All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/22] target/xtensa: implement double precision FPU
@ 2020-07-11 11:06 Max Filippov
  2020-07-11 11:06 ` [PATCH v4 01/22] softfloat: make NO_SIGNALING_NANS runtime property Max Filippov
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Hello,

this series implements double precision floating point unit option for
target/xtensa, updates FPU tests and adds two new CPU cores, one with
FPU2000 option and one with DFPU option.

It is tagged xtensa-5.1-dfp-v4 in the qemu-xtensa tree at
git://github.com/OSLL/qemu-xtensa.git

I don't post the last two patches as they are too big for the list,
they can be found in the git tree mentioned above.

Changes v3->v4:
- split DFPU option addition into a separate patch, change DFP unit
  detection logic
- avoid calling set_use_first_nan on every FPU operation in FPU2000
  and single-precision only DFPU configurations

Changes v2->v3:
- handle infzero case in pickNaNMulAdd properly and reword commit
  message
- add more infzero tests for FPU2000 and DFPU
- fix test names in test_dfp0_arith.S
- add licenses to newly imported cores
- rename DE_233L_FPU to de233_fpu to be more consistent with other
  core names

Changes v1->v2:
- use inline function for no_signaling_nans property to allow for
  constant folding on architectures that have this property fixed.

Max Filippov (22):
  softfloat: make NO_SIGNALING_NANS runtime property
  softfloat: pass float_status pointer to pickNaN
  softfloat: add xtensa specialization for pickNaNMulAdd
  target/xtensa: add geometry to xtensa_get_regfile_by_name
  target/xtensa: support copying registers up to 64 bits wide
  target/xtensa: rename FPU2000 translators and helpers
  target/xtensa: move FSR/FCR register accessors
  target/xtensa: don't access BR regfile directly
  target/xtensa: add DFPU option
  target/xtensa: add DFPU registers and opcodes
  target/xtensa: implement FPU division and square root
  tests/tcg/xtensa: fix test execution on ISS
  tests/tcg/xtensa: update test_fp0_arith for DFPU
  tests/tcg/xtensa: expand madd tests
  tests/tcg/xtensa: update test_fp0_conv for DFPU
  tests/tcg/xtensa: update test_fp1 for DFPU
  tests/tcg/xtensa: update test_lsc for DFPU
  tests/tcg/xtensa: add fp0 div and sqrt tests
  tests/tcg/xtensa: test double precision load/store
  tests/tcg/xtensa: add DFP0 arithmetic tests
  target/xtensa: import de233_fpu core
  target/xtensa: import DSP3400 core

 fpu/softfloat-specialize.inc.c                |    285 +-
 fpu/softfloat.c                               |      2 +-
 include/fpu/softfloat-helpers.h               |     10 +
 include/fpu/softfloat-types.h                 |      8 +-
 target/xtensa/Makefile.objs                   |      2 +
 target/xtensa/core-de233_fpu.c                |     58 +
 target/xtensa/core-de233_fpu/core-isa.h       |    727 +
 target/xtensa/core-de233_fpu/core-matmap.h    |    717 +
 target/xtensa/core-de233_fpu/gdb-config.inc.c |    277 +
 .../core-de233_fpu/xtensa-modules.inc.c       |  20758 ++
 target/xtensa/core-dsp3400.c                  |     58 +
 target/xtensa/core-dsp3400/core-isa.h         |    452 +
 target/xtensa/core-dsp3400/core-matmap.h      |    312 +
 target/xtensa/core-dsp3400/gdb-config.inc.c   |    400 +
 .../xtensa/core-dsp3400/xtensa-modules.inc.c  | 171906 +++++++++++++++
 target/xtensa/cpu.c                           |      5 +
 target/xtensa/cpu.h                           |      8 +-
 target/xtensa/fpu_helper.c                    |    342 +-
 target/xtensa/helper.c                        |      4 +-
 target/xtensa/helper.h                        |     58 +-
 target/xtensa/overlay_tool.h                  |     24 +
 target/xtensa/translate.c                     |   1437 +-
 tests/tcg/xtensa/fpu.h                        |    142 +
 tests/tcg/xtensa/macros.inc                   |     10 +-
 tests/tcg/xtensa/test_dfp0_arith.S            |    162 +
 tests/tcg/xtensa/test_fp0_arith.S             |    282 +-
 tests/tcg/xtensa/test_fp0_conv.S              |    299 +-
 tests/tcg/xtensa/test_fp0_div.S               |     82 +
 tests/tcg/xtensa/test_fp0_sqrt.S              |     76 +
 tests/tcg/xtensa/test_fp1.S                   |     62 +-
 tests/tcg/xtensa/test_lsc.S                   |    170 +-
 31 files changed, 198581 insertions(+), 554 deletions(-)
 create mode 100644 target/xtensa/core-de233_fpu.c
 create mode 100644 target/xtensa/core-de233_fpu/core-isa.h
 create mode 100644 target/xtensa/core-de233_fpu/core-matmap.h
 create mode 100644 target/xtensa/core-de233_fpu/gdb-config.inc.c
 create mode 100644 target/xtensa/core-de233_fpu/xtensa-modules.inc.c
 create mode 100644 target/xtensa/core-dsp3400.c
 create mode 100644 target/xtensa/core-dsp3400/core-isa.h
 create mode 100644 target/xtensa/core-dsp3400/core-matmap.h
 create mode 100644 target/xtensa/core-dsp3400/gdb-config.inc.c
 create mode 100644 target/xtensa/core-dsp3400/xtensa-modules.inc.c
 create mode 100644 tests/tcg/xtensa/fpu.h
 create mode 100644 tests/tcg/xtensa/test_dfp0_arith.S
 create mode 100644 tests/tcg/xtensa/test_fp0_div.S
 create mode 100644 tests/tcg/xtensa/test_fp0_sqrt.S

-- 
2.20.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 01/22] softfloat: make NO_SIGNALING_NANS runtime property
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 02/22] softfloat: pass float_status pointer to pickNaN Max Filippov
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Max Filippov, Richard Henderson,
	Philippe Mathieu-Daudé,
	Peter Maydell

target/xtensa, the only user of NO_SIGNALING_NANS macro has FPU
implementations with and without the corresponding property. With
NO_SIGNALING_NANS being a macro they cannot be a part of the same QEMU
executable.
Replace macro with new property in float_status to allow cores with
different FPU implementations coexist.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v1->v2:
- use inline function for no_signaling_nans property to allow for
  constant folding on architectures that have this property fixed.

 fpu/softfloat-specialize.inc.c  | 229 ++++++++++++++++----------------
 include/fpu/softfloat-helpers.h |   5 +
 include/fpu/softfloat-types.h   |   7 +-
 3 files changed, 128 insertions(+), 113 deletions(-)

diff --git a/fpu/softfloat-specialize.inc.c b/fpu/softfloat-specialize.inc.c
index 44f5b661f831..9d919ee2d993 100644
--- a/fpu/softfloat-specialize.inc.c
+++ b/fpu/softfloat-specialize.inc.c
@@ -79,12 +79,18 @@ this code that are retained.
  * version 2 or later. See the COPYING file in the top-level directory.
  */
 
-/* Define for architectures which deviate from IEEE in not supporting
+/*
+ * Define whether architecture deviates from IEEE in not supporting
  * signaling NaNs (so all NaNs are treated as quiet).
  */
+static inline bool no_signaling_nans(float_status *status)
+{
 #if defined(TARGET_XTENSA)
-#define NO_SIGNALING_NANS 1
+    return status->no_signaling_nans;
+#else
+    return false;
 #endif
+}
 
 /* Define how the architecture discriminates signaling NaNs.
  * This done with the most significant bit of the fraction.
@@ -111,12 +117,12 @@ static inline bool snan_bit_is_one(float_status *status)
 
 static bool parts_is_snan_frac(uint64_t frac, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return false;
-#else
-    bool msb = extract64(frac, DECOMPOSED_BINARY_POINT - 1, 1);
-    return msb == snan_bit_is_one(status);
-#endif
+    if (no_signaling_nans(status)) {
+        return false;
+    } else {
+        bool msb = extract64(frac, DECOMPOSED_BINARY_POINT - 1, 1);
+        return msb == snan_bit_is_one(status);
+    }
 }
 
 /*----------------------------------------------------------------------------
@@ -170,9 +176,8 @@ static FloatParts parts_default_nan(float_status *status)
 
 static FloatParts parts_silence_nan(FloatParts a, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    g_assert_not_reached();
-#elif defined(TARGET_HPPA)
+    g_assert(!no_signaling_nans(status));
+#if defined(TARGET_HPPA)
     a.frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
     a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
 #else
@@ -247,16 +252,16 @@ typedef struct {
 
 bool float16_is_quiet_nan(float16 a_, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return float16_is_any_nan(a_);
-#else
-    uint16_t a = float16_val(a_);
-    if (snan_bit_is_one(status)) {
-        return (((a >> 9) & 0x3F) == 0x3E) && (a & 0x1FF);
+    if (no_signaling_nans(status)) {
+        return float16_is_any_nan(a_);
     } else {
-        return ((a & ~0x8000) >= 0x7C80);
+        uint16_t a = float16_val(a_);
+        if (snan_bit_is_one(status)) {
+            return (((a >> 9) & 0x3F) == 0x3E) && (a & 0x1FF);
+        } else {
+            return ((a & ~0x8000) >= 0x7C80);
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -266,16 +271,16 @@ bool float16_is_quiet_nan(float16 a_, float_status *status)
 
 bool float16_is_signaling_nan(float16 a_, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return 0;
-#else
-    uint16_t a = float16_val(a_);
-    if (snan_bit_is_one(status)) {
-        return ((a & ~0x8000) >= 0x7C80);
+    if (no_signaling_nans(status)) {
+        return 0;
     } else {
-        return (((a >> 9) & 0x3F) == 0x3E) && (a & 0x1FF);
+        uint16_t a = float16_val(a_);
+        if (snan_bit_is_one(status)) {
+            return ((a & ~0x8000) >= 0x7C80);
+        } else {
+            return (((a >> 9) & 0x3F) == 0x3E) && (a & 0x1FF);
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -285,16 +290,16 @@ bool float16_is_signaling_nan(float16 a_, float_status *status)
 
 bool float32_is_quiet_nan(float32 a_, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return float32_is_any_nan(a_);
-#else
-    uint32_t a = float32_val(a_);
-    if (snan_bit_is_one(status)) {
-        return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003FFFFF);
+    if (no_signaling_nans(status)) {
+        return float32_is_any_nan(a_);
     } else {
-        return ((uint32_t)(a << 1) >= 0xFF800000);
+        uint32_t a = float32_val(a_);
+        if (snan_bit_is_one(status)) {
+            return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003FFFFF);
+        } else {
+            return ((uint32_t)(a << 1) >= 0xFF800000);
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -304,16 +309,16 @@ bool float32_is_quiet_nan(float32 a_, float_status *status)
 
 bool float32_is_signaling_nan(float32 a_, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return 0;
-#else
-    uint32_t a = float32_val(a_);
-    if (snan_bit_is_one(status)) {
-        return ((uint32_t)(a << 1) >= 0xFF800000);
+    if (no_signaling_nans(status)) {
+        return 0;
     } else {
-        return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003FFFFF);
+        uint32_t a = float32_val(a_);
+        if (snan_bit_is_one(status)) {
+            return ((uint32_t)(a << 1) >= 0xFF800000);
+        } else {
+            return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003FFFFF);
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -639,17 +644,17 @@ static float32 propagateFloat32NaN(float32 a, float32 b, float_status *status)
 
 bool float64_is_quiet_nan(float64 a_, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return float64_is_any_nan(a_);
-#else
-    uint64_t a = float64_val(a_);
-    if (snan_bit_is_one(status)) {
-        return (((a >> 51) & 0xFFF) == 0xFFE)
-            && (a & 0x0007FFFFFFFFFFFFULL);
+    if (no_signaling_nans(status)) {
+        return float64_is_any_nan(a_);
     } else {
-        return ((a << 1) >= 0xFFF0000000000000ULL);
+        uint64_t a = float64_val(a_);
+        if (snan_bit_is_one(status)) {
+            return (((a >> 51) & 0xFFF) == 0xFFE)
+                && (a & 0x0007FFFFFFFFFFFFULL);
+        } else {
+            return ((a << 1) >= 0xFFF0000000000000ULL);
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -659,17 +664,17 @@ bool float64_is_quiet_nan(float64 a_, float_status *status)
 
 bool float64_is_signaling_nan(float64 a_, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return 0;
-#else
-    uint64_t a = float64_val(a_);
-    if (snan_bit_is_one(status)) {
-        return ((a << 1) >= 0xFFF0000000000000ULL);
+    if (no_signaling_nans(status)) {
+        return 0;
     } else {
-        return (((a >> 51) & 0xFFF) == 0xFFE)
-            && (a & UINT64_C(0x0007FFFFFFFFFFFF));
+        uint64_t a = float64_val(a_);
+        if (snan_bit_is_one(status)) {
+            return ((a << 1) >= 0xFFF0000000000000ULL);
+        } else {
+            return (((a >> 51) & 0xFFF) == 0xFFE)
+                && (a & UINT64_C(0x0007FFFFFFFFFFFF));
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -778,21 +783,21 @@ static float64 propagateFloat64NaN(float64 a, float64 b, float_status *status)
 
 int floatx80_is_quiet_nan(floatx80 a, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return floatx80_is_any_nan(a);
-#else
-    if (snan_bit_is_one(status)) {
-        uint64_t aLow;
-
-        aLow = a.low & ~0x4000000000000000ULL;
-        return ((a.high & 0x7FFF) == 0x7FFF)
-            && (aLow << 1)
-            && (a.low == aLow);
+    if (no_signaling_nans(status)) {
+        return floatx80_is_any_nan(a);
     } else {
-        return ((a.high & 0x7FFF) == 0x7FFF)
-            && (UINT64_C(0x8000000000000000) <= ((uint64_t)(a.low << 1)));
+        if (snan_bit_is_one(status)) {
+            uint64_t aLow;
+
+            aLow = a.low & ~0x4000000000000000ULL;
+            return ((a.high & 0x7FFF) == 0x7FFF)
+                && (aLow << 1)
+                && (a.low == aLow);
+        } else {
+            return ((a.high & 0x7FFF) == 0x7FFF)
+                && (UINT64_C(0x8000000000000000) <= ((uint64_t)(a.low << 1)));
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -803,21 +808,21 @@ int floatx80_is_quiet_nan(floatx80 a, float_status *status)
 
 int floatx80_is_signaling_nan(floatx80 a, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return 0;
-#else
-    if (snan_bit_is_one(status)) {
-        return ((a.high & 0x7FFF) == 0x7FFF)
-            && ((a.low << 1) >= 0x8000000000000000ULL);
+    if (no_signaling_nans(status)) {
+        return 0;
     } else {
-        uint64_t aLow;
+        if (snan_bit_is_one(status)) {
+            return ((a.high & 0x7FFF) == 0x7FFF)
+                && ((a.low << 1) >= 0x8000000000000000ULL);
+        } else {
+            uint64_t aLow;
 
-        aLow = a.low & ~UINT64_C(0x4000000000000000);
-        return ((a.high & 0x7FFF) == 0x7FFF)
-            && (uint64_t)(aLow << 1)
-            && (a.low == aLow);
+            aLow = a.low & ~UINT64_C(0x4000000000000000);
+            return ((a.high & 0x7FFF) == 0x7FFF)
+                && (uint64_t)(aLow << 1)
+                && (a.low == aLow);
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -941,17 +946,17 @@ floatx80 propagateFloatx80NaN(floatx80 a, floatx80 b, float_status *status)
 
 bool float128_is_quiet_nan(float128 a, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return float128_is_any_nan(a);
-#else
-    if (snan_bit_is_one(status)) {
-        return (((a.high >> 47) & 0xFFFF) == 0xFFFE)
-            && (a.low || (a.high & 0x00007FFFFFFFFFFFULL));
+    if (no_signaling_nans(status)) {
+        return float128_is_any_nan(a);
     } else {
-        return ((a.high << 1) >= 0xFFFF000000000000ULL)
-            && (a.low || (a.high & 0x0000FFFFFFFFFFFFULL));
+        if (snan_bit_is_one(status)) {
+            return (((a.high >> 47) & 0xFFFF) == 0xFFFE)
+                && (a.low || (a.high & 0x00007FFFFFFFFFFFULL));
+        } else {
+            return ((a.high << 1) >= 0xFFFF000000000000ULL)
+                && (a.low || (a.high & 0x0000FFFFFFFFFFFFULL));
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -961,17 +966,17 @@ bool float128_is_quiet_nan(float128 a, float_status *status)
 
 bool float128_is_signaling_nan(float128 a, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    return 0;
-#else
-    if (snan_bit_is_one(status)) {
-        return ((a.high << 1) >= 0xFFFF000000000000ULL)
-            && (a.low || (a.high & 0x0000FFFFFFFFFFFFULL));
+    if (no_signaling_nans(status)) {
+        return 0;
     } else {
-        return (((a.high >> 47) & 0xFFFF) == 0xFFFE)
-            && (a.low || (a.high & UINT64_C(0x00007FFFFFFFFFFF)));
+        if (snan_bit_is_one(status)) {
+            return ((a.high << 1) >= 0xFFFF000000000000ULL)
+                && (a.low || (a.high & 0x0000FFFFFFFFFFFFULL));
+        } else {
+            return (((a.high >> 47) & 0xFFFF) == 0xFFFE)
+                && (a.low || (a.high & UINT64_C(0x00007FFFFFFFFFFF)));
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -981,16 +986,16 @@ bool float128_is_signaling_nan(float128 a, float_status *status)
 
 float128 float128_silence_nan(float128 a, float_status *status)
 {
-#ifdef NO_SIGNALING_NANS
-    g_assert_not_reached();
-#else
-    if (snan_bit_is_one(status)) {
-        return float128_default_nan(status);
+    if (no_signaling_nans(status)) {
+        g_assert_not_reached();
     } else {
-        a.high |= UINT64_C(0x0000800000000000);
-        return a;
+        if (snan_bit_is_one(status)) {
+            return float128_default_nan(status);
+        } else {
+            a.high |= UINT64_C(0x0000800000000000);
+            return a;
+        }
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index 735ed6b653ee..e842f83a1285 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -95,6 +95,11 @@ static inline void set_snan_bit_is_one(bool val, float_status *status)
     status->snan_bit_is_one = val;
 }
 
+static inline void set_no_signaling_nans(bool val, float_status *status)
+{
+    status->no_signaling_nans = val;
+}
+
 static inline bool get_float_detect_tininess(float_status *status)
 {
     return status->tininess_before_rounding;
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 7680193ebc1c..d6f167c1b0c4 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -165,8 +165,13 @@ typedef struct float_status {
     /* should denormalised inputs go to zero and set the input_denormal flag? */
     bool flush_inputs_to_zero;
     bool default_nan_mode;
-    /* not always used -- see snan_bit_is_one() in softfloat-specialize.h */
+    /*
+     * The flags below are not used on all specializations and may
+     * constant fold away (see snan_bit_is_one()/no_signalling_nans() in
+     * softfloat-specialize.inc.c)
+     */
     bool snan_bit_is_one;
+    bool no_signaling_nans;
 } float_status;
 
 #endif /* SOFTFLOAT_TYPES_H */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 02/22] softfloat: pass float_status pointer to pickNaN
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
  2020-07-11 11:06 ` [PATCH v4 01/22] softfloat: make NO_SIGNALING_NANS runtime property Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 03/22] softfloat: add xtensa specialization for pickNaNMulAdd Max Filippov
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Max Filippov, Richard Henderson, Peter Maydell

Pass float_status structure pointer to the pickNaN so that
machine-specific settings are available to NaN selection code.
Add use_first_nan property to float_status and use it in Xtensa-specific
pickNaN.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 fpu/softfloat-specialize.inc.c  | 30 ++++++++++++++++++++++++------
 fpu/softfloat.c                 |  2 +-
 include/fpu/softfloat-helpers.h |  5 +++++
 include/fpu/softfloat-types.h   |  1 +
 4 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/fpu/softfloat-specialize.inc.c b/fpu/softfloat-specialize.inc.c
index 9d919ee2d993..f519beca1b74 100644
--- a/fpu/softfloat-specialize.inc.c
+++ b/fpu/softfloat-specialize.inc.c
@@ -379,7 +379,7 @@ static float32 commonNaNToFloat32(commonNaNT a, float_status *status)
 *----------------------------------------------------------------------------*/
 
 static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-                   bool aIsLargerSignificand)
+                   bool aIsLargerSignificand, float_status *status)
 {
 #if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA)
     /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
@@ -412,7 +412,7 @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     } else {
         return 1;
     }
-#elif defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K)
+#elif defined(TARGET_PPC) || defined(TARGET_M68K)
     /* PowerPC propagation rules:
      *  1. A if it sNaN or qNaN
      *  2. B if it sNaN or qNaN
@@ -437,6 +437,24 @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     } else {
         return 1;
     }
+#elif defined(TARGET_XTENSA)
+    /*
+     * Xtensa has two NaN propagation modes.
+     * Which one is active is controlled by float_status::use_first_nan.
+     */
+    if (status->use_first_nan) {
+        if (is_nan(a_cls)) {
+            return 0;
+        } else {
+            return 1;
+        }
+    } else {
+        if (is_nan(b_cls)) {
+            return 1;
+        } else {
+            return 0;
+        }
+    }
 #else
     /* This implements x87 NaN propagation rules:
      * SNaN + QNaN => return the QNaN
@@ -624,7 +642,7 @@ static float32 propagateFloat32NaN(float32 a, float32 b, float_status *status)
         aIsLargerSignificand = (av < bv) ? 1 : 0;
     }
 
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand)) {
+    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
         if (is_snan(b_cls)) {
             return float32_silence_nan(b, status);
         }
@@ -762,7 +780,7 @@ static float64 propagateFloat64NaN(float64 a, float64 b, float_status *status)
         aIsLargerSignificand = (av < bv) ? 1 : 0;
     }
 
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand)) {
+    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
         if (is_snan(b_cls)) {
             return float64_silence_nan(b, status);
         }
@@ -926,7 +944,7 @@ floatx80 propagateFloatx80NaN(floatx80 a, floatx80 b, float_status *status)
         aIsLargerSignificand = (a.high < b.high) ? 1 : 0;
     }
 
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand)) {
+    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
         if (is_snan(b_cls)) {
             return floatx80_silence_nan(b, status);
         }
@@ -1074,7 +1092,7 @@ static float128 propagateFloat128NaN(float128 a, float128 b,
         aIsLargerSignificand = (a.high < b.high) ? 1 : 0;
     }
 
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand)) {
+    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
         if (is_snan(b_cls)) {
             return float128_silence_nan(b, status);
         }
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 5e9746c2876f..a89056a1816e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -881,7 +881,7 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
     } else {
         if (pickNaN(a.cls, b.cls,
                     a.frac > b.frac ||
-                    (a.frac == b.frac && a.sign < b.sign))) {
+                    (a.frac == b.frac && a.sign < b.sign), s)) {
             a = b;
         }
         if (is_snan(a.cls)) {
diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index e842f83a1285..2f0674fbddec 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -95,6 +95,11 @@ static inline void set_snan_bit_is_one(bool val, float_status *status)
     status->snan_bit_is_one = val;
 }
 
+static inline void set_use_first_nan(bool val, float_status *status)
+{
+    status->use_first_nan = val;
+}
+
 static inline void set_no_signaling_nans(bool val, float_status *status)
 {
     status->no_signaling_nans = val;
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index d6f167c1b0c4..c7ddcab8caee 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -171,6 +171,7 @@ typedef struct float_status {
      * softfloat-specialize.inc.c)
      */
     bool snan_bit_is_one;
+    bool use_first_nan;
     bool no_signaling_nans;
 } float_status;
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 03/22] softfloat: add xtensa specialization for pickNaNMulAdd
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
  2020-07-11 11:06 ` [PATCH v4 01/22] softfloat: make NO_SIGNALING_NANS runtime property Max Filippov
  2020-07-11 11:06 ` [PATCH v4 02/22] softfloat: pass float_status pointer to pickNaN Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 04/22] target/xtensa: add geometry to xtensa_get_regfile_by_name Max Filippov
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Max Filippov, Richard Henderson, Peter Maydell

pickNaNMulAdd logic on Xtensa is to apply pickNaN to the inputs of the
expression (a * b) + c. However if default NaN is produces as a result
of (a * b) calculation it is not considered when c is NaN.
So with two pickNaN variants there must be two pickNaNMulAdd variants.
In addition the invalid flag is always set when (a * b) produces NaN.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v2->v3:
- handle infzero case in pickNaNMulAdd properly and reword commit
  message

 fpu/softfloat-specialize.inc.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fpu/softfloat-specialize.inc.c b/fpu/softfloat-specialize.inc.c
index f519beca1b74..914deac46ecf 100644
--- a/fpu/softfloat-specialize.inc.c
+++ b/fpu/softfloat-specialize.inc.c
@@ -585,6 +585,32 @@ static int pickNaNMulAdd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
     } else {
         return 1;
     }
+#elif defined(TARGET_XTENSA)
+    /*
+     * For Xtensa, the (inf,zero,nan) case sets InvalidOp and returns
+     * an input NaN if we have one (ie c).
+     */
+    if (infzero) {
+        float_raise(float_flag_invalid, status);
+        return 2;
+    }
+    if (status->use_first_nan) {
+        if (is_nan(a_cls)) {
+            return 0;
+        } else if (is_nan(b_cls)) {
+            return 1;
+        } else {
+            return 2;
+        }
+    } else {
+        if (is_nan(c_cls)) {
+            return 2;
+        } else if (is_nan(b_cls)) {
+            return 1;
+        } else {
+            return 0;
+        }
+    }
 #else
     /* A default implementation: prefer a to b to c.
      * This is unlikely to actually match any real implementation.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 04/22] target/xtensa: add geometry to xtensa_get_regfile_by_name
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (2 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 03/22] softfloat: add xtensa specialization for pickNaNMulAdd Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 05/22] target/xtensa: support copying registers up to 64 bits wide Max Filippov
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Register file name may not uniquely identify a register file in the set
of configurations. E.g. floating point registers may have different size
in different configurations. Use register file geometry as additional
identifier.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 target/xtensa/cpu.h       |  2 +-
 target/xtensa/helper.c    |  4 +++-
 target/xtensa/translate.c | 35 +++++++++++++++++++++++++++--------
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
index 0c96181212a5..0409aa6189cf 100644
--- a/target/xtensa/cpu.h
+++ b/target/xtensa/cpu.h
@@ -598,7 +598,7 @@ void xtensa_cpu_do_unaligned_access(CPUState *cpu, vaddr addr,
 
 void xtensa_collect_sr_names(const XtensaConfig *config);
 void xtensa_translate_init(void);
-void **xtensa_get_regfile_by_name(const char *name);
+void **xtensa_get_regfile_by_name(const char *name, int entries, int bits);
 void xtensa_breakpoint_handler(CPUState *cs);
 void xtensa_register_core(XtensaConfigList *node);
 void xtensa_sim_open_console(Chardev *chr);
diff --git a/target/xtensa/helper.c b/target/xtensa/helper.c
index 7073381f03b2..05e2b7f70a1e 100644
--- a/target/xtensa/helper.c
+++ b/target/xtensa/helper.c
@@ -133,8 +133,10 @@ static void init_libisa(XtensaConfig *config)
     config->regfile = g_new(void **, regfiles);
     for (i = 0; i < regfiles; ++i) {
         const char *name = xtensa_regfile_name(config->isa, i);
+        int entries = xtensa_regfile_num_entries(config->isa, i);
+        int bits = xtensa_regfile_num_bits(config->isa, i);
 
-        config->regfile[i] = xtensa_get_regfile_by_name(name);
+        config->regfile[i] = xtensa_get_regfile_by_name(name, entries, bits);
 #ifdef DEBUG
         if (config->regfile[i] == NULL) {
             fprintf(stderr, "regfile '%s' not found for %s\n",
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 03d796d7a1ed..9838bf6b3ec5 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -227,24 +227,43 @@ void xtensa_translate_init(void)
                                "exclusive_val");
 }
 
-void **xtensa_get_regfile_by_name(const char *name)
+void **xtensa_get_regfile_by_name(const char *name, int entries, int bits)
 {
+    char *geometry_name;
+    void **res;
+
     if (xtensa_regfile_table == NULL) {
         xtensa_regfile_table = g_hash_table_new(g_str_hash, g_str_equal);
+        /*
+         * AR is special. Xtensa translator uses it as a current register
+         * window, but configuration overlays represent it as a complete
+         * physical register file.
+         */
         g_hash_table_insert(xtensa_regfile_table,
-                            (void *)"AR", (void *)cpu_R);
+                            (void *)"AR 16x32", (void *)cpu_R);
         g_hash_table_insert(xtensa_regfile_table,
-                            (void *)"MR", (void *)cpu_MR);
+                            (void *)"AR 32x32", (void *)cpu_R);
         g_hash_table_insert(xtensa_regfile_table,
-                            (void *)"FR", (void *)cpu_FR);
+                            (void *)"AR 64x32", (void *)cpu_R);
+
         g_hash_table_insert(xtensa_regfile_table,
-                            (void *)"BR", (void *)cpu_BR);
+                            (void *)"MR 4x32", (void *)cpu_MR);
+
         g_hash_table_insert(xtensa_regfile_table,
-                            (void *)"BR4", (void *)cpu_BR4);
+                            (void *)"FR 16x32", (void *)cpu_FR);
+
         g_hash_table_insert(xtensa_regfile_table,
-                            (void *)"BR8", (void *)cpu_BR8);
+                            (void *)"BR 16x1", (void *)cpu_BR);
+        g_hash_table_insert(xtensa_regfile_table,
+                            (void *)"BR4 4x4", (void *)cpu_BR4);
+        g_hash_table_insert(xtensa_regfile_table,
+                            (void *)"BR8 2x8", (void *)cpu_BR8);
     }
-    return (void **)g_hash_table_lookup(xtensa_regfile_table, (void *)name);
+
+    geometry_name = g_strdup_printf("%s %dx%d", name, entries, bits);
+    res = (void **)g_hash_table_lookup(xtensa_regfile_table, geometry_name);
+    g_free(geometry_name);
+    return res;
 }
 
 static inline bool option_enabled(DisasContext *dc, int opt)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 05/22] target/xtensa: support copying registers up to 64 bits wide
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (3 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 04/22] target/xtensa: add geometry to xtensa_get_regfile_by_name Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 06/22] target/xtensa: rename FPU2000 translators and helpers Max Filippov
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

FLIX dependency breaking code assumes that all registers are 32 bit
wide. This may not always be correct.
Extract actual register width from the associated register file and use
it to create temporaries of correct width and generate correct data
movement instructions.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 target/xtensa/cpu.h       |  1 +
 target/xtensa/translate.c | 26 +++++++++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
index 0409aa6189cf..960f6573447f 100644
--- a/target/xtensa/cpu.h
+++ b/target/xtensa/cpu.h
@@ -359,6 +359,7 @@ typedef struct opcode_arg {
     uint32_t raw_imm;
     void *in;
     void *out;
+    uint32_t num_bits;
 } OpcodeArg;
 
 typedef struct DisasContext DisasContext;
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 9838bf6b3ec5..bc01a720719d 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -943,10 +943,10 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
 
         for (opnd = vopnd = 0; opnd < opnds; ++opnd) {
             void **register_file = NULL;
+            xtensa_regfile rf;
 
             if (xtensa_operand_is_register(isa, opc, opnd)) {
-                xtensa_regfile rf = xtensa_operand_regfile(isa, opc, opnd);
-
+                rf = xtensa_operand_regfile(isa, opc, opnd);
                 register_file = dc->config->regfile[rf];
 
                 if (rf == dc->config->a_regfile) {
@@ -972,6 +972,9 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
                 if (register_file) {
                     arg[vopnd].in = register_file[v];
                     arg[vopnd].out = register_file[v];
+                    arg[vopnd].num_bits = xtensa_regfile_num_bits(isa, rf);
+                } else {
+                    arg[vopnd].num_bits = 32;
                 }
                 ++vopnd;
             }
@@ -1111,8 +1114,15 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
         for (i = j = 0; i < n_arg_copy; ++i) {
             if (i == 0 || arg_copy[i].resource != resource) {
                 resource = arg_copy[i].resource;
-                temp = tcg_temp_local_new();
-                tcg_gen_mov_i32(temp, arg_copy[i].arg->in);
+                if (arg_copy[i].arg->num_bits <= 32) {
+                    temp = tcg_temp_local_new_i32();
+                    tcg_gen_mov_i32(temp, arg_copy[i].arg->in);
+                } else if (arg_copy[i].arg->num_bits <= 64) {
+                    temp = tcg_temp_local_new_i64();
+                    tcg_gen_mov_i64(temp, arg_copy[i].arg->in);
+                } else {
+                    g_assert_not_reached();
+                }
                 arg_copy[i].temp = temp;
 
                 if (i != j) {
@@ -1143,7 +1153,13 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
     }
 
     for (i = 0; i < n_arg_copy; ++i) {
-        tcg_temp_free(arg_copy[i].temp);
+        if (arg_copy[i].arg->num_bits <= 32) {
+            tcg_temp_free_i32(arg_copy[i].temp);
+        } else if (arg_copy[i].arg->num_bits <= 64) {
+            tcg_temp_free_i64(arg_copy[i].temp);
+        } else {
+            g_assert_not_reached();
+        }
     }
 
     if (dc->base.is_jmp == DISAS_NEXT) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 06/22] target/xtensa: rename FPU2000 translators and helpers
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (4 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 05/22] target/xtensa: support copying registers up to 64 bits wide Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 07/22] target/xtensa: move FSR/FCR register accessors Max Filippov
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Add _s suffix to all FPU2000 opcode translators and helpers that also
have double-precision variant to unify naming and allow adding DFPU
implementations. Add _fpu2k_ to the names of helpers that will have
different implementation for the DFPU .

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v3->v4:
- add _fpu2k_ to single-precision arithmetic helpers that do NaN
  selection to make space for helpers that will have to call
  set_use_first_nan

 target/xtensa/fpu_helper.c | 22 ++++++------
 target/xtensa/helper.h     | 20 +++++------
 target/xtensa/translate.c  | 70 +++++++++++++++++++-------------------
 3 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
index 87487293f9a1..46e231bdaa51 100644
--- a/target/xtensa/fpu_helper.c
+++ b/target/xtensa/fpu_helper.c
@@ -33,7 +33,7 @@
 #include "exec/exec-all.h"
 #include "fpu/softfloat.h"
 
-void HELPER(wur_fcr)(CPUXtensaState *env, uint32_t v)
+void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
 {
     static const int rounding_mode[] = {
         float_round_nearest_even,
@@ -56,33 +56,35 @@ float32 HELPER(neg_s)(float32 v)
     return float32_chs(v);
 }
 
-float32 HELPER(add_s)(CPUXtensaState *env, float32 a, float32 b)
+float32 HELPER(fpu2k_add_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     return float32_add(a, b, &env->fp_status);
 }
 
-float32 HELPER(sub_s)(CPUXtensaState *env, float32 a, float32 b)
+float32 HELPER(fpu2k_sub_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     return float32_sub(a, b, &env->fp_status);
 }
 
-float32 HELPER(mul_s)(CPUXtensaState *env, float32 a, float32 b)
+float32 HELPER(fpu2k_mul_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     return float32_mul(a, b, &env->fp_status);
 }
 
-float32 HELPER(madd_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
+float32 HELPER(fpu2k_madd_s)(CPUXtensaState *env,
+                             float32 a, float32 b, float32 c)
 {
     return float32_muladd(b, c, a, 0, &env->fp_status);
 }
 
-float32 HELPER(msub_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
+float32 HELPER(fpu2k_msub_s)(CPUXtensaState *env,
+                             float32 a, float32 b, float32 c)
 {
     return float32_muladd(b, c, a, float_muladd_negate_product,
                           &env->fp_status);
 }
 
-uint32_t HELPER(ftoi)(float32 v, uint32_t rounding_mode, uint32_t scale)
+uint32_t HELPER(ftoi_s)(float32 v, uint32_t rounding_mode, uint32_t scale)
 {
     float_status fp_status = {0};
 
@@ -90,7 +92,7 @@ uint32_t HELPER(ftoi)(float32 v, uint32_t rounding_mode, uint32_t scale)
     return float32_to_int32(float32_scalbn(v, scale, &fp_status), &fp_status);
 }
 
-uint32_t HELPER(ftoui)(float32 v, uint32_t rounding_mode, uint32_t scale)
+uint32_t HELPER(ftoui_s)(float32 v, uint32_t rounding_mode, uint32_t scale)
 {
     float_status fp_status = {0};
     float32 res;
@@ -106,13 +108,13 @@ uint32_t HELPER(ftoui)(float32 v, uint32_t rounding_mode, uint32_t scale)
     }
 }
 
-float32 HELPER(itof)(CPUXtensaState *env, uint32_t v, uint32_t scale)
+float32 HELPER(itof_s)(CPUXtensaState *env, uint32_t v, uint32_t scale)
 {
     return float32_scalbn(int32_to_float32(v, &env->fp_status),
                           (int32_t)scale, &env->fp_status);
 }
 
-float32 HELPER(uitof)(CPUXtensaState *env, uint32_t v, uint32_t scale)
+float32 HELPER(uitof_s)(CPUXtensaState *env, uint32_t v, uint32_t scale)
 {
     return float32_scalbn(uint32_to_float32(v, &env->fp_status),
                           (int32_t)scale, &env->fp_status);
diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h
index 8532de0b35f5..bce31cbd9ff1 100644
--- a/target/xtensa/helper.h
+++ b/target/xtensa/helper.h
@@ -46,18 +46,18 @@ DEF_HELPER_3(wsr_dbreaka, void, env, i32, i32)
 DEF_HELPER_3(wsr_dbreakc, void, env, i32, i32)
 #endif
 
-DEF_HELPER_2(wur_fcr, void, env, i32)
+DEF_HELPER_2(wur_fpu2k_fcr, void, env, i32)
 DEF_HELPER_FLAGS_1(abs_s, TCG_CALL_NO_RWG_SE, f32, f32)
 DEF_HELPER_FLAGS_1(neg_s, TCG_CALL_NO_RWG_SE, f32, f32)
-DEF_HELPER_3(add_s, f32, env, f32, f32)
-DEF_HELPER_3(sub_s, f32, env, f32, f32)
-DEF_HELPER_3(mul_s, f32, env, f32, f32)
-DEF_HELPER_4(madd_s, f32, env, f32, f32, f32)
-DEF_HELPER_4(msub_s, f32, env, f32, f32, f32)
-DEF_HELPER_FLAGS_3(ftoi, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
-DEF_HELPER_FLAGS_3(ftoui, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
-DEF_HELPER_3(itof, f32, env, i32, i32)
-DEF_HELPER_3(uitof, f32, env, i32, i32)
+DEF_HELPER_3(fpu2k_add_s, f32, env, f32, f32)
+DEF_HELPER_3(fpu2k_sub_s, f32, env, f32, f32)
+DEF_HELPER_3(fpu2k_mul_s, f32, env, f32, f32)
+DEF_HELPER_4(fpu2k_madd_s, f32, env, f32, f32, f32)
+DEF_HELPER_4(fpu2k_msub_s, f32, env, f32, f32, f32)
+DEF_HELPER_FLAGS_3(ftoi_s, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
+DEF_HELPER_FLAGS_3(ftoui_s, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
+DEF_HELPER_3(itof_s, f32, env, i32, i32)
+DEF_HELPER_3(uitof_s, f32, env, i32, i32)
 
 DEF_HELPER_4(un_s, void, env, i32, f32, f32)
 DEF_HELPER_4(oeq_s, void, env, i32, f32, f32)
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index bc01a720719d..47951acd1669 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -2813,10 +2813,10 @@ static void translate_wur(DisasContext *dc, const OpcodeArg arg[],
     tcg_gen_mov_i32(cpu_UR[par[0]], arg[0].in);
 }
 
-static void translate_wur_fcr(DisasContext *dc, const OpcodeArg arg[],
-                              const uint32_t par[])
+static void translate_wur_fpu2k_fcr(DisasContext *dc, const OpcodeArg arg[],
+                                    const uint32_t par[])
 {
-    gen_helper_wur_fcr(cpu_env, arg[0].in);
+    gen_helper_wur_fpu2k_fcr(cpu_env, arg[0].in);
 }
 
 static void translate_wur_fsr(DisasContext *dc, const OpcodeArg arg[],
@@ -5583,7 +5583,7 @@ static const XtensaOpcodeOps core_ops[] = {
         .par = (const uint32_t[]){EXPSTATE},
     }, {
         .name = "wur.fcr",
-        .translate = translate_wur_fcr,
+        .translate = translate_wur_fpu2k_fcr,
         .par = (const uint32_t[]){FCR},
         .coprocessor = 0x1,
     }, {
@@ -6331,11 +6331,11 @@ static void translate_abs_s(DisasContext *dc, const OpcodeArg arg[],
     gen_helper_abs_s(arg[0].out, arg[1].in);
 }
 
-static void translate_add_s(DisasContext *dc, const OpcodeArg arg[],
-                            const uint32_t par[])
+static void translate_fpu2k_add_s(DisasContext *dc, const OpcodeArg arg[],
+                                  const uint32_t par[])
 {
-    gen_helper_add_s(arg[0].out, cpu_env,
-                     arg[1].in, arg[2].in);
+    gen_helper_fpu2k_add_s(arg[0].out, cpu_env,
+                           arg[1].in, arg[2].in);
 }
 
 enum {
@@ -6373,9 +6373,9 @@ static void translate_float_s(DisasContext *dc, const OpcodeArg arg[],
     TCGv_i32 scale = tcg_const_i32(-arg[2].imm);
 
     if (par[0]) {
-        gen_helper_uitof(arg[0].out, cpu_env, arg[1].in, scale);
+        gen_helper_uitof_s(arg[0].out, cpu_env, arg[1].in, scale);
     } else {
-        gen_helper_itof(arg[0].out, cpu_env, arg[1].in, scale);
+        gen_helper_itof_s(arg[0].out, cpu_env, arg[1].in, scale);
     }
     tcg_temp_free(scale);
 }
@@ -6387,11 +6387,11 @@ static void translate_ftoi_s(DisasContext *dc, const OpcodeArg arg[],
     TCGv_i32 scale = tcg_const_i32(arg[2].imm);
 
     if (par[1]) {
-        gen_helper_ftoui(arg[0].out, arg[1].in,
-                         rounding_mode, scale);
+        gen_helper_ftoui_s(arg[0].out, arg[1].in,
+                           rounding_mode, scale);
     } else {
-        gen_helper_ftoi(arg[0].out, arg[1].in,
-                        rounding_mode, scale);
+        gen_helper_ftoi_s(arg[0].out, arg[1].in,
+                          rounding_mode, scale);
     }
     tcg_temp_free(rounding_mode);
     tcg_temp_free(scale);
@@ -6433,11 +6433,11 @@ static void translate_ldstx(DisasContext *dc, const OpcodeArg arg[],
     tcg_temp_free(addr);
 }
 
-static void translate_madd_s(DisasContext *dc, const OpcodeArg arg[],
-                             const uint32_t par[])
+static void translate_fpu2k_madd_s(DisasContext *dc, const OpcodeArg arg[],
+                                   const uint32_t par[])
 {
-    gen_helper_madd_s(arg[0].out, cpu_env,
-                      arg[0].in, arg[1].in, arg[2].in);
+    gen_helper_fpu2k_madd_s(arg[0].out, cpu_env,
+                            arg[0].in, arg[1].in, arg[2].in);
 }
 
 static void translate_mov_s(DisasContext *dc, const OpcodeArg arg[],
@@ -6471,18 +6471,18 @@ static void translate_movp_s(DisasContext *dc, const OpcodeArg arg[],
     tcg_temp_free(zero);
 }
 
-static void translate_mul_s(DisasContext *dc, const OpcodeArg arg[],
-                            const uint32_t par[])
+static void translate_fpu2k_mul_s(DisasContext *dc, const OpcodeArg arg[],
+                                  const uint32_t par[])
 {
-    gen_helper_mul_s(arg[0].out, cpu_env,
-                     arg[1].in, arg[2].in);
+    gen_helper_fpu2k_mul_s(arg[0].out, cpu_env,
+                           arg[1].in, arg[2].in);
 }
 
-static void translate_msub_s(DisasContext *dc, const OpcodeArg arg[],
-                             const uint32_t par[])
+static void translate_fpu2k_msub_s(DisasContext *dc, const OpcodeArg arg[],
+                                   const uint32_t par[])
 {
-    gen_helper_msub_s(arg[0].out, cpu_env,
-                      arg[0].in, arg[1].in, arg[2].in);
+    gen_helper_fpu2k_msub_s(arg[0].out, cpu_env,
+                            arg[0].in, arg[1].in, arg[2].in);
 }
 
 static void translate_neg_s(DisasContext *dc, const OpcodeArg arg[],
@@ -6497,11 +6497,11 @@ static void translate_rfr_s(DisasContext *dc, const OpcodeArg arg[],
     tcg_gen_mov_i32(arg[0].out, arg[1].in);
 }
 
-static void translate_sub_s(DisasContext *dc, const OpcodeArg arg[],
-                            const uint32_t par[])
+static void translate_fpu2k_sub_s(DisasContext *dc, const OpcodeArg arg[],
+                                  const uint32_t par[])
 {
-    gen_helper_sub_s(arg[0].out, cpu_env,
-                     arg[1].in, arg[2].in);
+    gen_helper_fpu2k_sub_s(arg[0].out, cpu_env,
+                           arg[1].in, arg[2].in);
 }
 
 static void translate_wfr_s(DisasContext *dc, const OpcodeArg arg[],
@@ -6517,7 +6517,7 @@ static const XtensaOpcodeOps fpu2000_ops[] = {
         .coprocessor = 0x1,
     }, {
         .name = "add.s",
-        .translate = translate_add_s,
+        .translate = translate_fpu2k_add_s,
         .coprocessor = 0x1,
     }, {
         .name = "ceil.s",
@@ -6560,7 +6560,7 @@ static const XtensaOpcodeOps fpu2000_ops[] = {
         .coprocessor = 0x1,
     }, {
         .name = "madd.s",
-        .translate = translate_madd_s,
+        .translate = translate_fpu2k_madd_s,
         .coprocessor = 0x1,
     }, {
         .name = "mov.s",
@@ -6598,11 +6598,11 @@ static const XtensaOpcodeOps fpu2000_ops[] = {
         .coprocessor = 0x1,
     }, {
         .name = "msub.s",
-        .translate = translate_msub_s,
+        .translate = translate_fpu2k_msub_s,
         .coprocessor = 0x1,
     }, {
         .name = "mul.s",
-        .translate = translate_mul_s,
+        .translate = translate_fpu2k_mul_s,
         .coprocessor = 0x1,
     }, {
         .name = "neg.s",
@@ -6658,7 +6658,7 @@ static const XtensaOpcodeOps fpu2000_ops[] = {
         .coprocessor = 0x1,
     }, {
         .name = "sub.s",
-        .translate = translate_sub_s,
+        .translate = translate_fpu2k_sub_s,
         .coprocessor = 0x1,
     }, {
         .name = "trunc.s",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 07/22] target/xtensa: move FSR/FCR register accessors
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (5 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 06/22] target/xtensa: rename FPU2000 translators and helpers Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 08/22] target/xtensa: don't access BR regfile directly Max Filippov
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Move FSR/FCR register accessors from core opcodes to FPU2000 opcodes as
they are FPU2000-specific.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 target/xtensa/translate.c | 64 +++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 47951acd1669..1b643881e6e9 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -2813,18 +2813,6 @@ static void translate_wur(DisasContext *dc, const OpcodeArg arg[],
     tcg_gen_mov_i32(cpu_UR[par[0]], arg[0].in);
 }
 
-static void translate_wur_fpu2k_fcr(DisasContext *dc, const OpcodeArg arg[],
-                                    const uint32_t par[])
-{
-    gen_helper_wur_fpu2k_fcr(cpu_env, arg[0].in);
-}
-
-static void translate_wur_fsr(DisasContext *dc, const OpcodeArg arg[],
-                              const uint32_t par[])
-{
-    tcg_gen_andi_i32(cpu_UR[par[0]], arg[0].in, 0xffffff80);
-}
-
 static void translate_xor(DisasContext *dc, const OpcodeArg arg[],
                           const uint32_t par[])
 {
@@ -4665,16 +4653,6 @@ static const XtensaOpcodeOps core_ops[] = {
         .name = "rur.expstate",
         .translate = translate_rur,
         .par = (const uint32_t[]){EXPSTATE},
-    }, {
-        .name = "rur.fcr",
-        .translate = translate_rur,
-        .par = (const uint32_t[]){FCR},
-        .coprocessor = 0x1,
-    }, {
-        .name = "rur.fsr",
-        .translate = translate_rur,
-        .par = (const uint32_t[]){FSR},
-        .coprocessor = 0x1,
     }, {
         .name = "rur.threadptr",
         .translate = translate_rur,
@@ -5581,16 +5559,6 @@ static const XtensaOpcodeOps core_ops[] = {
         .name = "wur.expstate",
         .translate = translate_wur,
         .par = (const uint32_t[]){EXPSTATE},
-    }, {
-        .name = "wur.fcr",
-        .translate = translate_wur_fpu2k_fcr,
-        .par = (const uint32_t[]){FCR},
-        .coprocessor = 0x1,
-    }, {
-        .name = "wur.fsr",
-        .translate = translate_wur_fsr,
-        .par = (const uint32_t[]){FSR},
-        .coprocessor = 0x1,
     }, {
         .name = "wur.threadptr",
         .translate = translate_wur,
@@ -6510,6 +6478,18 @@ static void translate_wfr_s(DisasContext *dc, const OpcodeArg arg[],
     tcg_gen_mov_i32(arg[0].out, arg[1].in);
 }
 
+static void translate_wur_fpu2k_fcr(DisasContext *dc, const OpcodeArg arg[],
+                                    const uint32_t par[])
+{
+    gen_helper_wur_fpu2k_fcr(cpu_env, arg[0].in);
+}
+
+static void translate_wur_fpu2k_fsr(DisasContext *dc, const OpcodeArg arg[],
+                                    const uint32_t par[])
+{
+    tcg_gen_andi_i32(cpu_UR[par[0]], arg[0].in, 0xffffff80);
+}
+
 static const XtensaOpcodeOps fpu2000_ops[] = {
     {
         .name = "abs.s",
@@ -6632,6 +6612,16 @@ static const XtensaOpcodeOps fpu2000_ops[] = {
         .translate = translate_ftoi_s,
         .par = (const uint32_t[]){float_round_nearest_even, false},
         .coprocessor = 0x1,
+    }, {
+        .name = "rur.fcr",
+        .translate = translate_rur,
+        .par = (const uint32_t[]){FCR},
+        .coprocessor = 0x1,
+    }, {
+        .name = "rur.fsr",
+        .translate = translate_rur,
+        .par = (const uint32_t[]){FSR},
+        .coprocessor = 0x1,
     }, {
         .name = "ssi",
         .translate = translate_ldsti,
@@ -6699,6 +6689,16 @@ static const XtensaOpcodeOps fpu2000_ops[] = {
         .name = "wfr",
         .translate = translate_wfr_s,
         .coprocessor = 0x1,
+    }, {
+        .name = "wur.fcr",
+        .translate = translate_wur_fpu2k_fcr,
+        .par = (const uint32_t[]){FCR},
+        .coprocessor = 0x1,
+    }, {
+        .name = "wur.fsr",
+        .translate = translate_wur_fpu2k_fsr,
+        .par = (const uint32_t[]){FSR},
+        .coprocessor = 0x1,
     },
 };
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 08/22] target/xtensa: don't access BR regfile directly
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (6 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 07/22] target/xtensa: move FSR/FCR register accessors Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 09/22] target/xtensa: add DFPU option Max Filippov
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

BR registers used in FPU comparison opcodes are available as opcode
arguments for translators. Use them. This simplifies comparison helpers
interface and makes them usable in FLIX bundles.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 target/xtensa/fpu_helper.c | 42 +++++++++++++++++---------------------
 target/xtensa/helper.h     | 14 ++++++-------
 target/xtensa/translate.c  | 20 ++++++++++++++----
 3 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
index 46e231bdaa51..35dacbd14d68 100644
--- a/target/xtensa/fpu_helper.c
+++ b/target/xtensa/fpu_helper.c
@@ -120,49 +120,45 @@ float32 HELPER(uitof_s)(CPUXtensaState *env, uint32_t v, uint32_t scale)
                           (int32_t)scale, &env->fp_status);
 }
 
-static inline void set_br(CPUXtensaState *env, bool v, uint32_t br)
+uint32_t HELPER(un_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    if (v) {
-        env->sregs[BR] |= br;
-    } else {
-        env->sregs[BR] &= ~br;
-    }
-}
-
-void HELPER(un_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
-{
-    set_br(env, float32_unordered_quiet(a, b, &env->fp_status), br);
+    return float32_unordered_quiet(a, b, &env->fp_status);
 }
 
-void HELPER(oeq_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
+uint32_t HELPER(oeq_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_br(env, float32_eq_quiet(a, b, &env->fp_status), br);
+    return float32_eq_quiet(a, b, &env->fp_status);
 }
 
-void HELPER(ueq_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
+uint32_t HELPER(ueq_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     FloatRelation v = float32_compare_quiet(a, b, &env->fp_status);
-    set_br(env, v == float_relation_equal || v == float_relation_unordered, br);
+
+    return v == float_relation_equal ||
+           v == float_relation_unordered;
 }
 
-void HELPER(olt_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
+uint32_t HELPER(olt_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_br(env, float32_lt_quiet(a, b, &env->fp_status), br);
+    return float32_lt_quiet(a, b, &env->fp_status);
 }
 
-void HELPER(ult_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
+uint32_t HELPER(ult_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     FloatRelation v = float32_compare_quiet(a, b, &env->fp_status);
-    set_br(env, v == float_relation_less || v == float_relation_unordered, br);
+
+    return v == float_relation_less ||
+           v == float_relation_unordered;
 }
 
-void HELPER(ole_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
+uint32_t HELPER(ole_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_br(env, float32_le_quiet(a, b, &env->fp_status), br);
+    return float32_le_quiet(a, b, &env->fp_status);
 }
 
-void HELPER(ule_s)(CPUXtensaState *env, uint32_t br, float32 a, float32 b)
+uint32_t HELPER(ule_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     FloatRelation v = float32_compare_quiet(a, b, &env->fp_status);
-    set_br(env, v != float_relation_greater, br);
+
+    return v != float_relation_greater;
 }
diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h
index bce31cbd9ff1..02c00d8461c0 100644
--- a/target/xtensa/helper.h
+++ b/target/xtensa/helper.h
@@ -59,13 +59,13 @@ DEF_HELPER_FLAGS_3(ftoui_s, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
 DEF_HELPER_3(itof_s, f32, env, i32, i32)
 DEF_HELPER_3(uitof_s, f32, env, i32, i32)
 
-DEF_HELPER_4(un_s, void, env, i32, f32, f32)
-DEF_HELPER_4(oeq_s, void, env, i32, f32, f32)
-DEF_HELPER_4(ueq_s, void, env, i32, f32, f32)
-DEF_HELPER_4(olt_s, void, env, i32, f32, f32)
-DEF_HELPER_4(ult_s, void, env, i32, f32, f32)
-DEF_HELPER_4(ole_s, void, env, i32, f32, f32)
-DEF_HELPER_4(ule_s, void, env, i32, f32, f32)
+DEF_HELPER_3(un_s,  i32, env, f32, f32)
+DEF_HELPER_3(oeq_s, i32, env, f32, f32)
+DEF_HELPER_3(ueq_s, i32, env, f32, f32)
+DEF_HELPER_3(olt_s, i32, env, f32, f32)
+DEF_HELPER_3(ult_s, i32, env, f32, f32)
+DEF_HELPER_3(ole_s, i32, env, f32, f32)
+DEF_HELPER_3(ule_s, i32, env, f32, f32)
 
 DEF_HELPER_2(rer, i32, env, i32)
 DEF_HELPER_3(wer, void, env, i32, i32)
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 1b643881e6e9..67a92379f9dc 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -6319,7 +6319,7 @@ enum {
 static void translate_compare_s(DisasContext *dc, const OpcodeArg arg[],
                                 const uint32_t par[])
 {
-    static void (* const helper[])(TCGv_env env, TCGv_i32 bit,
+    static void (* const helper[])(TCGv_i32 res, TCGv_env env,
                                    TCGv_i32 s, TCGv_i32 t) = {
         [COMPARE_UN] = gen_helper_un_s,
         [COMPARE_OEQ] = gen_helper_oeq_s,
@@ -6329,10 +6329,22 @@ static void translate_compare_s(DisasContext *dc, const OpcodeArg arg[],
         [COMPARE_OLE] = gen_helper_ole_s,
         [COMPARE_ULE] = gen_helper_ule_s,
     };
-    TCGv_i32 bit = tcg_const_i32(1 << arg[0].imm);
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 res = tcg_temp_new_i32();
+    TCGv_i32 set_br = tcg_temp_new_i32();
+    TCGv_i32 clr_br = tcg_temp_new_i32();
 
-    helper[par[0]](cpu_env, bit, arg[1].in, arg[2].in);
-    tcg_temp_free(bit);
+    tcg_gen_ori_i32(set_br, arg[0].in, 1 << arg[0].imm);
+    tcg_gen_andi_i32(clr_br, arg[0].in, ~(1 << arg[0].imm));
+
+    helper[par[0]](res, cpu_env, arg[1].in, arg[2].in);
+    tcg_gen_movcond_i32(TCG_COND_NE,
+                        arg[0].out, res, zero,
+                        set_br, clr_br);
+    tcg_temp_free(zero);
+    tcg_temp_free(res);
+    tcg_temp_free(set_br);
+    tcg_temp_free(clr_br);
 }
 
 static void translate_float_s(DisasContext *dc, const OpcodeArg arg[],
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 09/22] target/xtensa: add DFPU option
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (7 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 08/22] target/xtensa: don't access BR regfile directly Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 10/22] target/xtensa: add DFPU registers and opcodes Max Filippov
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Double precision floating point unit is a FPU implementation different
from the FPU2000 in the following ways:
- it may be configured with only single or with both single and double
  precision operations support;
- it may be configured with division and square root opcodes;
- FSR register accumulates inValid, division by Zero, Overflow,
  Underflow and Inexact result flags of operations;
- QNaNs and SNaNs are handled properly;
- NaN propagation rules are different.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v3->v4:
- new patch split from the next one

 target/xtensa/cpu.h          |  2 ++
 target/xtensa/overlay_tool.h | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
index 960f6573447f..6fc1565000b6 100644
--- a/target/xtensa/cpu.h
+++ b/target/xtensa/cpu.h
@@ -52,6 +52,8 @@ enum {
     XTENSA_OPTION_COPROCESSOR,
     XTENSA_OPTION_BOOLEAN,
     XTENSA_OPTION_FP_COPROCESSOR,
+    XTENSA_OPTION_DFP_COPROCESSOR,
+    XTENSA_OPTION_DFPU_SINGLE_ONLY,
     XTENSA_OPTION_MP_SYNCHRO,
     XTENSA_OPTION_CONDITIONAL_STORE,
     XTENSA_OPTION_ATOMCTL,
diff --git a/target/xtensa/overlay_tool.h b/target/xtensa/overlay_tool.h
index eb9f08af0bf6..9f0846c86b65 100644
--- a/target/xtensa/overlay_tool.h
+++ b/target/xtensa/overlay_tool.h
@@ -39,6 +39,26 @@
 #define XCHAL_HAVE_DEPBITS 0
 #endif
 
+#ifndef XCHAL_HAVE_DFP
+#define XCHAL_HAVE_DFP 0
+#endif
+
+#ifndef XCHAL_HAVE_DFPU_SINGLE_ONLY
+#define XCHAL_HAVE_DFPU_SINGLE_ONLY 0
+#endif
+
+#ifndef XCHAL_HAVE_DFPU_SINGLE_DOUBLE
+#define XCHAL_HAVE_DFPU_SINGLE_DOUBLE XCHAL_HAVE_DFP
+#endif
+
+/*
+ * We need to know the type of FP unit, not only its precision.
+ * Unfortunately XCHAL macros don't tell this explicitly.
+ */
+#define XCHAL_HAVE_DFPU (XCHAL_HAVE_DFP || \
+                         XCHAL_HAVE_DFPU_SINGLE_ONLY || \
+                         XCHAL_HAVE_DFPU_SINGLE_DOUBLE)
+
 #ifndef XCHAL_HAVE_DIV32
 #define XCHAL_HAVE_DIV32 0
 #endif
@@ -99,6 +119,9 @@
     XCHAL_OPTION(XCHAL_HAVE_CP, XTENSA_OPTION_COPROCESSOR) | \
     XCHAL_OPTION(XCHAL_HAVE_BOOLEANS, XTENSA_OPTION_BOOLEAN) | \
     XCHAL_OPTION(XCHAL_HAVE_FP, XTENSA_OPTION_FP_COPROCESSOR) | \
+    XCHAL_OPTION(XCHAL_HAVE_DFPU, XTENSA_OPTION_DFP_COPROCESSOR) | \
+    XCHAL_OPTION(XCHAL_HAVE_DFPU_SINGLE_ONLY, \
+                 XTENSA_OPTION_DFPU_SINGLE_ONLY) | \
     XCHAL_OPTION(XCHAL_HAVE_RELEASE_SYNC, XTENSA_OPTION_MP_SYNCHRO) | \
     XCHAL_OPTION(XCHAL_HAVE_S32C1I, XTENSA_OPTION_CONDITIONAL_STORE) | \
     XCHAL_OPTION(((XCHAL_HAVE_S32C1I && XCHAL_HW_VERSION >= 230000) || \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 10/22] target/xtensa: add DFPU registers and opcodes
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (8 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 09/22] target/xtensa: add DFPU option Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 11/22] target/xtensa: implement FPU division and square root Max Filippov
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

DFPU may be configured with 32-bit or with 64-bit registers. Xtensa ISA
does not specify how single-precision values are stored in 64-bit
registers. Existing implementations store them in the low half of the
registers.
Add value extraction and write back to single-precision opcodes.
Add new double precision opcodes. Add 64-bit register file.
Add 64-bit values dumping to the xtensa_cpu_dump_state.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v3->v4:
- split into two patches
- add single-precision helpers that call set_use_first_nan
- call fpu2k helpers or the new helper depending on whether DFPU has
  only single precision configured.

 target/xtensa/cpu.c          |    5 +
 target/xtensa/cpu.h          |    3 +
 target/xtensa/fpu_helper.c   |  278 ++++++++-
 target/xtensa/helper.h       |   34 +-
 target/xtensa/overlay_tool.h |    1 +
 target/xtensa/translate.c    | 1126 +++++++++++++++++++++++++++++++++-
 6 files changed, 1413 insertions(+), 34 deletions(-)

diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
index 82c2ee0679f8..6a033e778c95 100644
--- a/target/xtensa/cpu.c
+++ b/target/xtensa/cpu.c
@@ -31,6 +31,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "cpu.h"
+#include "fpu/softfloat.h"
 #include "qemu/module.h"
 #include "migration/vmstate.h"
 
@@ -73,6 +74,8 @@ static void xtensa_cpu_reset(DeviceState *dev)
     XtensaCPU *cpu = XTENSA_CPU(s);
     XtensaCPUClass *xcc = XTENSA_CPU_GET_CLASS(cpu);
     CPUXtensaState *env = &cpu->env;
+    bool dfpu = xtensa_option_enabled(env->config,
+                                      XTENSA_OPTION_DFP_COPROCESSOR);
 
     xcc->parent_reset(dev);
 
@@ -104,6 +107,8 @@ static void xtensa_cpu_reset(DeviceState *dev)
     reset_mmu(env);
     s->halted = env->runstall;
 #endif
+    set_no_signaling_nans(!dfpu, &env->fp_status);
+    set_use_first_nan(!dfpu, &env->fp_status);
 }
 
 static ObjectClass *xtensa_cpu_class_by_name(const char *cpu_model)
diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
index 6fc1565000b6..3bd4f691c1a0 100644
--- a/target/xtensa/cpu.h
+++ b/target/xtensa/cpu.h
@@ -422,6 +422,7 @@ typedef struct XtensaOpcodeTranslators {
 
 extern const XtensaOpcodeTranslators xtensa_core_opcodes;
 extern const XtensaOpcodeTranslators xtensa_fpu2000_opcodes;
+extern const XtensaOpcodeTranslators xtensa_fpu_opcodes;
 
 struct XtensaConfig {
     const char *name;
@@ -484,6 +485,8 @@ struct XtensaConfig {
     unsigned n_mpu_fg_segments;
     unsigned n_mpu_bg_segments;
     const xtensa_mpu_entry *mpu_bg;
+
+    bool use_first_nan;
 };
 
 typedef struct XtensaConfigList {
diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
index 35dacbd14d68..b5faf34ad080 100644
--- a/target/xtensa/fpu_helper.c
+++ b/target/xtensa/fpu_helper.c
@@ -33,6 +33,30 @@
 #include "exec/exec-all.h"
 #include "fpu/softfloat.h"
 
+enum {
+    XTENSA_FP_I = 0x1,
+    XTENSA_FP_U = 0x2,
+    XTENSA_FP_O = 0x4,
+    XTENSA_FP_Z = 0x8,
+    XTENSA_FP_V = 0x10,
+};
+
+enum {
+    XTENSA_FCR_FLAGS_SHIFT = 2,
+    XTENSA_FSR_FLAGS_SHIFT = 7,
+};
+
+static const struct {
+    uint32_t xtensa_fp_flag;
+    int softfloat_fp_flag;
+} xtensa_fp_flag_map[] = {
+    { XTENSA_FP_I, float_flag_inexact, },
+    { XTENSA_FP_U, float_flag_underflow, },
+    { XTENSA_FP_O, float_flag_overflow, },
+    { XTENSA_FP_Z, float_flag_divbyzero, },
+    { XTENSA_FP_V, float_flag_invalid, },
+};
+
 void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
 {
     static const int rounding_mode[] = {
@@ -46,11 +70,72 @@ void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
     set_float_rounding_mode(rounding_mode[v & 3], &env->fp_status);
 }
 
+void HELPER(wur_fpu_fcr)(CPUXtensaState *env, uint32_t v)
+{
+    static const int rounding_mode[] = {
+        float_round_nearest_even,
+        float_round_to_zero,
+        float_round_up,
+        float_round_down,
+    };
+
+    if (v & 0xfffff000) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "MBZ field of FCR is written non-zero: %08x\n", v);
+    }
+    env->uregs[FCR] = v & 0x0000007f;
+    set_float_rounding_mode(rounding_mode[v & 3], &env->fp_status);
+}
+
+void HELPER(wur_fpu_fsr)(CPUXtensaState *env, uint32_t v)
+{
+    uint32_t flags = v >> XTENSA_FSR_FLAGS_SHIFT;
+    int fef = 0;
+    unsigned i;
+
+    if (v & 0xfffff000) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "MBZ field of FSR is written non-zero: %08x\n", v);
+    }
+    env->uregs[FSR] = v & 0x00000f80;
+    for (i = 0; i < ARRAY_SIZE(xtensa_fp_flag_map); ++i) {
+        if (flags & xtensa_fp_flag_map[i].xtensa_fp_flag) {
+            fef |= xtensa_fp_flag_map[i].softfloat_fp_flag;
+        }
+    }
+    set_float_exception_flags(fef, &env->fp_status);
+}
+
+uint32_t HELPER(rur_fpu_fsr)(CPUXtensaState *env)
+{
+    uint32_t flags = 0;
+    int fef = get_float_exception_flags(&env->fp_status);
+    unsigned i;
+
+    for (i = 0; i < ARRAY_SIZE(xtensa_fp_flag_map); ++i) {
+        if (fef & xtensa_fp_flag_map[i].softfloat_fp_flag) {
+            flags |= xtensa_fp_flag_map[i].xtensa_fp_flag;
+        }
+    }
+    env->uregs[FSR] = flags << XTENSA_FSR_FLAGS_SHIFT;
+    return flags << XTENSA_FSR_FLAGS_SHIFT;
+}
+
+float64 HELPER(abs_d)(float64 v)
+{
+    return float64_abs(v);
+}
+
 float32 HELPER(abs_s)(float32 v)
 {
     return float32_abs(v);
 }
 
+float64 HELPER(neg_d)(float64 v)
+{
+    return float64_chs(v);
+}
+
 float32 HELPER(neg_s)(float32 v)
 {
     return float32_chs(v);
@@ -84,28 +169,144 @@ float32 HELPER(fpu2k_msub_s)(CPUXtensaState *env,
                           &env->fp_status);
 }
 
-uint32_t HELPER(ftoi_s)(float32 v, uint32_t rounding_mode, uint32_t scale)
+float64 HELPER(add_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    set_use_first_nan(true, &env->fp_status);
+    return float64_add(a, b, &env->fp_status);
+}
+
+float32 HELPER(add_s)(CPUXtensaState *env, float32 a, float32 b)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_add(a, b, &env->fp_status);
+}
+
+float64 HELPER(sub_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    set_use_first_nan(true, &env->fp_status);
+    return float64_sub(a, b, &env->fp_status);
+}
+
+float32 HELPER(sub_s)(CPUXtensaState *env, float32 a, float32 b)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_sub(a, b, &env->fp_status);
+}
+
+float64 HELPER(mul_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    set_use_first_nan(true, &env->fp_status);
+    return float64_mul(a, b, &env->fp_status);
+}
+
+float32 HELPER(mul_s)(CPUXtensaState *env, float32 a, float32 b)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_mul(a, b, &env->fp_status);
+}
+
+float64 HELPER(madd_d)(CPUXtensaState *env, float64 a, float64 b, float64 c)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float64_muladd(b, c, a, 0, &env->fp_status);
+}
+
+float32 HELPER(madd_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_muladd(b, c, a, 0, &env->fp_status);
+}
+
+float64 HELPER(msub_d)(CPUXtensaState *env, float64 a, float64 b, float64 c)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float64_muladd(b, c, a, float_muladd_negate_product,
+                          &env->fp_status);
+}
+
+float32 HELPER(msub_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_muladd(b, c, a, float_muladd_negate_product,
+                          &env->fp_status);
+}
+
+uint32_t HELPER(ftoi_d)(CPUXtensaState *env, float64 v,
+                        uint32_t rounding_mode, uint32_t scale)
+{
+    float_status fp_status = env->fp_status;
+    uint32_t res;
+
+    set_float_rounding_mode(rounding_mode, &fp_status);
+    res = float64_to_int32(float64_scalbn(v, scale, &fp_status), &fp_status);
+    set_float_exception_flags(get_float_exception_flags(&fp_status),
+                              &env->fp_status);
+    return res;
+}
+
+uint32_t HELPER(ftoi_s)(CPUXtensaState *env, float32 v,
+                        uint32_t rounding_mode, uint32_t scale)
+{
+    float_status fp_status = env->fp_status;
+    uint32_t res;
+
+    set_float_rounding_mode(rounding_mode, &fp_status);
+    res = float32_to_int32(float32_scalbn(v, scale, &fp_status), &fp_status);
+    set_float_exception_flags(get_float_exception_flags(&fp_status),
+                              &env->fp_status);
+    return res;
+}
+
+uint32_t HELPER(ftoui_d)(CPUXtensaState *env, float64 v,
+                         uint32_t rounding_mode, uint32_t scale)
 {
-    float_status fp_status = {0};
+    float_status fp_status = env->fp_status;
+    float64 res;
+    uint32_t rv;
 
     set_float_rounding_mode(rounding_mode, &fp_status);
-    return float32_to_int32(float32_scalbn(v, scale, &fp_status), &fp_status);
+
+    res = float64_scalbn(v, scale, &fp_status);
+
+    if (float64_is_neg(v) && !float64_is_any_nan(v)) {
+        set_float_exception_flags(float_flag_invalid, &fp_status);
+        rv = float64_to_int32(res, &fp_status);
+    } else {
+        rv = float64_to_uint32(res, &fp_status);
+    }
+    set_float_exception_flags(get_float_exception_flags(&fp_status),
+                              &env->fp_status);
+    return rv;
 }
 
-uint32_t HELPER(ftoui_s)(float32 v, uint32_t rounding_mode, uint32_t scale)
+uint32_t HELPER(ftoui_s)(CPUXtensaState *env, float32 v,
+                         uint32_t rounding_mode, uint32_t scale)
 {
-    float_status fp_status = {0};
+    float_status fp_status = env->fp_status;
     float32 res;
+    uint32_t rv;
 
     set_float_rounding_mode(rounding_mode, &fp_status);
 
     res = float32_scalbn(v, scale, &fp_status);
 
     if (float32_is_neg(v) && !float32_is_any_nan(v)) {
-        return float32_to_int32(res, &fp_status);
+        rv = float32_to_int32(res, &fp_status);
+        if (rv) {
+            set_float_exception_flags(float_flag_invalid, &fp_status);
+        }
     } else {
-        return float32_to_uint32(res, &fp_status);
+        rv = float32_to_uint32(res, &fp_status);
     }
+    set_float_exception_flags(get_float_exception_flags(&fp_status),
+                              &env->fp_status);
+    return rv;
+}
+
+float64 HELPER(itof_d)(CPUXtensaState *env, uint32_t v, uint32_t scale)
+{
+    return float64_scalbn(int32_to_float64(v, &env->fp_status),
+                          (int32_t)scale, &env->fp_status);
 }
 
 float32 HELPER(itof_s)(CPUXtensaState *env, uint32_t v, uint32_t scale)
@@ -114,22 +315,56 @@ float32 HELPER(itof_s)(CPUXtensaState *env, uint32_t v, uint32_t scale)
                           (int32_t)scale, &env->fp_status);
 }
 
+float64 HELPER(uitof_d)(CPUXtensaState *env, uint32_t v, uint32_t scale)
+{
+    return float64_scalbn(uint32_to_float64(v, &env->fp_status),
+                          (int32_t)scale, &env->fp_status);
+}
+
 float32 HELPER(uitof_s)(CPUXtensaState *env, uint32_t v, uint32_t scale)
 {
     return float32_scalbn(uint32_to_float32(v, &env->fp_status),
                           (int32_t)scale, &env->fp_status);
 }
 
+float64 HELPER(cvtd_s)(CPUXtensaState *env, float32 v)
+{
+    return float32_to_float64(v, &env->fp_status);
+}
+
+float32 HELPER(cvts_d)(CPUXtensaState *env, float64 v)
+{
+    return float64_to_float32(v, &env->fp_status);
+}
+
+uint32_t HELPER(un_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    return float64_unordered_quiet(a, b, &env->fp_status);
+}
+
 uint32_t HELPER(un_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     return float32_unordered_quiet(a, b, &env->fp_status);
 }
 
+uint32_t HELPER(oeq_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    return float64_eq_quiet(a, b, &env->fp_status);
+}
+
 uint32_t HELPER(oeq_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     return float32_eq_quiet(a, b, &env->fp_status);
 }
 
+uint32_t HELPER(ueq_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    FloatRelation v = float64_compare_quiet(a, b, &env->fp_status);
+
+    return v == float_relation_equal ||
+           v == float_relation_unordered;
+}
+
 uint32_t HELPER(ueq_s)(CPUXtensaState *env, float32 a, float32 b)
 {
     FloatRelation v = float32_compare_quiet(a, b, &env->fp_status);
@@ -138,9 +373,22 @@ uint32_t HELPER(ueq_s)(CPUXtensaState *env, float32 a, float32 b)
            v == float_relation_unordered;
 }
 
+uint32_t HELPER(olt_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    return float64_lt(a, b, &env->fp_status);
+}
+
 uint32_t HELPER(olt_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    return float32_lt_quiet(a, b, &env->fp_status);
+    return float32_lt(a, b, &env->fp_status);
+}
+
+uint32_t HELPER(ult_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    FloatRelation v = float64_compare_quiet(a, b, &env->fp_status);
+
+    return v == float_relation_less ||
+           v == float_relation_unordered;
 }
 
 uint32_t HELPER(ult_s)(CPUXtensaState *env, float32 a, float32 b)
@@ -151,9 +399,21 @@ uint32_t HELPER(ult_s)(CPUXtensaState *env, float32 a, float32 b)
            v == float_relation_unordered;
 }
 
+uint32_t HELPER(ole_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    return float64_le(a, b, &env->fp_status);
+}
+
 uint32_t HELPER(ole_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    return float32_le_quiet(a, b, &env->fp_status);
+    return float32_le(a, b, &env->fp_status);
+}
+
+uint32_t HELPER(ule_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    FloatRelation v = float64_compare_quiet(a, b, &env->fp_status);
+
+    return v != float_relation_greater;
 }
 
 uint32_t HELPER(ule_s)(CPUXtensaState *env, float32 a, float32 b)
diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h
index 02c00d8461c0..095f754671ce 100644
--- a/target/xtensa/helper.h
+++ b/target/xtensa/helper.h
@@ -54,10 +54,11 @@ DEF_HELPER_3(fpu2k_sub_s, f32, env, f32, f32)
 DEF_HELPER_3(fpu2k_mul_s, f32, env, f32, f32)
 DEF_HELPER_4(fpu2k_madd_s, f32, env, f32, f32, f32)
 DEF_HELPER_4(fpu2k_msub_s, f32, env, f32, f32, f32)
-DEF_HELPER_FLAGS_3(ftoi_s, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
-DEF_HELPER_FLAGS_3(ftoui_s, TCG_CALL_NO_RWG_SE, i32, f32, i32, i32)
+DEF_HELPER_4(ftoi_s, i32, env, f32, i32, i32)
+DEF_HELPER_4(ftoui_s, i32, env, f32, i32, i32)
 DEF_HELPER_3(itof_s, f32, env, i32, i32)
 DEF_HELPER_3(uitof_s, f32, env, i32, i32)
+DEF_HELPER_2(cvtd_s, f64, env, f32)
 
 DEF_HELPER_3(un_s,  i32, env, f32, f32)
 DEF_HELPER_3(oeq_s, i32, env, f32, f32)
@@ -67,5 +68,34 @@ DEF_HELPER_3(ult_s, i32, env, f32, f32)
 DEF_HELPER_3(ole_s, i32, env, f32, f32)
 DEF_HELPER_3(ule_s, i32, env, f32, f32)
 
+DEF_HELPER_2(wur_fpu_fcr, void, env, i32)
+DEF_HELPER_1(rur_fpu_fsr, i32, env)
+DEF_HELPER_2(wur_fpu_fsr, void, env, i32)
+DEF_HELPER_FLAGS_1(abs_d, TCG_CALL_NO_RWG_SE, f64, f64)
+DEF_HELPER_FLAGS_1(neg_d, TCG_CALL_NO_RWG_SE, f64, f64)
+DEF_HELPER_3(add_d, f64, env, f64, f64)
+DEF_HELPER_3(add_s, f32, env, f32, f32)
+DEF_HELPER_3(sub_d, f64, env, f64, f64)
+DEF_HELPER_3(sub_s, f32, env, f32, f32)
+DEF_HELPER_3(mul_d, f64, env, f64, f64)
+DEF_HELPER_3(mul_s, f32, env, f32, f32)
+DEF_HELPER_4(madd_d, f64, env, f64, f64, f64)
+DEF_HELPER_4(madd_s, f32, env, f32, f32, f32)
+DEF_HELPER_4(msub_d, f64, env, f64, f64, f64)
+DEF_HELPER_4(msub_s, f32, env, f32, f32, f32)
+DEF_HELPER_4(ftoi_d, i32, env, f64, i32, i32)
+DEF_HELPER_4(ftoui_d, i32, env, f64, i32, i32)
+DEF_HELPER_3(itof_d, f64, env, i32, i32)
+DEF_HELPER_3(uitof_d, f64, env, i32, i32)
+DEF_HELPER_2(cvts_d, f32, env, f64)
+
+DEF_HELPER_3(un_d,  i32, env, f64, f64)
+DEF_HELPER_3(oeq_d, i32, env, f64, f64)
+DEF_HELPER_3(ueq_d, i32, env, f64, f64)
+DEF_HELPER_3(olt_d, i32, env, f64, f64)
+DEF_HELPER_3(ult_d, i32, env, f64, f64)
+DEF_HELPER_3(ole_d, i32, env, f64, f64)
+DEF_HELPER_3(ule_d, i32, env, f64, f64)
+
 DEF_HELPER_2(rer, i32, env, i32)
 DEF_HELPER_3(wer, void, env, i32, i32)
diff --git a/target/xtensa/overlay_tool.h b/target/xtensa/overlay_tool.h
index 9f0846c86b65..78720734fe92 100644
--- a/target/xtensa/overlay_tool.h
+++ b/target/xtensa/overlay_tool.h
@@ -538,6 +538,7 @@
     .ndepc = (XCHAL_XEA_VERSION >= 2), \
     .inst_fetch_width = XCHAL_INST_FETCH_WIDTH, \
     .max_insn_size = XCHAL_MAX_INSTRUCTION_SIZE, \
+    .use_first_nan = !XCHAL_HAVE_DFPU, \
     EXCEPTIONS_SECTION, \
     INTERRUPTS_SECTION, \
     TLB_SECTION, \
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 67a92379f9dc..fff29cc25dd1 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -79,6 +79,7 @@ struct DisasContext {
 static TCGv_i32 cpu_pc;
 static TCGv_i32 cpu_R[16];
 static TCGv_i32 cpu_FR[16];
+static TCGv_i64 cpu_FRD[16];
 static TCGv_i32 cpu_MR[4];
 static TCGv_i32 cpu_BR[16];
 static TCGv_i32 cpu_BR4[4];
@@ -169,6 +170,13 @@ void xtensa_translate_init(void)
                                            fregnames[i]);
     }
 
+    for (i = 0; i < 16; i++) {
+        cpu_FRD[i] = tcg_global_mem_new_i64(cpu_env,
+                                            offsetof(CPUXtensaState,
+                                                     fregs[i].f64),
+                                            fregnames[i]);
+    }
+
     for (i = 0; i < 4; i++) {
         cpu_MR[i] = tcg_global_mem_new_i32(cpu_env,
                                            offsetof(CPUXtensaState,
@@ -251,6 +259,8 @@ void **xtensa_get_regfile_by_name(const char *name, int entries, int bits)
 
         g_hash_table_insert(xtensa_regfile_table,
                             (void *)"FR 16x32", (void *)cpu_FR);
+        g_hash_table_insert(xtensa_regfile_table,
+                            (void *)"FR 16x64", (void *)cpu_FRD);
 
         g_hash_table_insert(xtensa_regfile_table,
                             (void *)"BR 16x1", (void *)cpu_BR);
@@ -1398,12 +1408,25 @@ void xtensa_cpu_dump_state(CPUState *cs, FILE *f, int flags)
         qemu_fprintf(f, "\n");
 
         for (i = 0; i < 16; ++i) {
-            qemu_fprintf(f, "F%02d=%08x (%+10.8e)%c", i,
+            qemu_fprintf(f, "F%02d=%08x (%-+15.8e)%c", i,
                          float32_val(env->fregs[i].f32[FP_F32_LOW]),
                          *(float *)(env->fregs[i].f32 + FP_F32_LOW),
                          (i % 2) == 1 ? '\n' : ' ');
         }
     }
+
+    if ((flags & CPU_DUMP_FPU) &&
+        xtensa_option_enabled(env->config, XTENSA_OPTION_DFP_COPROCESSOR) &&
+        !xtensa_option_enabled(env->config, XTENSA_OPTION_DFPU_SINGLE_ONLY)) {
+        qemu_fprintf(f, "\n");
+
+        for (i = 0; i < 16; ++i) {
+            qemu_fprintf(f, "F%02d=%016"PRIx64" (%-+24.16le)%c", i,
+                         float64_val(env->fregs[i].f64),
+                         *(double *)(&env->fregs[i].f64),
+                         (i % 2) == 1 ? '\n' : ' ');
+        }
+    }
 }
 
 void restore_state_to_opc(CPUXtensaState *env, TranslationBlock *tb,
@@ -6293,10 +6316,138 @@ const XtensaOpcodeTranslators xtensa_core_opcodes = {
 };
 
 
+static inline void get_f32_o1_i3(const OpcodeArg *arg, OpcodeArg *arg32,
+                                 int o0, int i0, int i1, int i2)
+{
+    if ((i0 >= 0 && arg[i0].num_bits == 64) ||
+        (o0 >= 0 && arg[o0].num_bits == 64)) {
+        if (o0 >= 0) {
+            arg32[o0].out = tcg_temp_new_i32();
+        }
+        if (i0 >= 0) {
+            arg32[i0].in = tcg_temp_new_i32();
+            tcg_gen_extrl_i64_i32(arg32[i0].in, arg[i0].in);
+        }
+        if (i1 >= 0) {
+            arg32[i1].in = tcg_temp_new_i32();
+            tcg_gen_extrl_i64_i32(arg32[i1].in, arg[i1].in);
+        }
+        if (i2 >= 0) {
+            arg32[i2].in = tcg_temp_new_i32();
+            tcg_gen_extrl_i64_i32(arg32[i2].in, arg[i2].in);
+        }
+    } else {
+        if (o0 >= 0) {
+            arg32[o0].out = arg[o0].out;
+        }
+        if (i0 >= 0) {
+            arg32[i0].in = arg[i0].in;
+        }
+        if (i1 >= 0) {
+            arg32[i1].in = arg[i1].in;
+        }
+        if (i2 >= 0) {
+            arg32[i2].in = arg[i2].in;
+        }
+    }
+}
+
+static inline void put_f32_o1_i3(const OpcodeArg *arg, const OpcodeArg *arg32,
+                                 int o0, int i0, int i1, int i2)
+{
+    if ((i0 >= 0 && arg[i0].num_bits == 64) ||
+        (o0 >= 0 && arg[o0].num_bits == 64)) {
+        if (o0 >= 0) {
+            tcg_gen_extu_i32_i64(arg[o0].out, arg32[o0].out);
+            tcg_temp_free_i32(arg32[o0].out);
+        }
+        if (i0 >= 0) {
+            tcg_temp_free_i32(arg32[i0].in);
+        }
+        if (i1 >= 0) {
+            tcg_temp_free_i32(arg32[i1].in);
+        }
+        if (i2 >= 0) {
+            tcg_temp_free_i32(arg32[i2].in);
+        }
+    }
+}
+
+static inline void get_f32_o1_i2(const OpcodeArg *arg, OpcodeArg *arg32,
+                                 int o0, int i0, int i1)
+{
+    get_f32_o1_i3(arg, arg32, o0, i0, i1, -1);
+}
+
+static inline void put_f32_o1_i2(const OpcodeArg *arg, const OpcodeArg *arg32,
+                                 int o0, int i0, int i1)
+{
+    put_f32_o1_i3(arg, arg32, o0, i0, i1, -1);
+}
+
+static inline void get_f32_o1_i1(const OpcodeArg *arg, OpcodeArg *arg32,
+                                 int o0, int i0)
+{
+    get_f32_o1_i2(arg, arg32, o0, i0, -1);
+}
+
+static inline void put_f32_o1_i1(const OpcodeArg *arg, const OpcodeArg *arg32,
+                                 int o0, int i0)
+{
+    put_f32_o1_i2(arg, arg32, o0, i0, -1);
+}
+
+static inline void get_f32_o1(const OpcodeArg *arg, OpcodeArg *arg32,
+                              int o0)
+{
+    get_f32_o1_i1(arg, arg32, o0, -1);
+}
+
+static inline void put_f32_o1(const OpcodeArg *arg, const OpcodeArg *arg32,
+                              int o0)
+{
+    put_f32_o1_i1(arg, arg32, o0, -1);
+}
+
+static inline void get_f32_i2(const OpcodeArg *arg, OpcodeArg *arg32,
+                              int i0, int i1)
+{
+    get_f32_o1_i2(arg, arg32, -1, i0, i1);
+}
+
+static inline void put_f32_i2(const OpcodeArg *arg, const OpcodeArg *arg32,
+                              int i0, int i1)
+{
+    put_f32_o1_i2(arg, arg32, -1, i0, i1);
+}
+
+static inline void get_f32_i1(const OpcodeArg *arg, OpcodeArg *arg32,
+                              int i0)
+{
+    get_f32_i2(arg, arg32, i0, -1);
+}
+
+static inline void put_f32_i1(const OpcodeArg *arg, const OpcodeArg *arg32,
+                              int i0)
+{
+    put_f32_i2(arg, arg32, i0, -1);
+}
+
+
+static void translate_abs_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    gen_helper_abs_d(arg[0].out, arg[1].in);
+}
+
 static void translate_abs_s(DisasContext *dc, const OpcodeArg arg[],
                             const uint32_t par[])
 {
-    gen_helper_abs_s(arg[0].out, arg[1].in);
+    OpcodeArg arg32[2];
+
+    get_f32_o1_i1(arg, arg32, 0, 1);
+    gen_helper_abs_s(arg32[0].out, arg32[1].in);
+    put_f32_o1_i1(arg, arg32, 0, 1);
 }
 
 static void translate_fpu2k_add_s(DisasContext *dc, const OpcodeArg arg[],
@@ -6316,6 +6467,37 @@ enum {
     COMPARE_ULE,
 };
 
+static void translate_compare_d(DisasContext *dc, const OpcodeArg arg[],
+                                const uint32_t par[])
+{
+    static void (* const helper[])(TCGv_i32 res, TCGv_env env,
+                                   TCGv_i64 s, TCGv_i64 t) = {
+        [COMPARE_UN] = gen_helper_un_d,
+        [COMPARE_OEQ] = gen_helper_oeq_d,
+        [COMPARE_UEQ] = gen_helper_ueq_d,
+        [COMPARE_OLT] = gen_helper_olt_d,
+        [COMPARE_ULT] = gen_helper_ult_d,
+        [COMPARE_OLE] = gen_helper_ole_d,
+        [COMPARE_ULE] = gen_helper_ule_d,
+    };
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 res = tcg_temp_new_i32();
+    TCGv_i32 set_br = tcg_temp_new_i32();
+    TCGv_i32 clr_br = tcg_temp_new_i32();
+
+    tcg_gen_ori_i32(set_br, arg[0].in, 1 << arg[0].imm);
+    tcg_gen_andi_i32(clr_br, arg[0].in, ~(1 << arg[0].imm));
+
+    helper[par[0]](res, cpu_env, arg[1].in, arg[2].in);
+    tcg_gen_movcond_i32(TCG_COND_NE,
+                        arg[0].out, res, zero,
+                        set_br, clr_br);
+    tcg_temp_free(zero);
+    tcg_temp_free(res);
+    tcg_temp_free(set_br);
+    tcg_temp_free(clr_br);
+}
+
 static void translate_compare_s(DisasContext *dc, const OpcodeArg arg[],
                                 const uint32_t par[])
 {
@@ -6329,6 +6511,7 @@ static void translate_compare_s(DisasContext *dc, const OpcodeArg arg[],
         [COMPARE_OLE] = gen_helper_ole_s,
         [COMPARE_ULE] = gen_helper_ule_s,
     };
+    OpcodeArg arg32[3];
     TCGv_i32 zero = tcg_const_i32(0);
     TCGv_i32 res = tcg_temp_new_i32();
     TCGv_i32 set_br = tcg_temp_new_i32();
@@ -6337,26 +6520,101 @@ static void translate_compare_s(DisasContext *dc, const OpcodeArg arg[],
     tcg_gen_ori_i32(set_br, arg[0].in, 1 << arg[0].imm);
     tcg_gen_andi_i32(clr_br, arg[0].in, ~(1 << arg[0].imm));
 
-    helper[par[0]](res, cpu_env, arg[1].in, arg[2].in);
+    get_f32_i2(arg, arg32, 1, 2);
+    helper[par[0]](res, cpu_env, arg32[1].in, arg32[2].in);
     tcg_gen_movcond_i32(TCG_COND_NE,
                         arg[0].out, res, zero,
                         set_br, clr_br);
+    put_f32_i2(arg, arg32, 1, 2);
     tcg_temp_free(zero);
     tcg_temp_free(res);
     tcg_temp_free(set_br);
     tcg_temp_free(clr_br);
 }
 
+static void translate_const_d(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    static const uint64_t v[] = {
+        UINT64_C(0x0000000000000000),
+        UINT64_C(0x3ff0000000000000),
+        UINT64_C(0x4000000000000000),
+        UINT64_C(0x3fe0000000000000),
+    };
+
+    tcg_gen_movi_i64(arg[0].out, v[arg[1].imm % ARRAY_SIZE(v)]);
+    if (arg[1].imm >= ARRAY_SIZE(v)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "const.d f%d, #%d, immediate value is reserved\n",
+                      arg[0].imm, arg[1].imm);
+    }
+}
+
+static void translate_const_s(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    static const uint32_t v[] = {
+        0x00000000,
+        0x3f800000,
+        0x40000000,
+        0x3f000000,
+    };
+
+    if (arg[0].num_bits == 32) {
+        tcg_gen_movi_i32(arg[0].out, v[arg[1].imm % ARRAY_SIZE(v)]);
+    } else {
+        tcg_gen_movi_i64(arg[0].out, v[arg[1].imm % ARRAY_SIZE(v)]);
+    }
+    if (arg[1].imm >= ARRAY_SIZE(v)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "const.s f%d, #%d, immediate value is reserved\n",
+                      arg[0].imm, arg[1].imm);
+    }
+}
+
+static void translate_float_d(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    TCGv_i32 scale = tcg_const_i32(-arg[2].imm);
+
+    if (par[0]) {
+        gen_helper_uitof_d(arg[0].out, cpu_env, arg[1].in, scale);
+    } else {
+        gen_helper_itof_d(arg[0].out, cpu_env, arg[1].in, scale);
+    }
+    tcg_temp_free(scale);
+}
+
 static void translate_float_s(DisasContext *dc, const OpcodeArg arg[],
                               const uint32_t par[])
 {
     TCGv_i32 scale = tcg_const_i32(-arg[2].imm);
+    OpcodeArg arg32[1];
 
+    get_f32_o1(arg, arg32, 0);
     if (par[0]) {
-        gen_helper_uitof_s(arg[0].out, cpu_env, arg[1].in, scale);
+        gen_helper_uitof_s(arg32[0].out, cpu_env, arg[1].in, scale);
+    } else {
+        gen_helper_itof_s(arg32[0].out, cpu_env, arg[1].in, scale);
+    }
+    put_f32_o1(arg, arg32, 0);
+    tcg_temp_free(scale);
+}
+
+static void translate_ftoi_d(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    TCGv_i32 rounding_mode = tcg_const_i32(par[0]);
+    TCGv_i32 scale = tcg_const_i32(arg[2].imm);
+
+    if (par[1]) {
+        gen_helper_ftoui_d(arg[0].out, cpu_env, arg[1].in,
+                           rounding_mode, scale);
     } else {
-        gen_helper_itof_s(arg[0].out, cpu_env, arg[1].in, scale);
+        gen_helper_ftoi_d(arg[0].out, cpu_env, arg[1].in,
+                          rounding_mode, scale);
     }
+    tcg_temp_free(rounding_mode);
     tcg_temp_free(scale);
 }
 
@@ -6365,14 +6623,17 @@ static void translate_ftoi_s(DisasContext *dc, const OpcodeArg arg[],
 {
     TCGv_i32 rounding_mode = tcg_const_i32(par[0]);
     TCGv_i32 scale = tcg_const_i32(arg[2].imm);
+    OpcodeArg arg32[2];
 
+    get_f32_i1(arg, arg32, 1);
     if (par[1]) {
-        gen_helper_ftoui_s(arg[0].out, arg[1].in,
+        gen_helper_ftoui_s(arg[0].out, cpu_env, arg32[1].in,
                            rounding_mode, scale);
     } else {
-        gen_helper_ftoi_s(arg[0].out, arg[1].in,
+        gen_helper_ftoi_s(arg[0].out, cpu_env, arg32[1].in,
                           rounding_mode, scale);
     }
+    put_f32_i1(arg, arg32, 1);
     tcg_temp_free(rounding_mode);
     tcg_temp_free(scale);
 }
@@ -6420,35 +6681,84 @@ static void translate_fpu2k_madd_s(DisasContext *dc, const OpcodeArg arg[],
                             arg[0].in, arg[1].in, arg[2].in);
 }
 
+static void translate_mov_d(DisasContext *dc, const OpcodeArg arg[],
+                                const uint32_t par[])
+{
+    tcg_gen_mov_i64(arg[0].out, arg[1].in);
+}
+
 static void translate_mov_s(DisasContext *dc, const OpcodeArg arg[],
                             const uint32_t par[])
 {
-    tcg_gen_mov_i32(arg[0].out, arg[1].in);
+    if (arg[0].num_bits == 32) {
+        tcg_gen_mov_i32(arg[0].out, arg[1].in);
+    } else {
+        tcg_gen_mov_i64(arg[0].out, arg[1].in);
+    }
+}
+
+static void translate_movcond_d(DisasContext *dc, const OpcodeArg arg[],
+                                const uint32_t par[])
+{
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 arg2 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(arg2, arg[2].in);
+    tcg_gen_movcond_i64(par[0], arg[0].out,
+                        arg2, zero,
+                        arg[1].in, arg[0].in);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(arg2);
 }
 
 static void translate_movcond_s(DisasContext *dc, const OpcodeArg arg[],
                                 const uint32_t par[])
 {
-    TCGv_i32 zero = tcg_const_i32(0);
+    if (arg[0].num_bits == 32) {
+        TCGv_i32 zero = tcg_const_i32(0);
 
-    tcg_gen_movcond_i32(par[0], arg[0].out,
-                        arg[2].in, zero,
+        tcg_gen_movcond_i32(par[0], arg[0].out,
+                            arg[2].in, zero,
+                            arg[1].in, arg[0].in);
+        tcg_temp_free(zero);
+    } else {
+        translate_movcond_d(dc, arg, par);
+    }
+}
+
+static void translate_movp_d(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i32 tmp1 = tcg_temp_new_i32();
+    TCGv_i64 tmp2 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i32(tmp1, arg[2].in, 1 << arg[2].imm);
+    tcg_gen_extu_i32_i64(tmp2, tmp1);
+    tcg_gen_movcond_i64(par[0],
+                        arg[0].out, tmp2, zero,
                         arg[1].in, arg[0].in);
-    tcg_temp_free(zero);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i32(tmp1);
+    tcg_temp_free_i64(tmp2);
 }
 
 static void translate_movp_s(DisasContext *dc, const OpcodeArg arg[],
                              const uint32_t par[])
 {
-    TCGv_i32 zero = tcg_const_i32(0);
-    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (arg[0].num_bits == 32) {
+        TCGv_i32 zero = tcg_const_i32(0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
 
-    tcg_gen_andi_i32(tmp, arg[2].in, 1 << arg[2].imm);
-    tcg_gen_movcond_i32(par[0],
-                        arg[0].out, tmp, zero,
-                        arg[1].in, arg[0].in);
-    tcg_temp_free(tmp);
-    tcg_temp_free(zero);
+        tcg_gen_andi_i32(tmp, arg[2].in, 1 << arg[2].imm);
+        tcg_gen_movcond_i32(par[0],
+                            arg[0].out, tmp, zero,
+                            arg[1].in, arg[0].in);
+        tcg_temp_free(tmp);
+        tcg_temp_free(zero);
+    } else {
+        translate_movp_d(dc, arg, par);
+    }
 }
 
 static void translate_fpu2k_mul_s(DisasContext *dc, const OpcodeArg arg[],
@@ -6465,16 +6775,36 @@ static void translate_fpu2k_msub_s(DisasContext *dc, const OpcodeArg arg[],
                             arg[0].in, arg[1].in, arg[2].in);
 }
 
+static void translate_neg_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    gen_helper_neg_d(arg[0].out, arg[1].in);
+}
+
 static void translate_neg_s(DisasContext *dc, const OpcodeArg arg[],
                             const uint32_t par[])
 {
-    gen_helper_neg_s(arg[0].out, arg[1].in);
+    OpcodeArg arg32[2];
+
+    get_f32_o1_i1(arg, arg32, 0, 1);
+    gen_helper_neg_s(arg32[0].out, arg32[1].in);
+    put_f32_o1_i1(arg, arg32, 0, 1);
+}
+
+static void translate_rfr_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    tcg_gen_extrh_i64_i32(arg[0].out, arg[1].in);
 }
 
 static void translate_rfr_s(DisasContext *dc, const OpcodeArg arg[],
                             const uint32_t par[])
 {
-    tcg_gen_mov_i32(arg[0].out, arg[1].in);
+    if (arg[1].num_bits == 32) {
+        tcg_gen_mov_i32(arg[0].out, arg[1].in);
+    } else {
+        tcg_gen_extrl_i64_i32(arg[0].out, arg[1].in);
+    }
 }
 
 static void translate_fpu2k_sub_s(DisasContext *dc, const OpcodeArg arg[],
@@ -6484,10 +6814,20 @@ static void translate_fpu2k_sub_s(DisasContext *dc, const OpcodeArg arg[],
                            arg[1].in, arg[2].in);
 }
 
+static void translate_wfr_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    tcg_gen_concat_i32_i64(arg[0].out, arg[2].in, arg[1].in);
+}
+
 static void translate_wfr_s(DisasContext *dc, const OpcodeArg arg[],
                             const uint32_t par[])
 {
-    tcg_gen_mov_i32(arg[0].out, arg[1].in);
+    if (arg[0].num_bits == 32) {
+        tcg_gen_mov_i32(arg[0].out, arg[1].in);
+    } else {
+        tcg_gen_ext_i32_i64(arg[0].out, arg[1].in);
+    }
 }
 
 static void translate_wur_fpu2k_fcr(DisasContext *dc, const OpcodeArg arg[],
@@ -6718,3 +7058,743 @@ const XtensaOpcodeTranslators xtensa_fpu2000_opcodes = {
     .num_opcodes = ARRAY_SIZE(fpu2000_ops),
     .opcode = fpu2000_ops,
 };
+
+static void translate_add_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    gen_helper_add_d(arg[0].out, cpu_env, arg[1].in, arg[2].in);
+}
+
+static void translate_add_s(DisasContext *dc, const OpcodeArg arg[],
+                                const uint32_t par[])
+{
+    if (option_enabled(dc, XTENSA_OPTION_DFPU_SINGLE_ONLY)) {
+        gen_helper_fpu2k_add_s(arg[0].out, cpu_env,
+                               arg[1].in, arg[2].in);
+    } else {
+        OpcodeArg arg32[3];
+
+        get_f32_o1_i2(arg, arg32, 0, 1, 2);
+        gen_helper_add_s(arg32[0].out, cpu_env, arg32[1].in, arg32[2].in);
+        put_f32_o1_i2(arg, arg32, 0, 1, 2);
+    }
+}
+
+static void translate_cvtd_s(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    TCGv_i32 v = tcg_temp_new_i32();
+
+    tcg_gen_extrl_i64_i32(v, arg[1].in);
+    gen_helper_cvtd_s(arg[0].out, cpu_env, v);
+    tcg_temp_free_i32(v);
+}
+
+static void translate_cvts_d(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    TCGv_i32 v = tcg_temp_new_i32();
+
+    gen_helper_cvts_d(v, cpu_env, arg[1].in);
+    tcg_gen_extu_i32_i64(arg[0].out, v);
+    tcg_temp_free_i32(v);
+}
+
+static void translate_ldsti_d(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    TCGv_i32 addr;
+
+    if (par[1]) {
+        addr = tcg_temp_new_i32();
+        tcg_gen_addi_i32(addr, arg[1].in, arg[2].imm);
+    } else {
+        addr = arg[1].in;
+    }
+    gen_load_store_alignment(dc, 3, addr, false);
+    if (par[0]) {
+        tcg_gen_qemu_st64(arg[0].in, addr, dc->cring);
+    } else {
+        tcg_gen_qemu_ld64(arg[0].out, addr, dc->cring);
+    }
+    if (par[2]) {
+        if (par[1]) {
+            tcg_gen_mov_i32(arg[1].out, addr);
+        } else {
+            tcg_gen_addi_i32(arg[1].out, arg[1].in, arg[2].imm);
+        }
+    }
+    if (par[1]) {
+        tcg_temp_free(addr);
+    }
+}
+
+static void translate_ldsti_s(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    TCGv_i32 addr;
+    OpcodeArg arg32[1];
+
+    if (par[1]) {
+        addr = tcg_temp_new_i32();
+        tcg_gen_addi_i32(addr, arg[1].in, arg[2].imm);
+    } else {
+        addr = arg[1].in;
+    }
+    gen_load_store_alignment(dc, 2, addr, false);
+    if (par[0]) {
+        get_f32_i1(arg, arg32, 0);
+        tcg_gen_qemu_st32(arg32[0].in, addr, dc->cring);
+        put_f32_i1(arg, arg32, 0);
+    } else {
+        get_f32_o1(arg, arg32, 0);
+        tcg_gen_qemu_ld32u(arg32[0].out, addr, dc->cring);
+        put_f32_o1(arg, arg32, 0);
+    }
+    if (par[2]) {
+        if (par[1]) {
+            tcg_gen_mov_i32(arg[1].out, addr);
+        } else {
+            tcg_gen_addi_i32(arg[1].out, arg[1].in, arg[2].imm);
+        }
+    }
+    if (par[1]) {
+        tcg_temp_free(addr);
+    }
+}
+
+static void translate_ldstx_d(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    TCGv_i32 addr;
+
+    if (par[1]) {
+        addr = tcg_temp_new_i32();
+        tcg_gen_add_i32(addr, arg[1].in, arg[2].in);
+    } else {
+        addr = arg[1].in;
+    }
+    gen_load_store_alignment(dc, 3, addr, false);
+    if (par[0]) {
+        tcg_gen_qemu_st64(arg[0].in, addr, dc->cring);
+    } else {
+        tcg_gen_qemu_ld64(arg[0].out, addr, dc->cring);
+    }
+    if (par[2]) {
+        if (par[1]) {
+            tcg_gen_mov_i32(arg[1].out, addr);
+        } else {
+            tcg_gen_add_i32(arg[1].out, arg[1].in, arg[2].in);
+        }
+    }
+    if (par[1]) {
+        tcg_temp_free(addr);
+    }
+}
+
+static void translate_ldstx_s(DisasContext *dc, const OpcodeArg arg[],
+                              const uint32_t par[])
+{
+    TCGv_i32 addr;
+    OpcodeArg arg32[1];
+
+    if (par[1]) {
+        addr = tcg_temp_new_i32();
+        tcg_gen_add_i32(addr, arg[1].in, arg[2].in);
+    } else {
+        addr = arg[1].in;
+    }
+    gen_load_store_alignment(dc, 2, addr, false);
+    if (par[0]) {
+        get_f32_i1(arg, arg32, 0);
+        tcg_gen_qemu_st32(arg32[0].in, addr, dc->cring);
+        put_f32_i1(arg, arg32, 0);
+    } else {
+        get_f32_o1(arg, arg32, 0);
+        tcg_gen_qemu_ld32u(arg32[0].out, addr, dc->cring);
+        put_f32_o1(arg, arg32, 0);
+    }
+    if (par[2]) {
+        if (par[1]) {
+            tcg_gen_mov_i32(arg[1].out, addr);
+        } else {
+            tcg_gen_add_i32(arg[1].out, arg[1].in, arg[2].in);
+        }
+    }
+    if (par[1]) {
+        tcg_temp_free(addr);
+    }
+}
+
+static void translate_madd_d(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    gen_helper_madd_d(arg[0].out, cpu_env,
+                      arg[0].in, arg[1].in, arg[2].in);
+}
+
+static void translate_madd_s(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    if (option_enabled(dc, XTENSA_OPTION_DFPU_SINGLE_ONLY)) {
+        gen_helper_fpu2k_madd_s(arg[0].out, cpu_env,
+                                arg[0].in, arg[1].in, arg[2].in);
+    } else {
+        OpcodeArg arg32[3];
+
+        get_f32_o1_i3(arg, arg32, 0, 0, 1, 2);
+        gen_helper_madd_s(arg32[0].out, cpu_env,
+                          arg32[0].in, arg32[1].in, arg32[2].in);
+        put_f32_o1_i3(arg, arg32, 0, 0, 1, 2);
+    }
+}
+
+static void translate_mul_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    gen_helper_mul_d(arg[0].out, cpu_env, arg[1].in, arg[2].in);
+}
+
+static void translate_mul_s(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    if (option_enabled(dc, XTENSA_OPTION_DFPU_SINGLE_ONLY)) {
+        gen_helper_fpu2k_mul_s(arg[0].out, cpu_env,
+                               arg[1].in, arg[2].in);
+    } else {
+        OpcodeArg arg32[3];
+
+        get_f32_o1_i2(arg, arg32, 0, 1, 2);
+        gen_helper_mul_s(arg32[0].out, cpu_env, arg32[1].in, arg32[2].in);
+        put_f32_o1_i2(arg, arg32, 0, 1, 2);
+    }
+}
+
+static void translate_msub_d(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    gen_helper_msub_d(arg[0].out, cpu_env,
+                      arg[0].in, arg[1].in, arg[2].in);
+}
+
+static void translate_msub_s(DisasContext *dc, const OpcodeArg arg[],
+                             const uint32_t par[])
+{
+    if (option_enabled(dc, XTENSA_OPTION_DFPU_SINGLE_ONLY)) {
+        gen_helper_fpu2k_msub_s(arg[0].out, cpu_env,
+                                arg[0].in, arg[1].in, arg[2].in);
+    } else {
+        OpcodeArg arg32[3];
+
+        get_f32_o1_i3(arg, arg32, 0, 0, 1, 2);
+        gen_helper_msub_s(arg32[0].out, cpu_env,
+                          arg32[0].in, arg32[1].in, arg32[2].in);
+        put_f32_o1_i3(arg, arg32, 0, 0, 1, 2);
+    }
+}
+
+static void translate_sub_d(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    gen_helper_sub_d(arg[0].out, cpu_env, arg[1].in, arg[2].in);
+}
+
+static void translate_sub_s(DisasContext *dc, const OpcodeArg arg[],
+                            const uint32_t par[])
+{
+    if (option_enabled(dc, XTENSA_OPTION_DFPU_SINGLE_ONLY)) {
+        gen_helper_fpu2k_sub_s(arg[0].out, cpu_env,
+                               arg[1].in, arg[2].in);
+    } else {
+        OpcodeArg arg32[3];
+
+        get_f32_o1_i2(arg, arg32, 0, 1, 2);
+        gen_helper_sub_s(arg32[0].out, cpu_env, arg32[1].in, arg32[2].in);
+        put_f32_o1_i2(arg, arg32, 0, 1, 2);
+    }
+}
+
+static void translate_wur_fpu_fcr(DisasContext *dc, const OpcodeArg arg[],
+                                  const uint32_t par[])
+{
+    gen_helper_wur_fpu_fcr(cpu_env, arg[0].in);
+}
+
+static void translate_rur_fpu_fsr(DisasContext *dc, const OpcodeArg arg[],
+                                  const uint32_t par[])
+{
+    gen_helper_rur_fpu_fsr(arg[0].out, cpu_env);
+}
+
+static void translate_wur_fpu_fsr(DisasContext *dc, const OpcodeArg arg[],
+                                  const uint32_t par[])
+{
+    gen_helper_wur_fpu_fsr(cpu_env, arg[0].in);
+}
+
+static const XtensaOpcodeOps fpu_ops[] = {
+    {
+        .name = "abs.d",
+        .translate = translate_abs_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "abs.s",
+        .translate = translate_abs_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "add.d",
+        .translate = translate_add_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "add.s",
+        .translate = translate_add_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ceil.d",
+        .translate = translate_ftoi_d,
+        .par = (const uint32_t[]){float_round_up, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ceil.s",
+        .translate = translate_ftoi_s,
+        .par = (const uint32_t[]){float_round_up, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "const.d",
+        .translate = translate_const_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "const.s",
+        .translate = translate_const_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "cvtd.s",
+        .translate = translate_cvtd_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "cvts.d",
+        .translate = translate_cvts_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "float.d",
+        .translate = translate_float_d,
+        .par = (const uint32_t[]){false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "float.s",
+        .translate = translate_float_s,
+        .par = (const uint32_t[]){false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "floor.d",
+        .translate = translate_ftoi_d,
+        .par = (const uint32_t[]){float_round_down, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "floor.s",
+        .translate = translate_ftoi_s,
+        .par = (const uint32_t[]){float_round_down, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ldi",
+        .translate = translate_ldsti_d,
+        .par = (const uint32_t[]){false, true, false},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ldip",
+        .translate = translate_ldsti_d,
+        .par = (const uint32_t[]){false, false, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ldiu",
+        .translate = translate_ldsti_d,
+        .par = (const uint32_t[]){false, true, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ldx",
+        .translate = translate_ldstx_d,
+        .par = (const uint32_t[]){false, true, false},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ldxp",
+        .translate = translate_ldstx_d,
+        .par = (const uint32_t[]){false, false, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ldxu",
+        .translate = translate_ldstx_d,
+        .par = (const uint32_t[]){false, true, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "lsi",
+        .translate = translate_ldsti_s,
+        .par = (const uint32_t[]){false, true, false},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "lsip",
+        .translate = translate_ldsti_s,
+        .par = (const uint32_t[]){false, false, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "lsiu",
+        .translate = translate_ldsti_s,
+        .par = (const uint32_t[]){false, true, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "lsx",
+        .translate = translate_ldstx_s,
+        .par = (const uint32_t[]){false, true, false},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "lsxp",
+        .translate = translate_ldstx_s,
+        .par = (const uint32_t[]){false, false, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "lsxu",
+        .translate = translate_ldstx_s,
+        .par = (const uint32_t[]){false, true, true},
+        .op_flags = XTENSA_OP_LOAD,
+        .coprocessor = 0x1,
+    }, {
+        .name = "madd.d",
+        .translate = translate_madd_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "madd.s",
+        .translate = translate_madd_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mov.d",
+        .translate = translate_mov_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mov.s",
+        .translate = translate_mov_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "moveqz.d",
+        .translate = translate_movcond_d,
+        .par = (const uint32_t[]){TCG_COND_EQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "moveqz.s",
+        .translate = translate_movcond_s,
+        .par = (const uint32_t[]){TCG_COND_EQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movf.d",
+        .translate = translate_movp_d,
+        .par = (const uint32_t[]){TCG_COND_EQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movf.s",
+        .translate = translate_movp_s,
+        .par = (const uint32_t[]){TCG_COND_EQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movgez.d",
+        .translate = translate_movcond_d,
+        .par = (const uint32_t[]){TCG_COND_GE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movgez.s",
+        .translate = translate_movcond_s,
+        .par = (const uint32_t[]){TCG_COND_GE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movltz.d",
+        .translate = translate_movcond_d,
+        .par = (const uint32_t[]){TCG_COND_LT},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movltz.s",
+        .translate = translate_movcond_s,
+        .par = (const uint32_t[]){TCG_COND_LT},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movnez.d",
+        .translate = translate_movcond_d,
+        .par = (const uint32_t[]){TCG_COND_NE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movnez.s",
+        .translate = translate_movcond_s,
+        .par = (const uint32_t[]){TCG_COND_NE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movt.d",
+        .translate = translate_movp_d,
+        .par = (const uint32_t[]){TCG_COND_NE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "movt.s",
+        .translate = translate_movp_s,
+        .par = (const uint32_t[]){TCG_COND_NE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "msub.d",
+        .translate = translate_msub_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "msub.s",
+        .translate = translate_msub_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mul.d",
+        .translate = translate_mul_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mul.s",
+        .translate = translate_mul_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "neg.d",
+        .translate = translate_neg_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "neg.s",
+        .translate = translate_neg_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "oeq.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_OEQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "oeq.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_OEQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ole.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_OLE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ole.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_OLE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "olt.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_OLT},
+        .coprocessor = 0x1,
+    }, {
+        .name = "olt.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_OLT},
+        .coprocessor = 0x1,
+    }, {
+        .name = "rfr",
+        .translate = translate_rfr_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "rfrd",
+        .translate = translate_rfr_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "round.d",
+        .translate = translate_ftoi_d,
+        .par = (const uint32_t[]){float_round_nearest_even, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "round.s",
+        .translate = translate_ftoi_s,
+        .par = (const uint32_t[]){float_round_nearest_even, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "rur.fcr",
+        .translate = translate_rur,
+        .par = (const uint32_t[]){FCR},
+        .coprocessor = 0x1,
+    }, {
+        .name = "rur.fsr",
+        .translate = translate_rur_fpu_fsr,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sdi",
+        .translate = translate_ldsti_d,
+        .par = (const uint32_t[]){true, true, false},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sdip",
+        .translate = translate_ldsti_d,
+        .par = (const uint32_t[]){true, false, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sdiu",
+        .translate = translate_ldsti_d,
+        .par = (const uint32_t[]){true, true, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sdx",
+        .translate = translate_ldstx_d,
+        .par = (const uint32_t[]){true, true, false},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sdxp",
+        .translate = translate_ldstx_d,
+        .par = (const uint32_t[]){true, false, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sdxu",
+        .translate = translate_ldstx_d,
+        .par = (const uint32_t[]){true, true, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ssi",
+        .translate = translate_ldsti_s,
+        .par = (const uint32_t[]){true, true, false},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ssip",
+        .translate = translate_ldsti_s,
+        .par = (const uint32_t[]){true, false, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ssiu",
+        .translate = translate_ldsti_s,
+        .par = (const uint32_t[]){true, true, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ssx",
+        .translate = translate_ldstx_s,
+        .par = (const uint32_t[]){true, true, false},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ssxp",
+        .translate = translate_ldstx_s,
+        .par = (const uint32_t[]){true, false, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "ssxu",
+        .translate = translate_ldstx_s,
+        .par = (const uint32_t[]){true, true, true},
+        .op_flags = XTENSA_OP_STORE,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sub.d",
+        .translate = translate_sub_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sub.s",
+        .translate = translate_sub_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "trunc.d",
+        .translate = translate_ftoi_d,
+        .par = (const uint32_t[]){float_round_to_zero, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "trunc.s",
+        .translate = translate_ftoi_s,
+        .par = (const uint32_t[]){float_round_to_zero, false},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ueq.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_UEQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ueq.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_UEQ},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ufloat.d",
+        .translate = translate_float_d,
+        .par = (const uint32_t[]){true},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ufloat.s",
+        .translate = translate_float_s,
+        .par = (const uint32_t[]){true},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ule.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_ULE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ule.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_ULE},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ult.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_ULT},
+        .coprocessor = 0x1,
+    }, {
+        .name = "ult.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_ULT},
+        .coprocessor = 0x1,
+    }, {
+        .name = "un.d",
+        .translate = translate_compare_d,
+        .par = (const uint32_t[]){COMPARE_UN},
+        .coprocessor = 0x1,
+    }, {
+        .name = "un.s",
+        .translate = translate_compare_s,
+        .par = (const uint32_t[]){COMPARE_UN},
+        .coprocessor = 0x1,
+    }, {
+        .name = "utrunc.d",
+        .translate = translate_ftoi_d,
+        .par = (const uint32_t[]){float_round_to_zero, true},
+        .coprocessor = 0x1,
+    }, {
+        .name = "utrunc.s",
+        .translate = translate_ftoi_s,
+        .par = (const uint32_t[]){float_round_to_zero, true},
+        .coprocessor = 0x1,
+    }, {
+        .name = "wfr",
+        .translate = translate_wfr_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "wfrd",
+        .translate = translate_wfr_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "wur.fcr",
+        .translate = translate_wur_fpu_fcr,
+        .par = (const uint32_t[]){FCR},
+        .coprocessor = 0x1,
+    }, {
+        .name = "wur.fsr",
+        .translate = translate_wur_fpu_fsr,
+        .coprocessor = 0x1,
+    },
+};
+
+const XtensaOpcodeTranslators xtensa_fpu_opcodes = {
+    .num_opcodes = ARRAY_SIZE(fpu_ops),
+    .opcode = fpu_ops,
+};
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 11/22] target/xtensa: implement FPU division and square root
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (9 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 10/22] target/xtensa: add DFPU registers and opcodes Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 12/22] tests/tcg/xtensa: fix test execution on ISS Max Filippov
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

This does not implement all opcodes related to div/sqrt as specified in
the xtensa ISA, partly because the official specification is not
complete and partly because precise implementation is unnecessarily
complex. Instead instructions specific to the div/sqrt sequences are
implemented differently, most of them as nops, but the results of
div/sqrt sequences is preserved.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 target/xtensa/fpu_helper.c |  24 +++++++++
 target/xtensa/helper.h     |   4 ++
 target/xtensa/translate.c  | 104 +++++++++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+)

diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
index b5faf34ad080..ba3c29d19d91 100644
--- a/target/xtensa/fpu_helper.c
+++ b/target/xtensa/fpu_helper.c
@@ -231,6 +231,30 @@ float32 HELPER(msub_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
                           &env->fp_status);
 }
 
+float64 HELPER(mkdadj_d)(CPUXtensaState *env, float64 a, float64 b)
+{
+    set_use_first_nan(true, &env->fp_status);
+    return float64_div(b, a, &env->fp_status);
+}
+
+float32 HELPER(mkdadj_s)(CPUXtensaState *env, float32 a, float32 b)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_div(b, a, &env->fp_status);
+}
+
+float64 HELPER(mksadj_d)(CPUXtensaState *env, float64 v)
+{
+    set_use_first_nan(true, &env->fp_status);
+    return float64_sqrt(v, &env->fp_status);
+}
+
+float32 HELPER(mksadj_s)(CPUXtensaState *env, float32 v)
+{
+    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    return float32_sqrt(v, &env->fp_status);
+}
+
 uint32_t HELPER(ftoi_d)(CPUXtensaState *env, float64 v,
                         uint32_t rounding_mode, uint32_t scale)
 {
diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h
index 095f754671ce..ae938ceedb80 100644
--- a/target/xtensa/helper.h
+++ b/target/xtensa/helper.h
@@ -83,6 +83,10 @@ DEF_HELPER_4(madd_d, f64, env, f64, f64, f64)
 DEF_HELPER_4(madd_s, f32, env, f32, f32, f32)
 DEF_HELPER_4(msub_d, f64, env, f64, f64, f64)
 DEF_HELPER_4(msub_s, f32, env, f32, f32, f32)
+DEF_HELPER_3(mkdadj_d, f64, env, f64, f64)
+DEF_HELPER_3(mkdadj_s, f32, env, f32, f32)
+DEF_HELPER_2(mksadj_d, f64, env, f64)
+DEF_HELPER_2(mksadj_s, f32, env, f32)
 DEF_HELPER_4(ftoi_d, i32, env, f64, i32, i32)
 DEF_HELPER_4(ftoui_d, i32, env, f64, i32, i32)
 DEF_HELPER_3(itof_d, f64, env, i32, i32)
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index fff29cc25dd1..944a157747cd 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -7314,6 +7314,38 @@ static void translate_sub_s(DisasContext *dc, const OpcodeArg arg[],
     }
 }
 
+static void translate_mkdadj_d(DisasContext *dc, const OpcodeArg arg[],
+                               const uint32_t par[])
+{
+    gen_helper_mkdadj_d(arg[0].out, cpu_env, arg[0].in, arg[1].in);
+}
+
+static void translate_mkdadj_s(DisasContext *dc, const OpcodeArg arg[],
+                               const uint32_t par[])
+{
+    OpcodeArg arg32[2];
+
+    get_f32_o1_i2(arg, arg32, 0, 0, 1);
+    gen_helper_mkdadj_s(arg32[0].out, cpu_env, arg32[0].in, arg32[1].in);
+    put_f32_o1_i2(arg, arg32, 0, 0, 1);
+}
+
+static void translate_mksadj_d(DisasContext *dc, const OpcodeArg arg[],
+                               const uint32_t par[])
+{
+    gen_helper_mksadj_d(arg[0].out, cpu_env, arg[1].in);
+}
+
+static void translate_mksadj_s(DisasContext *dc, const OpcodeArg arg[],
+                               const uint32_t par[])
+{
+    OpcodeArg arg32[2];
+
+    get_f32_o1_i1(arg, arg32, 0, 1);
+    gen_helper_mksadj_s(arg32[0].out, cpu_env, arg32[1].in);
+    put_f32_o1_i1(arg, arg32, 0, 1);
+}
+
 static void translate_wur_fpu_fcr(DisasContext *dc, const OpcodeArg arg[],
                                   const uint32_t par[])
 {
@@ -7349,6 +7381,22 @@ static const XtensaOpcodeOps fpu_ops[] = {
         .name = "add.s",
         .translate = translate_add_s,
         .coprocessor = 0x1,
+    }, {
+        .name = "addexp.d",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "addexp.s",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "addexpm.d",
+        .translate = translate_mov_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "addexpm.s",
+        .translate = translate_mov_s,
+        .coprocessor = 0x1,
     }, {
         .name = "ceil.d",
         .translate = translate_ftoi_d,
@@ -7375,6 +7423,22 @@ static const XtensaOpcodeOps fpu_ops[] = {
         .name = "cvts.d",
         .translate = translate_cvts_d,
         .coprocessor = 0x1,
+    }, {
+        .name = "div0.d",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "div0.s",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "divn.d",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "divn.s",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
     }, {
         .name = "float.d",
         .translate = translate_float_d,
@@ -7475,6 +7539,30 @@ static const XtensaOpcodeOps fpu_ops[] = {
         .name = "madd.s",
         .translate = translate_madd_s,
         .coprocessor = 0x1,
+    }, {
+        .name = "maddn.d",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "maddn.s",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mkdadj.d",
+        .translate = translate_mkdadj_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mkdadj.s",
+        .translate = translate_mkdadj_s,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mksadj.d",
+        .translate = translate_mksadj_d,
+        .coprocessor = 0x1,
+    }, {
+        .name = "mksadj.s",
+        .translate = translate_mksadj_s,
+        .coprocessor = 0x1,
     }, {
         .name = "mov.d",
         .translate = translate_mov_d,
@@ -7567,6 +7655,14 @@ static const XtensaOpcodeOps fpu_ops[] = {
         .name = "neg.s",
         .translate = translate_neg_s,
         .coprocessor = 0x1,
+    }, {
+        .name = "nexp01.d",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "nexp01.s",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
     }, {
         .name = "oeq.d",
         .translate = translate_compare_d,
@@ -7660,6 +7756,14 @@ static const XtensaOpcodeOps fpu_ops[] = {
         .par = (const uint32_t[]){true, true, true},
         .op_flags = XTENSA_OP_STORE,
         .coprocessor = 0x1,
+    }, {
+        .name = "sqrt0.d",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
+    }, {
+        .name = "sqrt0.s",
+        .translate = translate_nop,
+        .coprocessor = 0x1,
     }, {
         .name = "ssi",
         .translate = translate_ldsti_s,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 12/22] tests/tcg/xtensa: fix test execution on ISS
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (10 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 11/22] target/xtensa: implement FPU division and square root Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 13/22] tests/tcg/xtensa: update test_fp0_arith for DFPU Max Filippov
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Space for test results may be allocated in IRAM which is only
word-accessible. Use full 32-bit words to access test results.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/macros.inc | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/tcg/xtensa/macros.inc b/tests/tcg/xtensa/macros.inc
index aa8f95bce879..f88937c7bf82 100644
--- a/tests/tcg/xtensa/macros.inc
+++ b/tests/tcg/xtensa/macros.inc
@@ -3,7 +3,7 @@
 .macro test_suite name
 .data
 status: .word result
-result: .space 256
+result: .space 1024
 .text
 .global main
 .align 4
@@ -25,9 +25,9 @@ main:
     movi    a3, 0
     beqz    a2, 2f
 1:
-    l8ui    a1, a0, 0
+    l32i    a1, a0, 0
     or      a3, a3, a1
-    addi    a0, a0, 1
+    addi    a0, a0, 4
     addi    a2, a2, -1
     bnez    a2, 1b
 2:
@@ -65,7 +65,7 @@ test_\name:
     reset_ps
     movi    a2, status
     l32i    a3, a2, 0
-    addi    a3, a3, 1
+    addi    a3, a3, 4
     s32i    a3, a2, 0
 .endm
 
@@ -78,7 +78,7 @@ test_\name:
     movi    a2, status
     l32i    a2, a2, 0
     movi    a3, 1
-    s8i     a3, a2, 0
+    s32i    a3, a2, 0
 #ifdef DEBUG
     print   failed
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 13/22] tests/tcg/xtensa: update test_fp0_arith for DFPU
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (11 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 12/22] tests/tcg/xtensa: fix test execution on ISS Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 14/22] tests/tcg/xtensa: expand madd tests Max Filippov
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

DFPU arithmetic opcodes update FSR flags. Add FSR parameters and
expected FSR register values for the arithmetic tests.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/fpu.h            | 142 ++++++++++++++++++++++++
 tests/tcg/xtensa/test_fp0_arith.S | 178 ++++++++++++++----------------
 2 files changed, 223 insertions(+), 97 deletions(-)
 create mode 100644 tests/tcg/xtensa/fpu.h

diff --git a/tests/tcg/xtensa/fpu.h b/tests/tcg/xtensa/fpu.h
new file mode 100644
index 000000000000..42e321747304
--- /dev/null
+++ b/tests/tcg/xtensa/fpu.h
@@ -0,0 +1,142 @@
+#if XCHAL_HAVE_DFP || XCHAL_HAVE_FP_DIV
+#define DFPU 1
+#else
+#define DFPU 0
+#endif
+
+#define FCR_RM_NEAREST 0
+#define FCR_RM_TRUNC   1
+#define FCR_RM_CEIL    2
+#define FCR_RM_FLOOR   3
+
+#define FSR__ 0x00000000
+#define FSR_I 0x00000080
+#define FSR_U 0x00000100
+#define FSR_O 0x00000200
+#define FSR_Z 0x00000400
+#define FSR_V 0x00000800
+
+#define FSR_UI (FSR_U | FSR_I)
+#define FSR_OI (FSR_O | FSR_I)
+
+#define F32_0           0x00000000
+#define F32_0_5         0x3f000000
+#define F32_1           0x3f800000
+#define F32_MAX         0x7f7fffff
+#define F32_PINF        0x7f800000
+#define F32_NINF        0xff800000
+
+#define F32_DNAN        0x7fc00000
+#define F32_SNAN(v)     (0x7f800000 | (v))
+#define F32_QNAN(v)     (0x7fc00000 | (v))
+
+#define F32_MINUS       0x80000000
+
+#define F64_0           0x0000000000000000
+#define F64_MIN_NORM    0x0010000000000000
+#define F64_1           0x3ff0000000000000
+#define F64_MAX_2       0x7fe0000000000000
+#define F64_MAX         0x7fefffffffffffff
+#define F64_PINF        0x7ff0000000000000
+#define F64_NINF        0xfff0000000000000
+
+#define F64_DNAN        0x7ff8000000000000
+#define F64_SNAN(v)     (0x7ff0000000000000 | (v))
+#define F64_QNAN(v)     (0x7ff8000000000000 | (v))
+
+#define F64_MINUS       0x8000000000000000
+
+.macro test_op1_rm op, fr0, fr1, v0, r, sr
+    movi    a2, 0
+    wur     a2, fsr
+    movfp   \fr0, \v0
+    \op     \fr1, \fr0
+    check_res \fr1, \r, \sr
+.endm
+
+.macro test_op2_rm op, fr0, fr1, fr2, v0, v1, r, sr
+    movi    a2, 0
+    wur     a2, fsr
+    movfp   \fr0, \v0
+    movfp   \fr1, \v1
+    \op     \fr2, \fr0, \fr1
+    check_res \fr2, \r, \sr
+.endm
+
+.macro test_op3_rm op, fr0, fr1, fr2, fr3, v0, v1, v2, r, sr
+    movi    a2, 0
+    wur     a2, fsr
+    movfp   \fr0, \v0
+    movfp   \fr1, \v1
+    movfp   \fr2, \v2
+    \op     \fr0, \fr1, \fr2
+    check_res \fr3, \r, \sr
+.endm
+
+.macro test_op1_ex op, fr0, fr1, v0, rm, r, sr
+    movi    a2, \rm
+    wur     a2, fcr
+    test_op1_rm \op, \fr0, \fr1, \v0, \r, \sr
+    movi    a2, (\rm) | 0x7c
+    wur     a2, fcr
+    test_op1_rm \op, \fr0, \fr1, \v0, \r, \sr
+.endm
+
+.macro test_op2_ex op, fr0, fr1, fr2, v0, v1, rm, r, sr
+    movi    a2, \rm
+    wur     a2, fcr
+    test_op2_rm \op, \fr0, \fr1, \fr2, \v0, \v1, \r, \sr
+    movi    a2, (\rm) | 0x7c
+    wur     a2, fcr
+    test_op2_rm \op, \fr0, \fr1, \fr2, \v0, \v1, \r, \sr
+.endm
+
+.macro test_op3_ex op, fr0, fr1, fr2, fr3, v0, v1, v2, rm, r, sr
+    movi    a2, \rm
+    wur     a2, fcr
+    test_op3_rm \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, \r, \sr
+    movi    a2, (\rm) | 0x7c
+    wur     a2, fcr
+    test_op3_rm \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, \r, \sr
+.endm
+
+.macro test_op1 op, fr0, fr1, v0, r0, r1, r2, r3, sr0, sr1, sr2, sr3
+    test_op1_ex \op, \fr0, \fr1, \v0, 0, \r0, \sr0
+    test_op1_ex \op, \fr0, \fr1, \v0, 1, \r1, \sr1
+    test_op1_ex \op, \fr0, \fr1, \v0, 2, \r2, \sr2
+    test_op1_ex \op, \fr0, \fr1, \v0, 3, \r3, \sr3
+.endm
+
+.macro test_op2 op, fr0, fr1, fr2, v0, v1, r0, r1, r2, r3, sr0, sr1, sr2, sr3
+    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 0, \r0, \sr0
+    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 1, \r1, \sr1
+    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 2, \r2, \sr2
+    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 3, \r3, \sr3
+.endm
+
+.macro test_op3 op, fr0, fr1, fr2, fr3, v0, v1, v2, r0, r1, r2, r3, sr0, sr1, sr2, sr3
+    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 0, \r0, \sr0
+    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 1, \r1, \sr1
+    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 2, \r2, \sr2
+    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 3, \r3, \sr3
+.endm
+
+.macro test_op2_cpe op
+    set_vector  kernel, 2f
+    movi    a2, 0
+    wsr     a2, cpenable
+1:
+    \op     f2, f0, f1
+    test_fail
+2:
+    rsr     a2, excvaddr
+    movi    a3, 1b
+    assert  eq, a2, a3
+    rsr     a2, exccause
+    movi    a3, 32
+    assert  eq, a2, a3
+
+    set_vector  kernel, 0
+    movi    a2, 1
+    wsr     a2, cpenable
+.endm
diff --git a/tests/tcg/xtensa/test_fp0_arith.S b/tests/tcg/xtensa/test_fp0_arith.S
index 253d033a3398..df870eb7a013 100644
--- a/tests/tcg/xtensa/test_fp0_arith.S
+++ b/tests/tcg/xtensa/test_fp0_arith.S
@@ -1,4 +1,5 @@
 #include "macros.inc"
+#include "fpu.h"
 
 test_suite fp0_arith
 
@@ -9,84 +10,18 @@ test_suite fp0_arith
     wfr     \fr, a2
 .endm
 
-.macro check_res fr, r
+.macro check_res fr, r, sr
     rfr     a2, \fr
     dump    a2
     movi    a3, \r
     assert  eq, a2, a3
     rur     a2, fsr
-    assert  eqi, a2, 0
-.endm
-
-.macro test_op2_rm op, fr0, fr1, fr2, v0, v1, r
-    movi    a2, 0
-    wur     a2, fsr
-    movfp   \fr0, \v0
-    movfp   \fr1, \v1
-    \op     \fr2, \fr0, \fr1
-    check_res \fr2, \r
-.endm
-
-.macro test_op3_rm op, fr0, fr1, fr2, fr3, v0, v1, v2, r
-    movi    a2, 0
-    wur     a2, fsr
-    movfp   \fr0, \v0
-    movfp   \fr1, \v1
-    movfp   \fr2, \v2
-    \op     \fr0, \fr1, \fr2
-    check_res \fr3, \r
-.endm
-
-.macro test_op2_ex op, fr0, fr1, fr2, v0, v1, rm, r
-    movi    a2, \rm
-    wur     a2, fcr
-    test_op2_rm \op, \fr0, \fr1, \fr2, \v0, \v1, \r
-    movi    a2, (\rm) | 0x7c
-    wur     a2, fcr
-    test_op2_rm \op, \fr0, \fr1, \fr2, \v0, \v1, \r
-.endm
-
-.macro test_op3_ex op, fr0, fr1, fr2, fr3, v0, v1, v2, rm, r
-    movi    a2, \rm
-    wur     a2, fcr
-    test_op3_rm \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, \r
-    movi    a2, (\rm) | 0x7c
-    wur     a2, fcr
-    test_op3_rm \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, \r
-.endm
-
-.macro test_op2 op, fr0, fr1, fr2, v0, v1, r0, r1, r2, r3
-    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 0, \r0
-    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 1, \r1
-    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 2, \r2
-    test_op2_ex \op, \fr0, \fr1, \fr2, \v0, \v1, 3, \r3
-.endm
-
-.macro test_op3 op, fr0, fr1, fr2, fr3, v0, v1, v2, r0, r1, r2, r3
-    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 0, \r0
-    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 1, \r1
-    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 2, \r2
-    test_op3_ex \op, \fr0, \fr1, \fr2, \fr3, \v0, \v1, \v2, 3, \r3
-.endm
-
-.macro test_op2_cpe op
-    set_vector  kernel, 2f
-    movi    a2, 0
-    wsr     a2, cpenable
-1:
-    \op     f2, f0, f1
-    test_fail
-2:
-    rsr     a2, excvaddr
-    movi    a3, 1b
+#if DFPU
+    movi    a3, \sr
     assert  eq, a2, a3
-    rsr     a2, exccause
-    movi    a3, 32
-    assert  eq, a2, a3
-
-    set_vector  kernel, 0
-    movi    a2, 1
-    wsr     a2, cpenable
+#else
+    assert  eqi, a2, 0
+#endif
 .endm
 
 test add_s
@@ -94,78 +29,127 @@ test add_s
     wsr     a2, cpenable
 
     test_op2 add.s, f0, f1, f2, 0x3fc00000, 0x34400000, \
-        0x3fc00002, 0x3fc00001, 0x3fc00002, 0x3fc00001
+        0x3fc00002, 0x3fc00001, 0x3fc00002, 0x3fc00001, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
     test_op2 add.s, f3, f4, f5, 0x3fc00000, 0x34a00000, \
-        0x3fc00002, 0x3fc00002, 0x3fc00003, 0x3fc00002
+        0x3fc00002, 0x3fc00002, 0x3fc00003, 0x3fc00002, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
 
     /* MAX_FLOAT + MAX_FLOAT = +inf/MAX_FLOAT  */
     test_op2 add.s, f6, f7, f8, 0x7f7fffff, 0x7f7fffff, \
-        0x7f800000, 0x7f7fffff, 0x7f800000, 0x7f7fffff
+        0x7f800000, 0x7f7fffff, 0x7f800000, 0x7f7fffff, \
+            FSR_OI,     FSR_OI,     FSR_OI,     FSR_OI
 test_end
 
 test add_s_inf
     /* 1 + +inf = +inf  */
     test_op2 add.s, f6, f7, f8, 0x3fc00000, 0x7f800000, \
-        0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000
+        0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, \
+             FSR__,      FSR__,      FSR__,      FSR__
 
     /* +inf + -inf = default NaN */
     test_op2 add.s, f0, f1, f2, 0x7f800000, 0xff800000, \
-        0x7fc00000, 0x7fc00000, 0x7fc00000, 0x7fc00000
+        0x7fc00000, 0x7fc00000, 0x7fc00000, 0x7fc00000, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
 test_end
 
-test add_s_nan
-    /* 1 + NaN = NaN  */
+#if DFPU
+test add_s_nan_dfpu
+    /* 1 + QNaN = QNaN  */
     test_op2 add.s, f9, f10, f11, 0x3fc00000, 0x7fc00001, \
-        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001
+        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001, \
+             FSR__,      FSR__,      FSR__,      FSR__
+    /* 1 + SNaN = QNaN  */
     test_op2 add.s, f12, f13, f14, 0x3fc00000, 0x7f800001, \
-        0x7f800001, 0x7f800001, 0x7f800001, 0x7f800001
+        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
 
-    /* NaN1 + NaN2 = NaN1 */
+    /* SNaN1 + SNaN2 = QNaN2 */
+    test_op2 add.s, f15, f0, f1, 0x7f800001, 0x7fbfffff, \
+        0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
+    test_op2 add.s, f2, f3, f4, 0x7fbfffff, 0x7f800001, \
+        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
+    /* QNaN1 + SNaN2 = QNaN2 */
+    test_op2 add.s, f5, f6, f7, 0x7fc00001, 0x7fbfffff, \
+        0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
+    /* SNaN1 + QNaN2 = QNaN2 */
+    test_op2 add.s, f8, f9, f10, 0x7fbfffff, 0x7fc00001, \
+        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
+test_end
+#else
+test add_s_nan_fpu2k
+    /* 1 + QNaN = QNaN  */
+    test_op2 add.s, f9, f10, f11, 0x3fc00000, 0x7fc00001, \
+        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001, \
+             FSR__,      FSR__,      FSR__,      FSR__
+    /* 1 + SNaN = SNaN  */
+    test_op2 add.s, f12, f13, f14, 0x3fc00000, 0x7f800001, \
+        0x7f800001, 0x7f800001, 0x7f800001, 0x7f800001, \
+             FSR__,      FSR__,      FSR__,      FSR__
+    /* SNaN1 + SNaN2 = SNaN1 */
     test_op2 add.s, f15, f0, f1, 0x7f800001, 0x7fbfffff, \
-        0x7f800001, 0x7f800001, 0x7f800001, 0x7f800001
+        0x7f800001, 0x7f800001, 0x7f800001, 0x7f800001, \
+             FSR__,      FSR__,      FSR__,      FSR__
     test_op2 add.s, f2, f3, f4, 0x7fbfffff, 0x7f800001, \
-        0x7fbfffff, 0x7fbfffff, 0x7fbfffff, 0x7fbfffff
+        0x7fbfffff, 0x7fbfffff, 0x7fbfffff, 0x7fbfffff, \
+             FSR__,      FSR__,      FSR__,      FSR__
+    /* QNaN1 + SNaN2 = QNaN1 */
     test_op2 add.s, f5, f6, f7, 0x7fc00001, 0x7fbfffff, \
-        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001
+        0x7fc00001, 0x7fc00001, 0x7fc00001, 0x7fc00001, \
+             FSR__,      FSR__,      FSR__,      FSR__
+    /* SNaN1 + QNaN2 = SNaN1 */
     test_op2 add.s, f8, f9, f10, 0x7fbfffff, 0x7fc00001, \
-        0x7fbfffff, 0x7fbfffff, 0x7fbfffff, 0x7fbfffff
+        0x7fbfffff, 0x7fbfffff, 0x7fbfffff, 0x7fbfffff, \
+             FSR__,      FSR__,      FSR__,      FSR__
 test_end
+#endif
 
 test sub_s
     test_op2 sub.s, f0, f1, f0, 0x3f800001, 0x33800000, \
-        0x3f800000, 0x3f800000, 0x3f800001, 0x3f800000
+        0x3f800000, 0x3f800000, 0x3f800001, 0x3f800000, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
     test_op2 sub.s, f0, f1, f1, 0x3f800002, 0x33800000, \
-        0x3f800002, 0x3f800001, 0x3f800002, 0x3f800001
+        0x3f800002, 0x3f800001, 0x3f800002, 0x3f800001, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
 
     /* norm - norm = denorm */
     test_op2 sub.s, f6, f7, f8, 0x00800001, 0x00800000, \
-        0x00000001, 0x00000001, 0x00000001, 0x00000001
+        0x00000001, 0x00000001, 0x00000001, 0x00000001, \
+             FSR__,      FSR__,      FSR__,      FSR__
 test_end
 
 test mul_s
     test_op2 mul.s, f0, f1, f2, 0x3f800001, 0x3f800001, \
-        0x3f800002, 0x3f800002, 0x3f800003, 0x3f800002
-
+        0x3f800002, 0x3f800002, 0x3f800003, 0x3f800002, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
     /* MAX_FLOAT/2 * MAX_FLOAT/2 = +inf/MAX_FLOAT  */
     test_op2 mul.s, f6, f7, f8, 0x7f000000, 0x7f000000, \
-        0x7f800000, 0x7f7fffff, 0x7f800000, 0x7f7fffff
+        0x7f800000, 0x7f7fffff, 0x7f800000, 0x7f7fffff, \
+            FSR_OI,     FSR_OI,     FSR_OI,     FSR_OI
     /* min norm * min norm = 0/denorm */
     test_op2 mul.s, f6, f7, f8, 0x00800001, 0x00800000, \
-        0x00000000, 0x00000000, 0x00000001, 0x00000000
-
+        0x00000000, 0x00000000, 0x00000001, 0x00000000, \
+            FSR_UI,     FSR_UI,     FSR_UI,     FSR_UI
     /* inf * 0 = default NaN */
     test_op2 mul.s, f6, f7, f8, 0x7f800000, 0x00000000, \
-        0x7fc00000, 0x7fc00000, 0x7fc00000, 0x7fc00000
+        0x7fc00000, 0x7fc00000, 0x7fc00000, 0x7fc00000, \
+             FSR_V,      FSR_V,      FSR_V,      FSR_V
 test_end
 
 test madd_s
     test_op3 madd.s, f0, f1, f2, f0, 0, 0x3f800001, 0x3f800001, \
-        0x3f800002, 0x3f800002, 0x3f800003, 0x3f800002
+        0x3f800002, 0x3f800002, 0x3f800003, 0x3f800002, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
 test_end
 
 test msub_s
     test_op3 msub.s, f0, f1, f2, f0, 0x3f800000, 0x3f800001, 0x3f800001, \
-        0xb4800000, 0xb4800000, 0xb4800000, 0xb4800001
+        0xb4800000, 0xb4800000, 0xb4800000, 0xb4800001, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
 test_end
 
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 14/22] tests/tcg/xtensa: expand madd tests
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (12 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 13/22] tests/tcg/xtensa: update test_fp0_arith for DFPU Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 15/22] tests/tcg/xtensa: update test_fp0_conv for DFPU Max Filippov
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Test that madd doesn't do rounding after multiplication.
Test NaN propagation rules for FPU2000 and DFPU madd opcode.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v2->v3:
- add more infzero tests for FPU2000 and DFPU

 tests/tcg/xtensa/test_fp0_arith.S | 104 ++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/tests/tcg/xtensa/test_fp0_arith.S b/tests/tcg/xtensa/test_fp0_arith.S
index df870eb7a013..7eefc1da409d 100644
--- a/tests/tcg/xtensa/test_fp0_arith.S
+++ b/tests/tcg/xtensa/test_fp0_arith.S
@@ -146,6 +146,110 @@ test madd_s
              FSR_I,      FSR_I,      FSR_I,      FSR_I
 test_end
 
+test madd_s_precision
+    test_op3 madd.s, f0, f1, f2, f0, 0xbf800002, 0x3f800001, 0x3f800001, \
+        0x28800000, 0x28800000, 0x28800000, 0x28800000, \
+             FSR__,      FSR__,      FSR__,      FSR__
+test_end
+
+#if DFPU
+test madd_s_nan_dfpu
+    /* DFPU madd/msub NaN1, NaN2, NaN3 priority: NaN1, NaN3, NaN2 */
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_1, F32_1, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_QNAN(2), F32_1, \
+        F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_1, F32_QNAN(3), \
+        F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_QNAN(2), F32_1, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_1, F32_QNAN(3), \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_QNAN(2), F32_QNAN(3), \
+        F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_QNAN(2), F32_QNAN(3), \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    /* inf * 0 = default NaN */
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_PINF, F32_0, \
+        F32_DNAN, F32_DNAN, F32_DNAN, F32_DNAN, \
+           FSR_V,    FSR_V,    FSR_V,    FSR_V
+    /* inf * 0 + SNaN1 = QNaN1 */
+    test_op3 madd.s, f0, f1, f2, f0, F32_SNAN(1), F32_PINF, F32_0, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+    /* inf * 0 + QNaN1 = QNaN1 */
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_PINF, F32_0, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+
+    /* madd/msub SNaN turns to QNaN and sets Invalid flag */
+    test_op3 madd.s, f0, f1, f2, f0, F32_SNAN(1), F32_1, F32_1, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_SNAN(2), F32_1, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+test_end
+#else
+test madd_s_nan_fpu2k
+    /* FPU2000 madd/msub NaN1, NaN2, NaN3 priority: NaN2, NaN3, NaN1 */
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_1, F32_1, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_QNAN(2), F32_1, \
+        F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_1, F32_QNAN(3), \
+        F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_QNAN(2), F32_1, \
+        F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_1, F32_QNAN(3), \
+        F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), F32_QNAN(3), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_QNAN(2), F32_QNAN(3), \
+        F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_QNAN(2), F32_QNAN(3), \
+        F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), F32_QNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    /* inf * 0 = default NaN */
+    test_op3 madd.s, f0, f1, f2, f0, F32_1, F32_PINF, F32_0, \
+        F32_DNAN, F32_DNAN, F32_DNAN, F32_DNAN, \
+           FSR__,    FSR__,    FSR__,    FSR__
+    /* inf * 0 + SNaN1 = SNaN1 */
+    test_op3 madd.s, f0, f1, f2, f0, F32_SNAN(1), F32_PINF, F32_0, \
+        F32_SNAN(1), F32_SNAN(1), F32_SNAN(1), F32_SNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    /* inf * 0 + QNaN1 = QNaN1 */
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_PINF, F32_0, \
+        F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), F32_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    /* madd/msub SNaN is preserved */
+    test_op3 madd.s, f0, f1, f2, f0, F32_SNAN(1), F32_1, F32_1, \
+        F32_SNAN(1), F32_SNAN(1), F32_SNAN(1), F32_SNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.s, f0, f1, f2, f0, F32_QNAN(1), F32_SNAN(2), F32_1, \
+        F32_SNAN(2), F32_SNAN(2), F32_SNAN(2), F32_SNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+test_end
+#endif
+
 test msub_s
     test_op3 msub.s, f0, f1, f2, f0, 0x3f800000, 0x3f800001, 0x3f800001, \
         0xb4800000, 0xb4800000, 0xb4800000, 0xb4800001, \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 15/22] tests/tcg/xtensa: update test_fp0_conv for DFPU
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (13 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 14/22] tests/tcg/xtensa: expand madd tests Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 16/22] tests/tcg/xtensa: update test_fp1 " Max Filippov
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

DFPU conversion opcodes update FSR flags. Add FSR parameters and
expected FSR register values for the conversion tests.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/test_fp0_conv.S | 299 ++++++++++++++++---------------
 1 file changed, 155 insertions(+), 144 deletions(-)

diff --git a/tests/tcg/xtensa/test_fp0_conv.S b/tests/tcg/xtensa/test_fp0_conv.S
index 147e3d5062df..cfee6e51790c 100644
--- a/tests/tcg/xtensa/test_fp0_conv.S
+++ b/tests/tcg/xtensa/test_fp0_conv.S
@@ -1,4 +1,5 @@
 #include "macros.inc"
+#include "fpu.h"
 
 test_suite fp0_conv
 
@@ -9,7 +10,7 @@ test_suite fp0_conv
     wfr     \fr, a2
 .endm
 
-.macro test_ftoi_ex op, r0, fr0, v, c, r
+.macro test_ftoi_ex op, r0, fr0, v, c, r, sr
     movi    a2, 0
     wur     a2, fsr
     movfp   \fr0, \v
@@ -18,20 +19,25 @@ test_suite fp0_conv
     movi    a3, \r
     assert  eq, \r0, a3
     rur     a2, fsr
+#if DFPU
+    movi    a3, \sr
+    assert  eq, a2, a3
+#else
     assert  eqi, a2, 0
+#endif
 .endm
 
-.macro test_ftoi op, r0, fr0, v, c, r
+.macro test_ftoi op, r0, fr0, v, c, r, sr
     movi    a2, 0
     wur     a2, fcr
-    test_ftoi_ex \op, \r0, \fr0, \v, \c, \r
+    test_ftoi_ex \op, \r0, \fr0, \v, \c, \r, \sr
     movi    a2, 0x7c
     wur     a2, fcr
-    test_ftoi_ex \op, \r0, \fr0, \v, \c, \r
+    test_ftoi_ex \op, \r0, \fr0, \v, \c, \r, \sr
 .endm
 
 
-.macro test_itof_ex op, fr0, ar0, v, c, r
+.macro test_itof_ex op, fr0, ar0, v, c, r, sr
     movi    a2, 0
     wur     a2, fsr
     movi    \ar0, \v
@@ -42,23 +48,28 @@ test_suite fp0_conv
     movi    a3, \r
     assert  eq, a2, a3
     rur     a2, fsr
+#if DFPU
+    movi    a3, \sr
+    assert  eq, a2, a3
+#else
     assert  eqi, a2, 0
+#endif
 .endm
 
-.macro test_itof_rm op, fr0, ar0, v, c, rm, r
+.macro test_itof_rm op, fr0, ar0, v, c, rm, r, sr
     movi    a2, \rm
     wur     a2, fcr
-    test_itof_ex \op, \fr0, \ar0, \v, \c, \r
+    test_itof_ex \op, \fr0, \ar0, \v, \c, \r, \sr
     movi    a2, (\rm) | 0x7c
     wur     a2, fcr
-    test_itof_ex \op, \fr0, \ar0, \v, \c, \r
+    test_itof_ex \op, \fr0, \ar0, \v, \c, \r, \sr
 .endm
 
-.macro test_itof op, fr0, ar0, v, c, r0, r1, r2, r3
-    test_itof_rm \op, \fr0, \ar0, \v, \c, 0, \r0
-    test_itof_rm \op, \fr0, \ar0, \v, \c, 1, \r1
-    test_itof_rm \op, \fr0, \ar0, \v, \c, 2, \r2
-    test_itof_rm \op, \fr0, \ar0, \v, \c, 3, \r3
+.macro test_itof op, fr0, ar0, v, c, r0, r1, r2, r3, sr
+    test_itof_rm \op, \fr0, \ar0, \v, \c, 0, \r0, \sr
+    test_itof_rm \op, \fr0, \ar0, \v, \c, 1, \r1, \sr
+    test_itof_rm \op, \fr0, \ar0, \v, \c, 2, \r2, \sr
+    test_itof_rm \op, \fr0, \ar0, \v, \c, 3, \r3, \sr
 .endm
 
 test round_s
@@ -66,237 +77,237 @@ test round_s
     wsr     a2, cpenable
 
     /* NaN */
-    test_ftoi round.s, a2, f0, 0xffc00001, 0, 0x7fffffff
-    test_ftoi round.s, a2, f0, 0xff800001, 0, 0x7fffffff
+    test_ftoi round.s, a2, f0, 0xffc00001, 0, 0x7fffffff, FSR_V
+    test_ftoi round.s, a2, f0, 0xff800001, 0, 0x7fffffff, FSR_V
 
     /* -inf */
-    test_ftoi round.s, a2, f0, 0xff800000, 0, 0x80000000
+    test_ftoi round.s, a2, f0, 0xff800000, 0, 0x80000000, FSR_V
 
     /* negative overflow */
-    test_ftoi round.s, a2, f0, 0xceffffff, 1, 0x80000000
-    test_ftoi round.s, a2, f0, 0xcf000000, 0, 0x80000000
-    test_ftoi round.s, a2, f0, 0xceffffff, 0, 0x80000080
+    test_ftoi round.s, a2, f0, 0xceffffff, 1, 0x80000000, FSR_V
+    test_ftoi round.s, a2, f0, 0xcf000000, 0, 0x80000000, FSR__
+    test_ftoi round.s, a2, f0, 0xceffffff, 0, 0x80000080, FSR__
 
     /* negative */
-    test_ftoi round.s, a2, f0, 0xbfa00000, 1, -2 /* -1.25 * 2 */
-    test_ftoi round.s, a2, f0, 0xbfc00000, 0, -2 /* -1.5 */
-    test_ftoi round.s, a2, f0, 0xbf800000, 1, -2 /* -1 * 2 */
-    test_ftoi round.s, a2, f0, 0xbf800000, 0, -1 /* -1 */
-    test_ftoi round.s, a2, f0, 0xbf400000, 0, -1 /* -0.75 */
-    test_ftoi round.s, a2, f0, 0xbf000000, 0, 0  /* -0.5 */
+    test_ftoi round.s, a2, f0, 0xbfa00000, 1, -2, FSR_I  /* -1.25 * 2 */
+    test_ftoi round.s, a2, f0, 0xbfc00000, 0, -2, FSR_I  /* -1.5 */
+    test_ftoi round.s, a2, f0, 0xbf800000, 1, -2, FSR__  /* -1 * 2 */
+    test_ftoi round.s, a2, f0, 0xbf800000, 0, -1, FSR__  /* -1 */
+    test_ftoi round.s, a2, f0, 0xbf400000, 0, -1, FSR_I  /* -0.75 */
+    test_ftoi round.s, a2, f0, 0xbf000000, 0,  0, FSR_I  /* -0.5 */
 
     /* positive */
-    test_ftoi round.s, a2, f0, 0x3f000000, 0, 0 /* 0.5 */
-    test_ftoi round.s, a2, f0, 0x3f400000, 0, 1 /* 0.75 */
-    test_ftoi round.s, a2, f0, 0x3f800000, 0, 1 /* 1 */
-    test_ftoi round.s, a2, f0, 0x3f800000, 1, 2 /* 1 * 2 */
-    test_ftoi round.s, a2, f0, 0x3fc00000, 0, 2 /* 1.5 */
-    test_ftoi round.s, a2, f0, 0x3fa00000, 1, 2 /* 1.25 * 2 */
+    test_ftoi round.s, a2, f0, 0x3f000000, 0, 0, FSR_I /* 0.5 */
+    test_ftoi round.s, a2, f0, 0x3f400000, 0, 1, FSR_I /* 0.75 */
+    test_ftoi round.s, a2, f0, 0x3f800000, 0, 1, FSR__ /* 1 */
+    test_ftoi round.s, a2, f0, 0x3f800000, 1, 2, FSR__ /* 1 * 2 */
+    test_ftoi round.s, a2, f0, 0x3fc00000, 0, 2, FSR_I /* 1.5 */
+    test_ftoi round.s, a2, f0, 0x3fa00000, 1, 2, FSR_I /* 1.25 * 2 */
 
     /* positive overflow */
-    test_ftoi round.s, a2, f0, 0x4effffff, 0, 0x7fffff80
-    test_ftoi round.s, a2, f0, 0x4f000000, 0, 0x7fffffff
-    test_ftoi round.s, a2, f0, 0x4effffff, 1, 0x7fffffff
+    test_ftoi round.s, a2, f0, 0x4effffff, 0, 0x7fffff80, FSR__
+    test_ftoi round.s, a2, f0, 0x4f000000, 0, 0x7fffffff, FSR_V
+    test_ftoi round.s, a2, f0, 0x4effffff, 1, 0x7fffffff, FSR_V
 
     /* +inf */
-    test_ftoi round.s, a2, f0, 0x7f800000, 0, 0x7fffffff
+    test_ftoi round.s, a2, f0, 0x7f800000, 0, 0x7fffffff, FSR_V
 
     /* NaN */
-    test_ftoi round.s, a2, f0, 0x7f800001, 0, 0x7fffffff
-    test_ftoi round.s, a2, f0, 0x7fc00000, 0, 0x7fffffff
+    test_ftoi round.s, a2, f0, 0x7f800001, 0, 0x7fffffff, FSR_V
+    test_ftoi round.s, a2, f0, 0x7fc00000, 0, 0x7fffffff, FSR_V
 test_end
 
 test trunc_s
     /* NaN */
-    test_ftoi trunc.s, a2, f0, 0xffc00001, 0, 0x7fffffff
-    test_ftoi trunc.s, a2, f0, 0xff800001, 0, 0x7fffffff
+    test_ftoi trunc.s, a2, f0, 0xffc00001, 0, 0x7fffffff, FSR_V
+    test_ftoi trunc.s, a2, f0, 0xff800001, 0, 0x7fffffff, FSR_V
 
     /* -inf */
-    test_ftoi trunc.s, a2, f0, 0xff800000, 0, 0x80000000
+    test_ftoi trunc.s, a2, f0, 0xff800000, 0, 0x80000000, FSR_V
 
     /* negative overflow */
-    test_ftoi trunc.s, a2, f0, 0xceffffff, 1, 0x80000000
-    test_ftoi trunc.s, a2, f0, 0xcf000000, 0, 0x80000000
-    test_ftoi trunc.s, a2, f0, 0xceffffff, 0, 0x80000080
+    test_ftoi trunc.s, a2, f0, 0xceffffff, 1, 0x80000000, FSR_V
+    test_ftoi trunc.s, a2, f0, 0xcf000000, 0, 0x80000000, FSR__
+    test_ftoi trunc.s, a2, f0, 0xceffffff, 0, 0x80000080, FSR__
 
     /* negative */
-    test_ftoi trunc.s, a2, f0, 0xbfa00000, 1, -2 /* -1.25 * 2 */
-    test_ftoi trunc.s, a2, f0, 0xbfc00000, 0, -1 /* -1.5 */
-    test_ftoi trunc.s, a2, f0, 0xbf800000, 1, -2 /* -1 * 2 */
-    test_ftoi trunc.s, a2, f0, 0xbf800000, 0, -1 /* -1 */
-    test_ftoi trunc.s, a2, f0, 0xbf400000, 0, 0  /* -0.75 */
-    test_ftoi trunc.s, a2, f0, 0xbf000000, 0, 0  /* -0.5 */
+    test_ftoi trunc.s, a2, f0, 0xbfa00000, 1, -2, FSR_I /* -1.25 * 2 */
+    test_ftoi trunc.s, a2, f0, 0xbfc00000, 0, -1, FSR_I /* -1.5 */
+    test_ftoi trunc.s, a2, f0, 0xbf800000, 1, -2, FSR__ /* -1 * 2 */
+    test_ftoi trunc.s, a2, f0, 0xbf800000, 0, -1, FSR__ /* -1 */
+    test_ftoi trunc.s, a2, f0, 0xbf400000, 0,  0, FSR_I /* -0.75 */
+    test_ftoi trunc.s, a2, f0, 0xbf000000, 0,  0, FSR_I /* -0.5 */
 
     /* positive */
-    test_ftoi trunc.s, a2, f0, 0x3f000000, 0, 0 /* 0.5 */
-    test_ftoi trunc.s, a2, f0, 0x3f400000, 0, 0 /* 0.75 */
-    test_ftoi trunc.s, a2, f0, 0x3f800000, 0, 1 /* 1 */
-    test_ftoi trunc.s, a2, f0, 0x3f800000, 1, 2 /* 1 * 2 */
-    test_ftoi trunc.s, a2, f0, 0x3fc00000, 0, 1 /* 1.5 */
-    test_ftoi trunc.s, a2, f0, 0x3fa00000, 1, 2 /* 1.25 * 2 */
+    test_ftoi trunc.s, a2, f0, 0x3f000000, 0, 0, FSR_I /* 0.5 */
+    test_ftoi trunc.s, a2, f0, 0x3f400000, 0, 0, FSR_I /* 0.75 */
+    test_ftoi trunc.s, a2, f0, 0x3f800000, 0, 1, FSR__ /* 1 */
+    test_ftoi trunc.s, a2, f0, 0x3f800000, 1, 2, FSR__ /* 1 * 2 */
+    test_ftoi trunc.s, a2, f0, 0x3fc00000, 0, 1, FSR_I /* 1.5 */
+    test_ftoi trunc.s, a2, f0, 0x3fa00000, 1, 2, FSR_I /* 1.25 * 2 */
 
     /* positive overflow */
-    test_ftoi trunc.s, a2, f0, 0x4effffff, 0, 0x7fffff80
-    test_ftoi trunc.s, a2, f0, 0x4f000000, 0, 0x7fffffff
-    test_ftoi trunc.s, a2, f0, 0x4effffff, 1, 0x7fffffff
+    test_ftoi trunc.s, a2, f0, 0x4effffff, 0, 0x7fffff80, FSR__
+    test_ftoi trunc.s, a2, f0, 0x4f000000, 0, 0x7fffffff, FSR_V
+    test_ftoi trunc.s, a2, f0, 0x4effffff, 1, 0x7fffffff, FSR_V
 
     /* +inf */
-    test_ftoi trunc.s, a2, f0, 0x7f800000, 0, 0x7fffffff
+    test_ftoi trunc.s, a2, f0, 0x7f800000, 0, 0x7fffffff, FSR_V
 
     /* NaN */
-    test_ftoi trunc.s, a2, f0, 0x7f800001, 0, 0x7fffffff
-    test_ftoi trunc.s, a2, f0, 0x7fc00000, 0, 0x7fffffff
+    test_ftoi trunc.s, a2, f0, 0x7f800001, 0, 0x7fffffff, FSR_V
+    test_ftoi trunc.s, a2, f0, 0x7fc00000, 0, 0x7fffffff, FSR_V
 test_end
 
 test floor_s
     /* NaN */
-    test_ftoi floor.s, a2, f0, 0xffc00001, 0, 0x7fffffff
-    test_ftoi floor.s, a2, f0, 0xff800001, 0, 0x7fffffff
+    test_ftoi floor.s, a2, f0, 0xffc00001, 0, 0x7fffffff, FSR_V
+    test_ftoi floor.s, a2, f0, 0xff800001, 0, 0x7fffffff, FSR_V
 
     /* -inf */
-    test_ftoi floor.s, a2, f0, 0xff800000, 0, 0x80000000
+    test_ftoi floor.s, a2, f0, 0xff800000, 0, 0x80000000, FSR_V
 
     /* negative overflow */
-    test_ftoi floor.s, a2, f0, 0xceffffff, 1, 0x80000000
-    test_ftoi floor.s, a2, f0, 0xcf000000, 0, 0x80000000
-    test_ftoi floor.s, a2, f0, 0xceffffff, 0, 0x80000080
+    test_ftoi floor.s, a2, f0, 0xceffffff, 1, 0x80000000, FSR_V
+    test_ftoi floor.s, a2, f0, 0xcf000000, 0, 0x80000000, FSR__
+    test_ftoi floor.s, a2, f0, 0xceffffff, 0, 0x80000080, FSR__
 
     /* negative */
-    test_ftoi floor.s, a2, f0, 0xbfa00000, 1, -3 /* -1.25 * 2 */
-    test_ftoi floor.s, a2, f0, 0xbfc00000, 0, -2 /* -1.5 */
-    test_ftoi floor.s, a2, f0, 0xbf800000, 1, -2 /* -1 * 2 */
-    test_ftoi floor.s, a2, f0, 0xbf800000, 0, -1 /* -1 */
-    test_ftoi floor.s, a2, f0, 0xbf400000, 0, -1 /* -0.75 */
-    test_ftoi floor.s, a2, f0, 0xbf000000, 0, -1 /* -0.5 */
+    test_ftoi floor.s, a2, f0, 0xbfa00000, 1, -3, FSR_I /* -1.25 * 2 */
+    test_ftoi floor.s, a2, f0, 0xbfc00000, 0, -2, FSR_I /* -1.5 */
+    test_ftoi floor.s, a2, f0, 0xbf800000, 1, -2, FSR__ /* -1 * 2 */
+    test_ftoi floor.s, a2, f0, 0xbf800000, 0, -1, FSR__ /* -1 */
+    test_ftoi floor.s, a2, f0, 0xbf400000, 0, -1, FSR_I /* -0.75 */
+    test_ftoi floor.s, a2, f0, 0xbf000000, 0, -1, FSR_I /* -0.5 */
 
     /* positive */
-    test_ftoi floor.s, a2, f0, 0x3f000000, 0, 0 /* 0.5 */
-    test_ftoi floor.s, a2, f0, 0x3f400000, 0, 0 /* 0.75 */
-    test_ftoi floor.s, a2, f0, 0x3f800000, 0, 1 /* 1 */
-    test_ftoi floor.s, a2, f0, 0x3f800000, 1, 2 /* 1 * 2 */
-    test_ftoi floor.s, a2, f0, 0x3fc00000, 0, 1 /* 1.5 */
-    test_ftoi floor.s, a2, f0, 0x3fa00000, 1, 2 /* 1.25 * 2 */
+    test_ftoi floor.s, a2, f0, 0x3f000000, 0, 0, FSR_I /* 0.5 */
+    test_ftoi floor.s, a2, f0, 0x3f400000, 0, 0, FSR_I /* 0.75 */
+    test_ftoi floor.s, a2, f0, 0x3f800000, 0, 1, FSR__ /* 1 */
+    test_ftoi floor.s, a2, f0, 0x3f800000, 1, 2, FSR__ /* 1 * 2 */
+    test_ftoi floor.s, a2, f0, 0x3fc00000, 0, 1, FSR_I /* 1.5 */
+    test_ftoi floor.s, a2, f0, 0x3fa00000, 1, 2, FSR_I /* 1.25 * 2 */
 
     /* positive overflow */
-    test_ftoi floor.s, a2, f0, 0x4effffff, 0, 0x7fffff80
-    test_ftoi floor.s, a2, f0, 0x4f000000, 0, 0x7fffffff
-    test_ftoi floor.s, a2, f0, 0x4effffff, 1, 0x7fffffff
+    test_ftoi floor.s, a2, f0, 0x4effffff, 0, 0x7fffff80, FSR__
+    test_ftoi floor.s, a2, f0, 0x4f000000, 0, 0x7fffffff, FSR_V
+    test_ftoi floor.s, a2, f0, 0x4effffff, 1, 0x7fffffff, FSR_V
 
     /* +inf */
-    test_ftoi floor.s, a2, f0, 0x7f800000, 0, 0x7fffffff
+    test_ftoi floor.s, a2, f0, 0x7f800000, 0, 0x7fffffff, FSR_V
 
     /* NaN */
-    test_ftoi floor.s, a2, f0, 0x7f800001, 0, 0x7fffffff
-    test_ftoi floor.s, a2, f0, 0x7fc00000, 0, 0x7fffffff
+    test_ftoi floor.s, a2, f0, 0x7f800001, 0, 0x7fffffff, FSR_V
+    test_ftoi floor.s, a2, f0, 0x7fc00000, 0, 0x7fffffff, FSR_V
 test_end
 
 test ceil_s
     /* NaN */
-    test_ftoi ceil.s, a2, f0, 0xffc00001, 0, 0x7fffffff
-    test_ftoi ceil.s, a2, f0, 0xff800001, 0, 0x7fffffff
+    test_ftoi ceil.s, a2, f0, 0xffc00001, 0, 0x7fffffff, FSR_V
+    test_ftoi ceil.s, a2, f0, 0xff800001, 0, 0x7fffffff, FSR_V
 
     /* -inf */
-    test_ftoi ceil.s, a2, f0, 0xff800000, 0, 0x80000000
+    test_ftoi ceil.s, a2, f0, 0xff800000, 0, 0x80000000, FSR_V
 
     /* negative overflow */
-    test_ftoi ceil.s, a2, f0, 0xceffffff, 1, 0x80000000
-    test_ftoi ceil.s, a2, f0, 0xcf000000, 0, 0x80000000
-    test_ftoi ceil.s, a2, f0, 0xceffffff, 0, 0x80000080
+    test_ftoi ceil.s, a2, f0, 0xceffffff, 1, 0x80000000, FSR_V
+    test_ftoi ceil.s, a2, f0, 0xcf000000, 0, 0x80000000, FSR__
+    test_ftoi ceil.s, a2, f0, 0xceffffff, 0, 0x80000080, FSR__
 
     /* negative */
-    test_ftoi ceil.s, a2, f0, 0xbfa00000, 1, -2 /* -1.25 * 2 */
-    test_ftoi ceil.s, a2, f0, 0xbfc00000, 0, -1 /* -1.5 */
-    test_ftoi ceil.s, a2, f0, 0xbf800000, 1, -2 /* -1 * 2 */
-    test_ftoi ceil.s, a2, f0, 0xbf800000, 0, -1 /* -1 */
-    test_ftoi ceil.s, a2, f0, 0xbf400000, 0, 0  /* -0.75 */
-    test_ftoi ceil.s, a2, f0, 0xbf000000, 0, 0  /* -0.5 */
+    test_ftoi ceil.s, a2, f0, 0xbfa00000, 1, -2, FSR_I /* -1.25 * 2 */
+    test_ftoi ceil.s, a2, f0, 0xbfc00000, 0, -1, FSR_I /* -1.5 */
+    test_ftoi ceil.s, a2, f0, 0xbf800000, 1, -2, FSR__ /* -1 * 2 */
+    test_ftoi ceil.s, a2, f0, 0xbf800000, 0, -1, FSR__ /* -1 */
+    test_ftoi ceil.s, a2, f0, 0xbf400000, 0,  0, FSR_I /* -0.75 */
+    test_ftoi ceil.s, a2, f0, 0xbf000000, 0,  0, FSR_I /* -0.5 */
 
     /* positive */
-    test_ftoi ceil.s, a2, f0, 0x3f000000, 0, 1 /* 0.5 */
-    test_ftoi ceil.s, a2, f0, 0x3f400000, 0, 1 /* 0.75 */
-    test_ftoi ceil.s, a2, f0, 0x3f800000, 0, 1 /* 1 */
-    test_ftoi ceil.s, a2, f0, 0x3f800000, 1, 2 /* 1 * 2 */
-    test_ftoi ceil.s, a2, f0, 0x3fc00000, 0, 2 /* 1.5 */
-    test_ftoi ceil.s, a2, f0, 0x3fa00000, 1, 3 /* 1.25 * 2 */
+    test_ftoi ceil.s, a2, f0, 0x3f000000, 0, 1, FSR_I /* 0.5 */
+    test_ftoi ceil.s, a2, f0, 0x3f400000, 0, 1, FSR_I /* 0.75 */
+    test_ftoi ceil.s, a2, f0, 0x3f800000, 0, 1, FSR__ /* 1 */
+    test_ftoi ceil.s, a2, f0, 0x3f800000, 1, 2, FSR__ /* 1 * 2 */
+    test_ftoi ceil.s, a2, f0, 0x3fc00000, 0, 2, FSR_I /* 1.5 */
+    test_ftoi ceil.s, a2, f0, 0x3fa00000, 1, 3, FSR_I /* 1.25 * 2 */
 
     /* positive overflow */
-    test_ftoi ceil.s, a2, f0, 0x4effffff, 0, 0x7fffff80
-    test_ftoi ceil.s, a2, f0, 0x4f000000, 0, 0x7fffffff
-    test_ftoi ceil.s, a2, f0, 0x4effffff, 1, 0x7fffffff
+    test_ftoi ceil.s, a2, f0, 0x4effffff, 0, 0x7fffff80, FSR__
+    test_ftoi ceil.s, a2, f0, 0x4f000000, 0, 0x7fffffff, FSR_V
+    test_ftoi ceil.s, a2, f0, 0x4effffff, 1, 0x7fffffff, FSR_V
 
     /* +inf */
-    test_ftoi ceil.s, a2, f0, 0x7f800000, 0, 0x7fffffff
+    test_ftoi ceil.s, a2, f0, 0x7f800000, 0, 0x7fffffff, FSR_V
 
     /* NaN */
-    test_ftoi ceil.s, a2, f0, 0x7f800001, 0, 0x7fffffff
-    test_ftoi ceil.s, a2, f0, 0x7fc00000, 0, 0x7fffffff
+    test_ftoi ceil.s, a2, f0, 0x7f800001, 0, 0x7fffffff, FSR_V
+    test_ftoi ceil.s, a2, f0, 0x7fc00000, 0, 0x7fffffff, FSR_V
 test_end
 
 test utrunc_s
     /* NaN */
-    test_ftoi utrunc.s, a2, f0, 0xffc00001, 0, 0xffffffff
-    test_ftoi utrunc.s, a2, f0, 0xff800001, 0, 0xffffffff
+    test_ftoi utrunc.s, a2, f0, 0xffc00001, 0, 0xffffffff, FSR_V
+    test_ftoi utrunc.s, a2, f0, 0xff800001, 0, 0xffffffff, FSR_V
 
     /* -inf */
-    test_ftoi utrunc.s, a2, f0, 0xff800000, 0, 0x80000000
+    test_ftoi utrunc.s, a2, f0, 0xff800000, 0, 0x80000000, FSR_V
 
     /* negative overflow */
-    test_ftoi utrunc.s, a2, f0, 0xceffffff, 1, 0x80000000
-    test_ftoi utrunc.s, a2, f0, 0xcf000000, 0, 0x80000000
-    test_ftoi utrunc.s, a2, f0, 0xceffffff, 0, 0x80000080
+    test_ftoi utrunc.s, a2, f0, 0xceffffff, 1, 0x80000000, FSR_V
+    test_ftoi utrunc.s, a2, f0, 0xcf000000, 0, 0x80000000, FSR_V
+    test_ftoi utrunc.s, a2, f0, 0xceffffff, 0, 0x80000080, FSR_V
 
     /* negative */
-    test_ftoi utrunc.s, a2, f0, 0xbfa00000, 1, -2 /* -1.25 * 2 */
-    test_ftoi utrunc.s, a2, f0, 0xbfc00000, 0, -1 /* -1.5 */
-    test_ftoi utrunc.s, a2, f0, 0xbf800000, 1, -2 /* -1 * 2 */
-    test_ftoi utrunc.s, a2, f0, 0xbf800000, 0, -1 /* -1 */
-    test_ftoi utrunc.s, a2, f0, 0xbf400000, 0, 0  /* -0.75 */
-    test_ftoi utrunc.s, a2, f0, 0xbf000000, 0, 0  /* -0.5 */
+    test_ftoi utrunc.s, a2, f0, 0xbfa00000, 1, -2, FSR_V /* -1.25 * 2 */
+    test_ftoi utrunc.s, a2, f0, 0xbfc00000, 0, -1, FSR_V /* -1.5 */
+    test_ftoi utrunc.s, a2, f0, 0xbf800000, 1, -2, FSR_V /* -1 * 2 */
+    test_ftoi utrunc.s, a2, f0, 0xbf800000, 0, -1, FSR_V /* -1 */
+    test_ftoi utrunc.s, a2, f0, 0xbf400000, 0,  0, FSR_I /* -0.75 */
+    test_ftoi utrunc.s, a2, f0, 0xbf000000, 0,  0, FSR_I /* -0.5 */
 
     /* positive */
-    test_ftoi utrunc.s, a2, f0, 0x3f000000, 0, 0 /* 0.5 */
-    test_ftoi utrunc.s, a2, f0, 0x3f400000, 0, 0 /* 0.75 */
-    test_ftoi utrunc.s, a2, f0, 0x3f800000, 0, 1 /* 1 */
-    test_ftoi utrunc.s, a2, f0, 0x3f800000, 1, 2 /* 1 * 2 */
-    test_ftoi utrunc.s, a2, f0, 0x3fc00000, 0, 1 /* 1.5 */
-    test_ftoi utrunc.s, a2, f0, 0x3fa00000, 1, 2 /* 1.25 * 2 */
+    test_ftoi utrunc.s, a2, f0, 0x3f000000, 0, 0, FSR_I /* 0.5 */
+    test_ftoi utrunc.s, a2, f0, 0x3f400000, 0, 0, FSR_I /* 0.75 */
+    test_ftoi utrunc.s, a2, f0, 0x3f800000, 0, 1, FSR__ /* 1 */
+    test_ftoi utrunc.s, a2, f0, 0x3f800000, 1, 2, FSR__ /* 1 * 2 */
+    test_ftoi utrunc.s, a2, f0, 0x3fc00000, 0, 1, FSR_I /* 1.5 */
+    test_ftoi utrunc.s, a2, f0, 0x3fa00000, 1, 2, FSR_I /* 1.25 * 2 */
 
     /* positive overflow */
-    test_ftoi utrunc.s, a2, f0, 0x4effffff, 0, 0x7fffff80
-    test_ftoi utrunc.s, a2, f0, 0x4f000000, 0, 0x80000000
-    test_ftoi utrunc.s, a2, f0, 0x4effffff, 1, 0xffffff00
-    test_ftoi utrunc.s, a2, f0, 0x4f800000, 1, 0xffffffff
+    test_ftoi utrunc.s, a2, f0, 0x4effffff, 0, 0x7fffff80, FSR__
+    test_ftoi utrunc.s, a2, f0, 0x4f000000, 0, 0x80000000, FSR__
+    test_ftoi utrunc.s, a2, f0, 0x4effffff, 1, 0xffffff00, FSR__
+    test_ftoi utrunc.s, a2, f0, 0x4f800000, 1, 0xffffffff, FSR_V
 
     /* +inf */
-    test_ftoi utrunc.s, a2, f0, 0x7f800000, 0, 0xffffffff
+    test_ftoi utrunc.s, a2, f0, 0x7f800000, 0, 0xffffffff, FSR_V
 
     /* NaN */
-    test_ftoi utrunc.s, a2, f0, 0x7f800001, 0, 0xffffffff
-    test_ftoi utrunc.s, a2, f0, 0x7fc00000, 0, 0xffffffff
+    test_ftoi utrunc.s, a2, f0, 0x7f800001, 0, 0xffffffff, FSR_V
+    test_ftoi utrunc.s, a2, f0, 0x7fc00000, 0, 0xffffffff, FSR_V
 test_end
 
 test float_s
     test_itof float.s, f0, a2, -1, 0, \
-        0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
-    test_itof float.s, f0, a2, 0, 0, 0, 0, 0, 0
+        0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, FSR__
+    test_itof float.s, f0, a2, 0, 0, 0, 0, 0, 0, FSR__
     test_itof float.s, f0, a2, 1, 1, \
-        0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, FSR__
     test_itof float.s, f0, a2, 1, 0, \
-        0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, FSR__
     test_itof float.s, f0, a2, 0x7fffffff, 0, \
-        0x4f000000, 0x4effffff, 0x4f000000, 0x4effffff
+        0x4f000000, 0x4effffff, 0x4f000000, 0x4effffff, FSR_I
 test_end
 
 test ufloat_s
-    test_itof ufloat.s, f0, a2, 0, 0, 0, 0, 0, 0
+    test_itof ufloat.s, f0, a2, 0, 0, 0, 0, 0, 0, FSR__
     test_itof ufloat.s, f0, a2, 1, 1, \
-        0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, FSR__
     test_itof ufloat.s, f0, a2, 1, 0, \
-        0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, FSR__
     test_itof ufloat.s, f0, a2, 0x7fffffff, 0, \
-        0x4f000000, 0x4effffff, 0x4f000000, 0x4effffff
+        0x4f000000, 0x4effffff, 0x4f000000, 0x4effffff, FSR_I
     test_itof ufloat.s, f0, a2, 0xffffffff, 0, \
-        0x4f800000, 0x4f7fffff, 0x4f800000, 0x4f7fffff
+        0x4f800000, 0x4f7fffff, 0x4f800000, 0x4f7fffff, FSR_I
 test_end
 
 #endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 16/22] tests/tcg/xtensa: update test_fp1 for DFPU
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (14 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 15/22] tests/tcg/xtensa: update test_fp0_conv for DFPU Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 17/22] tests/tcg/xtensa: update test_lsc " Max Filippov
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

DFPU sets Invalid flag in FSR when at least one argument of FP
comparison opcodes is NaN, SNaN for most opcodes, any NaN for olt/ole.
Add checks for FSR and expected FSR values.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/test_fp1.S | 62 ++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/tests/tcg/xtensa/test_fp1.S b/tests/tcg/xtensa/test_fp1.S
index 6e182e5964bd..77336a3fcf2c 100644
--- a/tests/tcg/xtensa/test_fp1.S
+++ b/tests/tcg/xtensa/test_fp1.S
@@ -1,4 +1,5 @@
 #include "macros.inc"
+#include "fpu.h"
 
 test_suite fp1
 
@@ -9,7 +10,7 @@ test_suite fp1
     wfr     \fr, a2
 .endm
 
-.macro test_ord_ex op, br, fr0, fr1, v0, v1, r
+.macro test_ord_ex op, br, fr0, fr1, v0, v1, r, sr
     movi    a2, 0
     wur     a2, fsr
     movfp   \fr0, \v0
@@ -20,65 +21,70 @@ test_suite fp1
     movt    a2, a3, \br
     assert  eqi, a2, \r
     rur     a2, fsr
+#if DFPU
+    movi    a3, \sr
+    assert  eq, a2, a3
+#else
     assert  eqi, a2, 0
+#endif
 .endm
 
-.macro test_ord op, br, fr0, fr1, v0, v1, r
+.macro test_ord op, br, fr0, fr1, v0, v1, r, sr
     movi    a2, 0
     wur     a2, fcr
-    test_ord_ex \op, \br, \fr0, \fr1, \v0, \v1, \r
+    test_ord_ex \op, \br, \fr0, \fr1, \v0, \v1, \r, \sr
     movi    a2, 0x7c
     wur     a2, fcr
-    test_ord_ex \op, \br, \fr0, \fr1, \v0, \v1, \r
+    test_ord_ex \op, \br, \fr0, \fr1, \v0, \v1, \r, \sr
 .endm
 
-.macro test_ord_all op, aa, ab, ba, aPI, PIa, aN, Na, II, IN, NI
-    test_ord \op  b0,  f0,  f1, 0x3f800000, 0x3f800000, \aa
-    test_ord \op  b1,  f2,  f3, 0x3f800000, 0x3fc00000, \ab
-    test_ord \op  b2,  f4,  f5, 0x3fc00000, 0x3f800000, \ba
-    test_ord \op  b3,  f6,  f7, 0x3f800000, 0x7f800000, \aPI
-    test_ord \op  b4,  f8,  f9, 0x7f800000, 0x3f800000, \PIa
-    test_ord \op  b5, f10, f11, 0x3f800000, 0xffc00001, \aN
-    test_ord \op  b6, f12, f13, 0x3f800000, 0xff800001, \aN
-    test_ord \op  b7, f14, f15, 0x3f800000, 0x7f800001, \aN
-    test_ord \op  b8,  f0,  f1, 0x3f800000, 0x7fc00000, \aN
-    test_ord \op  b9,  f2,  f3, 0xffc00001, 0x3f800000, \Na
-    test_ord \op b10,  f4,  f5, 0xff800001, 0x3f800000, \Na
-    test_ord \op b11,  f6,  f7, 0x7f800001, 0x3f800000, \Na
-    test_ord \op b12,  f8,  f9, 0x7fc00000, 0x3f800000, \Na
-    test_ord \op b13, f10, f11, 0x7f800000, 0x7f800000, \II
-    test_ord \op b14, f12, f13, 0x7f800000, 0x7fc00000, \IN
-    test_ord \op b15, f14, f15, 0x7fc00000, 0x7f800000, \NI
+.macro test_ord_all op, aa, ab, ba, aPI, PIa, aN, Na, II, IN, NI, qnan_sr
+    test_ord \op  b0,  f0,  f1, 0x3f800000, 0x3f800000, \aa,  FSR__    /*   ord == ord */
+    test_ord \op  b1,  f2,  f3, 0x3f800000, 0x3fc00000, \ab,  FSR__    /*   ord <  ord */
+    test_ord \op  b2,  f4,  f5, 0x3fc00000, 0x3f800000, \ba,  FSR__    /*   ord >  ord */
+    test_ord \op  b3,  f6,  f7, 0x3f800000, 0x7f800000, \aPI, FSR__    /*   ord   +INF */
+    test_ord \op  b4,  f8,  f9, 0x7f800000, 0x3f800000, \PIa, FSR__    /*  +INF    ord */
+    test_ord \op  b5, f10, f11, 0x3f800000, 0xffc00001, \aN,  \qnan_sr /*   ord  -QNaN */
+    test_ord \op  b6, f12, f13, 0x3f800000, 0xff800001, \aN,  FSR_V    /*   ord  -SNaN */
+    test_ord \op  b7, f14, f15, 0x3f800000, 0x7f800001, \aN,  FSR_V    /*   ord  +SNaN */
+    test_ord \op  b8,  f0,  f1, 0x3f800000, 0x7fc00000, \aN,  \qnan_sr /*   ord  +QNaN */
+    test_ord \op  b9,  f2,  f3, 0xffc00001, 0x3f800000, \Na,  \qnan_sr /* -QNaN    ord */
+    test_ord \op b10,  f4,  f5, 0xff800001, 0x3f800000, \Na,  FSR_V    /* -SNaN    ord */
+    test_ord \op b11,  f6,  f7, 0x7f800001, 0x3f800000, \Na,  FSR_V    /* +SNaN    ord */
+    test_ord \op b12,  f8,  f9, 0x7fc00000, 0x3f800000, \Na,  \qnan_sr /* +QNaN    ord */
+    test_ord \op b13, f10, f11, 0x7f800000, 0x7f800000, \II,  FSR__    /*  +INF   +INF */
+    test_ord \op b14, f12, f13, 0x7f800000, 0x7fc00000, \IN,  \qnan_sr /*  +INF  +QNaN */
+    test_ord \op b15, f14, f15, 0x7fc00000, 0x7f800000, \NI,  \qnan_sr /* +QNaN   +INF */
 .endm
 
 test un_s
     movi    a2, 1
     wsr     a2, cpenable
-    test_ord_all un.s, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1
+    test_ord_all un.s, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, FSR__
 test_end
 
 test oeq_s
-    test_ord_all oeq.s, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0
+    test_ord_all oeq.s, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, FSR__
 test_end
 
 test ueq_s
-    test_ord_all ueq.s, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1
+    test_ord_all ueq.s, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, FSR__
 test_end
 
 test olt_s
-    test_ord_all olt.s, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0
+    test_ord_all olt.s, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, FSR_V
 test_end
 
 test ult_s
-    test_ord_all ult.s, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1
+    test_ord_all ult.s, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, FSR__
 test_end
 
 test ole_s
-    test_ord_all ole.s, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0
+    test_ord_all ole.s, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, FSR_V
 test_end
 
 test ule_s
-    test_ord_all ule.s, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1
+    test_ord_all ule.s, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, FSR__
 test_end
 
 .macro test_cond op, fr0, fr1, cr, v0, v1, r
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 17/22] tests/tcg/xtensa: update test_lsc for DFPU
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (15 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 16/22] tests/tcg/xtensa: update test_fp1 " Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 18/22] tests/tcg/xtensa: add fp0 div and sqrt tests Max Filippov
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

DFPU doesn't have pre-increment FP load/store opcodes, it has
post-increment opcodes instead. Test increment opcodes present in the
current config.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/test_lsc.S | 47 +++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/tests/tcg/xtensa/test_lsc.S b/tests/tcg/xtensa/test_lsc.S
index 0578bf19e72e..9d59c1815a9e 100644
--- a/tests/tcg/xtensa/test_lsc.S
+++ b/tests/tcg/xtensa/test_lsc.S
@@ -1,4 +1,5 @@
 #include "macros.inc"
+#include "fpu.h"
 
 test_suite lsc
 
@@ -9,9 +10,14 @@ test lsi
     wsr     a2, cpenable
 
     movi    a2, 1f
-    lsi     f0, a2, 0
     lsi     f1, a2, 4
+#if DFPU
+    lsi     f2, a2, 8
+    lsip    f0, a2, 8
+#else
+    lsi     f0, a2, 0
     lsiu    f2, a2, 8
+#endif
     movi    a3, 1f + 8
     assert  eq, a2, a3
     rfr     a2, f0
@@ -34,13 +40,18 @@ test ssi
     movi    a2, 1f
     movi    a3, 0x40800000
     wfr     f3, a3
-    ssi     f3, a2, 0
     movi    a3, 0x40a00000
     wfr     f4, a3
-    ssi     f4, a2, 4
     movi    a3, 0x40c00000
     wfr     f5, a3
+    ssi     f4, a2, 4
+#if DFPU
+    ssi     f5, a2, 8
+    ssip    f3, a2, 8
+#else
+    ssi     f3, a2, 0
     ssiu    f5, a2, 8
+#endif
     movi    a3, 1f + 8
     assert  eq, a2, a3
     l32i    a4, a2, -8
@@ -62,11 +73,16 @@ test_end
 test lsx
     movi    a2, 1f
     movi    a3, 0
+    movi    a4, 4
+    movi    a5, 8
+    lsx     f7, a2, a4
+#if DFPU
+    lsx     f8, a2, a5
+    lsxp    f6, a2, a5
+#else
     lsx     f6, a2, a3
-    movi    a3, 4
-    lsx     f7, a2, a3
-    movi    a3, 8
-    lsxu    f8, a2, a3
+    lsxu    f8, a2, a5
+#endif
     movi    a3, 1f + 8
     assert  eq, a2, a3
     rfr     a2, f6
@@ -87,18 +103,23 @@ test_end
 
 test ssx
     movi    a2, 1f
-    movi    a3, 0
     movi    a4, 0x41200000
     wfr     f9, a4
-    ssx     f9, a2, a3
-    movi    a3, 4
     movi    a4, 0x41300000
     wfr     f10, a4
-    ssx     f10, a2, a3
-    movi    a3, 8
     movi    a4, 0x41400000
     wfr     f11, a4
-    ssxu    f11, a2, a3
+    movi    a3, 0
+    movi    a4, 4
+    movi    a5, 8
+    ssx     f10, a2, a4
+#if DFPU
+    ssx     f11, a2, a5
+    ssxp    f9, a2, a5
+#else
+    ssx     f9, a2, a3
+    ssxu    f11, a2, a5
+#endif
     movi    a3, 1f + 8
     assert  eq, a2, a3
     l32i    a4, a2, -8
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 18/22] tests/tcg/xtensa: add fp0 div and sqrt tests
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (16 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 17/22] tests/tcg/xtensa: update test_lsc " Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 19/22] tests/tcg/xtensa: test double precision load/store Max Filippov
  2020-07-11 11:06 ` [PATCH v4 20/22] tests/tcg/xtensa: add DFP0 arithmetic tests Max Filippov
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Test exact division/sqrt DFPU sequences.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/test_fp0_div.S  | 82 ++++++++++++++++++++++++++++++++
 tests/tcg/xtensa/test_fp0_sqrt.S | 76 +++++++++++++++++++++++++++++
 2 files changed, 158 insertions(+)
 create mode 100644 tests/tcg/xtensa/test_fp0_div.S
 create mode 100644 tests/tcg/xtensa/test_fp0_sqrt.S

diff --git a/tests/tcg/xtensa/test_fp0_div.S b/tests/tcg/xtensa/test_fp0_div.S
new file mode 100644
index 000000000000..c3e7ad7bb5b3
--- /dev/null
+++ b/tests/tcg/xtensa/test_fp0_div.S
@@ -0,0 +1,82 @@
+#include "macros.inc"
+#include "fpu.h"
+
+test_suite fp0_div
+
+#if XCHAL_HAVE_FP_DIV
+
+.macro  divs_seq q, a, b, r, y, y0, an, bn, e, ex
+    div0.s      \y0, \b
+    nexp01.s    \bn, \b
+    const.s     \e, 1
+    maddn.s     \e, \bn, \y0
+    mov.s       \y, \y0
+    mov.s       \ex, \b
+    nexp01.s    \an, \a
+    maddn.s     \y, \e, \y0
+    const.s     \e, 1
+    const.s     \q, 0
+    neg.s       \r, \an
+    maddn.s     \e, \bn, \y
+    maddn.s     \q, \r, \y0
+    mkdadj.s    \ex, \a
+    maddn.s     \y, \e, \y
+    maddn.s     \r, \bn, \q
+    const.s     \e, 1
+    maddn.s     \e, \bn, \y
+    maddn.s     \q, \r, \y
+    neg.s       \r, \an
+    maddn.s     \y, \e, \y
+    maddn.s     \r, \bn, \q
+    addexpm.s   \q, \ex
+    addexp.s    \y, \ex
+    divn.s      \q, \r, \y
+.endm
+
+.macro div_s fr0, fr1, fr2
+    divs_seq    \fr0, \fr1, \fr2, f9, f10, f11, f12, f13, f14, f15
+.endm
+
+.macro movfp fr, v
+    movi        a2, \v
+    wfr         \fr, a2
+.endm
+
+.macro check_res fr, r, sr
+    rfr         a2, \fr
+    dump        a2
+    movi        a3, \r
+    assert      eq, a2, a3
+    rur         a2, fsr
+    movi        a3, \sr
+    assert      eq, a2, a3
+.endm
+
+test div_s
+    movi        a2, 1
+    wsr         a2, cpenable
+
+    test_op2    div_s, f0, f1, f2, 0x40000000, 0x40400000, \
+        0x3f2aaaab, 0x3f2aaaaa, 0x3f2aaaab, 0x3f2aaaaa, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
+    test_op2    div_s, f3, f4, f5, F32_1, F32_0, \
+        F32_PINF, F32_PINF, F32_PINF, F32_PINF, \
+           FSR_Z,    FSR_Z,    FSR_Z,    FSR_Z
+    test_op2    div_s, f6, f7, f8, F32_0, F32_0, \
+        F32_DNAN, F32_DNAN, F32_DNAN, F32_DNAN, \
+           FSR_V,    FSR_V,    FSR_V,    FSR_V
+
+    /* MAX_FLOAT / 0.5 = +inf/MAX_FLOAT  */
+    test_op2    div_s, f0, f1, f2, F32_MAX, F32_0_5, \
+        F32_PINF, F32_MAX, F32_PINF, F32_MAX, \
+          FSR_OI,  FSR_OI,   FSR_OI,  FSR_OI
+
+    /* 0.5 / MAX_FLOAT = denorm  */
+    test_op2    div_s, f0, f1, f2, F32_0_5, F32_MAX, \
+        0x00100000, 0x00100000, 0x00100001, 0x00100000, \
+            FSR_UI,     FSR_UI,     FSR_UI,     FSR_UI
+test_end
+
+#endif
+
+test_suite_end
diff --git a/tests/tcg/xtensa/test_fp0_sqrt.S b/tests/tcg/xtensa/test_fp0_sqrt.S
new file mode 100644
index 000000000000..585973dce6bc
--- /dev/null
+++ b/tests/tcg/xtensa/test_fp0_sqrt.S
@@ -0,0 +1,76 @@
+#include "macros.inc"
+#include "fpu.h"
+
+test_suite fp0_sqrt
+
+#if XCHAL_HAVE_FP_SQRT
+
+.macro  sqrt_seq r, a, y, t1, hn, h2, t5, h
+    sqrt0.s     \y, \a
+    const.s     \t1, 0
+    maddn.s     \t1, \y, \y
+    nexp01.s    \hn, \a
+    const.s     \r, 3
+    addexp.s    \hn, \r
+    maddn.s     \r, \t1, \hn
+    nexp01.s    \t1, \a
+    neg.s       \h2, \t1
+    maddn.s     \y, \r, \y
+    const.s     \r, 0
+    const.s     \t5, 0
+    const.s     \h, 0
+    maddn.s     \r, \h2, \y
+    maddn.s     \t5, \y, \hn
+    const.s     \hn, 3
+    maddn.s     \h, \hn, \y
+    maddn.s     \t1, \r, \r
+    maddn.s     \hn, \t5, \y
+    neg.s       \y, \h
+    maddn.s     \r, \t1, \y
+    maddn.s     \h, \hn, \h
+    mksadj.s    \y, \a
+    nexp01.s    \a, \a
+    maddn.s     \a, \r, \r
+    neg.s       \t1, \h
+    addexpm.s   \r, \y
+    addexp.s    \t1, \y
+    divn.s      \r, \a, \t1
+.endm
+
+.macro sqrt_s fr0, fr1
+    sqrt_seq    \fr0, \fr1, f10, f11, f12, f13, f14, f15
+.endm
+
+.macro movfp fr, v
+    movi        a2, \v
+    wfr         \fr, a2
+.endm
+
+.macro check_res fr, r, sr
+    rfr         a2, \fr
+    dump        a2
+    movi        a3, \r
+    assert      eq, a2, a3
+    rur         a2, fsr
+    movi        a3, \sr
+    assert      eq, a2, a3
+.endm
+
+test sqrt_s
+    movi        a2, 1
+    wsr         a2, cpenable
+
+    test_op1    sqrt_s, f0, f1, 0x40000000, \
+        0x3fb504f3, 0x3fb504f3, 0x3fb504f4, 0x3fb504f3, \
+             FSR_I,      FSR_I,      FSR_I,      FSR_I
+    test_op1    sqrt_s, f3, f4, F32_1, \
+        F32_1, F32_1, F32_1, F32_1, \
+        FSR__, FSR__, FSR__, FSR__
+    test_op1    sqrt_s, f6, f7, F32_MINUS | F32_1, \
+        F32_DNAN, F32_DNAN, F32_DNAN, F32_DNAN, \
+           FSR_V,    FSR_V,    FSR_V,    FSR_V
+test_end
+
+#endif
+
+test_suite_end
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 19/22] tests/tcg/xtensa: test double precision load/store
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (17 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 18/22] tests/tcg/xtensa: add fp0 div and sqrt tests Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  2020-07-11 11:06 ` [PATCH v4 20/22] tests/tcg/xtensa: add DFP0 arithmetic tests Max Filippov
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Add ldi[p]/sdi[p]/ldx[p]/sdx[p] opcode tests to test_lsc.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 tests/tcg/xtensa/test_lsc.S | 123 ++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/tests/tcg/xtensa/test_lsc.S b/tests/tcg/xtensa/test_lsc.S
index 9d59c1815a9e..348822bdd359 100644
--- a/tests/tcg/xtensa/test_lsc.S
+++ b/tests/tcg/xtensa/test_lsc.S
@@ -140,4 +140,127 @@ test_end
 
 #endif
 
+#if XCHAL_HAVE_DFP
+
+#if XCHAL_HAVE_BE
+#define F64_HIGH_OFF 0
+#else
+#define F64_HIGH_OFF 4
+#endif
+
+.macro movdf fr, hi, lo
+    movi    a2, \hi
+    movi    a3, \lo
+    wfrd    \fr, a2, a3
+.endm
+
+test ldi
+    movi    a2, 1
+    wsr     a2, cpenable
+
+    movi    a2, 1f
+    ldi     f1, a2, 8
+    ldi     f2, a2, 16
+    ldip    f0, a2, 16
+    movi    a3, 1f + 16
+    assert  eq, a2, a3
+    rfrd    a2, f0
+    movi    a3, 0x3ff00000
+    assert  eq, a2, a3
+    rfrd    a2, f1
+    movi    a3, 0x40000000
+    assert  eq, a2, a3
+    rfrd    a2, f2
+    movi    a3, 0x40080000
+    assert  eq, a2, a3
+.data
+    .align  8
+1:
+.double 1, 2, 3
+.text
+test_end
+
+test sdi
+    movdf   f3, 0x40800000, 0
+    movdf   f4, 0x40a00000, 0
+    movdf   f5, 0x40c00000, 0
+    movi    a2, 1f
+    sdi     f4, a2, 8
+    sdi     f5, a2, 16
+    sdip    f3, a2, 16
+    movi    a3, 1f + 16
+    assert  eq, a2, a3
+    l32i    a4, a2, -16 + F64_HIGH_OFF
+    movi    a3, 0x40800000
+    assert  eq, a4, a3
+    l32i    a4, a2, -8 + F64_HIGH_OFF
+    movi    a3, 0x40a00000
+    assert  eq, a4, a3
+    l32i    a4, a2, F64_HIGH_OFF
+    movi    a3, 0x40c00000
+    assert  eq, a4, a3
+.data
+    .align  8
+1:
+.double 0, 0, 0
+.text
+test_end
+
+test ldx
+    movi    a2, 1f
+    movi    a3, 0
+    movi    a4, 8
+    movi    a5, 16
+    ldx     f7, a2, a4
+    ldx     f8, a2, a5
+    ldxp    f6, a2, a5
+    movi    a3, 1f + 16
+    assert  eq, a2, a3
+    rfrd    a2, f6
+    movi    a3, 0x401c0000
+    assert  eq, a2, a3
+    rfrd    a2, f7
+    movi    a3, 0x40200000
+    assert  eq, a2, a3
+    rfrd    a2, f8
+    movi    a3, 0x40220000
+    assert  eq, a2, a3
+.data
+    .align  8
+1:
+.double 7, 8, 9
+.text
+test_end
+
+test sdx
+    movdf   f9, 0x41200000, 0
+    movdf   f10, 0x41300000, 0
+    movdf   f11, 0x41400000, 0
+    movi    a2, 1f
+    movi    a3, 0
+    movi    a4, 8
+    movi    a5, 16
+    sdx     f10, a2, a4
+    sdx     f11, a2, a5
+    sdxp    f9, a2, a5
+    movi    a3, 1f + 16
+    assert  eq, a2, a3
+    l32i    a4, a2, -16 + F64_HIGH_OFF
+    movi    a3, 0x41200000
+    assert  eq, a4, a3
+    l32i    a4, a2, -8 + F64_HIGH_OFF
+    movi    a3, 0x41300000
+    assert  eq, a4, a3
+    l32i    a4, a2, F64_HIGH_OFF
+    movi    a3, 0x41400000
+    assert  eq, a4, a3
+.data
+    .align  8
+1:
+.double 0, 0, 0
+.text
+test_end
+
+#endif
+
 test_suite_end
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 20/22] tests/tcg/xtensa: add DFP0 arithmetic tests
  2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
                   ` (18 preceding siblings ...)
  2020-07-11 11:06 ` [PATCH v4 19/22] tests/tcg/xtensa: test double precision load/store Max Filippov
@ 2020-07-11 11:06 ` Max Filippov
  19 siblings, 0 replies; 21+ messages in thread
From: Max Filippov @ 2020-07-11 11:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov, Richard Henderson

Add test for basic double precision opcode properties.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
Changes v2->v3:
- add more infzero tests for DFPU
- fix test names in test_dfp0_arith.S

 tests/tcg/xtensa/test_dfp0_arith.S | 162 +++++++++++++++++++++++++++++
 1 file changed, 162 insertions(+)
 create mode 100644 tests/tcg/xtensa/test_dfp0_arith.S

diff --git a/tests/tcg/xtensa/test_dfp0_arith.S b/tests/tcg/xtensa/test_dfp0_arith.S
new file mode 100644
index 000000000000..53bf8122d082
--- /dev/null
+++ b/tests/tcg/xtensa/test_dfp0_arith.S
@@ -0,0 +1,162 @@
+#include "macros.inc"
+#include "fpu.h"
+
+test_suite fp0_arith
+
+#if XCHAL_HAVE_DFP
+
+.macro movfp fr, v
+    movi    a2, ((\v) >> 32) & 0xffffffff
+    movi    a3, ((\v) & 0xffffffff)
+    wfrd    \fr, a2, a3
+.endm
+
+.macro check_res fr, r, sr
+    rfrd    a2, \fr
+    dump    a2
+    movi    a3, ((\r) >> 32) & 0xffffffff
+    assert  eq, a2, a3
+    rfr    a2, \fr
+    dump    a2
+    movi    a3, ((\r) & 0xffffffff)
+    assert  eq, a2, a3
+    rur     a2, fsr
+    movi    a3, \sr
+    assert  eq, a2, a3
+.endm
+
+test add_d
+    movi    a2, 1
+    wsr     a2, cpenable
+
+    /* MAX_FLOAT + MAX_FLOAT = +inf/MAX_FLOAT  */
+    test_op2 add.d, f6, f7, f8, F64_MAX, F64_MAX, \
+        F64_PINF, F64_MAX, F64_PINF, F64_MAX, \
+          FSR_OI,  FSR_OI,   FSR_OI,  FSR_OI
+test_end
+
+test add_d_inf
+    /* 1 + +inf = +inf  */
+    test_op2 add.d, f6, f7, f8, F64_1, F64_PINF, \
+        F64_PINF, F64_PINF, F64_PINF, F64_PINF, \
+           FSR__,    FSR__,    FSR__,    FSR__
+
+    /* +inf + -inf = default NaN */
+    test_op2 add.d, f0, f1, f2, F64_PINF, F64_NINF, \
+        F64_DNAN, F64_DNAN, F64_DNAN, F64_DNAN, \
+           FSR_V,    FSR_V,    FSR_V,    FSR_V
+test_end
+
+test add_d_nan_dfpu
+    /* 1 + QNaN = QNaN  */
+    test_op2 add.d, f9, f10, f11, F64_1, F64_QNAN(1), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    /* 1 + SNaN = QNaN  */
+    test_op2 add.d, f12, f13, f14, F64_1, F64_SNAN(1), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+
+    /* SNaN1 + SNaN2 = QNaN2 */
+    test_op2 add.d, f15, f0, f1, F64_SNAN(1), F64_SNAN(2), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+    /* QNaN1 + SNaN2 = QNaN2 */
+    test_op2 add.d, f5, f6, f7, F64_QNAN(1), F64_SNAN(2), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+    /* SNaN1 + QNaN2 = QNaN2 */
+    test_op2 add.d, f8, f9, f10, F64_SNAN(1), F64_QNAN(2), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+test_end
+
+test sub_d
+    /* norm - norm = denorm */
+    test_op2 sub.d, f6, f7, f8, F64_MIN_NORM | 1, F64_MIN_NORM, \
+        0x00000001, 0x00000001, 0x00000001, 0x00000001, \
+             FSR__,      FSR__,      FSR__,      FSR__
+test_end
+
+test mul_d
+    test_op2 mul.d, f0, f1, f2, F64_1 | 1, F64_1 | 1, \
+        F64_1 | 2, F64_1 | 2, F64_1 | 3, F64_1 | 2, \
+            FSR_I,     FSR_I,     FSR_I,     FSR_I
+    /* MAX_FLOAT/2 * MAX_FLOAT/2 = +inf/MAX_FLOAT  */
+    test_op2 mul.d, f6, f7, f8, F64_MAX_2, F64_MAX_2, \
+        F64_PINF, F64_MAX, F64_PINF, F64_MAX, \
+          FSR_OI,  FSR_OI,   FSR_OI,  FSR_OI
+    /* min norm * min norm = 0/denorm */
+    test_op2 mul.d, f6, f7, f8, F64_MIN_NORM, F64_MIN_NORM, \
+         F64_0,  F64_0, 0x00000001,  F64_0, \
+        FSR_UI, FSR_UI,     FSR_UI, FSR_UI
+    /* inf * 0 = default NaN */
+    test_op2 mul.d, f6, f7, f8, F64_PINF, F64_0, \
+        F64_DNAN, F64_DNAN, F64_DNAN, F64_DNAN, \
+           FSR_V,    FSR_V,    FSR_V,    FSR_V
+test_end
+
+test madd_d
+    test_op3 madd.d, f0, f1, f2, f0, F64_0, F64_1 | 1, F64_1 | 1, \
+        F64_1 | 2, F64_1 | 2, F64_1 | 3, F64_1 | 2, \
+            FSR_I,     FSR_I,     FSR_I,     FSR_I
+test_end
+
+test madd_d_precision
+    test_op3 madd.d, f0, f1, f2, f0, \
+        F64_MINUS | F64_1 | 2, F64_1 | 1, F64_1 | 1, \
+        0x3970000000000000, 0x3970000000000000, 0x3970000000000000, 0x3970000000000000, \
+             FSR__,      FSR__,      FSR__,      FSR__
+test_end
+
+test madd_d_nan_dfpu
+    /* DFPU madd/msub NaN1, NaN2, NaN3 priority: NaN1, NaN3, NaN2 */
+    test_op3 madd.d, f0, f1, f2, f0, F64_QNAN(1), F64_1, F64_1, \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.d, f0, f1, f2, f0, F64_1, F64_QNAN(2), F64_1, \
+        F64_QNAN(2), F64_QNAN(2), F64_QNAN(2), F64_QNAN(2), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.d, f0, f1, f2, f0, F64_1, F64_1, F64_QNAN(3), \
+        F64_QNAN(3), F64_QNAN(3), F64_QNAN(3), F64_QNAN(3), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    test_op3 madd.d, f0, f1, f2, f0, F64_QNAN(1), F64_QNAN(2), F64_1, \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.d, f0, f1, f2, f0, F64_QNAN(1), F64_1, F64_QNAN(3), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+    test_op3 madd.d, f0, f1, f2, f0, F64_1, F64_QNAN(2), F64_QNAN(3), \
+        F64_QNAN(3), F64_QNAN(3), F64_QNAN(3), F64_QNAN(3), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    test_op3 madd.d, f0, f1, f2, f0, F64_QNAN(1), F64_QNAN(2), F64_QNAN(3), \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR__,       FSR__,       FSR__,       FSR__
+
+    /* inf * 0 = default NaN */
+    test_op3 madd.d, f0, f1, f2, f0, F64_1, F64_PINF, F64_0, \
+        F64_DNAN, F64_DNAN, F64_DNAN, F64_DNAN, \
+           FSR_V,    FSR_V,    FSR_V,    FSR_V
+    /* inf * 0 + SNaN1 = QNaN1 */
+    test_op3 madd.d, f0, f1, f2, f0, F64_SNAN(1), F64_PINF, F64_0, \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+    /* inf * 0 + QNaN1 = QNaN1 */
+    test_op3 madd.d, f0, f1, f2, f0, F64_QNAN(1), F64_PINF, F64_0, \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+
+    /* madd/msub SNaN turns to QNaN and sets Invalid flag */
+    test_op3 madd.d, f0, f1, f2, f0, F64_SNAN(1), F64_1, F64_1, \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+    test_op3 madd.d, f0, f1, f2, f0, F64_QNAN(1), F64_SNAN(2), F64_1, \
+        F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), F64_QNAN(1), \
+              FSR_V,       FSR_V,       FSR_V,       FSR_V
+test_end
+
+#endif
+
+test_suite_end
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-07-11 11:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-11 11:06 [PATCH v4 00/22] target/xtensa: implement double precision FPU Max Filippov
2020-07-11 11:06 ` [PATCH v4 01/22] softfloat: make NO_SIGNALING_NANS runtime property Max Filippov
2020-07-11 11:06 ` [PATCH v4 02/22] softfloat: pass float_status pointer to pickNaN Max Filippov
2020-07-11 11:06 ` [PATCH v4 03/22] softfloat: add xtensa specialization for pickNaNMulAdd Max Filippov
2020-07-11 11:06 ` [PATCH v4 04/22] target/xtensa: add geometry to xtensa_get_regfile_by_name Max Filippov
2020-07-11 11:06 ` [PATCH v4 05/22] target/xtensa: support copying registers up to 64 bits wide Max Filippov
2020-07-11 11:06 ` [PATCH v4 06/22] target/xtensa: rename FPU2000 translators and helpers Max Filippov
2020-07-11 11:06 ` [PATCH v4 07/22] target/xtensa: move FSR/FCR register accessors Max Filippov
2020-07-11 11:06 ` [PATCH v4 08/22] target/xtensa: don't access BR regfile directly Max Filippov
2020-07-11 11:06 ` [PATCH v4 09/22] target/xtensa: add DFPU option Max Filippov
2020-07-11 11:06 ` [PATCH v4 10/22] target/xtensa: add DFPU registers and opcodes Max Filippov
2020-07-11 11:06 ` [PATCH v4 11/22] target/xtensa: implement FPU division and square root Max Filippov
2020-07-11 11:06 ` [PATCH v4 12/22] tests/tcg/xtensa: fix test execution on ISS Max Filippov
2020-07-11 11:06 ` [PATCH v4 13/22] tests/tcg/xtensa: update test_fp0_arith for DFPU Max Filippov
2020-07-11 11:06 ` [PATCH v4 14/22] tests/tcg/xtensa: expand madd tests Max Filippov
2020-07-11 11:06 ` [PATCH v4 15/22] tests/tcg/xtensa: update test_fp0_conv for DFPU Max Filippov
2020-07-11 11:06 ` [PATCH v4 16/22] tests/tcg/xtensa: update test_fp1 " Max Filippov
2020-07-11 11:06 ` [PATCH v4 17/22] tests/tcg/xtensa: update test_lsc " Max Filippov
2020-07-11 11:06 ` [PATCH v4 18/22] tests/tcg/xtensa: add fp0 div and sqrt tests Max Filippov
2020-07-11 11:06 ` [PATCH v4 19/22] tests/tcg/xtensa: test double precision load/store Max Filippov
2020-07-11 11:06 ` [PATCH v4 20/22] tests/tcg/xtensa: add DFP0 arithmetic tests Max Filippov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.