All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/13] hardfloat
@ 2018-11-24 23:55 Emilio G. Cota
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html

Changes since v5:

- Rebase on rth/tcg-next-for-4.0

- Use QEMU_FLATTEN instead of __attribute__((flatten))

- Merge rth's cleanups (thanks!). With this, we now use a union to
  hold {float|float32} or {double|float64} types, which gets
  rid of most macros. I added a few optimizations (i.e. likely
  hints in some branches, and not using temp variables to hold
  the result of fpclassify) to roughly match (and sometimes
  surpass) v5's performance.

- float64_sqrt: use fpclassify, which gives a 1.5x speedup.

This series introduces no regressions to fp-test. You can test
hardfloat by passing "-f x" to fp-test (so that the inexact flag
is set before each operation) and using even rounding (fp-test's
default). Note that hardfloat does not affect operations with
other rounding modes.

Perf numbers for fp-bench running on several host machines are in
each commit log; numbers for several benchmarks (NBench, SPEC06fp)
are in the last patch's commit log. These numbers are a bit
outdated (they're from v2 or so), but I've decided to keep them
because they give a good idea of the speedups to expect, and I don't
have time to re-run them =)

I did re-run the numbers for sqrt and cmp, though, since the
implementation has changed quite a bit since v5. I didn't
re-run these on Aarch64 and PPC hosts due to lack of time,
but I doubt they'd change significantly.

You can fetch this series from:
  https://github.com/cota/qemu/tree/hardfloat-v6

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-03 12:13   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

This gets rid of the muladd errors due to not raising the invalid flag.

- Before:
Errors found in f64_mulAdd, rounding near_even, tininess before rounding:
+000.0000000000000  +7FF.0000000000000  +7FF.FFFFFFFFFFFFF
        => +7FF.FFFFFFFFFFFFF .....  expected -7FF.FFFFFFFFFFFFF v....
[...]

- After:
In 6133248 tests, no errors found in f64_mulAdd, rounding near_even, tininess before rounding.
[...]

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/fp/Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index d649a5a1db..49cdcd1bd2 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -29,6 +29,9 @@ QEMU_INCLUDES += -I$(TF_SOURCE_DIR)
 
 # work around TARGET_* poisoning
 QEMU_CFLAGS += -DHW_POISON_H
+# define a target to match testfloat's implementation-defined choices, such as
+# whether to raise the invalid flag when dealing with NaNs in muladd.
+QEMU_CFLAGS += -DTARGET_ARM
 
 # capstone has a platform.h file that clashes with softfloat's
 QEMU_CFLAGS := $(filter-out %capstone, $(QEMU_CFLAGS))
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal Emilio G. Cota
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

This paves the way for upcoming work.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/fpu/softfloat.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 8fd9f9bbae..9eeccd88a5 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -464,6 +464,16 @@ static inline int float32_is_zero_or_denormal(float32 a)
     return (float32_val(a) & 0x7f800000) == 0;
 }
 
+static inline bool float32_is_normal(float32 a)
+{
+    return ((float32_val(a) + 0x00800000) & 0x7fffffff) >= 0x01000000;
+}
+
+static inline bool float32_is_denormal(float32 a)
+{
+    return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
+}
+
 static inline float32 float32_set_sign(float32 a, int sign)
 {
     return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -605,6 +615,16 @@ static inline int float64_is_zero_or_denormal(float64 a)
     return (float64_val(a) & 0x7ff0000000000000LL) == 0;
 }
 
+static inline bool float64_is_normal(float64 a)
+{
+    return ((float64_val(a) + (1ULL << 52)) & -1ULL >> 1) >= 1ULL << 53;
+}
+
+static inline bool float64_is_denormal(float64 a)
+{
+    return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
+}
+
 static inline float64 float64_set_sign(float64 a, int sign)
 {
     return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/tricore/fpu_helper.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c
index df162902d6..31df462e4a 100644
--- a/target/tricore/fpu_helper.c
+++ b/target/tricore/fpu_helper.c
@@ -44,11 +44,6 @@ static inline uint8_t f_get_excp_flags(CPUTriCoreState *env)
               | float_flag_inexact);
 }
 
-static inline bool f_is_denormal(float32 arg)
-{
-    return float32_is_zero_or_denormal(arg) && !float32_is_zero(arg);
-}
-
 static inline float32 f_maddsub_nan_result(float32 arg1, float32 arg2,
                                            float32 arg3, float32 result,
                                            uint32_t muladd_negate_c)
@@ -260,8 +255,8 @@ uint32_t helper_fcmp(CPUTriCoreState *env, uint32_t r1, uint32_t r2)
     set_flush_inputs_to_zero(0, &env->fp_status);
 
     result = 1 << (float32_compare_quiet(arg1, arg2, &env->fp_status) + 1);
-    result |= f_is_denormal(arg1) << 4;
-    result |= f_is_denormal(arg2) << 5;
+    result |= float32_is_denormal(arg1) << 4;
+    result |= float32_is_denormal(arg2) << 5;
 
     flags = f_get_excp_flags(env);
     if (flags) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (2 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-03 14:16   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).

Given that we'll be including <math.h> soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().

Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e1eef954e6..ecdc00c633 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -336,8 +336,8 @@ static inline float64 float64_pack_raw(FloatParts p)
 #include "softfloat-specialize.h"
 
 /* Canonicalize EXP and FRAC, setting CLS.  */
-static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
-                               float_status *status)
+static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
+                                  float_status *status)
 {
     if (part.exp == parm->exp_max && !parm->arm_althp) {
         if (part.frac == 0) {
@@ -513,7 +513,7 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
 static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
                                             const FloatFmt *params)
 {
-    return canonicalize(float16_unpack_raw(f), params, s);
+    return sf_canonicalize(float16_unpack_raw(f), params, s);
 }
 
 static FloatParts float16_unpack_canonical(float16 f, float_status *s)
@@ -534,7 +534,7 @@ static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
 
 static FloatParts float32_unpack_canonical(float32 f, float_status *s)
 {
-    return canonicalize(float32_unpack_raw(f), &float32_params, s);
+    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
 }
 
 static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
@@ -544,7 +544,7 @@ static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
 
 static FloatParts float64_unpack_canonical(float64 f, float_status *s)
 {
-    return canonicalize(float64_unpack_raw(f), &float64_params, s);
+    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
 }
 
 static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (3 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-03 14:16   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

These will gain some users very soon.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/fpu/softfloat.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 9eeccd88a5..38a5e99cf3 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -474,6 +474,11 @@ static inline bool float32_is_denormal(float32 a)
     return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
 }
 
+static inline bool float32_is_zero_or_normal(float32 a)
+{
+    return float32_is_normal(a) || float32_is_zero(a);
+}
+
 static inline float32 float32_set_sign(float32 a, int sign)
 {
     return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -625,6 +630,11 @@ static inline bool float64_is_denormal(float64 a)
     return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
 }
 
+static inline bool float64_is_zero_or_normal(float64 a)
+{
+    return float64_is_normal(a) || float64_is_zero(a);
+}
+
 static inline float64 float64_set_sign(float64 a, int sign)
 {
     return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (4 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-03 14:29   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

These microbenchmarks will allow us to measure the performance impact of
FP emulation optimizations. Note that we can measure both directly the impact
on the softfloat functions (with "-t soft"), or the impact on an
emulated workload (call with "-t host" and run under qemu user-mode).

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/fp/fp-bench.c | 630 ++++++++++++++++++++++++++++++++++++++++++++
 tests/fp/.gitignore |   1 +
 tests/fp/Makefile   |   5 +-
 3 files changed, 635 insertions(+), 1 deletion(-)
 create mode 100644 tests/fp/fp-bench.c

diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
new file mode 100644
index 0000000000..f5bc5edebf
--- /dev/null
+++ b/tests/fp/fp-bench.c
@@ -0,0 +1,630 @@
+/*
+ * fp-bench.c - A collection of simple floating point microbenchmarks.
+ *
+ * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include <math.h>
+#include <fenv.h>
+#include "qemu/timer.h"
+#include "fpu/softfloat.h"
+
+/* amortize the computation of random inputs */
+#define OPS_PER_ITER     50000
+
+#define MAX_OPERANDS 3
+
+#define SEED_A 0xdeadfacedeadface
+#define SEED_B 0xbadc0feebadc0fee
+#define SEED_C 0xbeefdeadbeefdead
+
+enum op {
+    OP_ADD,
+    OP_SUB,
+    OP_MUL,
+    OP_DIV,
+    OP_FMA,
+    OP_SQRT,
+    OP_CMP,
+    OP_MAX_NR,
+};
+
+static const char * const op_names[] = {
+    [OP_ADD] = "add",
+    [OP_SUB] = "sub",
+    [OP_MUL] = "mul",
+    [OP_DIV] = "div",
+    [OP_FMA] = "mulAdd",
+    [OP_SQRT] = "sqrt",
+    [OP_CMP] = "cmp",
+    [OP_MAX_NR] = NULL,
+};
+
+enum precision {
+    PREC_SINGLE,
+    PREC_DOUBLE,
+    PREC_FLOAT32,
+    PREC_FLOAT64,
+    PREC_MAX_NR,
+};
+
+enum rounding {
+    ROUND_EVEN,
+    ROUND_ZERO,
+    ROUND_DOWN,
+    ROUND_UP,
+    ROUND_TIEAWAY,
+    N_ROUND_MODES,
+};
+
+static const char * const round_names[] = {
+    [ROUND_EVEN] = "even",
+    [ROUND_ZERO] = "zero",
+    [ROUND_DOWN] = "down",
+    [ROUND_UP] = "up",
+    [ROUND_TIEAWAY] = "tieaway",
+};
+
+enum tester {
+    TESTER_SOFT,
+    TESTER_HOST,
+    TESTER_MAX_NR,
+};
+
+static const char * const tester_names[] = {
+    [TESTER_SOFT] = "soft",
+    [TESTER_HOST] = "host",
+    [TESTER_MAX_NR] = NULL,
+};
+
+union fp {
+    float f;
+    double d;
+    float32 f32;
+    float64 f64;
+    uint64_t u64;
+};
+
+struct op_state;
+
+typedef float (*float_func_t)(const struct op_state *s);
+typedef double (*double_func_t)(const struct op_state *s);
+
+union fp_func {
+    float_func_t float_func;
+    double_func_t double_func;
+};
+
+typedef void (*bench_func_t)(void);
+
+struct op_desc {
+    const char * const name;
+};
+
+#define DEFAULT_DURATION_SECS 1
+
+static uint64_t random_ops[MAX_OPERANDS] = {
+    SEED_A, SEED_B, SEED_C,
+};
+static float_status soft_status;
+static enum precision precision;
+static enum op operation;
+static enum tester tester;
+static uint64_t n_completed_ops;
+static unsigned int duration = DEFAULT_DURATION_SECS;
+static int64_t ns_elapsed;
+/* disable optimizations with volatile */
+static volatile union fp res;
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+    x ^= x >> 12; /* a */
+    x ^= x << 25; /* b */
+    x ^= x >> 27; /* c */
+    return x * UINT64_C(2685821657736338717);
+}
+
+static void update_random_ops(int n_ops, enum precision prec)
+{
+    int i;
+
+    for (i = 0; i < n_ops; i++) {
+        uint64_t r = random_ops[i];
+
+        if (prec == PREC_SINGLE || PREC_FLOAT32) {
+            do {
+                r = xorshift64star(r);
+            } while (!float32_is_normal(r));
+        } else if (prec == PREC_DOUBLE || PREC_FLOAT64) {
+            do {
+                r = xorshift64star(r);
+            } while (!float64_is_normal(r));
+        } else {
+            g_assert_not_reached();
+        }
+        random_ops[i] = r;
+    }
+}
+
+static void fill_random(union fp *ops, int n_ops, enum precision prec,
+                        bool no_neg)
+{
+    int i;
+
+    for (i = 0; i < n_ops; i++) {
+        switch (prec) {
+        case PREC_SINGLE:
+        case PREC_FLOAT32:
+            ops[i].f32 = make_float32(random_ops[i]);
+            if (no_neg && float32_is_neg(ops[i].f32)) {
+                ops[i].f32 = float32_chs(ops[i].f32);
+            }
+            /* raise the exponent to limit the frequency of denormal results */
+            ops[i].f32 |= 0x40000000;
+            break;
+        case PREC_DOUBLE:
+        case PREC_FLOAT64:
+            ops[i].f64 = make_float64(random_ops[i]);
+            if (no_neg && float64_is_neg(ops[i].f64)) {
+                ops[i].f64 = float64_chs(ops[i].f64);
+            }
+            /* raise the exponent to limit the frequency of denormal results */
+            ops[i].f64 |= LIT64(0x4000000000000000);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+}
+
+/*
+ * The main benchmark function. Instead of (ab)using macros, we rely
+ * on the compiler to unfold this at compile-time.
+ */
+static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
+{
+    int64_t tf = get_clock() + duration * 1000000000LL;
+
+    while (get_clock() < tf) {
+        union fp ops[MAX_OPERANDS];
+        int64_t t0;
+        int i;
+
+        update_random_ops(n_ops, prec);
+        switch (prec) {
+        case PREC_SINGLE:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float a = ops[0].f;
+                float b = ops[1].f;
+                float c = ops[2].f;
+
+                switch (op) {
+                case OP_ADD:
+                    res.f = a + b;
+                    break;
+                case OP_SUB:
+                    res.f = a - b;
+                    break;
+                case OP_MUL:
+                    res.f = a * b;
+                    break;
+                case OP_DIV:
+                    res.f = a / b;
+                    break;
+                case OP_FMA:
+                    res.f = fmaf(a, b, c);
+                    break;
+                case OP_SQRT:
+                    res.f = sqrtf(a);
+                    break;
+                case OP_CMP:
+                    res.u64 = isgreater(a, b);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        case PREC_DOUBLE:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                double a = ops[0].d;
+                double b = ops[1].d;
+                double c = ops[2].d;
+
+                switch (op) {
+                case OP_ADD:
+                    res.d = a + b;
+                    break;
+                case OP_SUB:
+                    res.d = a - b;
+                    break;
+                case OP_MUL:
+                    res.d = a * b;
+                    break;
+                case OP_DIV:
+                    res.d = a / b;
+                    break;
+                case OP_FMA:
+                    res.d = fma(a, b, c);
+                    break;
+                case OP_SQRT:
+                    res.d = sqrt(a);
+                    break;
+                case OP_CMP:
+                    res.u64 = isgreater(a, b);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        case PREC_FLOAT32:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float32 a = ops[0].f32;
+                float32 b = ops[1].f32;
+                float32 c = ops[2].f32;
+
+                switch (op) {
+                case OP_ADD:
+                    res.f32 = float32_add(a, b, &soft_status);
+                    break;
+                case OP_SUB:
+                    res.f32 = float32_sub(a, b, &soft_status);
+                    break;
+                case OP_MUL:
+                    res.f = float32_mul(a, b, &soft_status);
+                    break;
+                case OP_DIV:
+                    res.f32 = float32_div(a, b, &soft_status);
+                    break;
+                case OP_FMA:
+                    res.f32 = float32_muladd(a, b, c, 0, &soft_status);
+                    break;
+                case OP_SQRT:
+                    res.f32 = float32_sqrt(a, &soft_status);
+                    break;
+                case OP_CMP:
+                    res.u64 = float32_compare_quiet(a, b, &soft_status);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        case PREC_FLOAT64:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float64 a = ops[0].f64;
+                float64 b = ops[1].f64;
+                float64 c = ops[2].f64;
+
+                switch (op) {
+                case OP_ADD:
+                    res.f64 = float64_add(a, b, &soft_status);
+                    break;
+                case OP_SUB:
+                    res.f64 = float64_sub(a, b, &soft_status);
+                    break;
+                case OP_MUL:
+                    res.f = float64_mul(a, b, &soft_status);
+                    break;
+                case OP_DIV:
+                    res.f64 = float64_div(a, b, &soft_status);
+                    break;
+                case OP_FMA:
+                    res.f64 = float64_muladd(a, b, c, 0, &soft_status);
+                    break;
+                case OP_SQRT:
+                    res.f64 = float64_sqrt(a, &soft_status);
+                    break;
+                case OP_CMP:
+                    res.u64 = float64_compare_quiet(a, b, &soft_status);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        ns_elapsed += get_clock() - t0;
+        n_completed_ops += OPS_PER_ITER;
+    }
+}
+
+#define GEN_BENCH(name, type, prec, op, n_ops)          \
+    static void __attribute__((flatten)) name(void)     \
+    {                                                   \
+        bench(prec, op, n_ops, false);                  \
+    }
+
+#define GEN_BENCH_NO_NEG(name, type, prec, op, n_ops)   \
+    static void __attribute__((flatten)) name(void)     \
+    {                                                   \
+        bench(prec, op, n_ops, true);                   \
+    }
+
+#define GEN_BENCH_ALL_TYPES(opname, op, n_ops)                          \
+    GEN_BENCH(bench_ ## opname ## _float, float, PREC_SINGLE, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _double, double, PREC_DOUBLE, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _float32, float32, PREC_FLOAT32, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _float64, float64, PREC_FLOAT64, op, n_ops)
+
+GEN_BENCH_ALL_TYPES(add, OP_ADD, 2)
+GEN_BENCH_ALL_TYPES(sub, OP_SUB, 2)
+GEN_BENCH_ALL_TYPES(mul, OP_MUL, 2)
+GEN_BENCH_ALL_TYPES(div, OP_DIV, 2)
+GEN_BENCH_ALL_TYPES(fma, OP_FMA, 3)
+GEN_BENCH_ALL_TYPES(cmp, OP_CMP, 2)
+#undef GEN_BENCH_ALL_TYPES
+
+#define GEN_BENCH_ALL_TYPES_NO_NEG(name, op, n)                         \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float, float, PREC_SINGLE, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _double, double, PREC_DOUBLE, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float32, float32, PREC_FLOAT32, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float64, float64, PREC_FLOAT64, op, n)
+
+GEN_BENCH_ALL_TYPES_NO_NEG(sqrt, OP_SQRT, 1)
+#undef GEN_BENCH_ALL_TYPES_NO_NEG
+
+#undef GEN_BENCH_NO_NEG
+#undef GEN_BENCH
+
+#define GEN_BENCH_FUNCS(opname, op)                             \
+    [op] = {                                                    \
+        [PREC_SINGLE]    = bench_ ## opname ## _float,          \
+        [PREC_DOUBLE]    = bench_ ## opname ## _double,         \
+        [PREC_FLOAT32]   = bench_ ## opname ## _float32,        \
+        [PREC_FLOAT64]   = bench_ ## opname ## _float64,        \
+    }
+
+static const bench_func_t bench_funcs[OP_MAX_NR][PREC_MAX_NR] = {
+    GEN_BENCH_FUNCS(add, OP_ADD),
+    GEN_BENCH_FUNCS(sub, OP_SUB),
+    GEN_BENCH_FUNCS(mul, OP_MUL),
+    GEN_BENCH_FUNCS(div, OP_DIV),
+    GEN_BENCH_FUNCS(fma, OP_FMA),
+    GEN_BENCH_FUNCS(sqrt, OP_SQRT),
+    GEN_BENCH_FUNCS(cmp, OP_CMP),
+};
+
+#undef GEN_BENCH_FUNCS
+
+static void run_bench(void)
+{
+    bench_func_t f;
+
+    f = bench_funcs[operation][precision];
+    g_assert(f);
+    f();
+}
+
+/* @arr must be NULL-terminated */
+static int find_name(const char * const *arr, const char *name)
+{
+    int i;
+
+    for (i = 0; arr[i] != NULL; i++) {
+        if (strcmp(name, arr[i]) == 0) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+static void usage_complete(int argc, char *argv[])
+{
+    gchar *op_list = g_strjoinv(", ", (gchar **)op_names);
+    gchar *tester_list = g_strjoinv(", ", (gchar **)tester_names);
+
+    fprintf(stderr, "Usage: %s [options]\n", argv[0]);
+    fprintf(stderr, "options:\n");
+    fprintf(stderr, " -d = duration, in seconds. Default: %d\n",
+            DEFAULT_DURATION_SECS);
+    fprintf(stderr, " -h = show this help message.\n");
+    fprintf(stderr, " -o = floating point operation (%s). Default: %s\n",
+            op_list, op_names[0]);
+    fprintf(stderr, " -p = floating point precision (single, double). "
+            "Default: single\n");
+    fprintf(stderr, " -r = rounding mode (even, zero, down, up, tieaway). "
+            "Default: even\n");
+    fprintf(stderr, " -t = tester (%s). Default: %s\n",
+            tester_list, tester_names[0]);
+    fprintf(stderr, " -z = flush inputs to zero (soft tester only). "
+            "Default: disabled\n");
+    fprintf(stderr, " -Z = flush output to zero (soft tester only). "
+            "Default: disabled\n");
+
+    g_free(tester_list);
+    g_free(op_list);
+}
+
+static int round_name_to_mode(const char *name)
+{
+    int i;
+
+    for (i = 0; i < N_ROUND_MODES; i++) {
+        if (!strcmp(round_names[i], name)) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+static void QEMU_NORETURN die_host_rounding(enum rounding rounding)
+{
+    fprintf(stderr, "fatal: '%s' rounding not supported on this host\n",
+            round_names[rounding]);
+    exit(EXIT_FAILURE);
+}
+
+static void set_host_precision(enum rounding rounding)
+{
+    int rhost;
+
+    switch (rounding) {
+    case ROUND_EVEN:
+        rhost = FE_TONEAREST;
+        break;
+    case ROUND_ZERO:
+        rhost = FE_TOWARDZERO;
+        break;
+    case ROUND_DOWN:
+        rhost = FE_DOWNWARD;
+        break;
+    case ROUND_UP:
+        rhost = FE_UPWARD;
+        break;
+    case ROUND_TIEAWAY:
+        die_host_rounding(rounding);
+        return;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (fesetround(rhost)) {
+        die_host_rounding(rounding);
+    }
+}
+
+static void set_soft_precision(enum rounding rounding)
+{
+    signed char mode;
+
+    switch (rounding) {
+    case ROUND_EVEN:
+        mode = float_round_nearest_even;
+        break;
+    case ROUND_ZERO:
+        mode = float_round_to_zero;
+        break;
+    case ROUND_DOWN:
+        mode = float_round_down;
+        break;
+    case ROUND_UP:
+        mode = float_round_up;
+        break;
+    case ROUND_TIEAWAY:
+        mode = float_round_ties_away;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    soft_status.float_rounding_mode = mode;
+}
+
+static void parse_args(int argc, char *argv[])
+{
+    int c;
+    int val;
+    int rounding = ROUND_EVEN;
+
+    for (;;) {
+        c = getopt(argc, argv, "d:ho:p:r:t:zZ");
+        if (c < 0) {
+            break;
+        }
+        switch (c) {
+        case 'd':
+            duration = atoi(optarg);
+            break;
+        case 'h':
+            usage_complete(argc, argv);
+            exit(EXIT_SUCCESS);
+        case 'o':
+            val = find_name(op_names, optarg);
+            if (val < 0) {
+                fprintf(stderr, "Unsupported op '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            operation = val;
+            break;
+        case 'p':
+            if (!strcmp(optarg, "single")) {
+                precision = PREC_SINGLE;
+            } else if (!strcmp(optarg, "double")) {
+                precision = PREC_DOUBLE;
+            } else {
+                fprintf(stderr, "Unsupported precision '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            break;
+        case 'r':
+            rounding = round_name_to_mode(optarg);
+            if (rounding < 0) {
+                fprintf(stderr, "fatal: invalid rounding mode '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            break;
+        case 't':
+            val = find_name(tester_names, optarg);
+            if (val < 0) {
+                fprintf(stderr, "Unsupported tester '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            tester = val;
+            break;
+        case 'z':
+            soft_status.flush_inputs_to_zero = 1;
+            break;
+        case 'Z':
+            soft_status.flush_to_zero = 1;
+            break;
+        }
+    }
+
+    /* set precision and rounding mode based on the tester */
+    switch (tester) {
+    case TESTER_HOST:
+        set_host_precision(rounding);
+        break;
+    case TESTER_SOFT:
+        set_soft_precision(rounding);
+        switch (precision) {
+        case PREC_SINGLE:
+            precision = PREC_FLOAT32;
+            break;
+        case PREC_DOUBLE:
+            precision = PREC_FLOAT64;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void pr_stats(void)
+{
+    printf("%.2f MFlops\n", (double)n_completed_ops / ns_elapsed * 1e3);
+}
+
+int main(int argc, char *argv[])
+{
+    parse_args(argc, argv);
+    run_bench();
+    pr_stats();
+    return 0;
+}
diff --git a/tests/fp/.gitignore b/tests/fp/.gitignore
index 8d45d18ac4..704fd42992 100644
--- a/tests/fp/.gitignore
+++ b/tests/fp/.gitignore
@@ -1 +1,2 @@
 fp-test
+fp-bench
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index 49cdcd1bd2..5019dcdca0 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -553,7 +553,7 @@ TF_OBJS_LIB += $(TF_OBJS_WRITECASE)
 TF_OBJS_LIB += testLoops_common.o
 TF_OBJS_LIB += $(TF_OBJS_TEST)
 
-BINARIES := fp-test$(EXESUF)
+BINARIES := fp-test$(EXESUF) fp-bench$(EXESUF)
 
 # everything depends on config-host.h because platform.h includes it
 all: $(BUILD_DIR)/config-host.h
@@ -590,10 +590,13 @@ $(TF_OBJS_LIB) slowfloat.o: %.o: $(TF_SOURCE_DIR)/%.c
 
 libtestfloat.a: $(TF_OBJS_LIB)
 
+fp-bench$(EXESUF): fp-bench.o $(QEMU_SOFTFLOAT_OBJ) $(LIBQEMUUTIL)
+
 clean:
 	rm -f *.o *.d $(BINARIES)
 	rm -f *.gcno *.gcda *.gcov
 	rm -f fp-test$(EXESUF)
+	rm -f fp-bench$(EXESUF)
 	rm -f libsoftfloat.a
 	rm -f libtestfloat.a
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (5 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-11-25  0:25   ` Aleksandar Markovic
  2018-12-04 12:28   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.

The approach followed here avoids checking the FP exception flags register.
See the added comment for details.

This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags such as -ffast-math. We control the flags so this should
be easy to enforce though.

This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.

Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 315 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ecdc00c633..306a12fa8d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -83,6 +83,7 @@ this code that are retained.
  * target-dependent and needs the TARGET_* macros.
  */
 #include "qemu/osdep.h"
+#include <math.h>
 #include "qemu/bitops.h"
 #include "fpu/softfloat.h"
 
@@ -95,6 +96,320 @@ this code that are retained.
 *----------------------------------------------------------------------------*/
 #include "fpu/softfloat-macros.h"
 
+/*
+ * Hardfloat
+ *
+ * Fast emulation of guest FP instructions is challenging for two reasons.
+ * First, FP instruction semantics are similar but not identical, particularly
+ * when handling NaNs. Second, emulating at reasonable speed the guest FP
+ * exception flags is not trivial: reading the host's flags register with a
+ * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
+ * and trapping on every FP exception is not fast nor pleasant to work with.
+ *
+ * We address these challenges by leveraging the host FPU for a subset of the
+ * operations. To do this we expand on the idea presented in this paper:
+ *
+ * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
+ * binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.
+ *
+ * The idea is thus to leverage the host FPU to (1) compute FP operations
+ * and (2) identify whether FP exceptions occurred while avoiding
+ * expensive exception flag register accesses.
+ *
+ * An important optimization shown in the paper is that given that exception
+ * flags are rarely cleared by the guest, we can avoid recomputing some flags.
+ * This is particularly useful for the inexact flag, which is very frequently
+ * raised in floating-point workloads.
+ *
+ * We optimize the code further by deferring to soft-fp whenever FP exception
+ * detection might get hairy. Two examples: (1) when at least one operand is
+ * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 result
+ * and the result is < the minimum normal.
+ */
+#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t)                          \
+    static inline void name(soft_t *a, float_status *s)                 \
+    {                                                                   \
+        if (unlikely(soft_t ## _is_denormal(*a))) {                     \
+            *a = soft_t ## _set_sign(soft_t ## _zero,                   \
+                                     soft_t ## _is_neg(*a));            \
+            s->float_exception_flags |= float_flag_input_denormal;      \
+        }                                                               \
+    }
+
+GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
+GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
+#undef GEN_INPUT_FLUSH__NOCHECK
+
+#define GEN_INPUT_FLUSH1(name, soft_t)                  \
+    static inline void name(soft_t *a, float_status *s) \
+    {                                                   \
+        if (likely(!s->flush_inputs_to_zero)) {         \
+            return;                                     \
+        }                                               \
+        soft_t ## _input_flush__nocheck(a, s);          \
+    }
+
+GEN_INPUT_FLUSH1(float32_input_flush1, float32)
+GEN_INPUT_FLUSH1(float64_input_flush1, float64)
+#undef GEN_INPUT_FLUSH1
+
+#define GEN_INPUT_FLUSH2(name, soft_t)                                  \
+    static inline void name(soft_t *a, soft_t *b, float_status *s)      \
+    {                                                                   \
+        if (likely(!s->flush_inputs_to_zero)) {                         \
+            return;                                                     \
+        }                                                               \
+        soft_t ## _input_flush__nocheck(a, s);                          \
+        soft_t ## _input_flush__nocheck(b, s);                          \
+    }
+
+GEN_INPUT_FLUSH2(float32_input_flush2, float32)
+GEN_INPUT_FLUSH2(float64_input_flush2, float64)
+#undef GEN_INPUT_FLUSH2
+
+#define GEN_INPUT_FLUSH3(name, soft_t)                                  \
+    static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
+    {                                                                   \
+        if (likely(!s->flush_inputs_to_zero)) {                         \
+            return;                                                     \
+        }                                                               \
+        soft_t ## _input_flush__nocheck(a, s);                          \
+        soft_t ## _input_flush__nocheck(b, s);                          \
+        soft_t ## _input_flush__nocheck(c, s);                          \
+    }
+
+GEN_INPUT_FLUSH3(float32_input_flush3, float32)
+GEN_INPUT_FLUSH3(float64_input_flush3, float64)
+#undef GEN_INPUT_FLUSH3
+
+/*
+ * Choose whether to use fpclassify or float32/64_* primitives in the generated
+ * hardfloat functions. Each combination of number of inputs and float size
+ * gets its own value.
+ */
+#if defined(__x86_64__)
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 1
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 1
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 1
+#else
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 0
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 0
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 0
+#endif
+
+/*
+ * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
+ * float{32,64}_is_infinity when !USE_FP.
+ * On x86_64/aarch64, using the former over the latter can yield a ~6% speedup.
+ * On power64 however, using isinf() reduces fp-bench performance by up to 50%.
+ */
+#if defined(__x86_64__) || defined(__aarch64__)
+# define QEMU_HARDFLOAT_USE_ISINF   1
+#else
+# define QEMU_HARDFLOAT_USE_ISINF   0
+#endif
+
+/*
+ * Some targets clear the FP flags before most FP operations. This prevents
+ * the use of hardfloat, since hardfloat relies on the inexact flag being
+ * already set.
+ */
+#if defined(TARGET_PPC)
+# define QEMU_NO_HARDFLOAT 1
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
+#else
+# define QEMU_NO_HARDFLOAT 0
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN __attribute__((noinline))
+#endif
+
+static inline bool can_use_fpu(const float_status *s)
+{
+    if (QEMU_NO_HARDFLOAT) {
+        return false;
+    }
+    return likely(s->float_exception_flags & float_flag_inexact &&
+                  s->float_rounding_mode == float_round_nearest_even);
+}
+
+/*
+ * Hardfloat generation functions. Each operation can have two flavors:
+ * either using softfloat primitives (e.g. float32_is_zero_or_normal) for
+ * most condition checks, or native ones (e.g. fpclassify).
+ *
+ * The flavor is chosen by the callers. Instead of using macros, we rely on the
+ * compiler to propagate constants and inline everything into the callers.
+ *
+ * We only generate functions for operations with two inputs, since only
+ * these are common enough to justify consolidating them into common code.
+ */
+
+typedef union {
+    float32 s;
+    float h;
+} union_float32;
+
+typedef union {
+    float64 s;
+    double h;
+} union_float64;
+
+typedef bool (*f32_check_fn)(union_float32 a, union_float32 b);
+typedef bool (*f64_check_fn)(union_float64 a, union_float64 b);
+
+typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);
+typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
+typedef float   (*hard_f32_op2_fn)(float a, float b);
+typedef double  (*hard_f64_op2_fn)(double a, double b);
+
+/* 2-input is-zero-or-normal */
+static inline bool f32_is_zon2(union_float32 a, union_float32 b)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        /*
+         * Not using a temp variable for consecutive fpclassify calls ends up
+         * generating faster code.
+         */
+        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO);
+    }
+    return float32_is_zero_or_normal(a.s) &&
+           float32_is_zero_or_normal(b.s);
+}
+
+static inline bool f64_is_zon2(union_float64 a, union_float64 b)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO);
+    }
+    return float64_is_zero_or_normal(a.s) &&
+           float64_is_zero_or_normal(b.s);
+}
+
+/* 3-input is-zero-or-normal */
+static inline
+bool f32_is_zon3(union_float32 a, union_float32 b, union_float32 c)
+{
+    if (QEMU_HARDFLOAT_3F32_USE_FP) {
+        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO) &&
+               (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) == FP_ZERO);
+    }
+    return float32_is_zero_or_normal(a.s) &&
+           float32_is_zero_or_normal(b.s) &&
+           float32_is_zero_or_normal(c.s);
+}
+
+static inline
+bool f64_is_zon3(union_float64 a, union_float64 b, union_float64 c)
+{
+    if (QEMU_HARDFLOAT_3F64_USE_FP) {
+        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO) &&
+               (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) == FP_ZERO);
+    }
+    return float64_is_zero_or_normal(a.s) &&
+           float64_is_zero_or_normal(b.s) &&
+           float64_is_zero_or_normal(c.s);
+}
+
+static inline bool f32_is_inf(union_float32 a)
+{
+    if (QEMU_HARDFLOAT_USE_ISINF) {
+        return isinff(a.h);
+    }
+    return float32_is_infinity(a.s);
+}
+
+static inline bool f64_is_inf(union_float64 a)
+{
+    if (QEMU_HARDFLOAT_USE_ISINF) {
+        return isinf(a.h);
+    }
+    return float64_is_infinity(a.s);
+}
+
+/* Note: @fast_test and @post can be NULL */
+static inline float32
+float32_gen2(float32 xa, float32 xb, float_status *s,
+             hard_f32_op2_fn hard, soft_f32_op2_fn soft,
+             f32_check_fn pre, f32_check_fn post,
+             f32_check_fn fast_test, soft_f32_op2_fn fast_op)
+{
+    union_float32 ua, ub, ur;
+
+    ua.s = xa;
+    ub.s = xb;
+
+    if (unlikely(!can_use_fpu(s))) {
+        goto soft;
+    }
+
+    float32_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(!pre(ua, ub))) {
+        goto soft;
+    }
+    if (fast_test && fast_test(ua, ub)) {
+        return fast_op(ua.s, ub.s, s);
+    }
+
+    ur.h = hard(ua.h, ub.h);
+    if (unlikely(f32_is_inf(ur))) {
+        s->float_exception_flags |= float_flag_overflow;
+    } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
+        if (post == NULL || post(ua, ub)) {
+            goto soft;
+        }
+    }
+    return ur.s;
+
+ soft:
+    return soft(ua.s, ub.s, s);
+}
+
+static inline float64
+float64_gen2(float64 xa, float64 xb, float_status *s,
+             hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+             f64_check_fn pre, f64_check_fn post,
+             f64_check_fn fast_test, soft_f64_op2_fn fast_op)
+{
+    union_float64 ua, ub, ur;
+
+    ua.s = xa;
+    ub.s = xb;
+
+    if (unlikely(!can_use_fpu(s))) {
+        goto soft;
+    }
+
+    float64_input_flush2(&ua.s, &ub.s, s);
+    if (unlikely(!pre(ua, ub))) {
+        goto soft;
+    }
+    if (fast_test && fast_test(ua, ub)) {
+        return fast_op(ua.s, ub.s, s);
+    }
+
+    ur.h = hard(ua.h, ub.h);
+    if (unlikely(f64_is_inf(ur))) {
+        s->float_exception_flags |= float_flag_overflow;
+    } else if (unlikely(fabs(ur.h) <= DBL_MIN)) {
+        if (post == NULL || post(ua, ub)) {
+            goto soft;
+        }
+    }
+    return ur.s;
+
+ soft:
+    return soft(ua.s, ub.s, s);
+}
+
 /*----------------------------------------------------------------------------
 | Returns the fraction bits of the half-precision floating-point value `a'.
 *----------------------------------------------------------------------------*/
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (6 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-04 18:34   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Performance results (single and double precision) for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
add-single: 135.07 MFlops
add-double: 131.60 MFlops
sub-single: 130.04 MFlops
sub-double: 133.01 MFlops
- after:
add-single: 443.04 MFlops
add-double: 301.95 MFlops
sub-single: 411.36 MFlops
sub-double: 293.15 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
add-single: 44.79 MFlops
add-double: 49.20 MFlops
sub-single: 44.55 MFlops
sub-double: 49.06 MFlops
- after:
add-single: 93.28 MFlops
add-double: 88.27 MFlops
sub-single: 91.47 MFlops
sub-double: 88.27 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
add-single: 72.59 MFlops
add-double: 72.27 MFlops
sub-single: 75.33 MFlops
sub-double: 70.54 MFlops
- after:
add-single: 112.95 MFlops
add-double: 201.11 MFlops
sub-single: 116.80 MFlops
sub-double: 188.72 MFlops

Note that the IBM and ARM machines benefit from having
HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
can suffer significantly:
- IBM Power8:
add-single: [1] 54.94 vs [0] 116.37 MFlops
add-double: [1] 58.92 vs [0] 201.44 MFlops
- Aarch64 A57:
add-single: [1] 80.72 vs [0] 93.24 MFlops
add-double: [1] 82.10 vs [0] 88.18 MFlops

On the Intel machine, having 2F64 set to 1 pays off, but it
doesn't for 2F32:
- Intel i7-6700K:
add-single: [1] 285.79 vs [0] 426.70 MFlops
add-double: [1] 302.15 vs [0] 278.82 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 117 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 98 insertions(+), 19 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 306a12fa8d..cc500b1618 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1050,49 +1050,128 @@ float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 QEMU_FLATTEN float32_add(float32 a, float32 b, float_status *status)
+float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+{
+    FloatParts pa = float16_unpack_canonical(a, status);
+    FloatParts pb = float16_unpack_canonical(b, status);
+    FloatParts pr = addsub_floats(pa, pb, true, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, false, status);
+    FloatParts pr = addsub_floats(pa, pb, subtract, status);
 
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 QEMU_FLATTEN float64_add(float64 a, float64 b, float_status *status)
+static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
+{
+    return soft_f32_addsub(a, b, false, status);
+}
+
+static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
+{
+    return soft_f32_addsub(a, b, true, status);
+}
+
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, false, status);
+    FloatParts pr = addsub_floats(pa, pb, subtract, status);
 
     return float64_round_pack_canonical(pr, status);
 }
 
-float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pb = float16_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, true, status);
+    return soft_f64_addsub(a, b, false, status);
+}
 
-    return float16_round_pack_canonical(pr, status);
+static inline float64 soft_f64_sub(float64 a, float64 b, float_status *status)
+{
+    return soft_f64_addsub(a, b, true, status);
 }
 
-float32 QEMU_FLATTEN float32_sub(float32 a, float32 b, float_status *status)
+static float hard_f32_add(float a, float b)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pb = float32_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, true, status);
+    return a + b;
+}
 
-    return float32_round_pack_canonical(pr, status);
+static float hard_f32_sub(float a, float b)
+{
+    return a - b;
 }
 
-float64 QEMU_FLATTEN float64_sub(float64 a, float64 b, float_status *status)
+static double hard_f64_add(double a, double b)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pb = float64_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, true, status);
+    return a + b;
+}
 
-    return float64_round_pack_canonical(pr, status);
+static double hard_f64_sub(double a, double b)
+{
+    return a - b;
+}
+
+static bool f32_addsub_post(union_float32 a, union_float32 b)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        return !(fpclassify(a.h) == FP_ZERO && fpclassify(b.h) == FP_ZERO);
+    }
+    return !(float32_is_zero(a.s) && float32_is_zero(b.s));
+}
+
+static bool f64_addsub_post(union_float64 a, union_float64 b)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return !(fpclassify(a.h) == FP_ZERO && fpclassify(b.h) == FP_ZERO);
+    } else {
+        return !(float64_is_zero(a.s) && float64_is_zero(b.s));
+    }
+}
+
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
+                              hard_f32_op2_fn hard, soft_f32_op2_fn soft)
+{
+    return float32_gen2(a, b, s, hard, soft,
+                        f32_is_zon2, f32_addsub_post, NULL, NULL);
+}
+
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
+                              hard_f64_op2_fn hard, soft_f64_op2_fn soft)
+{
+    return float64_gen2(a, b, s, hard, soft,
+                        f64_is_zon2, f64_addsub_post, NULL, NULL);
+}
+
+float32 QEMU_FLATTEN
+float32_add(float32 a, float32 b, float_status *s)
+{
+    return float32_addsub(a, b, s, hard_f32_add, soft_f32_add);
+}
+
+float32 QEMU_FLATTEN
+float32_sub(float32 a, float32 b, float_status *s)
+{
+    return float32_addsub(a, b, s, hard_f32_sub, soft_f32_sub);
+}
+
+float64 QEMU_FLATTEN
+float64_add(float64 a, float64 b, float_status *s)
+{
+    return float64_addsub(a, b, s, hard_f64_add, soft_f64_add);
+}
+
+float64 QEMU_FLATTEN
+float64_sub(float64 a, float64 b, float_status *s)
+{
+    return float64_addsub(a, b, s, hard_f64_sub, soft_f64_sub);
 }
 
 /*
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (7 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-05 10:10   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
mul-single: 126.91 MFlops
mul-double: 118.28 MFlops
- after:
mul-single: 258.02 MFlops
mul-double: 197.96 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
mul-single: 37.42 MFlops
mul-double: 38.77 MFlops
- after:
mul-single: 73.41 MFlops
mul-double: 76.93 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
mul-single: 58.40 MFlops
mul-double: 59.33 MFlops
- after:
mul-single: 60.25 MFlops
mul-double: 94.79 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index cc500b1618..58e67d9b80 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1232,7 +1232,8 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_mul(float32 a, float32 b, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1241,7 +1242,8 @@ float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_mul(float64 a, float64 b, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1250,6 +1252,54 @@ float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+static float hard_f32_mul(float a, float b)
+{
+    return a * b;
+}
+
+static double hard_f64_mul(double a, double b)
+{
+    return a * b;
+}
+
+static bool f32_mul_fast_test(union_float32 a, union_float32 b)
+{
+    return float32_is_zero(a.s) || float32_is_zero(b.s);
+}
+
+static bool f64_mul_fast_test(union_float64 a, union_float64 b)
+{
+    return float64_is_zero(a.s) || float64_is_zero(b.s);
+}
+
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
+{
+    bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
+
+    return float32_set_sign(float32_zero, signbit);
+}
+
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
+{
+    bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
+
+    return float64_set_sign(float64_zero, signbit);
+}
+
+float32 QEMU_FLATTEN
+float32_mul(float32 a, float32 b, float_status *s)
+{
+    return float32_gen2(a, b, s, hard_f32_mul, soft_f32_mul,
+                        f32_is_zon2, NULL, f32_mul_fast_test, f32_mul_fast_op);
+}
+
+float64 QEMU_FLATTEN
+float64_mul(float64 a, float64 b, float_status *s)
+{
+    return float64_gen2(a, b, s, hard_f64_mul, soft_f64_mul,
+                        f64_is_zon2, NULL, f64_mul_fast_test, f64_mul_fast_op);
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b' then adding 'c', with no intermediate rounding step after the
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (8 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-05 10:11   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
div-single: 34.84 MFlops
div-double: 34.04 MFlops
- after:
div-single: 275.23 MFlops
div-double: 216.38 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
div-single: 9.33 MFlops
div-double: 9.30 MFlops
- after:
div-single: 51.55 MFlops
div-double: 15.09 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
div-single: 25.65 MFlops
div-double: 24.91 MFlops
- after:
div-single: 96.83 MFlops
div-double: 31.01 MFlops

Here setting 2FP64_USE_FP to 1 pays off for x86_64:
[1] 215.97 vs [0] 62.15 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 58e67d9b80..e35ebfaae7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1624,7 +1624,8 @@ float16 float16_div(float16 a, float16 b, float_status *status)
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 float32_div(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_div(float32 a, float32 b, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1633,7 +1634,8 @@ float32 float32_div(float32 a, float32 b, float_status *status)
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 float64_div(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_div(float64 a, float64 b, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1642,6 +1644,64 @@ float64 float64_div(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+static float hard_f32_div(float a, float b)
+{
+    return a / b;
+}
+
+static double hard_f64_div(double a, double b)
+{
+    return a / b;
+}
+
+static bool f32_div_pre(union_float32 a, union_float32 b)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+               fpclassify(b.h) == FP_NORMAL;
+    }
+    return float32_is_zero_or_normal(a.s) && float32_is_normal(b.s);
+}
+
+static bool f64_div_pre(union_float64 a, union_float64 b)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+               fpclassify(b.h) == FP_NORMAL;
+    }
+    return float64_is_zero_or_normal(a.s) && float64_is_normal(b.s);
+}
+
+static bool f32_div_post(union_float32 a, union_float32 b)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        return fpclassify(a.h) != FP_ZERO;
+    }
+    return !float32_is_zero(a.s);
+}
+
+static bool f64_div_post(union_float64 a, union_float64 b)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return fpclassify(a.h) != FP_ZERO;
+    }
+    return !float64_is_zero(a.s);
+}
+
+float32 QEMU_FLATTEN
+float32_div(float32 a, float32 b, float_status *s)
+{
+    return float32_gen2(a, b, s, hard_f32_div, soft_f32_div,
+                        f32_div_pre, f32_div_post, NULL, NULL);
+}
+
+float64 QEMU_FLATTEN
+float64_div(float64 a, float64 b, float_status *s)
+{
+    return float64_gen2(a, b, s, hard_f64_div, soft_f64_div,
+                        f64_div_pre, f64_div_post, NULL, NULL);
+}
+
 /*
  * Float to Float conversions
  *
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (9 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-05 12:25   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
fma-single: 74.73 MFlops
fma-double: 74.54 MFlops
- after:
fma-single: 203.37 MFlops
fma-double: 169.37 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
fma-single: 23.24 MFlops
fma-double: 23.70 MFlops
- after:
fma-single: 66.14 MFlops
fma-double: 63.10 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
fma-single: 37.26 MFlops
fma-double: 37.29 MFlops
- after:
fma-single: 48.90 MFlops
fma-double: 59.51 MFlops

Here having 3FP64 set to 1 pays off for x86_64:
[1] 170.15 vs [0] 153.12 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 128 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e35ebfaae7..e03feafb6f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1514,8 +1514,9 @@ float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
-                                                int flags, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
+                float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1525,8 +1526,9 @@ float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
-                                                int flags, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
+                float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1536,6 +1538,128 @@ float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
     return float64_round_pack_canonical(pr, status);
 }
 
+float32 QEMU_FLATTEN
+float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
+{
+    union_float32 ua, ub, uc, ur;
+
+    ua.s = xa;
+    ub.s = xb;
+    uc.s = xc;
+
+    if (unlikely(!can_use_fpu(s))) {
+        goto soft;
+    }
+    if (unlikely(flags & float_muladd_halve_result)) {
+        goto soft;
+    }
+
+    float32_input_flush3(&ua.s, &ub.s, &uc.s, s);
+    if (unlikely(!f32_is_zon3(ua, ub, uc))) {
+        goto soft;
+    }
+    /*
+     * When (a || b) == 0, there's no need to check for under/over flow,
+     * since we know the addend is (normal || 0) and the product is 0.
+     */
+    if (float32_is_zero(ua.s) || float32_is_zero(ub.s)) {
+        union_float32 up;
+        bool prod_sign;
+
+        prod_sign = float32_is_neg(ua.s) ^ float32_is_neg(ub.s);
+        prod_sign ^= !!(flags & float_muladd_negate_product);
+        up.s = float32_set_sign(float32_zero, prod_sign);
+
+        if (flags & float_muladd_negate_c) {
+            uc.h = -uc.h;
+        }
+        ur.h = up.h + uc.h;
+    } else {
+        if (flags & float_muladd_negate_product) {
+            ua.h = -ua.h;
+        }
+        if (flags & float_muladd_negate_c) {
+            uc.h = -uc.h;
+        }
+
+        ur.h = fmaf(ua.h, ub.h, uc.h);
+
+        if (unlikely(f32_is_inf(ur))) {
+            s->float_exception_flags |= float_flag_overflow;
+        } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
+            goto soft;
+        }
+    }
+    if (flags & float_muladd_negate_result) {
+        return float32_chs(ur.s);
+    }
+    return ur.s;
+
+ soft:
+    return soft_f32_muladd(ua.s, ub.s, uc.s, flags, s);
+}
+
+float64 QEMU_FLATTEN
+float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
+{
+    union_float64 ua, ub, uc, ur;
+
+    ua.s = xa;
+    ub.s = xb;
+    uc.s = xc;
+
+    if (unlikely(!can_use_fpu(s))) {
+        goto soft;
+    }
+    if (unlikely(flags & float_muladd_halve_result)) {
+        goto soft;
+    }
+
+    float64_input_flush3(&ua.s, &ub.s, &uc.s, s);
+    if (unlikely(!f64_is_zon3(ua, ub, uc))) {
+        goto soft;
+    }
+    /*
+     * When (a || b) == 0, there's no need to check for under/over flow,
+     * since we know the addend is (normal || 0) and the product is 0.
+     */
+    if (float64_is_zero(ua.s) || float64_is_zero(ub.s)) {
+        union_float64 up;
+        bool prod_sign;
+
+        prod_sign = float64_is_neg(ua.s) ^ float64_is_neg(ub.s);
+        prod_sign ^= !!(flags & float_muladd_negate_product);
+        up.s = float64_set_sign(float64_zero, prod_sign);
+
+        if (flags & float_muladd_negate_c) {
+            uc.h = -uc.h;
+        }
+        ur.h = up.h + uc.h;
+    } else {
+        if (flags & float_muladd_negate_product) {
+            ua.h = -ua.h;
+        }
+        if (flags & float_muladd_negate_c) {
+            uc.h = -uc.h;
+        }
+
+        ur.h = fma(ua.h, ub.h, uc.h);
+
+        if (unlikely(f64_is_inf(ur))) {
+            s->float_exception_flags |= float_flag_overflow;
+        } else if (unlikely(fabs(ur.h) <= FLT_MIN)) {
+            goto soft;
+        }
+    }
+    if (flags & float_muladd_negate_result) {
+        return float64_chs(ur.s);
+    }
+    return ur.s;
+
+ soft:
+    return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
+}
+
 /*
  * Returns the result of dividing the floating-point value `a' by the
  * corresponding value `b'. The operation is performed according to
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (10 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-05 12:26   ` Alex Bennée
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Performance results for fp-bench:

Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 42.30 MFlops
sqrt-double: 22.97 MFlops
- after:
sqrt-single: 311.42 MFlops
sqrt-double: 311.08 MFlops

Here USE_FP makes a huge difference for f64's, with throughput
going from ~200 MFlops to ~300 MFlops.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e03feafb6f..4c6ecd1883 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3040,20 +3040,76 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 QEMU_FLATTEN float32_sqrt(float32 a, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_sqrt(float32 a, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pr = sqrt_float(pa, status, &float32_params);
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 QEMU_FLATTEN float64_sqrt(float64 a, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_sqrt(float64 a, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pr = sqrt_float(pa, status, &float64_params);
     return float64_round_pack_canonical(pr, status);
 }
 
+float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
+{
+    union_float32 ua, ur;
+
+    ua.s = xa;
+    if (unlikely(!can_use_fpu(s))) {
+        goto soft;
+    }
+
+    float32_input_flush1(&ua.s, s);
+    if (QEMU_HARDFLOAT_1F32_USE_FP) {
+        if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
+                       fpclassify(ua.h) == FP_ZERO) ||
+                     signbit(ua.h))) {
+            goto soft;
+        }
+    } else if (unlikely(!float32_is_zero_or_normal(ua.s) ||
+                        float32_is_neg(ua.s))) {
+        goto soft;
+    }
+    ur.h = sqrtf(ua.h);
+    return ur.s;
+
+ soft:
+    return soft_f32_sqrt(ua.s, s);
+}
+
+float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
+{
+    union_float64 ua, ur;
+
+    ua.s = xa;
+    if (unlikely(!can_use_fpu(s))) {
+        goto soft;
+    }
+
+    float64_input_flush1(&ua.s, s);
+    if (QEMU_HARDFLOAT_1F64_USE_FP) {
+        if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
+                       fpclassify(ua.h) == FP_ZERO) ||
+                     signbit(ua.h))) {
+            goto soft;
+        }
+    } else if (unlikely(!float64_is_zero_or_normal(ua.s) ||
+                        float64_is_neg(ua.s))) {
+        goto soft;
+    }
+    ur.h = sqrt(ua.h);
+    return ur.s;
+
+ soft:
+    return soft_f64_sqrt(ua.s, s);
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated NaN.
 *----------------------------------------------------------------------------*/
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (11 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
  2018-12-05 12:36   ` Alex Bennée
  2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson

Performance results for fp-bench:

Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
cmp-single: 110.98 MFlops
cmp-double: 107.12 MFlops
- after:
cmp-single: 506.28 MFlops
cmp-double: 524.77 MFlops

Note that flattening both eq and eq_signaling versions
would give us extra performance (695v506, 615v524 Mflops
for single/double, respectively) but this would emit two
essentially identical functions for each eq/signaling pair,
which is a waste.

Aggregate performance improvement for the last few patches:
[ all charts in png: https://imgur.com/a/4yV8p ]

1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

                   qemu-aarch64 NBench score; higher is better
                 Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

  16 +-+-----------+-------------+----===-------+---===-------+-----------+-+
  14 +-+..........................@@@&&.=.......@@@&&.=...................+-+
  12 +-+..........................@.@.&.=.......@.@.&.=.....+befor===     +-+
  10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& =     +-+
   8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+  @@u& =     +-+
   6 +-+............@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& =     +-+
   4 +-+.......###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& =     +-+
   2 +-+.....***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& =     +-+
   0 +-+-----***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-----+-+
            FOURIER    NEURAL NELU DECOMPOSITION         gmean

                              qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905
                                      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                            error bars: 95% confidence interval

  4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
    4 +-+..........................+@@+...........................................................................+-+
  3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub       +-+
  2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@&      +-+
    2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@&  %%@&+-+
  1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
  0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
    0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
  410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean

2. Host: ARM Aarch64 A57 @ 2.4GHz

                    qemu-aarch64 NBench score; higher is better
                 Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz

    5 +-+-----------+-------------+-------------+-------------+-----------+-+
  4.5 +-+........................................@@@&==...................+-+
  3 4 +-+..........................@@@&==........@.@&.=.....+before       +-+
    3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&==     +-+
  2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+  @m@& =     +-+
    2 +-+............@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& =     +-+
  1.5 +-+.....***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& =     +-+
  0.5 +-+.....*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& =     +-+
    0 +-+-----***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-----+-+
             FOURIER    NEURAL NLU DECOMPOSITION         gmean

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 109 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 95 insertions(+), 14 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4c6ecd1883..b29a2b6714 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2899,28 +2899,109 @@ static int compare_floats(FloatParts a, FloatParts b, bool is_quiet,
     }
 }
 
-#define COMPARE(sz)                                                     \
-int float ## sz ## _compare(float ## sz a, float ## sz b,               \
-                            float_status *s)                            \
+#define COMPARE(name, attr, sz)                                         \
+static int attr                                                         \
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
 {                                                                       \
     FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
     FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
-    return compare_floats(pa, pb, false, s);                            \
-}                                                                       \
-int float ## sz ## _compare_quiet(float ## sz a, float ## sz b,         \
-                                  float_status *s)                      \
-{                                                                       \
-    FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
-    FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
-    return compare_floats(pa, pb, true, s);                             \
+    return compare_floats(pa, pb, is_quiet, s);                         \
 }
 
-COMPARE(16)
-COMPARE(32)
-COMPARE(64)
+COMPARE(soft_f16_compare, QEMU_FLATTEN, 16)
+COMPARE(soft_f32_compare, QEMU_SOFTFLOAT_ATTR, 32)
+COMPARE(soft_f64_compare, QEMU_SOFTFLOAT_ATTR, 64)
 
 #undef COMPARE
 
+int float16_compare(float16 a, float16 b, float_status *s)
+{
+    return soft_f16_compare(a, b, false, s);
+}
+
+int float16_compare_quiet(float16 a, float16 b, float_status *s)
+{
+    return soft_f16_compare(a, b, true, s);
+}
+
+static int QEMU_FLATTEN
+f32_compare(float32 xa, float32 xb, bool is_quiet, float_status *s)
+{
+    union_float32 ua, ub;
+
+    ua.s = xa;
+    ub.s = xb;
+
+    if (QEMU_NO_HARDFLOAT) {
+        goto soft;
+    }
+
+    float32_input_flush2(&ua.s, &ub.s, s);
+    if (isgreaterequal(ua.h, ub.h)) {
+        if (isgreater(ua.h, ub.h)) {
+            return float_relation_greater;
+        }
+        return float_relation_equal;
+    }
+    if (likely(isless(ua.h, ub.h))) {
+        return float_relation_less;
+    }
+    /* The only condition remaining is unordered.
+     * Fall through to set flags.
+     */
+ soft:
+    return soft_f32_compare(ua.s, ub.s, is_quiet, s);
+}
+
+int float32_compare(float32 a, float32 b, float_status *s)
+{
+    return f32_compare(a, b, false, s);
+}
+
+int float32_compare_quiet(float32 a, float32 b, float_status *s)
+{
+    return f32_compare(a, b, true, s);
+}
+
+static int QEMU_FLATTEN
+f64_compare(float64 xa, float64 xb, bool is_quiet, float_status *s)
+{
+    union_float64 ua, ub;
+
+    ua.s = xa;
+    ub.s = xb;
+
+    if (QEMU_NO_HARDFLOAT) {
+        goto soft;
+    }
+
+    float64_input_flush2(&ua.s, &ub.s, s);
+    if (isgreaterequal(ua.h, ub.h)) {
+        if (isgreater(ua.h, ub.h)) {
+            return float_relation_greater;
+        }
+        return float_relation_equal;
+    }
+    if (likely(isless(ua.h, ub.h))) {
+        return float_relation_less;
+    }
+    /* The only condition remaining is unordered.
+     * Fall through to set flags.
+     */
+ soft:
+    return soft_f64_compare(ua.s, ub.s, is_quiet, s);
+}
+
+int float64_compare(float64 a, float64 b, float_status *s)
+{
+    return f64_compare(a, b, false, s);
+}
+
+int float64_compare_quiet(float64 a, float64 b, float_status *s)
+{
+    return f64_compare(a, b, true, s);
+}
+
 /* Multiply A by 2 raised to the power N.  */
 static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
 {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
@ 2018-11-25  0:25   ` Aleksandar Markovic
  2018-11-25  1:25     ` Emilio G. Cota
  2018-12-04 12:28   ` Alex Bennée
  1 sibling, 1 reply; 37+ messages in thread
From: Aleksandar Markovic @ 2018-11-25  0:25 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: Richard Henderson, Alex Bennée, qemu-devel

Hi, Emilio.

> Note: some architectures (at least PPC, there might be others) clear
> the status flags passed to softfloat before most FP operations. This
> precludes the use of hardfloat, so to avoid introducing a performance
> regression for those targets, we add a flag to disable hardfloat.
> In the long run though it would be good to fix the targets so that
> at least the inexact flag passed to softfloat is indeed sticky.

Can you elaborate more on this paragraph?

Thanks,
Aleksandar Markovic
On Nov 25, 2018 1:08 AM, "Emilio G. Cota" <cota@braap.org> wrote:

> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the added comment for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.
>
> This patch just adds common code. Some operations will be migrated
> to hardfloat in subsequent patches to ease bisection.
>
> Note: some architectures (at least PPC, there might be others) clear
> the status flags passed to softfloat before most FP operations. This
> precludes the use of hardfloat, so to avoid introducing a performance
> regression for those targets, we add a flag to disable hardfloat.
> In the long run though it would be good to fix the targets so that
> at least the inexact flag passed to softfloat is indeed sticky.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  fpu/softfloat.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 315 insertions(+)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index ecdc00c633..306a12fa8d 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -83,6 +83,7 @@ this code that are retained.
>   * target-dependent and needs the TARGET_* macros.
>   */
>  #include "qemu/osdep.h"
> +#include <math.h>
>  #include "qemu/bitops.h"
>  #include "fpu/softfloat.h"
>
> @@ -95,6 +96,320 @@ this code that are retained.
>  *-----------------------------------------------------------
> -----------------*/
>  #include "fpu/softfloat-macros.h"
>
> +/*
> + * Hardfloat
> + *
> + * Fast emulation of guest FP instructions is challenging for two reasons.
> + * First, FP instruction semantics are similar but not identical,
> particularly
> + * when handling NaNs. Second, emulating at reasonable speed the guest FP
> + * exception flags is not trivial: reading the host's flags register with
> a
> + * feclearexcept & fetestexcept pair is slow [slightly slower than
> soft-fp],
> + * and trapping on every FP exception is not fast nor pleasant to work
> with.
> + *
> + * We address these challenges by leveraging the host FPU for a subset of
> the
> + * operations. To do this we expand on the idea presented in this paper:
> + *
> + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions
> in a
> + * binary translator." Software: Practice and Experience 46.12
> (2016):1591-1615.
> + *
> + * The idea is thus to leverage the host FPU to (1) compute FP operations
> + * and (2) identify whether FP exceptions occurred while avoiding
> + * expensive exception flag register accesses.
> + *
> + * An important optimization shown in the paper is that given that
> exception
> + * flags are rarely cleared by the guest, we can avoid recomputing some
> flags.
> + * This is particularly useful for the inexact flag, which is very
> frequently
> + * raised in floating-point workloads.
> + *
> + * We optimize the code further by deferring to soft-fp whenever FP
> exception
> + * detection might get hairy. Two examples: (1) when at least one operand
> is
> + * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0
> result
> + * and the result is < the minimum normal.
> + */
> +#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t)                          \
> +    static inline void name(soft_t *a, float_status *s)                 \
> +    {                                                                   \
> +        if (unlikely(soft_t ## _is_denormal(*a))) {                     \
> +            *a = soft_t ## _set_sign(soft_t ## _zero,                   \
> +                                     soft_t ## _is_neg(*a));            \
> +            s->float_exception_flags |= float_flag_input_denormal;      \
> +        }                                                               \
> +    }
> +
> +GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
> +GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
> +#undef GEN_INPUT_FLUSH__NOCHECK
> +
> +#define GEN_INPUT_FLUSH1(name, soft_t)                  \
> +    static inline void name(soft_t *a, float_status *s) \
> +    {                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {         \
> +            return;                                     \
> +        }                                               \
> +        soft_t ## _input_flush__nocheck(a, s);          \
> +    }
> +
> +GEN_INPUT_FLUSH1(float32_input_flush1, float32)
> +GEN_INPUT_FLUSH1(float64_input_flush1, float64)
> +#undef GEN_INPUT_FLUSH1
> +
> +#define GEN_INPUT_FLUSH2(name, soft_t)                                  \
> +    static inline void name(soft_t *a, soft_t *b, float_status *s)      \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +        soft_t ## _input_flush__nocheck(b, s);                          \
> +    }
> +
> +GEN_INPUT_FLUSH2(float32_input_flush2, float32)
> +GEN_INPUT_FLUSH2(float64_input_flush2, float64)
> +#undef GEN_INPUT_FLUSH2
> +
> +#define GEN_INPUT_FLUSH3(name, soft_t)                                  \
> +    static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status
> *s) \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +        soft_t ## _input_flush__nocheck(b, s);                          \
> +        soft_t ## _input_flush__nocheck(c, s);                          \
> +    }
> +
> +GEN_INPUT_FLUSH3(float32_input_flush3, float32)
> +GEN_INPUT_FLUSH3(float64_input_flush3, float64)
> +#undef GEN_INPUT_FLUSH3
> +
> +/*
> + * Choose whether to use fpclassify or float32/64_* primitives in the
> generated
> + * hardfloat functions. Each combination of number of inputs and float
> size
> + * gets its own value.
> + */
> +#if defined(__x86_64__)
> +# define QEMU_HARDFLOAT_1F32_USE_FP 0
> +# define QEMU_HARDFLOAT_1F64_USE_FP 1
> +# define QEMU_HARDFLOAT_2F32_USE_FP 0
> +# define QEMU_HARDFLOAT_2F64_USE_FP 1
> +# define QEMU_HARDFLOAT_3F32_USE_FP 0
> +# define QEMU_HARDFLOAT_3F64_USE_FP 1
> +#else
> +# define QEMU_HARDFLOAT_1F32_USE_FP 0
> +# define QEMU_HARDFLOAT_1F64_USE_FP 0
> +# define QEMU_HARDFLOAT_2F32_USE_FP 0
> +# define QEMU_HARDFLOAT_2F64_USE_FP 0
> +# define QEMU_HARDFLOAT_3F32_USE_FP 0
> +# define QEMU_HARDFLOAT_3F64_USE_FP 0
> +#endif
> +
> +/*
> + * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
> + * float{32,64}_is_infinity when !USE_FP.
> + * On x86_64/aarch64, using the former over the latter can yield a ~6%
> speedup.
> + * On power64 however, using isinf() reduces fp-bench performance by up
> to 50%.
> + */
> +#if defined(__x86_64__) || defined(__aarch64__)
> +# define QEMU_HARDFLOAT_USE_ISINF   1
> +#else
> +# define QEMU_HARDFLOAT_USE_ISINF   0
> +#endif
> +
> +/*
> + * Some targets clear the FP flags before most FP operations. This
> prevents
> + * the use of hardfloat, since hardfloat relies on the inexact flag being
> + * already set.
> + */
> +#if defined(TARGET_PPC)
> +# define QEMU_NO_HARDFLOAT 1
> +# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
> +#else
> +# define QEMU_NO_HARDFLOAT 0
> +# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN __attribute__((noinline))
> +#endif
> +
> +static inline bool can_use_fpu(const float_status *s)
> +{
> +    if (QEMU_NO_HARDFLOAT) {
> +        return false;
> +    }
> +    return likely(s->float_exception_flags & float_flag_inexact &&
> +                  s->float_rounding_mode == float_round_nearest_even);
> +}
> +
> +/*
> + * Hardfloat generation functions. Each operation can have two flavors:
> + * either using softfloat primitives (e.g. float32_is_zero_or_normal) for
> + * most condition checks, or native ones (e.g. fpclassify).
> + *
> + * The flavor is chosen by the callers. Instead of using macros, we rely
> on the
> + * compiler to propagate constants and inline everything into the callers.
> + *
> + * We only generate functions for operations with two inputs, since only
> + * these are common enough to justify consolidating them into common code.
> + */
> +
> +typedef union {
> +    float32 s;
> +    float h;
> +} union_float32;
> +
> +typedef union {
> +    float64 s;
> +    double h;
> +} union_float64;
> +
> +typedef bool (*f32_check_fn)(union_float32 a, union_float32 b);
> +typedef bool (*f64_check_fn)(union_float64 a, union_float64 b);
> +
> +typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);
> +typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
> +typedef float   (*hard_f32_op2_fn)(float a, float b);
> +typedef double  (*hard_f64_op2_fn)(double a, double b);
> +
> +/* 2-input is-zero-or-normal */
> +static inline bool f32_is_zon2(union_float32 a, union_float32 b)
> +{
> +    if (QEMU_HARDFLOAT_2F32_USE_FP) {
> +        /*
> +         * Not using a temp variable for consecutive fpclassify calls
> ends up
> +         * generating faster code.
> +         */
> +        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> +               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO);
> +    }
> +    return float32_is_zero_or_normal(a.s) &&
> +           float32_is_zero_or_normal(b.s);
> +}
> +
> +static inline bool f64_is_zon2(union_float64 a, union_float64 b)
> +{
> +    if (QEMU_HARDFLOAT_2F64_USE_FP) {
> +        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> +               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO);
> +    }
> +    return float64_is_zero_or_normal(a.s) &&
> +           float64_is_zero_or_normal(b.s);
> +}
> +
> +/* 3-input is-zero-or-normal */
> +static inline
> +bool f32_is_zon3(union_float32 a, union_float32 b, union_float32 c)
> +{
> +    if (QEMU_HARDFLOAT_3F32_USE_FP) {
> +        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> +               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO) &&
> +               (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) ==
> FP_ZERO);
> +    }
> +    return float32_is_zero_or_normal(a.s) &&
> +           float32_is_zero_or_normal(b.s) &&
> +           float32_is_zero_or_normal(c.s);
> +}
> +
> +static inline
> +bool f64_is_zon3(union_float64 a, union_float64 b, union_float64 c)
> +{
> +    if (QEMU_HARDFLOAT_3F64_USE_FP) {
> +        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> +               (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO) &&
> +               (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) ==
> FP_ZERO);
> +    }
> +    return float64_is_zero_or_normal(a.s) &&
> +           float64_is_zero_or_normal(b.s) &&
> +           float64_is_zero_or_normal(c.s);
> +}
> +
> +static inline bool f32_is_inf(union_float32 a)
> +{
> +    if (QEMU_HARDFLOAT_USE_ISINF) {
> +        return isinff(a.h);
> +    }
> +    return float32_is_infinity(a.s);
> +}
> +
> +static inline bool f64_is_inf(union_float64 a)
> +{
> +    if (QEMU_HARDFLOAT_USE_ISINF) {
> +        return isinf(a.h);
> +    }
> +    return float64_is_infinity(a.s);
> +}
> +
> +/* Note: @fast_test and @post can be NULL */
> +static inline float32
> +float32_gen2(float32 xa, float32 xb, float_status *s,
> +             hard_f32_op2_fn hard, soft_f32_op2_fn soft,
> +             f32_check_fn pre, f32_check_fn post,
> +             f32_check_fn fast_test, soft_f32_op2_fn fast_op)
> +{
> +    union_float32 ua, ub, ur;
> +
> +    ua.s = xa;
> +    ub.s = xb;
> +
> +    if (unlikely(!can_use_fpu(s))) {
> +        goto soft;
> +    }
> +
> +    float32_input_flush2(&ua.s, &ub.s, s);
> +    if (unlikely(!pre(ua, ub))) {
> +        goto soft;
> +    }
> +    if (fast_test && fast_test(ua, ub)) {
> +        return fast_op(ua.s, ub.s, s);
> +    }
> +
> +    ur.h = hard(ua.h, ub.h);
> +    if (unlikely(f32_is_inf(ur))) {
> +        s->float_exception_flags |= float_flag_overflow;
> +    } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
> +        if (post == NULL || post(ua, ub)) {
> +            goto soft;
> +        }
> +    }
> +    return ur.s;
> +
> + soft:
> +    return soft(ua.s, ub.s, s);
> +}
> +
> +static inline float64
> +float64_gen2(float64 xa, float64 xb, float_status *s,
> +             hard_f64_op2_fn hard, soft_f64_op2_fn soft,
> +             f64_check_fn pre, f64_check_fn post,
> +             f64_check_fn fast_test, soft_f64_op2_fn fast_op)
> +{
> +    union_float64 ua, ub, ur;
> +
> +    ua.s = xa;
> +    ub.s = xb;
> +
> +    if (unlikely(!can_use_fpu(s))) {
> +        goto soft;
> +    }
> +
> +    float64_input_flush2(&ua.s, &ub.s, s);
> +    if (unlikely(!pre(ua, ub))) {
> +        goto soft;
> +    }
> +    if (fast_test && fast_test(ua, ub)) {
> +        return fast_op(ua.s, ub.s, s);
> +    }
> +
> +    ur.h = hard(ua.h, ub.h);
> +    if (unlikely(f64_is_inf(ur))) {
> +        s->float_exception_flags |= float_flag_overflow;
> +    } else if (unlikely(fabs(ur.h) <= DBL_MIN)) {
> +        if (post == NULL || post(ua, ub)) {
> +            goto soft;
> +        }
> +    }
> +    return ur.s;
> +
> + soft:
> +    return soft(ua.s, ub.s, s);
> +}
> +
>  /*----------------------------------------------------------
> ------------------
>  | Returns the fraction bits of the half-precision floating-point value
> `a'.
>  *-----------------------------------------------------------
> -----------------*/
> --
> 2.17.1
>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-11-25  0:25   ` Aleksandar Markovic
@ 2018-11-25  1:25     ` Emilio G. Cota
  0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-25  1:25 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: Richard Henderson, Alex Bennée, qemu-devel

On Sun, Nov 25, 2018 at 01:25:25 +0100, Aleksandar Markovic wrote:
> > Note: some architectures (at least PPC, there might be others) clear
> > the status flags passed to softfloat before most FP operations. This
> > precludes the use of hardfloat, so to avoid introducing a performance
> > regression for those targets, we add a flag to disable hardfloat.
> > In the long run though it would be good to fix the targets so that
> > at least the inexact flag passed to softfloat is indeed sticky.
> 
> Can you elaborate more on this paragraph?

Sure. We only use hardfloat when the inexact flag is already
set. If it isn't, we defer to softfloat. This is done for two
reasons:

- Computing the inexact flag requires duplicating
  most of what softfloat does, so it's not worth doing. Note
  that clearing and reading the host's fp flags is even
  slower, so that's not an option.

- The inexact flag is raised *very* frequently. The flag
  remains set (in the guest) unless guest code explicitly
  clears it, which few guest workloads do.

It therefore makes sense for hardfloat to only kick in once
the inexact flag has already been set.

Most targets directly keep the guest's FP flags in the same
struct (float_status) that is passed to softfloat ops.
PPC, however, keeps the state of the guest FP flags in one
place, and passes a pristine float_status to softfloat code
every time it calls it. Thus, given that hardfloat is
entirely implemented in softfloat.c, PPC targets cannot
currently take advantage of it.

Changing this in the PPC target is not impossible, but it will
require additional work that I'm not doing in this series, hence
my note. So for now, PPC targets just have hardfloat disabled
at compile time, which avoids adding overhead for a feature
that they cannot use.

Let me know if anything is unclear. Cheers,

		Emilio

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (12 preceding siblings ...)
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
@ 2018-11-27 17:24 ` no-reply
  2018-11-27 17:52   ` Emilio G. Cota
  2018-11-27 17:32 ` no-reply
  2018-12-05 12:41 ` Alex Bennée
  15 siblings, 1 reply; 37+ messages in thread
From: no-reply @ 2018-11-27 17:24 UTC (permalink / raw)
  To: cota; +Cc: famz, qemu-devel, richard.henderson, alex.bennee

Hi,

This series failed docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Message-id: 20181124235553.17371-1-cota@braap.org
Subject: [Qemu-devel] [PATCH v6 00/13] hardfloat
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=8
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fe0cee3 hardfloat: implement float32/64 comparison
ac5968b hardfloat: implement float32/64 square root
0f10937 hardfloat: implement float32/64 fused multiply-add
de38097 hardfloat: implement float32/64 division
fbeab45 hardfloat: implement float32/64 multiplication
8894a16 hardfloat: implement float32/64 addition and subtraction
834d403 fpu: introduce hardfloat
94b3f9b tests/fp: add fp-bench
fe2ef78 softfloat: add float{32, 64}_is_zero_or_normal
a343567 softfloat: rename canonicalize to sf_canonicalize
73e6c0d target/tricore: use float32_is_denormal
be09b31 softfloat: add float{32, 64}_is_{de, }normal
319042a fp-test: pick TARGET_ARM to get its specialization

=== OUTPUT BEGIN ===
  BUILD   fedora
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-spofu4kn/src'
  GEN     /var/tmp/patchew-tester-tmp-spofu4kn/src/docker-src.2018-11-27-12.22.26.24055/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-spofu4kn/src/docker-src.2018-11-27-12.22.26.24055/qemu.tar.vroot'...
done.
Checking out files:  46% (3007/6464)   
Checking out files:  47% (3039/6464)   
Checking out files:  48% (3103/6464)   
Checking out files:  49% (3168/6464)   
Checking out files:  50% (3232/6464)   
Checking out files:  51% (3297/6464)   
Checking out files:  52% (3362/6464)   
Checking out files:  53% (3426/6464)   
Checking out files:  54% (3491/6464)   
Checking out files:  55% (3556/6464)   
Checking out files:  56% (3620/6464)   
Checking out files:  57% (3685/6464)   
Checking out files:  58% (3750/6464)   
Checking out files:  59% (3814/6464)   
Checking out files:  60% (3879/6464)   
Checking out files:  61% (3944/6464)   
Checking out files:  62% (4008/6464)   
Checking out files:  63% (4073/6464)   
Checking out files:  64% (4137/6464)   
Checking out files:  65% (4202/6464)   
Checking out files:  66% (4267/6464)   
Checking out files:  67% (4331/6464)   
Checking out files:  68% (4396/6464)   
Checking out files:  69% (4461/6464)   
Checking out files:  70% (4525/6464)   
Checking out files:  71% (4590/6464)   
Checking out files:  72% (4655/6464)   
Checking out files:  73% (4719/6464)   
Checking out files:  74% (4784/6464)   
Checking out files:  75% (4848/6464)   
Checking out files:  76% (4913/6464)   
Checking out files:  77% (4978/6464)   
Checking out files:  78% (5042/6464)   
Checking out files:  79% (5107/6464)   
Checking out files:  80% (5172/6464)   
Checking out files:  81% (5236/6464)   
Checking out files:  82% (5301/6464)   
Checking out files:  83% (5366/6464)   
Checking out files:  84% (5430/6464)   
Checking out files:  85% (5495/6464)   
Checking out files:  86% (5560/6464)   
Checking out files:  87% (5624/6464)   
Checking out files:  88% (5689/6464)   
Checking out files:  89% (5753/6464)   
Checking out files:  90% (5818/6464)   
Checking out files:  91% (5883/6464)   
Checking out files:  92% (5947/6464)   
Checking out files:  93% (6012/6464)   
Checking out files:  94% (6077/6464)   
Checking out files:  95% (6141/6464)   
Checking out files:  96% (6206/6464)   
Checking out files:  97% (6271/6464)   
Checking out files:  98% (6335/6464)   
Checking out files:  99% (6400/6464)   
Checking out files: 100% (6464/6464)   
Checking out files: 100% (6464/6464), done.
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into 'ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPY    RUNNER
    RUN test-mingw in qemu:fedora 
Packages installed:
SDL2-devel-2.0.9-1.fc28.x86_64
bc-1.07.1-5.fc28.x86_64
bison-3.0.4-9.fc28.x86_64
bluez-libs-devel-5.50-1.fc28.x86_64
brlapi-devel-0.6.7-19.fc28.x86_64
bzip2-1.0.6-26.fc28.x86_64
bzip2-devel-1.0.6-26.fc28.x86_64
ccache-3.4.2-2.fc28.x86_64
clang-6.0.1-2.fc28.x86_64
device-mapper-multipath-devel-0.7.4-3.git07e7bd5.fc28.x86_64
findutils-4.6.0-19.fc28.x86_64
flex-2.6.1-7.fc28.x86_64
gcc-8.2.1-5.fc28.x86_64
gcc-c++-8.2.1-5.fc28.x86_64
gettext-0.19.8.1-14.fc28.x86_64
git-2.17.2-1.fc28.x86_64
glib2-devel-2.56.3-2.fc28.x86_64
glusterfs-api-devel-4.1.5-1.fc28.x86_64
gnutls-devel-3.6.4-1.fc28.x86_64
gtk3-devel-3.22.30-1.fc28.x86_64
hostname-3.20-3.fc28.x86_64
libaio-devel-0.3.110-11.fc28.x86_64
libasan-8.2.1-5.fc28.x86_64
libattr-devel-2.4.48-3.fc28.x86_64
libcap-devel-2.25-9.fc28.x86_64
libcap-ng-devel-0.7.9-4.fc28.x86_64
libcurl-devel-7.59.0-8.fc28.x86_64
libfdt-devel-1.4.7-1.fc28.x86_64
libpng-devel-1.6.34-6.fc28.x86_64
librbd-devel-12.2.8-1.fc28.x86_64
libssh2-devel-1.8.0-7.fc28.x86_64
libubsan-8.2.1-5.fc28.x86_64
libusbx-devel-1.0.22-1.fc28.x86_64
libxml2-devel-2.9.8-4.fc28.x86_64
llvm-6.0.1-8.fc28.x86_64
lzo-devel-2.08-12.fc28.x86_64
make-4.2.1-6.fc28.x86_64
mingw32-SDL2-2.0.9-1.fc28.noarch
mingw32-bzip2-1.0.6-9.fc27.noarch
mingw32-curl-7.57.0-1.fc28.noarch
mingw32-glib2-2.56.1-1.fc28.noarch
mingw32-gmp-6.1.2-2.fc27.noarch
mingw32-gnutls-3.6.3-1.fc28.noarch
mingw32-gtk3-3.22.30-1.fc28.noarch
mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw32-libpng-1.6.29-2.fc27.noarch
mingw32-libssh2-1.8.0-3.fc27.noarch
mingw32-libtasn1-4.13-1.fc28.noarch
mingw32-nettle-3.4-1.fc28.noarch
mingw32-pixman-0.34.0-3.fc27.noarch
mingw32-pkg-config-0.28-9.fc27.x86_64
mingw64-SDL2-2.0.9-1.fc28.noarch
mingw64-bzip2-1.0.6-9.fc27.noarch
mingw64-curl-7.57.0-1.fc28.noarch
mingw64-glib2-2.56.1-1.fc28.noarch
mingw64-gmp-6.1.2-2.fc27.noarch
mingw64-gnutls-3.6.3-1.fc28.noarch
mingw64-gtk3-3.22.30-1.fc28.noarch
mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw64-libpng-1.6.29-2.fc27.noarch
mingw64-libssh2-1.8.0-3.fc27.noarch
mingw64-libtasn1-4.13-1.fc28.noarch
mingw64-nettle-3.4-1.fc28.noarch
mingw64-pixman-0.34.0-3.fc27.noarch
mingw64-pkg-config-0.28-9.fc27.x86_64
ncurses-devel-6.1-5.20180224.fc28.x86_64
nettle-devel-3.4-2.fc28.x86_64
nss-devel-3.39.0-1.0.fc28.x86_64
numactl-devel-2.0.11-8.fc28.x86_64
package PyYAML is not installed
package libjpeg-devel is not installed
perl-5.26.2-414.fc28.x86_64
pixman-devel-0.34.0-8.fc28.x86_64
python3-3.6.6-1.fc28.x86_64
snappy-devel-1.1.7-5.fc28.x86_64
sparse-0.5.2-1.fc28.x86_64
spice-server-devel-0.14.0-4.fc28.x86_64
systemtap-sdt-devel-4.0-1.fc28.x86_64
tar-1.30-3.fc28.x86_64
usbredir-devel-0.8.0-1.fc28.x86_64
virglrenderer-devel-0.6.0-4.20170210git76b3da97b.fc28.x86_64
vte3-devel-0.36.5-6.fc28.x86_64
which-2.21-8.fc28.x86_64
xen-devel-4.10.2-2.fc28.x86_64
zlib-devel-1.2.11-8.fc28.x86_64

Environment variables:
TARGET_LIST=
PACKAGES=bc     bison     bluez-libs-devel     brlapi-devel     bzip2     bzip2-devel     ccache     clang     device-mapper-multipath-devel     findutils     flex     gcc     gcc-c++     gettext     git     glib2-devel     glusterfs-api-devel     gnutls-devel     gtk3-devel     hostname     libaio-devel     libasan     libattr-devel     libcap-devel     libcap-ng-devel     libcurl-devel     libfdt-devel     libjpeg-devel     libpng-devel     librbd-devel     libssh2-devel     libubsan     libusbx-devel     libxml2-devel     llvm     lzo-devel     make     mingw32-bzip2     mingw32-curl     mingw32-glib2     mingw32-gmp     mingw32-gnutls     mingw32-gtk3     mingw32-libjpeg-turbo     mingw32-libpng     mingw32-libssh2     mingw32-libtasn1     mingw32-nettle     mingw32-pixman     mingw32-pkg-config     mingw32-SDL2     mingw64-bzip2     mingw64-curl     mingw64-glib2     mingw64-gmp     mingw64-gnutls     mingw64-gtk3     mingw64-libjpeg-turbo     mingw64-libpng     mingw64-libssh2     mingw64-libtasn1     mingw64-nettle     mingw64-pixman     mingw64-pkg-config     mingw64-SDL2     ncurses-devel     nettle-devel     nss-devel     numactl-devel     perl     pixman-devel     python3     PyYAML     SDL2-devel     snappy-devel     sparse     spice-server-devel     systemtap-sdt-devel     tar     usbredir-devel     virglrenderer-devel     vte3-devel     which     xen-devel     zlib-devel
J=8
V=
HOSTNAME=2ccc3ec89689
DEBUG=
SHOW_ENV=1
PWD=/
HOME=/
CCACHE_DIR=/var/tmp/ccache
FBR=f28
DISTTAG=f28container
QEMU_CONFIGURE_OPTS=--python=/usr/bin/python3
FGC=f28
TEST_DIR=/tmp/qemu-test
SHLVL=1
FEATURES=mingw clang pyyaml asan dtc
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MAKEFLAGS= -j8
EXTRA_CONFIGURE_OPTS=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install --python=/usr/bin/python3 --cross-prefix=x86_64-w64-mingw32- --enable-trace-backends=simple --enable-gnutls --enable-nettle --enable-curl --enable-vnc --enable-bzip2 --enable-guest-agent --with-sdlabi=2.0
Install prefix    /tmp/qemu-test/install
BIOS directory    /tmp/qemu-test/install
firmware path     /tmp/qemu-test/install/share/qemu-firmware
binary directory  /tmp/qemu-test/install
library directory /tmp/qemu-test/install/lib
module directory  /tmp/qemu-test/install/lib
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory  /tmp/qemu-test/install
local state directory   queried at runtime
Windows SDK       no
Source path       /tmp/qemu-test/src
GIT binary        git
GIT submodules    
C compiler        x86_64-w64-mingw32-gcc
Host C compiler   cc
C++ compiler      x86_64-w64-mingw32-g++
Objective-C compiler clang
ARFLAGS           rv
CFLAGS            -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS       -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/pixman-1  -I$(SRC_PATH)/dtc/libfdt -Werror -DHAS_LIBSSH2_SFTP_FSYNC -I/usr/x86_64-w64-mingw32/sys-root/mingw/include  -mms-bitfields -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0 -I/usr/x86_64-w64-mingw32/sys-root/mingw/lib/glib-2.0/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include  -m64 -mcx16 -mthreads -D__USE_MINGW_ANSI_STDIO=1 -DWIN32_LEAN_AND_MEAN -DWINVER=0x501 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv  -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/p11-kit-1  -I/usr/x86_64-w64-mingw32/sys-root/mingw/include   -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/libpng16 
LDFLAGS           -Wl,--nxcompat -Wl,--no-seh -Wl,--dynamicbase -Wl,--warn-common -m64 -g 
QEMU_LDFLAGS      -L$(BUILD_DIR)/dtc/libfdt 
make              make
install           install
python            /usr/bin/python3 -B
smbd              /usr/sbin/smbd
module support    no
host CPU          x86_64
host big endian   no
target list       x86_64-softmmu aarch64-softmmu
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
SDL support       yes (2.0.9)
GTK support       yes (3.22.30)
GTK GL support    no
VTE support       no 
TLS priority      NORMAL
GNUTLS support    yes
libgcrypt         no
nettle            yes (3.4)
libtasn1          yes
curses support    no
virgl support     no 
curl support      yes
mingw32 support   yes
Audio drivers     dsound
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS support    no
Multipath support no
VNC support       yes
VNC SASL support  no
VNC JPEG support  yes
VNC PNG support   yes
xen support       no
brlapi support    no
bluez  support    no
Documentation     no
PIE               no
vde support       no
netmap support    no
Linux AIO support no
ATTR/XATTR support no
Install blobs     yes
KVM support       no
HAX support       yes
HVF support       no
WHPX support      no
TCG support       yes
TCG debug enabled no
TCG interpreter   no
malloc trim support no
RDMA support      no
PVRDMA support    no
fdt support       git
membarrier        no
preadv support    no
fdatasync         no
madvise           no
posix_madvise     no
posix_memalign    no
libcap-ng support no
vhost-net support no
vhost-crypto support no
vhost-scsi support no
vhost-vsock support no
vhost-user support no
Trace backends    simple
Trace output file trace-<pid>
spice support     no 
rbd support       no
xfsctl support    no
smartcard support no
libusb            no
usb net redir     no
OpenGL support    no
OpenGL dmabufs    no
libiscsi support  no
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info yes
QGA MSI support   no
seccomp support   no
coroutine backend win32
coroutine pool    yes
debug stack usage no
mutex debugging   no
crypto afalg      no
GlusterFS support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   yes
TPM passthrough   no
TPM emulator      no
QOM debugging     yes
Live block migration yes
lzo support       no
snappy support    no
bzip2 support     yes
NUMA host support no
libxml2           no
tcmalloc support  no
jemalloc support  no
avx2 optimization yes
replication support yes
VxHS block device no
bochs support     yes
cloop support     yes
dmg support       yes
qcow v1 support   yes
vdi support       yes
vvfat support     yes
qed support       yes
parallels support yes
sheepdog support  yes
capstone          no
docker            no
libpmem support   no
libudev           no

NOTE: cross-compilers enabled:  'x86_64-w64-mingw32-gcc'
  GEN     config-host.h
  GEN     x86_64-softmmu/config-devices.mak.tmp
  GEN     aarch64-softmmu/config-devices.mak.tmp
  GEN     qemu-options.def
  GEN     qapi-gen
  GEN     trace/generated-tcg-tracers.h
  GEN     trace/generated-helpers-wrappers.h
  GEN     trace/generated-helpers.h
  GEN     trace/generated-helpers.c
  GEN     aarch64-softmmu/config-devices.mak
  GEN     x86_64-softmmu/config-devices.mak
  GEN     module_block.h
  GEN     ui/input-keymap-atset1-to-qcode.c
  GEN     ui/input-keymap-linux-to-qcode.c
  GEN     ui/input-keymap-qcode-to-atset1.c
  GEN     ui/input-keymap-qcode-to-atset2.c
  GEN     ui/input-keymap-qcode-to-atset3.c
  GEN     ui/input-keymap-qcode-to-linux.c
  GEN     ui/input-keymap-qcode-to-qnum.c
  GEN     ui/input-keymap-qcode-to-sun.c
  GEN     ui/input-keymap-qnum-to-qcode.c
  GEN     ui/input-keymap-usb-to-qcode.c
  GEN     ui/input-keymap-win32-to-qcode.c
  GEN     ui/input-keymap-x11-to-qcode.c
  GEN     ui/input-keymap-xorgevdev-to-qcode.c
  GEN     ui/input-keymap-xorgkbd-to-qcode.c
  GEN     ui/input-keymap-xorgxquartz-to-qcode.c
  GEN     ui/input-keymap-xorgxwin-to-qcode.c
  GEN     ui/input-keymap-osx-to-qcode.c
  GEN     trace-root.h
  GEN     tests/test-qapi-gen
  GEN     accel/kvm/trace.h
  GEN     accel/tcg/trace.h
  GEN     audio/trace.h
  GEN     block/trace.h
  GEN     chardev/trace.h
  GEN     crypto/trace.h
  GEN     hw/9pfs/trace.h
  GEN     hw/acpi/trace.h
  GEN     hw/alpha/trace.h
  GEN     hw/arm/trace.h
  GEN     hw/audio/trace.h
  GEN     hw/block/trace.h
  GEN     hw/block/dataplane/trace.h
  GEN     hw/char/trace.h
  GEN     hw/display/trace.h
  GEN     hw/dma/trace.h
  GEN     hw/hppa/trace.h
  GEN     hw/i2c/trace.h
  GEN     hw/i386/trace.h
  GEN     hw/i386/xen/trace.h
  GEN     hw/ide/trace.h
  GEN     hw/input/trace.h
  GEN     hw/intc/trace.h
  GEN     hw/isa/trace.h
  GEN     hw/mem/trace.h
  GEN     hw/misc/trace.h
  GEN     hw/misc/macio/trace.h
  GEN     hw/net/trace.h
  GEN     hw/nvram/trace.h
  GEN     hw/pci/trace.h
  GEN     hw/pci-host/trace.h
  GEN     hw/ppc/trace.h
  GEN     hw/rdma/trace.h
  GEN     hw/rdma/vmw/trace.h
  GEN     hw/s390x/trace.h
  GEN     hw/scsi/trace.h
  GEN     hw/sd/trace.h
  GEN     hw/sparc/trace.h
  GEN     hw/sparc64/trace.h
  GEN     hw/timer/trace.h
  GEN     hw/tpm/trace.h
  GEN     hw/usb/trace.h
  GEN     hw/vfio/trace.h
  GEN     hw/virtio/trace.h
  GEN     hw/watchdog/trace.h
  GEN     hw/xen/trace.h
  GEN     io/trace.h
  GEN     linux-user/trace.h
  GEN     migration/trace.h
  GEN     nbd/trace.h
  GEN     net/trace.h
  GEN     qapi/trace.h
  GEN     qom/trace.h
  GEN     scsi/trace.h
  GEN     target/arm/trace.h
  GEN     target/i386/trace.h
  GEN     target/mips/trace.h
  GEN     target/ppc/trace.h
  GEN     target/s390x/trace.h
  GEN     target/sparc/trace.h
  GEN     ui/trace.h
  GEN     util/trace.h
  GEN     trace-root.c
  GEN     accel/kvm/trace.c
  GEN     accel/tcg/trace.c
  GEN     audio/trace.c
  GEN     block/trace.c
  GEN     chardev/trace.c
  GEN     crypto/trace.c
  GEN     hw/9pfs/trace.c
  GEN     hw/acpi/trace.c
  GEN     hw/alpha/trace.c
  GEN     hw/arm/trace.c
  GEN     hw/audio/trace.c
  GEN     hw/block/trace.c
  GEN     hw/block/dataplane/trace.c
  GEN     hw/char/trace.c
  GEN     hw/display/trace.c
  GEN     hw/dma/trace.c
  GEN     hw/hppa/trace.c
  GEN     hw/i2c/trace.c
  GEN     hw/i386/trace.c
  GEN     hw/i386/xen/trace.c
  GEN     hw/ide/trace.c
  GEN     hw/input/trace.c
  GEN     hw/intc/trace.c
  GEN     hw/isa/trace.c
  GEN     hw/mem/trace.c
  GEN     hw/misc/trace.c
  GEN     hw/misc/macio/trace.c
  GEN     hw/net/trace.c
  GEN     hw/nvram/trace.c
  GEN     hw/pci/trace.c
  GEN     hw/pci-host/trace.c
  GEN     hw/ppc/trace.c
  GEN     hw/rdma/trace.c
  GEN     hw/rdma/vmw/trace.c
  GEN     hw/s390x/trace.c
  GEN     hw/scsi/trace.c
  GEN     hw/sd/trace.c
  GEN     hw/sparc/trace.c
  GEN     hw/sparc64/trace.c
  GEN     hw/timer/trace.c
  GEN     hw/tpm/trace.c
  GEN     hw/usb/trace.c
  GEN     hw/vfio/trace.c
  GEN     hw/virtio/trace.c
  GEN     hw/watchdog/trace.c
  GEN     hw/xen/trace.c
  GEN     io/trace.c
  GEN     linux-user/trace.c
  GEN     migration/trace.c
  GEN     nbd/trace.c
  GEN     net/trace.c
  GEN     qapi/trace.c
  GEN     qom/trace.c
  GEN     scsi/trace.c
  GEN     target/arm/trace.c
  GEN     target/i386/trace.c
  GEN     target/mips/trace.c
  GEN     target/ppc/trace.c
  GEN     target/s390x/trace.c
  GEN     target/sparc/trace.c
  GEN     ui/trace.c
  GEN     util/trace.c
  GEN     config-all-devices.mak
	 DEP /tmp/qemu-test/src/dtc/tests/dumptrees.c
	 DEP /tmp/qemu-test/src/dtc/tests/trees.S
	 DEP /tmp/qemu-test/src/dtc/tests/testutils.c
	 DEP /tmp/qemu-test/src/dtc/tests/value-labels.c
	 DEP /tmp/qemu-test/src/dtc/tests/asm_tree_dump.c
	 DEP /tmp/qemu-test/src/dtc/tests/truncated_string.c
	 DEP /tmp/qemu-test/src/dtc/tests/truncated_memrsv.c
	 DEP /tmp/qemu-test/src/dtc/tests/truncated_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/check_full.c
	 DEP /tmp/qemu-test/src/dtc/tests/check_header.c
	 DEP /tmp/qemu-test/src/dtc/tests/check_path.c
	 DEP /tmp/qemu-test/src/dtc/tests/overlay_bad_fixup.c
	 DEP /tmp/qemu-test/src/dtc/tests/overlay.c
	 DEP /tmp/qemu-test/src/dtc/tests/subnode_iterate.c
	 DEP /tmp/qemu-test/src/dtc/tests/property_iterate.c
	 DEP /tmp/qemu-test/src/dtc/tests/integer-expressions.c
	 DEP /tmp/qemu-test/src/dtc/tests/utilfdt_test.c
	 DEP /tmp/qemu-test/src/dtc/tests/path_offset_aliases.c
	 DEP /tmp/qemu-test/src/dtc/tests/add_subnode_with_nops.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_unordered.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtb_reverse.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_ordered.c
	 DEP /tmp/qemu-test/src/dtc/tests/extra-terminating-null.c
	 DEP /tmp/qemu-test/src/dtc/tests/incbin.c
	 DEP /tmp/qemu-test/src/dtc/tests/boot-cpuid.c
	 DEP /tmp/qemu-test/src/dtc/tests/phandle_format.c
	 DEP /tmp/qemu-test/src/dtc/tests/path-references.c
	 DEP /tmp/qemu-test/src/dtc/tests/references.c
	 DEP /tmp/qemu-test/src/dtc/tests/string_escapes.c
	 DEP /tmp/qemu-test/src/dtc/tests/propname_escapes.c
	 DEP /tmp/qemu-test/src/dtc/tests/appendprop2.c
	 DEP /tmp/qemu-test/src/dtc/tests/appendprop1.c
	 DEP /tmp/qemu-test/src/dtc/tests/del_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/del_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/setprop.c
	 DEP /tmp/qemu-test/src/dtc/tests/set_name.c
	 DEP /tmp/qemu-test/src/dtc/tests/rw_tree1.c
	 DEP /tmp/qemu-test/src/dtc/tests/open_pack.c
	 DEP /tmp/qemu-test/src/dtc/tests/nopulate.c
	 DEP /tmp/qemu-test/src/dtc/tests/mangle-layout.c
	 DEP /tmp/qemu-test/src/dtc/tests/move_and_save.c
	 DEP /tmp/qemu-test/src/dtc/tests/sw_states.c
	 DEP /tmp/qemu-test/src/dtc/tests/sw_tree1.c
	 DEP /tmp/qemu-test/src/dtc/tests/nop_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/nop_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/setprop_inplace.c
	 DEP /tmp/qemu-test/src/dtc/tests/stringlist.c
	 DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells2.c
	 DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells.c
	 DEP /tmp/qemu-test/src/dtc/tests/notfound.c
	 DEP /tmp/qemu-test/src/dtc/tests/sized_cells.c
	 DEP /tmp/qemu-test/src/dtc/tests/char_literal.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_alias.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_compatible.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_check_compatible.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_phandle.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_prop_value.c
	 DEP /tmp/qemu-test/src/dtc/tests/parent_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_path.c
	 DEP /tmp/qemu-test/src/dtc/tests/supernode_atdepth_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_phandle.c
	 DEP /tmp/qemu-test/src/dtc/tests/getprop.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_name.c
	 DEP /tmp/qemu-test/src/dtc/tests/path_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/subnode_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/find_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/root_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_mem_rsv.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_overlay.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_addresses.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_empty_tree.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_strerror.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_rw.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_sw.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_wip.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_ro.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt.c
	 DEP /tmp/qemu-test/src/dtc/util.c
	 DEP /tmp/qemu-test/src/dtc/fdtoverlay.c
	 DEP /tmp/qemu-test/src/dtc/fdtput.c
	 DEP /tmp/qemu-test/src/dtc/fdtget.c
	 DEP /tmp/qemu-test/src/dtc/fdtdump.c
	 LEX convert-dtsv0-lexer.lex.c
	 DEP /tmp/qemu-test/src/dtc/srcpos.c
	 BISON dtc-parser.tab.c
	 LEX dtc-lexer.lex.c
	 DEP /tmp/qemu-test/src/dtc/treesource.c
	 DEP /tmp/qemu-test/src/dtc/livetree.c
	 DEP /tmp/qemu-test/src/dtc/fstree.c
	 DEP /tmp/qemu-test/src/dtc/flattree.c
	 DEP /tmp/qemu-test/src/dtc/dtc.c
	 DEP /tmp/qemu-test/src/dtc/data.c
	 DEP /tmp/qemu-test/src/dtc/checks.c
	 DEP convert-dtsv0-lexer.lex.c
	 DEP dtc-parser.tab.c
	 DEP dtc-lexer.lex.c
	CHK version_gen.h
	UPD version_gen.h
	 DEP /tmp/qemu-test/src/dtc/util.c
	 CC libfdt/fdt.o
	 CC libfdt/fdt_ro.o
	 CC libfdt/fdt_wip.o
	 CC libfdt/fdt_sw.o
	 CC libfdt/fdt_rw.o
	 CC libfdt/fdt_strerror.o
	 CC libfdt/fdt_empty_tree.o
	 CC libfdt/fdt_addresses.o
	 CC libfdt/fdt_overlay.o
	 AR libfdt/libfdt.a
x86_64-w64-mingw32-ar: creating libfdt/libfdt.a
a - libfdt/fdt.o
a - libfdt/fdt_ro.o
a - libfdt/fdt_wip.o
a - libfdt/fdt_sw.o
a - libfdt/fdt_rw.o
a - libfdt/fdt_strerror.o
a - libfdt/fdt_empty_tree.o
a - libfdt/fdt_addresses.o
a - libfdt/fdt_overlay.o
  RC      version.o
  GEN     qga/qapi-generated/qapi-gen
  CC      qapi/qapi-builtin-types.o
  CC      qapi/qapi-types.o
  CC      qapi/qapi-types-block-core.o
  CC      qapi/qapi-types-block.o
  CC      qapi/qapi-types-char.o
  CC      qapi/qapi-types-common.o
  CC      qapi/qapi-types-crypto.o
  CC      qapi/qapi-types-introspect.o
  CC      qapi/qapi-types-job.o
  CC      qapi/qapi-types-migration.o
  CC      qapi/qapi-types-misc.o
  CC      qapi/qapi-types-net.o
  CC      qapi/qapi-types-rocker.o
  CC      qapi/qapi-types-run-state.o
  CC      qapi/qapi-types-sockets.o
  CC      qapi/qapi-types-tpm.o
  CC      qapi/qapi-types-trace.o
  CC      qapi/qapi-types-transaction.o
  CC      qapi/qapi-types-ui.o
  CC      qapi/qapi-builtin-visit.o
  CC      qapi/qapi-visit.o
  CC      qapi/qapi-visit-block-core.o
  CC      qapi/qapi-visit-block.o
  CC      qapi/qapi-visit-char.o
  CC      qapi/qapi-visit-common.o
  CC      qapi/qapi-visit-crypto.o
  CC      qapi/qapi-visit-introspect.o
  CC      qapi/qapi-visit-job.o
  CC      qapi/qapi-visit-migration.o
  CC      qapi/qapi-visit-misc.o
  CC      qapi/qapi-visit-net.o
  CC      qapi/qapi-visit-rocker.o
  CC      qapi/qapi-visit-run-state.o
  CC      qapi/qapi-visit-sockets.o
  CC      qapi/qapi-visit-tpm.o
  CC      qapi/qapi-visit-trace.o
  CC      qapi/qapi-visit-transaction.o
  CC      qapi/qapi-visit-ui.o
  CC      qapi/qapi-events.o
  CC      qapi/qapi-events-block-core.o
  CC      qapi/qapi-events-char.o
  CC      qapi/qapi-events-block.o
  CC      qapi/qapi-events-common.o
  CC      qapi/qapi-events-crypto.o
  CC      qapi/qapi-events-introspect.o
  CC      qapi/qapi-events-job.o
  CC      qapi/qapi-events-migration.o
  CC      qapi/qapi-events-misc.o
  CC      qapi/qapi-events-net.o
  CC      qapi/qapi-events-rocker.o
  CC      qapi/qapi-events-run-state.o
  CC      qapi/qapi-events-sockets.o
  CC      qapi/qapi-events-tpm.o
  CC      qapi/qapi-events-trace.o
  CC      qapi/qapi-events-transaction.o
  CC      qapi/qapi-events-ui.o
  CC      qapi/qapi-introspect.o
  CC      qapi/qapi-visit-core.o
  CC      qapi/qapi-dealloc-visitor.o
  CC      qapi/qobject-input-visitor.o
  CC      qapi/qobject-output-visitor.o
  CC      qapi/qmp-registry.o
  CC      qapi/qmp-dispatch.o
  CC      qapi/string-input-visitor.o
  CC      qapi/string-output-visitor.o
  CC      qapi/opts-visitor.o
  CC      qapi/qapi-clone-visitor.o
  CC      qapi/qmp-event.o
  CC      qobject/qnull.o
  CC      qapi/qapi-util.o
  CC      qobject/qstring.o
  CC      qobject/qnum.o
  CC      qobject/qdict.o
  CC      qobject/qlist.o
  CC      qobject/qbool.o
  CC      qobject/qlit.o
  CC      qobject/qobject.o
  CC      qobject/qjson.o
  CC      qobject/json-lexer.o
  CC      qobject/json-streamer.o
  CC      qobject/json-parser.o
  CC      qobject/block-qdict.o
  CC      trace/simple.o
  CC      trace/control.o
  CC      trace/qmp.o
  CC      util/osdep.o
  CC      util/cutils.o
  CC      util/unicode.o
  CC      util/qemu-timer-common.o
  CC      util/bufferiszero.o
  CC      util/lockcnt.o
  CC      util/aiocb.o
  CC      util/async.o
  CC      util/aio-wait.o
  CC      util/thread-pool.o
  CC      util/qemu-timer.o
  CC      util/main-loop.o
  CC      util/iohandler.o
  CC      util/aio-win32.o
  CC      util/event_notifier-win32.o
  CC      util/oslib-win32.o
  CC      util/qemu-thread-win32.o
  CC      util/envlist.o
  CC      util/path.o
  CC      util/module.o
  CC      util/host-utils.o
  CC      util/bitmap.o
  CC      util/bitops.o
  CC      util/hbitmap.o
  CC      util/fifo8.o
  CC      util/acl.o
  CC      util/cacheinfo.o
  CC      util/error.o
  CC      util/qemu-error.o
  CC      util/id.o
  CC      util/iov.o
  CC      util/qemu-config.o
  CC      util/qemu-sockets.o
  CC      util/uri.o
  CC      util/notify.o
  CC      util/qemu-option.o
  CC      util/qemu-progress.o
  CC      util/keyval.o
  CC      util/hexdump.o
  CC      util/crc32c.o
  CC      util/uuid.o
  CC      util/throttle.o
  CC      util/getauxval.o
  CC      util/readline.o
  CC      util/rcu.o
  CC      util/qemu-coroutine.o
  CC      util/qemu-coroutine-lock.o
  CC      util/qemu-coroutine-io.o
  CC      util/qemu-coroutine-sleep.o
  CC      util/coroutine-win32.o
  CC      util/buffer.o
  CC      util/timed-average.o
  CC      util/base64.o
  CC      util/log.o
  CC      util/pagesize.o
  CC      util/qdist.o
  CC      util/qht.o
  CC      util/qsp.o
  CC      util/range.o
  CC      util/stats64.o
  CC      util/systemd.o
  CC      util/iova-tree.o
  CC      trace-root.o
  CC      accel/kvm/trace.o
  CC      accel/tcg/trace.o
  CC      audio/trace.o
  CC      block/trace.o
  CC      chardev/trace.o
  CC      crypto/trace.o
  CC      hw/9pfs/trace.o
  CC      hw/acpi/trace.o
  CC      hw/alpha/trace.o
  CC      hw/arm/trace.o
  CC      hw/audio/trace.o
  CC      hw/block/trace.o
  CC      hw/display/trace.o
  CC      hw/block/dataplane/trace.o
  CC      hw/char/trace.o
  CC      hw/dma/trace.o
  CC      hw/hppa/trace.o
  CC      hw/i2c/trace.o
  CC      hw/i386/trace.o
  CC      hw/i386/xen/trace.o
  CC      hw/ide/trace.o
  CC      hw/input/trace.o
  CC      hw/intc/trace.o
  CC      hw/isa/trace.o
  CC      hw/mem/trace.o
  CC      hw/misc/trace.o
  CC      hw/misc/macio/trace.o
  CC      hw/net/trace.o
  CC      hw/nvram/trace.o
  CC      hw/pci/trace.o
  CC      hw/pci-host/trace.o
  CC      hw/rdma/trace.o
  CC      hw/ppc/trace.o
  CC      hw/rdma/vmw/trace.o
  CC      hw/s390x/trace.o
  CC      hw/scsi/trace.o
  CC      hw/sd/trace.o
  CC      hw/sparc/trace.o
  CC      hw/sparc64/trace.o
  CC      hw/timer/trace.o
  CC      hw/tpm/trace.o
  CC      hw/usb/trace.o
  CC      hw/vfio/trace.o
  CC      hw/virtio/trace.o
  CC      hw/watchdog/trace.o
  CC      hw/xen/trace.o
  CC      io/trace.o
  CC      linux-user/trace.o
  CC      migration/trace.o
  CC      nbd/trace.o
  CC      net/trace.o
  CC      qapi/trace.o
  CC      qom/trace.o
  CC      scsi/trace.o
  CC      target/arm/trace.o
  CC      target/i386/trace.o
  CC      target/mips/trace.o
  CC      target/ppc/trace.o
  CC      target/s390x/trace.o
  CC      target/sparc/trace.o
  CC      ui/trace.o
  CC      util/trace.o
  CC      crypto/pbkdf-stub.o
  CC      stubs/arch-query-cpu-def.o
  CC      stubs/arch-query-cpu-model-expansion.o
  CC      stubs/arch-query-cpu-model-comparison.o
  CC      stubs/bdrv-next-monitor-owned.o
  CC      stubs/arch-query-cpu-model-baseline.o
  CC      stubs/blk-commit-all.o
  CC      stubs/blockdev-close-all-bdrv-states.o
  CC      stubs/clock-warp.o
  CC      stubs/cpu-get-clock.o
  CC      stubs/cpu-get-icount.o
  CC      stubs/dump.o
  CC      stubs/gdbstub.o
  CC      stubs/error-printf.o
  CC      stubs/fdset.o
  CC      stubs/get-vm-name.o
  CC      stubs/iothread.o
  CC      stubs/iothread-lock.o
  CC      stubs/is-daemonized.o
  CC      stubs/migr-blocker.o
  CC      stubs/machine-init-done.o
  CC      stubs/change-state-handler.o
  CC      stubs/monitor.o
  CC      stubs/qtest.o
  CC      stubs/notify-event.o
  CC      stubs/replay.o
  CC      stubs/runstate-check.o
  CC      stubs/set-fd-handler.o
  CC      stubs/slirp.o
  CC      stubs/sysbus.o
  CC      stubs/tpm.o
  CC      stubs/trace-control.o
  CC      stubs/uuid.o
  CC      stubs/vm-stop.o
  CC      stubs/vmstate.o
  CC      stubs/fd-register.o
  CC      stubs/qmp_memory_device.o
  CC      stubs/target-monitor-defs.o
  CC      stubs/target-get-monitor-def.o
  CC      stubs/pc_madt_cpu_entry.o
  CC      stubs/vmgenid.o
  CC      stubs/xen-common.o
  CC      stubs/xen-hvm.o
  CC      stubs/pci-host-piix.o
  CC      stubs/ram-block.o
  CC      stubs/ramfb.o
  GEN     qemu-img-cmds.h
  CC      block.o
  CC      blockjob.o
  CC      job.o
  CC      qemu-io-cmds.o
  CC      replication.o
  CC      block/raw-format.o
  CC      block/vmdk.o
  CC      block/vpc.o
  CC      block/qcow.o
  CC      block/vdi.o
  CC      block/cloop.o
  CC      block/bochs.o
  CC      block/vvfat.o
  CC      block/dmg.o
  CC      block/qcow2.o
  CC      block/qcow2-snapshot.o
  CC      block/qcow2-refcount.o
  CC      block/qcow2-cluster.o
  CC      block/qcow2-cache.o
  CC      block/qcow2-bitmap.o
  CC      block/qed.o
  CC      block/qed-l2-cache.o
  CC      block/qed-table.o
  CC      block/qed-cluster.o
  CC      block/qed-check.o
  CC      block/vhdx.o
  CC      block/vhdx-endian.o
  CC      block/vhdx-log.o
  CC      block/quorum.o
  CC      block/blkdebug.o
  CC      block/blkverify.o
  CC      block/blkreplay.o
  CC      block/parallels.o
  CC      block/blklogwrites.o
  CC      block/block-backend.o
  CC      block/snapshot.o
  CC      block/qapi.o
  CC      block/file-win32.o
  CC      block/win32-aio.o
  CC      block/null.o
  CC      block/mirror.o
  CC      block/commit.o
  CC      block/io.o
  CC      block/create.o
  CC      block/throttle-groups.o
  CC      block/nbd.o
  CC      block/sheepdog.o
  CC      block/nbd-client.o
  CC      block/accounting.o
  CC      block/dirty-bitmap.o
  CC      block/write-threshold.o
  CC      block/backup.o
  CC      block/replication.o
  CC      block/throttle.o
  CC      block/copy-on-read.o
  CC      block/crypto.o
  CC      nbd/server.o
  CC      nbd/client.o
  CC      nbd/common.o
  CC      scsi/utils.o
  CC      scsi/pr-manager-stub.o
  CC      block/curl.o
  CC      block/ssh.o
  CC      block/dmg-bz2.o
  CC      crypto/init.o
  CC      crypto/hash.o
  CC      crypto/hash-nettle.o
  CC      crypto/hmac.o
  CC      crypto/hmac-nettle.o
  CC      crypto/aes.o
  CC      crypto/desrfb.o
  CC      crypto/cipher.o
  CC      crypto/tlscreds.o
  CC      crypto/tlscredsanon.o
  CC      crypto/tlscredspsk.o
  CC      crypto/tlscredsx509.o
  CC      crypto/tlssession.o
  CC      crypto/secret.o
  CC      crypto/random-gnutls.o
  CC      crypto/pbkdf.o
  CC      crypto/pbkdf-nettle.o
  CC      crypto/ivgen.o
  CC      crypto/ivgen-essiv.o
  CC      crypto/ivgen-plain.o
  CC      crypto/ivgen-plain64.o
  CC      crypto/afsplit.o
  CC      crypto/xts.o
  CC      crypto/block.o
  CC      crypto/block-qcow.o
  CC      crypto/block-luks.o
  CC      io/channel.o
  CC      io/channel-buffer.o
  CC      io/channel-command.o
  CC      io/channel-file.o
  CC      io/channel-socket.o
  CC      io/channel-tls.o
  CC      io/channel-watch.o
  CC      io/channel-websock.o
  CC      io/channel-util.o
  CC      io/dns-resolver.o
  CC      io/net-listener.o
  CC      io/task.o
  CC      qom/object.o
  CC      qom/container.o
  CC      qom/qom-qobject.o
  CC      qom/object_interfaces.o
  CC      qemu-io.o
  CC      qemu-edid.o
  CC      hw/display/edid-generate.o
  CC      blockdev.o
  CC      blockdev-nbd.o
  CC      bootdevice.o
  CC      iothread.o
  CC      job-qmp.o
  CC      qdev-monitor.o
  CC      device-hotplug.o
  CC      os-win32.o
  CC      bt-host.o
  CC      bt-vhci.o
  CC      dma-helpers.o
  CC      vl.o
  CC      device_tree.o
  CC      tpm.o
  CC      qapi/qapi-commands.o
  CC      qapi/qapi-commands-block.o
  CC      qapi/qapi-commands-block-core.o
  CC      qapi/qapi-commands-char.o
  CC      qapi/qapi-commands-common.o
  CC      qapi/qapi-commands-crypto.o
  CC      qapi/qapi-commands-introspect.o
  CC      qapi/qapi-commands-migration.o
  CC      qapi/qapi-commands-job.o
  CC      qapi/qapi-commands-net.o
  CC      qapi/qapi-commands-misc.o
  CC      qapi/qapi-commands-rocker.o
  CC      qapi/qapi-commands-run-state.o
  CC      qapi/qapi-commands-sockets.o
  CC      qapi/qapi-commands-tpm.o
  CC      qapi/qapi-commands-trace.o
  CC      qapi/qapi-commands-transaction.o
  CC      qapi/qapi-commands-ui.o
  CC      qmp.o
  CC      hmp.o
  CC      cpus-common.o
  CC      audio/audio.o
  CC      audio/noaudio.o
  CC      audio/wavaudio.o
  CC      audio/mixeng.o
  CC      audio/dsoundaudio.o
  CC      audio/audio_win_int.o
  CC      audio/wavcapture.o
  CC      backends/rng.o
  CC      backends/rng-egd.o
  CC      backends/hostmem.o
  CC      backends/tpm.o
  CC      backends/hostmem-ram.o
  CC      backends/cryptodev.o
  CC      backends/cryptodev-builtin.o
  CC      backends/cryptodev-vhost.o
  CC      block/stream.o
  CC      chardev/msmouse.o
  CC      chardev/wctablet.o
  CC      chardev/testdev.o
  CC      disas/arm.o
  CC      disas/i386.o
  CXX     disas/arm-a64.o
  CXX     disas/libvixl/vixl/utils.o
  CXX     disas/libvixl/vixl/a64/instructions-a64.o
  CXX     disas/libvixl/vixl/compiler-intrinsics.o
  CXX     disas/libvixl/vixl/a64/decoder-a64.o
  CXX     disas/libvixl/vixl/a64/disasm-a64.o
  CC      hw/acpi/core.o
  CC      hw/acpi/piix4.o
  CC      hw/acpi/pcihp.o
  CC      hw/acpi/ich9.o
  CC      hw/acpi/tco.o
  CC      hw/acpi/cpu_hotplug.o
  CC      hw/acpi/memory_hotplug.o
  CC      hw/acpi/cpu.o
  CC      hw/acpi/nvdimm.o
  CC      hw/acpi/vmgenid.o
  CC      hw/acpi/acpi_interface.o
  CC      hw/acpi/bios-linker-loader.o
  CC      hw/acpi/aml-build.o
  CC      hw/acpi/ipmi.o
  CC      hw/acpi/acpi-stub.o
  CC      hw/acpi/ipmi-stub.o
  CC      hw/audio/sb16.o
  CC      hw/audio/es1370.o
  CC      hw/audio/ac97.o
  CC      hw/audio/fmopl.o
  CC      hw/audio/adlib.o
  CC      hw/audio/gus.o
  CC      hw/audio/gusemu_hal.o
  CC      hw/audio/gusemu_mixer.o
  CC      hw/audio/cs4231a.o
  CC      hw/audio/intel-hda.o
  CC      hw/audio/hda-codec.o
  CC      hw/audio/pcspk.o
  CC      hw/audio/wm8750.o
  CC      hw/audio/pl041.o
  CC      hw/audio/lm4549.o
  CC      hw/audio/marvell_88w8618.o
  CC      hw/audio/soundhw.o
  CC      hw/block/block.o
  CC      hw/block/cdrom.o
  CC      hw/block/hd-geometry.o
  CC      hw/block/fdc.o
  CC      hw/block/m25p80.o
  CC      hw/block/nand.o
  CC      hw/block/pflash_cfi01.o
  CC      hw/block/pflash_cfi02.o
  CC      hw/block/onenand.o
  CC      hw/block/ecc.o
  CC      hw/block/nvme.o
  CC      hw/bt/core.o
  CC      hw/bt/l2cap.o
  CC      hw/bt/sdp.o
  CC      hw/bt/hci.o
  CC      hw/bt/hid.o
  CC      hw/bt/hci-csr.o
  CC      hw/char/ipoctal232.o
  CC      hw/char/nrf51_uart.o
  CC      hw/char/parallel.o
  CC      hw/char/parallel-isa.o
  CC      hw/char/pl011.o
  CC      hw/char/serial.o
  CC      hw/char/serial-isa.o
  CC      hw/char/serial-pci.o
  CC      hw/char/virtio-console.o
  CC      hw/char/cadence_uart.o
  CC      hw/char/cmsdk-apb-uart.o
  CC      hw/char/debugcon.o
  CC      hw/char/imx_serial.o
  CC      hw/core/qdev.o
  CC      hw/core/qdev-properties.o
  CC      hw/core/bus.o
  CC      hw/core/reset.o
  CC      hw/core/qdev-fw.o
  CC      hw/core/fw-path-provider.o
  CC      hw/core/irq.o
  CC      hw/core/hotplug.o
  CC      hw/core/nmi.o
  CC      hw/core/stream.o
  CC      hw/core/ptimer.o
  CC      hw/core/sysbus.o
  CC      hw/core/machine.o
  CC      hw/core/loader.o
  CC      hw/core/qdev-properties-system.o
  CC      hw/core/register.o
  CC      hw/core/or-irq.o
  CC      hw/core/split-irq.o
  CC      hw/core/platform-bus.o
  CC      hw/core/generic-loader.o
  CC      hw/core/null-machine.o
  CC      hw/cpu/core.o
  CC      hw/display/ramfb.o
  CC      hw/display/ramfb-standalone.o
  CC      hw/display/ads7846.o
  CC      hw/display/cirrus_vga.o
  CC      hw/display/cirrus_vga_isa.o
  CC      hw/display/pl110.o
  CC      hw/display/sii9022.o
  CC      hw/display/ssd0303.o
  CC      hw/display/ssd0323.o
  CC      hw/display/vga-pci.o
  CC      hw/display/edid-region.o
  CC      hw/display/vga-isa.o
  CC      hw/display/vmware_vga.o
  CC      hw/display/bochs-display.o
  CC      hw/display/blizzard.o
  CC      hw/display/exynos4210_fimd.o
  CC      hw/display/framebuffer.o
  CC      hw/display/tc6393xb.o
  CC      hw/dma/pl080.o
  CC      hw/dma/pl330.o
  CC      hw/dma/i8257.o
  CC      hw/dma/xilinx_axidma.o
  CC      hw/dma/xlnx-zynq-devcfg.o
  CC      hw/dma/xlnx-zdma.o
  CC      hw/gpio/max7310.o
  CC      hw/gpio/pl061.o
  CC      hw/gpio/zaurus.o
  CC      hw/gpio/gpio_key.o
  CC      hw/i2c/core.o
  CC      hw/i2c/smbus.o
  CC      hw/i2c/smbus_eeprom.o
  CC      hw/i2c/i2c-ddc.o
  CC      hw/i2c/smbus_ich9.o
  CC      hw/i2c/versatile_i2c.o
  CC      hw/i2c/pm_smbus.o
  CC      hw/i2c/bitbang_i2c.o
  CC      hw/i2c/exynos4210_i2c.o
  CC      hw/i2c/imx_i2c.o
  CC      hw/i2c/aspeed_i2c.o
  CC      hw/ide/core.o
  CC      hw/ide/atapi.o
  CC      hw/ide/qdev.o
  CC      hw/ide/pci.o
  CC      hw/ide/isa.o
  CC      hw/ide/piix.o
  CC      hw/ide/microdrive.o
  CC      hw/ide/ahci.o
  CC      hw/ide/ich.o
  CC      hw/ide/ahci-allwinner.o
  CC      hw/input/hid.o
  CC      hw/input/lm832x.o
  CC      hw/input/pckbd.o
  CC      hw/input/pl050.o
  CC      hw/input/ps2.o
  CC      hw/input/stellaris_input.o
  CC      hw/input/tsc2005.o
  CC      hw/input/virtio-input.o
  CC      hw/input/virtio-input-hid.o
  CC      hw/intc/i8259_common.o
  CC      hw/intc/i8259.o
  CC      hw/intc/pl190.o
  CC      hw/intc/xlnx-pmu-iomod-intc.o
  CC      hw/intc/xlnx-zynqmp-ipi.o
  CC      hw/intc/imx_avic.o
  CC      hw/intc/imx_gpcv2.o
  CC      hw/intc/realview_gic.o
  CC      hw/intc/ioapic_common.o
  CC      hw/intc/arm_gic_common.o
  CC      hw/intc/arm_gic.o
  CC      hw/intc/arm_gicv2m.o
  CC      hw/intc/arm_gicv3_common.o
  CC      hw/intc/arm_gicv3.o
  CC      hw/intc/arm_gicv3_dist.o
  CC      hw/intc/arm_gicv3_redist.o
  CC      hw/intc/arm_gicv3_its_common.o
  CC      hw/intc/intc.o
  CC      hw/ipack/ipack.o
  CC      hw/ipack/tpci200.o
  CC      hw/ipmi/ipmi.o
  CC      hw/ipmi/ipmi_bmc_sim.o
  CC      hw/ipmi/ipmi_bmc_extern.o
  CC      hw/ipmi/isa_ipmi_kcs.o
  CC      hw/ipmi/isa_ipmi_bt.o
  CC      hw/isa/isa-bus.o
  CC      hw/isa/isa-superio.o
  CC      hw/isa/apm.o
  CC      hw/mem/pc-dimm.o
  CC      hw/mem/memory-device.o
  CC      hw/mem/nvdimm.o
  CC      hw/misc/applesmc.o
  CC      hw/misc/max111x.o
  CC      hw/misc/tmp105.o
  CC      hw/misc/tmp421.o
  CC      hw/misc/debugexit.o
  CC      hw/misc/pc-testdev.o
  CC      hw/misc/sga.o
  CC      hw/misc/pci-testdev.o
  CC      hw/misc/edu.o
  CC      hw/misc/pca9552.o
  CC      hw/misc/unimp.o
  CC      hw/misc/vmcoreinfo.o
  CC      hw/misc/arm_l2x0.o
  CC      hw/misc/arm_integrator_debug.o
  CC      hw/misc/a9scu.o
  CC      hw/misc/arm11scu.o
  CC      hw/net/ne2000.o
  CC      hw/net/eepro100.o
  CC      hw/net/pcnet-pci.o
  CC      hw/net/pcnet.o
  CC      hw/net/e1000.o
  CC      hw/net/e1000x_common.o
  CC      hw/net/net_tx_pkt.o
  CC      hw/net/net_rx_pkt.o
  CC      hw/net/e1000e.o
  CC      hw/net/e1000e_core.o
  CC      hw/net/rtl8139.o
  CC      hw/net/vmxnet3.o
  CC      hw/net/smc91c111.o
  CC      hw/net/lan9118.o
  CC      hw/net/ne2000-isa.o
  CC      hw/net/xgmac.o
  CC      hw/net/xilinx_axienet.o
  CC      hw/net/allwinner_emac.o
  CC      hw/net/imx_fec.o
  CC      hw/net/cadence_gem.o
  CC      hw/net/stellaris_enet.o
  CC      hw/net/ftgmac100.o
  CC      hw/net/rocker/rocker.o
  CC      hw/net/rocker/rocker_fp.o
  CC      hw/net/rocker/rocker_desc.o
  CC      hw/net/rocker/rocker_world.o
  CC      hw/net/rocker/rocker_of_dpa.o
  CC      hw/net/can/can_sja1000.o
  CC      hw/net/can/can_kvaser_pci.o
  CC      hw/net/can/can_pcm3680_pci.o
  CC      hw/net/can/can_mioe3680_pci.o
  CC      hw/nvram/eeprom93xx.o
  CC      hw/nvram/fw_cfg.o
  CC      hw/nvram/chrp_nvram.o
  CC      hw/pci-bridge/pci_bridge_dev.o
  CC      hw/pci-bridge/pcie_root_port.o
  CC      hw/pci-bridge/gen_pcie_root_port.o
  CC      hw/pci-bridge/pcie_pci_bridge.o
  CC      hw/pci-bridge/pci_expander_bridge.o
  CC      hw/pci-bridge/xio3130_upstream.o
  CC      hw/pci-bridge/xio3130_downstream.o
  CC      hw/pci-bridge/ioh3420.o
  CC      hw/pci-bridge/i82801b11.o
  CC      hw/pci-host/pam.o
  CC      hw/pci-host/versatile.o
  CC      hw/pci-host/piix.o
  CC      hw/pci-host/q35.o
  CC      hw/pci-host/gpex.o
  CC      hw/pci-host/designware.o
  CC      hw/pci/pci.o
  CC      hw/pci/pci_bridge.o
  CC      hw/pci/msix.o
  CC      hw/pci/msi.o
  CC      hw/pci/shpc.o
  CC      hw/pci/slotid_cap.o
  CC      hw/pci/pci_host.o
  CC      hw/pci/pcie_host.o
  CC      hw/pci/pcie.o
  CC      hw/pci/pcie_aer.o
  CC      hw/pci/pcie_port.o
  CC      hw/pci/pci-stub.o
  CC      hw/pcmcia/pcmcia.o
  CC      hw/scsi/scsi-disk.o
  CC      hw/scsi/emulation.o
  CC      hw/scsi/scsi-generic.o
  CC      hw/scsi/scsi-bus.o
  CC      hw/scsi/lsi53c895a.o
  CC      hw/scsi/mptsas.o
  CC      hw/scsi/mptconfig.o
  CC      hw/scsi/mptendian.o
  CC      hw/scsi/megasas.o
  CC      hw/scsi/vmw_pvscsi.o
  CC      hw/scsi/esp.o
  CC      hw/scsi/esp-pci.o
  CC      hw/sd/pl181.o
  CC      hw/sd/ssi-sd.o
  CC      hw/sd/sd.o
  CC      hw/sd/core.o
  CC      hw/sd/sdmmc-internal.o
  CC      hw/smbios/smbios.o
  CC      hw/sd/sdhci.o
  CC      hw/smbios/smbios_type_38.o
  CC      hw/smbios/smbios-stub.o
  CC      hw/smbios/smbios_type_38-stub.o
  CC      hw/ssi/pl022.o
  CC      hw/ssi/ssi.o
  CC      hw/ssi/xilinx_spips.o
  CC      hw/ssi/aspeed_smc.o
  CC      hw/ssi/stm32f2xx_spi.o
  CC      hw/ssi/mss-spi.o
  CC      hw/timer/arm_timer.o
  CC      hw/timer/arm_mptimer.o
  CC      hw/timer/armv7m_systick.o
  CC      hw/timer/a9gtimer.o
  CC      hw/timer/cadence_ttc.o
  CC      hw/timer/ds1338.o
  CC      hw/timer/i8254_common.o
  CC      hw/timer/hpet.o
  CC      hw/timer/i8254.o
  CC      hw/timer/pl031.o
  CC      hw/timer/twl92230.o
  CC      hw/timer/imx_epit.o
  CC      hw/timer/imx_gpt.o
  CC      hw/timer/stm32f2xx_timer.o
  CC      hw/timer/xlnx-zynqmp-rtc.o
  CC      hw/timer/aspeed_timer.o
  CC      hw/timer/cmsdk-apb-timer.o
  CC      hw/timer/cmsdk-apb-dualtimer.o
  CC      hw/timer/mss-timer.o
  CC      hw/tpm/tpm_util.o
  CC      hw/tpm/tpm_tis.o
  CC      hw/tpm/tpm_crb.o
  CC      hw/usb/core.o
  CC      hw/usb/combined-packet.o
  CC      hw/usb/bus.o
  CC      hw/usb/libhw.o
  CC      hw/usb/desc.o
  CC      hw/usb/desc-msos.o
  CC      hw/usb/hcd-uhci.o
  CC      hw/usb/hcd-ohci.o
  CC      hw/usb/hcd-ehci.o
  CC      hw/usb/hcd-ehci-sysbus.o
  CC      hw/usb/hcd-ehci-pci.o
  CC      hw/usb/hcd-xhci.o
  CC      hw/usb/hcd-xhci-nec.o
  CC      hw/usb/hcd-musb.o
  CC      hw/usb/dev-hub.o
  CC      hw/usb/dev-hid.o
  CC      hw/usb/dev-wacom.o
  CC      hw/usb/dev-storage.o
  CC      hw/usb/dev-uas.o
  CC      hw/usb/dev-audio.o
  CC      hw/usb/dev-serial.o
  CC      hw/usb/dev-network.o
  CC      hw/usb/dev-bluetooth.o
  CC      hw/usb/dev-smartcard-reader.o
  CC      hw/usb/host-stub.o
  CC      hw/virtio/virtio-bus.o
  CC      hw/virtio/virtio-rng.o
  CC      hw/virtio/virtio-pci.o
  CC      hw/virtio/virtio-mmio.o
  CC      hw/virtio/vhost-stub.o
  CC      hw/watchdog/watchdog.o
  CC      hw/watchdog/cmsdk-apb-watchdog.o
  CC      hw/watchdog/wdt_i6300esb.o
  CC      hw/watchdog/wdt_ib700.o
  CC      hw/watchdog/wdt_aspeed.o
  CC      migration/migration.o
  CC      migration/socket.o
  CC      migration/fd.o
  CC      migration/exec.o
  CC      migration/tls.o
  CC      migration/channel.o
  CC      migration/savevm.o
  CC      migration/colo.o
  CC      migration/colo-failover.o
  CC      migration/vmstate.o
  CC      migration/vmstate-types.o
  CC      migration/page_cache.o
  CC      migration/qemu-file.o
  CC      migration/global_state.o
  CC      migration/qemu-file-channel.o
  CC      migration/xbzrle.o
  CC      migration/postcopy-ram.o
  CC      migration/qjson.o
  CC      migration/block-dirty-bitmap.o
  CC      migration/block.o
  CC      net/net.o
  CC      net/queue.o
  CC      net/checksum.o
  CC      net/util.o
  CC      net/hub.o
  CC      net/socket.o
  CC      net/dump.o
  CC      net/eth.o
  CC      net/slirp.o
  CC      net/filter.o
  CC      net/filter-buffer.o
  CC      net/filter-mirror.o
  CC      net/colo-compare.o
  CC      net/colo.o
  CC      net/filter-rewriter.o
  CC      net/filter-replay.o
  CC      net/tap-win32.o
  CC      net/can/can_core.o
  CC      net/can/can_host.o
  CC      qom/cpu.o
  CC      replay/replay.o
  CC      replay/replay-internal.o
  CC      replay/replay-events.o
  CC      replay/replay-time.o
  CC      replay/replay-input.o
  CC      replay/replay-char.o
  CC      replay/replay-snapshot.o
  CC      replay/replay-net.o
  CC      replay/replay-audio.o
  CC      slirp/cksum.o
  CC      slirp/if.o
  CC      slirp/ip_icmp.o
  CC      slirp/ip6_icmp.o
  CC      slirp/ip6_input.o
  CC      slirp/ip6_output.o
  CC      slirp/ip_input.o
  CC      slirp/ip_output.o
  CC      slirp/dnssearch.o
  CC      slirp/dhcpv6.o
  CC      slirp/slirp.o
  CC      slirp/mbuf.o
  CC      slirp/misc.o
  CC      slirp/sbuf.o
  CC      slirp/socket.o
  CC      slirp/tcp_input.o
  CC      slirp/tcp_output.o
  CC      slirp/tcp_subr.o
  CC      slirp/tcp_timer.o
  CC      slirp/udp.o
  CC      slirp/udp6.o
  CC      slirp/bootp.o
  CC      slirp/tftp.o
  CC      slirp/arp_table.o
  CC      slirp/ndp_table.o
  CC      slirp/ncsi.o
  CC      ui/keymaps.o
  CC      ui/console.o
  CC      ui/cursor.o
  CC      ui/qemu-pixman.o
  CC      ui/input.o
  CC      ui/input-keymap.o
  CC      ui/input-legacy.o
  CC      ui/vnc.o
  CC      ui/vnc-enc-zlib.o
  CC      ui/vnc-enc-hextile.o
  CC      ui/vnc-enc-tight.o
  CC      ui/vnc-palette.o
  CC      ui/vnc-enc-zrle.o
  CC      ui/vnc-auth-vencrypt.o
  CC      ui/vnc-ws.o
  CC      ui/vnc-jobs.o
  CC      ui/sdl2.o
  CC      ui/sdl2-input.o
  CC      ui/sdl2-2d.o
  CC      ui/gtk.o
  CC      chardev/char.o
  CC      chardev/char-console.o
  CC      chardev/char-fe.o
  CC      chardev/char-file.o
  CC      chardev/char-io.o
  CC      chardev/char-mux.o
  CC      chardev/char-null.o
  CC      chardev/char-pipe.o
  CC      chardev/char-ringbuf.o
  CC      chardev/char-serial.o
  CC      chardev/char-socket.o
  CC      chardev/char-stdio.o
  CC      chardev/char-udp.o
  CC      chardev/char-win.o
  CC      chardev/char-win-stdio.o
  CC      qga/commands.o
  CC      qga/guest-agent-command-state.o
  CC      qga/main.o
  CC      qga/commands-win32.o
  CC      qga/channel-win32.o
  AS      optionrom/multiboot.o
  AS      optionrom/linuxboot.o
  CC      qga/service-win32.o
  CC      optionrom/linuxboot_dma.o
  AS      optionrom/kvmvapic.o
  BUILD   optionrom/multiboot.img
  BUILD   optionrom/linuxboot.img
  CC      qga/vss-win32.o
  BUILD   optionrom/multiboot.raw
  CC      qga/qapi-generated/qga-qapi-types.o
  CC      qga/qapi-generated/qga-qapi-visit.o
  CC      qga/qapi-generated/qga-qapi-commands.o
  AR      libqemuutil.a
  CC      qemu-img.o
  BUILD   optionrom/linuxboot.raw
  BUILD   optionrom/linuxboot_dma.img
  BUILD   optionrom/kvmvapic.img
  SIGN    optionrom/multiboot.bin
  SIGN    optionrom/linuxboot.bin
  BUILD   optionrom/linuxboot_dma.raw
  BUILD   optionrom/kvmvapic.raw
  SIGN    optionrom/linuxboot_dma.bin
  SIGN    optionrom/kvmvapic.bin
  LINK    qemu-io.exe
  LINK    qemu-edid.exe
  LINK    qemu-img.exe
  LINK    qemu-ga.exe
  GEN     x86_64-softmmu/config-target.h
  GEN     x86_64-softmmu/hmp-commands.h
  GEN     x86_64-softmmu/hmp-commands-info.h
  CC      x86_64-softmmu/exec.o
  CC      x86_64-softmmu/tcg/tcg-op-gvec.o
  CC      x86_64-softmmu/tcg/tcg.o
  CC      x86_64-softmmu/tcg/tcg-op.o
  CC      x86_64-softmmu/tcg/tcg-op-vec.o
  CC      x86_64-softmmu/tcg/tcg-common.o
  CC      x86_64-softmmu/tcg/optimize.o
  CC      x86_64-softmmu/fpu/softfloat.o
  CC      x86_64-softmmu/disas.o
  GEN     x86_64-softmmu/gdbstub-xml.c
  GEN     aarch64-softmmu/hmp-commands.h
  GEN     aarch64-softmmu/hmp-commands-info.h
  GEN     aarch64-softmmu/config-target.h
  CC      x86_64-softmmu/arch_init.o
  CC      x86_64-softmmu/cpus.o
  CC      aarch64-softmmu/exec.o
  CC      x86_64-softmmu/monitor.o
  CC      x86_64-softmmu/gdbstub.o
  CC      x86_64-softmmu/balloon.o
  CC      x86_64-softmmu/ioport.o
  CC      aarch64-softmmu/tcg/tcg.o
  CC      aarch64-softmmu/tcg/tcg-op.o
  CC      aarch64-softmmu/tcg/tcg-op-vec.o
  CC      x86_64-softmmu/numa.o
  CC      aarch64-softmmu/tcg/tcg-op-gvec.o
  CC      x86_64-softmmu/qtest.o
  CC      x86_64-softmmu/memory.o
  CC      x86_64-softmmu/memory_mapping.o
  CC      x86_64-softmmu/dump.o
  CC      x86_64-softmmu/win_dump.o
  CC      x86_64-softmmu/migration/ram.o
  CC      x86_64-softmmu/accel/accel.o
  CC      aarch64-softmmu/tcg/tcg-common.o
  CC      x86_64-softmmu/accel/stubs/hvf-stub.o
  CC      aarch64-softmmu/tcg/optimize.o
  CC      x86_64-softmmu/accel/stubs/whpx-stub.o
  CC      aarch64-softmmu/fpu/softfloat.o
  CC      x86_64-softmmu/accel/stubs/kvm-stub.o
  CC      aarch64-softmmu/disas.o
  CC      x86_64-softmmu/accel/tcg/tcg-all.o
  GEN     aarch64-softmmu/gdbstub-xml.c
  CC      x86_64-softmmu/accel/tcg/cputlb.o
  CC      x86_64-softmmu/accel/tcg/tcg-runtime.o
  CC      aarch64-softmmu/arch_init.o
  CC      x86_64-softmmu/accel/tcg/tcg-runtime-gvec.o
  CC      aarch64-softmmu/cpus.o
  CC      x86_64-softmmu/accel/tcg/cpu-exec.o
  CC      aarch64-softmmu/monitor.o
  CC      x86_64-softmmu/accel/tcg/cpu-exec-common.o
  CC      aarch64-softmmu/gdbstub.o
  CC      x86_64-softmmu/accel/tcg/translate-all.o
  CC      aarch64-softmmu/balloon.o
  CC      x86_64-softmmu/accel/tcg/translator.o
  CC      aarch64-softmmu/ioport.o
  CC      x86_64-softmmu/hw/block/virtio-blk.o
  CC      aarch64-softmmu/numa.o
  CC      x86_64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      aarch64-softmmu/qtest.o
  CC      x86_64-softmmu/hw/char/virtio-serial-bus.o
  CC      x86_64-softmmu/hw/display/vga.o
  CC      aarch64-softmmu/memory.o
  CC      x86_64-softmmu/hw/display/virtio-gpu.o
  CC      aarch64-softmmu/memory_mapping.o
  CC      x86_64-softmmu/hw/display/virtio-gpu-3d.o
  CC      aarch64-softmmu/dump.o
  CC      aarch64-softmmu/migration/ram.o
  CC      aarch64-softmmu/accel/accel.o
  CC      aarch64-softmmu/accel/stubs/hax-stub.o
  CC      aarch64-softmmu/accel/stubs/hvf-stub.o
  CC      aarch64-softmmu/accel/stubs/whpx-stub.o
  CC      aarch64-softmmu/accel/stubs/kvm-stub.o
  CC      aarch64-softmmu/accel/tcg/tcg-all.o
  CC      aarch64-softmmu/accel/tcg/cputlb.o
  CC      x86_64-softmmu/hw/display/virtio-gpu-pci.o
  CC      aarch64-softmmu/accel/tcg/tcg-runtime.o
  CC      x86_64-softmmu/hw/display/virtio-vga.o
  CC      aarch64-softmmu/accel/tcg/tcg-runtime-gvec.o
  CC      aarch64-softmmu/accel/tcg/cpu-exec.o
  CC      x86_64-softmmu/hw/intc/apic.o
  CC      aarch64-softmmu/accel/tcg/cpu-exec-common.o
  CC      aarch64-softmmu/accel/tcg/translate-all.o
  CC      x86_64-softmmu/hw/intc/apic_common.o
  CC      aarch64-softmmu/accel/tcg/translator.o
  CC      aarch64-softmmu/hw/adc/stm32f2xx_adc.o
  CC      x86_64-softmmu/hw/intc/ioapic.o
  CC      aarch64-softmmu/hw/block/virtio-blk.o
  CC      x86_64-softmmu/hw/isa/lpc_ich9.o
  CC      x86_64-softmmu/hw/misc/pvpanic.o
  CC      aarch64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      aarch64-softmmu/hw/char/exynos4210_uart.o
  CC      x86_64-softmmu/hw/net/virtio-net.o
  CC      aarch64-softmmu/hw/char/omap_uart.o
  CC      x86_64-softmmu/hw/net/vhost_net.o
  CC      aarch64-softmmu/hw/char/digic-uart.o
  CC      x86_64-softmmu/hw/scsi/virtio-scsi.o
  CC      aarch64-softmmu/hw/char/stm32f2xx_usart.o
  CC      x86_64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC      x86_64-softmmu/hw/timer/mc146818rtc.o
  CC      aarch64-softmmu/hw/char/bcm2835_aux.o
  CC      x86_64-softmmu/hw/virtio/virtio.o
  CC      aarch64-softmmu/hw/char/virtio-serial-bus.o
  CC      aarch64-softmmu/hw/cpu/arm11mpcore.o
  CC      x86_64-softmmu/hw/virtio/virtio-balloon.o
  CC      x86_64-softmmu/hw/virtio/virtio-crypto.o
  CC      x86_64-softmmu/hw/virtio/virtio-crypto-pci.o
  CC      aarch64-softmmu/hw/cpu/realview_mpcore.o
  CC      aarch64-softmmu/hw/cpu/a9mpcore.o
  CC      x86_64-softmmu/hw/i386/multiboot.o
  CC      aarch64-softmmu/hw/cpu/a15mpcore.o
  CC      x86_64-softmmu/hw/i386/pc.o
  CC      aarch64-softmmu/hw/display/omap_dss.o
  CC      x86_64-softmmu/hw/i386/pc_piix.o
  CC      x86_64-softmmu/hw/i386/pc_q35.o
  CC      x86_64-softmmu/hw/i386/pc_sysfw.o
  CC      x86_64-softmmu/hw/i386/x86-iommu.o
  CC      aarch64-softmmu/hw/display/pxa2xx_lcd.o
  CC      aarch64-softmmu/hw/display/omap_lcdc.o
  CC      x86_64-softmmu/hw/i386/intel_iommu.o
  CC      x86_64-softmmu/hw/i386/amd_iommu.o
  CC      x86_64-softmmu/hw/i386/vmport.o
  CC      aarch64-softmmu/hw/display/bcm2835_fb.o
  CC      x86_64-softmmu/hw/i386/vmmouse.o
  CC      aarch64-softmmu/hw/display/vga.o
  CC      x86_64-softmmu/hw/i386/kvmvapic.o
  CC      aarch64-softmmu/hw/display/virtio-gpu.o
  CC      x86_64-softmmu/hw/i386/acpi-build.o
  CC      x86_64-softmmu/target/i386/helper.o
  CC      x86_64-softmmu/target/i386/cpu.o
  CC      aarch64-softmmu/hw/display/virtio-gpu-3d.o
  CC      aarch64-softmmu/hw/display/virtio-gpu-pci.o
  CC      x86_64-softmmu/target/i386/gdbstub.o
  CC      aarch64-softmmu/hw/display/dpcd.o
  CC      x86_64-softmmu/target/i386/xsave_helper.o
  CC      aarch64-softmmu/hw/display/xlnx_dp.o
  CC      x86_64-softmmu/target/i386/translate.o
  CC      x86_64-softmmu/target/i386/bpt_helper.o
  CC      aarch64-softmmu/hw/dma/xlnx_dpdma.o
  CC      aarch64-softmmu/hw/dma/omap_dma.o
  CC      x86_64-softmmu/target/i386/cc_helper.o
  CC      x86_64-softmmu/target/i386/excp_helper.o
  CC      aarch64-softmmu/hw/dma/soc_dma.o
  CC      x86_64-softmmu/target/i386/fpu_helper.o
  CC      x86_64-softmmu/target/i386/int_helper.o
  CC      aarch64-softmmu/hw/dma/pxa2xx_dma.o
  CC      x86_64-softmmu/target/i386/mem_helper.o
  CC      aarch64-softmmu/hw/dma/bcm2835_dma.o
  CC      aarch64-softmmu/hw/gpio/omap_gpio.o
  CC      x86_64-softmmu/target/i386/misc_helper.o
  CC      aarch64-softmmu/hw/gpio/imx_gpio.o
  CC      aarch64-softmmu/hw/gpio/bcm2835_gpio.o
  CC      x86_64-softmmu/target/i386/mpx_helper.o
  CC      aarch64-softmmu/hw/i2c/omap_i2c.o
  CC      x86_64-softmmu/target/i386/seg_helper.o
  CC      aarch64-softmmu/hw/input/pxa2xx_keypad.o
  CC      x86_64-softmmu/target/i386/smm_helper.o
  CC      aarch64-softmmu/hw/input/tsc210x.o
  CC      x86_64-softmmu/target/i386/svm_helper.o
  CC      x86_64-softmmu/target/i386/machine.o
  CC      x86_64-softmmu/target/i386/arch_memory_mapping.o
  CC      aarch64-softmmu/hw/intc/armv7m_nvic.o
  CC      aarch64-softmmu/hw/intc/exynos4210_gic.o
  CC      aarch64-softmmu/hw/intc/exynos4210_combiner.o
  CC      x86_64-softmmu/target/i386/arch_dump.o
  CC      x86_64-softmmu/target/i386/monitor.o
  CC      aarch64-softmmu/hw/intc/omap_intc.o
  CC      aarch64-softmmu/hw/intc/bcm2835_ic.o
  CC      x86_64-softmmu/target/i386/kvm-stub.o
  CC      aarch64-softmmu/hw/intc/bcm2836_control.o
  CC      x86_64-softmmu/target/i386/hyperv-stub.o
  CC      aarch64-softmmu/hw/intc/allwinner-a10-pic.o
  CC      x86_64-softmmu/target/i386/hax-all.o
  CC      aarch64-softmmu/hw/intc/aspeed_vic.o
  CC      aarch64-softmmu/hw/intc/arm_gicv3_cpuif.o
  CC      aarch64-softmmu/hw/misc/arm_sysctl.o
  CC      aarch64-softmmu/hw/misc/cbus.o
  CC      aarch64-softmmu/hw/misc/exynos4210_pmu.o
  CC      x86_64-softmmu/target/i386/hax-mem.o
  CC      aarch64-softmmu/hw/misc/exynos4210_clk.o
  CC      aarch64-softmmu/hw/misc/exynos4210_rng.o
  CC      x86_64-softmmu/target/i386/hax-windows.o
  CC      aarch64-softmmu/hw/misc/imx_ccm.o
  CC      x86_64-softmmu/target/i386/sev-stub.o
  CC      aarch64-softmmu/hw/misc/imx31_ccm.o
  CC      aarch64-softmmu/hw/misc/imx25_ccm.o
  CC      aarch64-softmmu/hw/misc/imx6_ccm.o
  CC      aarch64-softmmu/hw/misc/imx6ul_ccm.o
  CC      aarch64-softmmu/hw/misc/imx6_src.o
  GEN     trace/generated-helpers.c
  CC      aarch64-softmmu/hw/misc/imx7_ccm.o
  CC      aarch64-softmmu/hw/misc/imx2_wdt.o
  CC      x86_64-softmmu/trace/control-target.o
  CC      aarch64-softmmu/hw/misc/imx7_snvs.o
  CC      aarch64-softmmu/hw/misc/imx7_gpr.o
  CC      aarch64-softmmu/hw/misc/mst_fpga.o
  CC      aarch64-softmmu/hw/misc/omap_clk.o
  CC      aarch64-softmmu/hw/misc/omap_gpmc.o
  CC      aarch64-softmmu/hw/misc/omap_l4.o
  CC      x86_64-softmmu/gdbstub-xml.o
  CC      aarch64-softmmu/hw/misc/omap_sdrc.o
  CC      aarch64-softmmu/hw/misc/omap_tap.o
  CC      aarch64-softmmu/hw/misc/bcm2835_mbox.o
  CC      aarch64-softmmu/hw/misc/bcm2835_property.o
  CC      aarch64-softmmu/hw/misc/bcm2835_rng.o
  CC      aarch64-softmmu/hw/misc/zynq_slcr.o
  CC      x86_64-softmmu/trace/generated-helpers.o
  CC      aarch64-softmmu/hw/misc/zynq-xadc.o
  CC      aarch64-softmmu/hw/misc/stm32f2xx_syscfg.o
  CC      aarch64-softmmu/hw/misc/mps2-fpgaio.o
  CC      aarch64-softmmu/hw/misc/mps2-scc.o
  CC      aarch64-softmmu/hw/misc/tz-mpc.o
  CC      aarch64-softmmu/hw/misc/tz-msc.o
  CC      aarch64-softmmu/hw/misc/tz-ppc.o
  CC      aarch64-softmmu/hw/misc/iotkit-secctl.o
  CC      aarch64-softmmu/hw/misc/iotkit-sysctl.o
  CC      aarch64-softmmu/hw/misc/iotkit-sysinfo.o
  CC      aarch64-softmmu/hw/misc/auxbus.o
  CC      aarch64-softmmu/hw/misc/aspeed_scu.o
  CC      aarch64-softmmu/hw/misc/aspeed_sdmc.o
  CC      aarch64-softmmu/hw/misc/msf2-sysreg.o
  CC      aarch64-softmmu/hw/net/virtio-net.o
  CC      aarch64-softmmu/hw/net/vhost_net.o
  CC      aarch64-softmmu/hw/pcmcia/pxa2xx.o
  CC      aarch64-softmmu/hw/scsi/virtio-scsi.o
  CC      aarch64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC      aarch64-softmmu/hw/sd/omap_mmc.o
  CC      aarch64-softmmu/hw/sd/pxa2xx_mmci.o
  CC      aarch64-softmmu/hw/sd/bcm2835_sdhost.o
  CC      aarch64-softmmu/hw/ssi/omap_spi.o
  CC      aarch64-softmmu/hw/ssi/imx_spi.o
  CC      aarch64-softmmu/hw/timer/exynos4210_mct.o
  CC      aarch64-softmmu/hw/timer/exynos4210_pwm.o
  CC      aarch64-softmmu/hw/timer/exynos4210_rtc.o
  CC      aarch64-softmmu/hw/timer/omap_gptimer.o
  CC      aarch64-softmmu/hw/timer/omap_synctimer.o
  CC      aarch64-softmmu/hw/timer/pxa2xx_timer.o
  CC      aarch64-softmmu/hw/timer/digic-timer.o
  CC      aarch64-softmmu/hw/timer/allwinner-a10-pit.o
  CC      aarch64-softmmu/hw/usb/tusb6010.o
  CC      aarch64-softmmu/hw/usb/chipidea.o
  CC      aarch64-softmmu/hw/virtio/virtio.o
  CC      aarch64-softmmu/hw/virtio/virtio-balloon.o
  CC      aarch64-softmmu/hw/virtio/virtio-crypto.o
  CC      aarch64-softmmu/hw/virtio/virtio-crypto-pci.o
  CC      aarch64-softmmu/hw/arm/boot.o
  CC      aarch64-softmmu/hw/arm/virt.o
  CC      aarch64-softmmu/hw/arm/sysbus-fdt.o
  CC      aarch64-softmmu/hw/arm/virt-acpi-build.o
  CC      aarch64-softmmu/hw/arm/digic_boards.o
  CC      aarch64-softmmu/hw/arm/exynos4_boards.o
  CC      aarch64-softmmu/hw/arm/highbank.o
  CC      aarch64-softmmu/hw/arm/integratorcp.o
  CC      aarch64-softmmu/hw/arm/mainstone.o
  CC      aarch64-softmmu/hw/arm/musicpal.o
  CC      aarch64-softmmu/hw/arm/netduino2.o
  CC      aarch64-softmmu/hw/arm/nseries.o
  CC      aarch64-softmmu/hw/arm/omap_sx1.o
  CC      aarch64-softmmu/hw/arm/palm.o
  CC      aarch64-softmmu/hw/arm/gumstix.o
  CC      aarch64-softmmu/hw/arm/spitz.o
  CC      aarch64-softmmu/hw/arm/tosa.o
  CC      aarch64-softmmu/hw/arm/z2.o
  CC      aarch64-softmmu/hw/arm/realview.o
  CC      aarch64-softmmu/hw/arm/stellaris.o
  CC      aarch64-softmmu/hw/arm/collie.o
  CC      aarch64-softmmu/hw/arm/vexpress.o
  CC      aarch64-softmmu/hw/arm/versatilepb.o
  CC      aarch64-softmmu/hw/arm/xilinx_zynq.o
  CC      aarch64-softmmu/hw/arm/armv7m.o
  CC      aarch64-softmmu/hw/arm/exynos4210.o
  CC      aarch64-softmmu/hw/arm/pxa2xx.o
  CC      aarch64-softmmu/hw/arm/pxa2xx_gpio.o
  CC      aarch64-softmmu/hw/arm/pxa2xx_pic.o
  CC      aarch64-softmmu/hw/arm/digic.o
  CC      aarch64-softmmu/hw/arm/omap1.o
  CC      aarch64-softmmu/hw/arm/omap2.o
  CC      aarch64-softmmu/hw/arm/strongarm.o
  CC      aarch64-softmmu/hw/arm/allwinner-a10.o
  CC      aarch64-softmmu/hw/arm/cubieboard.o
  CC      aarch64-softmmu/hw/arm/bcm2835_peripherals.o
  CC      aarch64-softmmu/hw/arm/bcm2836.o
  CC      aarch64-softmmu/hw/arm/raspi.o
  CC      aarch64-softmmu/hw/arm/stm32f205_soc.o
  CC      aarch64-softmmu/hw/arm/xlnx-zynqmp.o
  CC      aarch64-softmmu/hw/arm/xlnx-zcu102.o
  CC      aarch64-softmmu/hw/arm/xlnx-versal.o
  CC      aarch64-softmmu/hw/arm/xlnx-versal-virt.o
  CC      aarch64-softmmu/hw/arm/fsl-imx25.o
  CC      aarch64-softmmu/hw/arm/imx25_pdk.o
  CC      aarch64-softmmu/hw/arm/fsl-imx31.o
  CC      aarch64-softmmu/hw/arm/kzm.o
  CC      aarch64-softmmu/hw/arm/fsl-imx6.o
  CC      aarch64-softmmu/hw/arm/sabrelite.o
  CC      aarch64-softmmu/hw/arm/aspeed_soc.o
  CC      aarch64-softmmu/hw/arm/aspeed.o
  CC      aarch64-softmmu/hw/arm/mps2.o
  CC      aarch64-softmmu/hw/arm/mps2-tz.o
  CC      aarch64-softmmu/hw/arm/msf2-soc.o
  CC      aarch64-softmmu/hw/arm/msf2-som.o
  CC      aarch64-softmmu/hw/arm/iotkit.o
  CC      aarch64-softmmu/hw/arm/fsl-imx7.o
  CC      aarch64-softmmu/hw/arm/mcimx7d-sabre.o
  CC      aarch64-softmmu/hw/arm/smmu-common.o
  CC      aarch64-softmmu/hw/arm/smmuv3.o
  CC      aarch64-softmmu/hw/arm/fsl-imx6ul.o
  CC      aarch64-softmmu/hw/arm/mcimx6ul-evk.o
  CC      aarch64-softmmu/hw/arm/nrf51_soc.o
  CC      aarch64-softmmu/hw/arm/microbit.o
  CC      aarch64-softmmu/target/arm/arm-semi.o
  CC      aarch64-softmmu/target/arm/machine.o
  CC      aarch64-softmmu/target/arm/psci.o
  CC      aarch64-softmmu/target/arm/arch_dump.o
  CC      aarch64-softmmu/target/arm/monitor.o
  CC      aarch64-softmmu/target/arm/kvm-stub.o
  CC      aarch64-softmmu/target/arm/translate.o
  CC      aarch64-softmmu/target/arm/op_helper.o
  CC      aarch64-softmmu/target/arm/helper.o
  CC      aarch64-softmmu/target/arm/cpu.o
  CC      aarch64-softmmu/target/arm/neon_helper.o
  CC      aarch64-softmmu/target/arm/iwmmxt_helper.o
  CC      aarch64-softmmu/target/arm/vec_helper.o
  CC      aarch64-softmmu/target/arm/gdbstub.o
  CC      aarch64-softmmu/target/arm/cpu64.o
  CC      aarch64-softmmu/target/arm/translate-a64.o
  CC      aarch64-softmmu/target/arm/helper-a64.o
/tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
         return isinff(a.h);
                ^~~~~~
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
cc1: all warnings being treated as errors
/tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
         return isinff(a.h);
                ^~~~~~
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:483: subdir-x86_64-softmmu] Error 2
make: *** Waiting for unfinished jobs....
make: *** [Makefile:483: subdir-aarch64-softmmu] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 563, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 560, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 306, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 274, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 181, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=0f08c190f26911e8a02468b59973b7d0', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-spofu4kn/src/docker-src.2018-11-27-12.22.26.24055:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-spofu4kn/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real	1m54.170s
user	0m17.471s
sys	0m3.686s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (13 preceding siblings ...)
  2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
@ 2018-11-27 17:32 ` no-reply
  2018-12-05 12:41 ` Alex Bennée
  15 siblings, 0 replies; 37+ messages in thread
From: no-reply @ 2018-11-27 17:32 UTC (permalink / raw)
  To: cota; +Cc: famz, qemu-devel, richard.henderson, alex.bennee

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20181124235553.17371-1-cota@braap.org
Subject: [Qemu-devel] [PATCH v6 00/13] hardfloat
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fe0cee3 hardfloat: implement float32/64 comparison
ac5968b hardfloat: implement float32/64 square root
0f10937 hardfloat: implement float32/64 fused multiply-add
de38097 hardfloat: implement float32/64 division
fbeab45 hardfloat: implement float32/64 multiplication
8894a16 hardfloat: implement float32/64 addition and subtraction
834d403 fpu: introduce hardfloat
94b3f9b tests/fp: add fp-bench
fe2ef78 softfloat: add float{32, 64}_is_zero_or_normal
a343567 softfloat: rename canonicalize to sf_canonicalize
73e6c0d target/tricore: use float32_is_denormal
be09b31 softfloat: add float{32, 64}_is_{de, }normal
319042a fp-test: pick TARGET_ARM to get its specialization

=== OUTPUT BEGIN ===
Checking PATCH 1/13: fp-test: pick TARGET_ARM to get its specialization...
Checking PATCH 2/13: softfloat: add float{32, 64}_is_{de, }normal...
Checking PATCH 3/13: target/tricore: use float32_is_denormal...
Checking PATCH 4/13: softfloat: rename canonicalize to sf_canonicalize...
Checking PATCH 5/13: softfloat: add float{32, 64}_is_zero_or_normal...
Checking PATCH 6/13: tests/fp: add fp-bench...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#49: 
new file mode 100644

total: 0 errors, 1 warnings, 653 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 7/13: fpu: introduce hardfloat...
ERROR: spaces required around that '*' (ctx:WxV)
#82: FILE: fpu/softfloat.c:130:
+    static inline void name(soft_t *a, float_status *s)                 \
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#96: FILE: fpu/softfloat.c:144:
+    static inline void name(soft_t *a, float_status *s) \
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#109: FILE: fpu/softfloat.c:157:
+    static inline void name(soft_t *a, soft_t *b, float_status *s)      \
                                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#123: FILE: fpu/softfloat.c:171:
+    static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
                                                                           ^

WARNING: architecture specific defines should be avoided
#142: FILE: fpu/softfloat.c:190:
+#if defined(__x86_64__)

WARNING: architecture specific defines should be avoided
#164: FILE: fpu/softfloat.c:212:
+#if defined(__x86_64__) || defined(__aarch64__)

ERROR: spaces required around that '*' (ctx:WxV)
#183: FILE: fpu/softfloat.c:231:
+static inline bool can_use_fpu(const float_status *s)
                                                   ^

total: 5 errors, 2 warnings, 327 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 8/13: hardfloat: implement float32/64 addition and subtraction...
ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: fpu/softfloat.c:1063:
+soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
                                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#109: FILE: fpu/softfloat.c:1072:
+static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
                                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#114: FILE: fpu/softfloat.c:1077:
+static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
                                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#120: FILE: fpu/softfloat.c:1083:
+soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
                                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#130: FILE: fpu/softfloat.c:1092:
+static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
                                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#135: FILE: fpu/softfloat.c:1097:
+static inline float64 soft_f64_sub(float64 a, float64 b, float_status *status)
                                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#177: FILE: fpu/softfloat.c:1139:
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
                                                                  ^

ERROR: spaces required around that '*' (ctx:WxV)
#184: FILE: fpu/softfloat.c:1146:
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
                                                                  ^

ERROR: spaces required around that '*' (ctx:WxV)
#192: FILE: fpu/softfloat.c:1154:
+float32_add(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#198: FILE: fpu/softfloat.c:1160:
+float32_sub(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#204: FILE: fpu/softfloat.c:1166:
+float64_add(float64 a, float64 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#210: FILE: fpu/softfloat.c:1172:
+float64_sub(float64 a, float64 b, float_status *s)
                                                ^

total: 12 errors, 0 warnings, 149 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 9/13: hardfloat: implement float32/64 multiplication...
ERROR: spaces required around that '*' (ctx:WxV)
#45: FILE: fpu/softfloat.c:1236:
+soft_f32_mul(float32 a, float32 b, float_status *status)
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#55: FILE: fpu/softfloat.c:1246:
+soft_f64_mul(float64 a, float64 b, float_status *status)
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#83: FILE: fpu/softfloat.c:1275:
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
                                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#90: FILE: fpu/softfloat.c:1282:
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
                                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: fpu/softfloat.c:1290:
+float32_mul(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#105: FILE: fpu/softfloat.c:1297:
+float64_mul(float64 a, float64 b, float_status *s)
                                                ^

total: 6 errors, 0 warnings, 72 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 10/13: hardfloat: implement float32/64 division...
ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: fpu/softfloat.c:1628:
+soft_f32_div(float32 a, float32 b, float_status *status)
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#58: FILE: fpu/softfloat.c:1638:
+soft_f64_div(float64 a, float64 b, float_status *status)
                                                 ^

ERROR: spaces required around that '*' (ctx:WxV)
#111: FILE: fpu/softfloat.c:1692:
+float32_div(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#118: FILE: fpu/softfloat.c:1699:
+float64_div(float64 a, float64 b, float_status *s)
                                                ^

total: 4 errors, 0 warnings, 82 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 11/13: hardfloat: implement float32/64 fused multiply-add...
ERROR: spaces required around that '*' (ctx:WxV)
#50: FILE: fpu/softfloat.c:1519:
+                float_status *status)
                              ^

ERROR: spaces required around that '*' (ctx:WxV)
#62: FILE: fpu/softfloat.c:1531:
+                float_status *status)
                              ^

ERROR: spaces required around that '*' (ctx:WxV)
#71: FILE: fpu/softfloat.c:1542:
+float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
                                                                            ^

ERROR: spaces required around that '*' (ctx:WxV)
#132: FILE: fpu/softfloat.c:1603:
+float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
                                                                            ^

total: 4 errors, 0 warnings, 150 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 12/13: hardfloat: implement float32/64 square root...
ERROR: spaces required around that '*' (ctx:WxV)
#32: FILE: fpu/softfloat.c:3044:
+soft_f32_sqrt(float32 a, float_status *status)
                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#41: FILE: fpu/softfloat.c:3052:
+soft_f64_sqrt(float64 a, float_status *status)
                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: fpu/softfloat.c:3059:
+float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
                                                            ^

ERROR: spaces required around that '*' (ctx:WxV)
#75: FILE: fpu/softfloat.c:3086:
+float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
                                                            ^

total: 4 errors, 0 warnings, 78 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 13/13: hardfloat: implement float32/64 comparison...
ERROR: spaces required around that '*' (ctx:WxV)
#87: FILE: fpu/softfloat.c:2904:
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
                                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#111: FILE: fpu/softfloat.c:2917:
+int float16_compare(float16 a, float16 b, float_status *s)
                                                        ^

total: 2 errors, 0 warnings, 123 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
  2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
@ 2018-11-27 17:52   ` Emilio G. Cota
  0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-27 17:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: famz, richard.henderson, alex.bennee

On Tue, Nov 27, 2018 at 09:24:21 -0800, no-reply@patchew.org wrote:
> /tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
>          return isinff(a.h);
>                 ^~~~~~
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
> cc1: all warnings being treated as errors
> /tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
>          return isinff(a.h);
>                 ^~~~~~
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
> cc1: all warnings being treated as errors
> make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
> make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1

This is the offender:

+static inline bool f32_is_inf(union_float32 a)
+{
+    if (QEMU_HARDFLOAT_USE_ISINF) {
+        return isinff(a.h);
+    }
+    return float32_is_infinity(a.s);
+}

I've fixed up the branch on github to use isinf here instead
of isinff.

		Emilio

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
@ 2018-12-03 12:13   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 12:13 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> This gets rid of the muladd errors due to not raising the invalid flag.
>
> - Before:
> Errors found in f64_mulAdd, rounding near_even, tininess before rounding:
> +000.0000000000000  +7FF.0000000000000  +7FF.FFFFFFFFFFFFF
>         => +7FF.FFFFFFFFFFFFF .....  expected -7FF.FFFFFFFFFFFFF v....
> [...]
>
> - After:
> In 6133248 tests, no errors found in f64_mulAdd, rounding near_even, tininess before rounding.
> [...]
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tests/fp/Makefile | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/tests/fp/Makefile b/tests/fp/Makefile
> index d649a5a1db..49cdcd1bd2 100644
> --- a/tests/fp/Makefile
> +++ b/tests/fp/Makefile
> @@ -29,6 +29,9 @@ QEMU_INCLUDES += -I$(TF_SOURCE_DIR)
>
>  # work around TARGET_* poisoning
>  QEMU_CFLAGS += -DHW_POISON_H
> +# define a target to match testfloat's implementation-defined choices, such as
> +# whether to raise the invalid flag when dealing with NaNs in muladd.
> +QEMU_CFLAGS += -DTARGET_ARM
>
>  # capstone has a platform.h file that clashes with softfloat's
>  QEMU_CFLAGS := $(filter-out %capstone, $(QEMU_CFLAGS))


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
@ 2018-12-03 14:16   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 14:16 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> glibc >= 2.25 defines canonicalize in commit eaf5ad0
> (Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
>
> Given that we'll be including <math.h> soon, prepare
> for this by prefixing our canonicalize() with sf_ to avoid
> clashing with the libc's canonicalize().
>
> Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  fpu/softfloat.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e1eef954e6..ecdc00c633 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -336,8 +336,8 @@ static inline float64 float64_pack_raw(FloatParts p)
>  #include "softfloat-specialize.h"
>
>  /* Canonicalize EXP and FRAC, setting CLS.  */
> -static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
> -                               float_status *status)
> +static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
> +                                  float_status *status)
>  {
>      if (part.exp == parm->exp_max && !parm->arm_althp) {
>          if (part.frac == 0) {
> @@ -513,7 +513,7 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
>  static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
>                                              const FloatFmt *params)
>  {
> -    return canonicalize(float16_unpack_raw(f), params, s);
> +    return sf_canonicalize(float16_unpack_raw(f), params, s);
>  }
>
>  static FloatParts float16_unpack_canonical(float16 f, float_status *s)
> @@ -534,7 +534,7 @@ static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
>
>  static FloatParts float32_unpack_canonical(float32 f, float_status *s)
>  {
> -    return canonicalize(float32_unpack_raw(f), &float32_params, s);
> +    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
>  }
>
>  static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
> @@ -544,7 +544,7 @@ static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
>
>  static FloatParts float64_unpack_canonical(float64 f, float_status *s)
>  {
> -    return canonicalize(float64_unpack_raw(f), &float64_params, s);
> +    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
>  }
>
>  static float64 float64_round_pack_canonical(FloatParts p, float_status *s)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
@ 2018-12-03 14:16   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 14:16 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> These will gain some users very soon.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/fpu/softfloat.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 9eeccd88a5..38a5e99cf3 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -474,6 +474,11 @@ static inline bool float32_is_denormal(float32 a)
>      return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
>  }
>
> +static inline bool float32_is_zero_or_normal(float32 a)
> +{
> +    return float32_is_normal(a) || float32_is_zero(a);
> +}
> +
>  static inline float32 float32_set_sign(float32 a, int sign)
>  {
>      return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
> @@ -625,6 +630,11 @@ static inline bool float64_is_denormal(float64 a)
>      return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
>  }
>
> +static inline bool float64_is_zero_or_normal(float64 a)
> +{
> +    return float64_is_normal(a) || float64_is_zero(a);
> +}
> +
>  static inline float64 float64_set_sign(float64 a, int sign)
>  {
>      return make_float64((float64_val(a) & 0x7fffffffffffffffULL)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
@ 2018-12-03 14:29   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 14:29 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> These microbenchmarks will allow us to measure the performance impact of
> FP emulation optimizations. Note that we can measure both directly the impact
> on the softfloat functions (with "-t soft"), or the impact on an
> emulated workload (call with "-t host" and run under qemu user-mode).

It would be nice to be able to cross-build this later so we can build
easily for non-x86. However no reason to hold things up for that:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> +
> +/*
> + * The main benchmark function. Instead of (ab)using macros, we rely
> + * on the compiler to unfold this at compile-time.
> + */

\o/

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
  2018-11-25  0:25   ` Aleksandar Markovic
@ 2018-12-04 12:28   ` Alex Bennée
  2018-12-04 13:33     ` Richard Henderson
  1 sibling, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 12:28 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the added comment for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.

We don't currently enforce this though although maybe that would be too
much hand holding for compiler ricers hell bent on not understanding the
flags they use.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-12-04 12:28   ` Alex Bennée
@ 2018-12-04 13:33     ` Richard Henderson
  2018-12-04 13:52       ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2018-12-04 13:33 UTC (permalink / raw)
  To: Alex Bennée, Emilio G. Cota; +Cc: qemu-devel

On 12/4/18 6:28 AM, Alex Bennée wrote:
> Emilio G. Cota <cota@braap.org> writes:
>> This assumes that QEMU is running on an IEEE754-compliant FPU and
>> that the rounding is set to the default (to nearest). The
>> implementation-dependent specifics of the FPU should not matter; things
>> like tininess detection and snan representation are still dealt with in
>> soft-fp. However, this approach will break on most hosts if we compile
>> QEMU with flags such as -ffast-math. We control the flags so this should
>> be easy to enforce though.
> 
> We don't currently enforce this though although maybe that would be too
> much hand holding for compiler ricers hell bent on not understanding the
> flags they use.

We could always

#ifdef __FAST_MATH__
#error "Silliness like this will get you nowhere"
#endif


r~

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-12-04 13:33     ` Richard Henderson
@ 2018-12-04 13:52       ` Alex Bennée
  2018-12-04 17:31         ` Emilio G. Cota
  0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 13:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Emilio G. Cota, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 12/4/18 6:28 AM, Alex Bennée wrote:
>> Emilio G. Cota <cota@braap.org> writes:
>>> This assumes that QEMU is running on an IEEE754-compliant FPU and
>>> that the rounding is set to the default (to nearest). The
>>> implementation-dependent specifics of the FPU should not matter; things
>>> like tininess detection and snan representation are still dealt with in
>>> soft-fp. However, this approach will break on most hosts if we compile
>>> QEMU with flags such as -ffast-math. We control the flags so this should
>>> be easy to enforce though.
>>
>> We don't currently enforce this though although maybe that would be too
>> much hand holding for compiler ricers hell bent on not understanding the
>> flags they use.
>
> We could always
>
> #ifdef __FAST_MATH__
> #error "Silliness like this will get you nowhere"
> #endif

Emilio, are you happy to add that guard with a suitable pithy comment?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-12-04 13:52       ` Alex Bennée
@ 2018-12-04 17:31         ` Emilio G. Cota
  2018-12-04 19:08           ` Alex Bennée
  0 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-12-04 17:31 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

On Tue, Dec 04, 2018 at 13:52:16 +0000, Alex Bennée wrote:
> > We could always
> >
> > #ifdef __FAST_MATH__
> > #error "Silliness like this will get you nowhere"
> > #endif
> 
> Emilio, are you happy to add that guard with a suitable pithy comment?

Isn't it better to just disable hardfloat then?

--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -220,7 +220,7 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
  * the use of hardfloat, since hardfloat relies on the inexact flag being
  * already set.
  */
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(__FAST_MATH__)
 # define QEMU_NO_HARDFLOAT 1
 # define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
 #else

Or perhaps disable it, as well as issue a #warning?

		E.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
@ 2018-12-04 18:34   ` Alex Bennée
  2018-12-04 20:07     ` Emilio G. Cota
  0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 18:34 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Performance results (single and double precision) for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> add-single: 135.07 MFlops
> add-double: 131.60 MFlops
> sub-single: 130.04 MFlops
> sub-double: 133.01 MFlops
> - after:
> add-single: 443.04 MFlops
> add-double: 301.95 MFlops
> sub-single: 411.36 MFlops
> sub-double: 293.15 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> add-single: 44.79 MFlops
> add-double: 49.20 MFlops
> sub-single: 44.55 MFlops
> sub-double: 49.06 MFlops
> - after:
> add-single: 93.28 MFlops
> add-double: 88.27 MFlops
> sub-single: 91.47 MFlops
> sub-double: 88.27 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> add-single: 72.59 MFlops
> add-double: 72.27 MFlops
> sub-single: 75.33 MFlops
> sub-double: 70.54 MFlops
> - after:
> add-single: 112.95 MFlops
> add-double: 201.11 MFlops
> sub-single: 116.80 MFlops
> sub-double: 188.72 MFlops
>
> Note that the IBM and ARM machines benefit from having
> HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
> can suffer significantly:

Is this just the latency of pushing the number into a SIMD register and
checking the flags compared to a bitmask check?

> - IBM Power8:
> add-single: [1] 54.94 vs [0] 116.37 MFlops
> add-double: [1] 58.92 vs [0] 201.44 MFlops
> - Aarch64 A57:
> add-single: [1] 80.72 vs [0] 93.24 MFlops
> add-double: [1] 82.10 vs [0] 88.18 MFlops
>
> On the Intel machine, having 2F64 set to 1 pays off, but it
> doesn't for 2F32:
> - Intel i7-6700K:
> add-single: [1] 285.79 vs [0] 426.70 MFlops
> add-double: [1] 302.15 vs [0] 278.82 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  fpu/softfloat.c | 117 ++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 98 insertions(+), 19 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 306a12fa8d..cc500b1618 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1050,49 +1050,128 @@ float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 QEMU_FLATTEN float32_add(float32 a, float32 b, float_status *status)
> +float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
> +{
> +    FloatParts pa = float16_unpack_canonical(a, status);
> +    FloatParts pb = float16_unpack_canonical(b, status);
> +    FloatParts pr = addsub_floats(pa, pb, true, status);
> +
> +    return float16_round_pack_canonical(pr, status);
> +}

Hmm the diff is confusing but the changes look fine in the final code:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
  2018-12-04 17:31         ` Emilio G. Cota
@ 2018-12-04 19:08           ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 19:08 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: Richard Henderson, qemu-devel


Emilio G. Cota <cota@braap.org> writes:

> On Tue, Dec 04, 2018 at 13:52:16 +0000, Alex Bennée wrote:
>> > We could always
>> >
>> > #ifdef __FAST_MATH__
>> > #error "Silliness like this will get you nowhere"
>> > #endif
>>
>> Emilio, are you happy to add that guard with a suitable pithy comment?
>
> Isn't it better to just disable hardfloat then?
>
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -220,7 +220,7 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
>   * the use of hardfloat, since hardfloat relies on the inexact flag being
>   * already set.
>   */
> -#if defined(TARGET_PPC)
> +#if defined(TARGET_PPC) || defined(__FAST_MATH__)
>  # define QEMU_NO_HARDFLOAT 1
>  # define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
>  #else
>
> Or perhaps disable it, as well as issue a #warning?

Issuing the warning is only to tell the user they are being stupid but
yeah certainly disable. Maybe we'll be around when someone comes asking
why maths didn't get faster ;-)

>
> 		E.


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction
  2018-12-04 18:34   ` Alex Bennée
@ 2018-12-04 20:07     ` Emilio G. Cota
  0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-12-04 20:07 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Richard Henderson

On Tue, Dec 04, 2018 at 18:34:18 +0000, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
(snip)
> > Note that the IBM and ARM machines benefit from having
> > HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
> > can suffer significantly:
> 
> Is this just the latency of pushing the number into a SIMD register and
> checking the flags compared to a bitmask check?

That's the case in the generated x86 assembly, so I presume
the same it's happening in the other ISAs (I didn't check
the assembly there).

(snip)
> 
> Hmm the diff is confusing but the changes look fine in the final code:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Thanks!

		E.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
@ 2018-12-05 10:10   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 10:10 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Performance results for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> mul-single: 126.91 MFlops
> mul-double: 118.28 MFlops
> - after:
> mul-single: 258.02 MFlops
> mul-double: 197.96 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> mul-single: 37.42 MFlops
> mul-double: 38.77 MFlops
> - after:
> mul-single: 73.41 MFlops
> mul-double: 76.93 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> mul-single: 58.40 MFlops
> mul-double: 59.33 MFlops
> - after:
> mul-single: 60.25 MFlops
> mul-double: 94.79 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index cc500b1618..58e67d9b80 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1232,7 +1232,8 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_mul(float32 a, float32 b, float_status *status)
>  {
>      FloatParts pa = float32_unpack_canonical(a, status);
>      FloatParts pb = float32_unpack_canonical(b, status);
> @@ -1241,7 +1242,8 @@ float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
>      return float32_round_pack_canonical(pr, status);
>  }
>
> -float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_mul(float64 a, float64 b, float_status *status)
>  {
>      FloatParts pa = float64_unpack_canonical(a, status);
>      FloatParts pb = float64_unpack_canonical(b, status);
> @@ -1250,6 +1252,54 @@ float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
>      return float64_round_pack_canonical(pr, status);
>  }
>
> +static float hard_f32_mul(float a, float b)
> +{
> +    return a * b;
> +}
> +
> +static double hard_f64_mul(double a, double b)
> +{
> +    return a * b;
> +}
> +
> +static bool f32_mul_fast_test(union_float32 a, union_float32 b)
> +{
> +    return float32_is_zero(a.s) || float32_is_zero(b.s);
> +}
> +
> +static bool f64_mul_fast_test(union_float64 a, union_float64 b)
> +{
> +    return float64_is_zero(a.s) || float64_is_zero(b.s);
> +}
> +
> +static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
> +{
> +    bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
> +
> +    return float32_set_sign(float32_zero, signbit);
> +}
> +
> +static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
> +{
> +    bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
> +
> +    return float64_set_sign(float64_zero, signbit);
> +}
> +
> +float32 QEMU_FLATTEN
> +float32_mul(float32 a, float32 b, float_status *s)
> +{
> +    return float32_gen2(a, b, s, hard_f32_mul, soft_f32_mul,
> +                        f32_is_zon2, NULL, f32_mul_fast_test, f32_mul_fast_op);
> +}
> +
> +float64 QEMU_FLATTEN
> +float64_mul(float64 a, float64 b, float_status *s)
> +{
> +    return float64_gen2(a, b, s, hard_f64_mul, soft_f64_mul,
> +                        f64_is_zon2, NULL, f64_mul_fast_test, f64_mul_fast_op);
> +}
> +
>  /*
>   * Returns the result of multiplying the floating-point values `a' and
>   * `b' then adding 'c', with no intermediate rounding step after the


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
@ 2018-12-05 10:11   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 10:11 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Performance results for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> div-single: 34.84 MFlops
> div-double: 34.04 MFlops
> - after:
> div-single: 275.23 MFlops
> div-double: 216.38 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> div-single: 9.33 MFlops
> div-double: 9.30 MFlops
> - after:
> div-single: 51.55 MFlops
> div-double: 15.09 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> div-single: 25.65 MFlops
> div-double: 24.91 MFlops
> - after:
> div-single: 96.83 MFlops
> div-double: 31.01 MFlops
>
> Here setting 2FP64_USE_FP to 1 pays off for x86_64:
> [1] 215.97 vs [0] 62.15 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  fpu/softfloat.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 58e67d9b80..e35ebfaae7 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1624,7 +1624,8 @@ float16 float16_div(float16 a, float16 b, float_status *status)
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 float32_div(float32 a, float32 b, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_div(float32 a, float32 b, float_status *status)
>  {
>      FloatParts pa = float32_unpack_canonical(a, status);
>      FloatParts pb = float32_unpack_canonical(b, status);
> @@ -1633,7 +1634,8 @@ float32 float32_div(float32 a, float32 b, float_status *status)
>      return float32_round_pack_canonical(pr, status);
>  }
>
> -float64 float64_div(float64 a, float64 b, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_div(float64 a, float64 b, float_status *status)
>  {
>      FloatParts pa = float64_unpack_canonical(a, status);
>      FloatParts pb = float64_unpack_canonical(b, status);
> @@ -1642,6 +1644,64 @@ float64 float64_div(float64 a, float64 b, float_status *status)
>      return float64_round_pack_canonical(pr, status);
>  }
>
> +static float hard_f32_div(float a, float b)
> +{
> +    return a / b;
> +}
> +
> +static double hard_f64_div(double a, double b)
> +{
> +    return a / b;
> +}
> +
> +static bool f32_div_pre(union_float32 a, union_float32 b)
> +{
> +    if (QEMU_HARDFLOAT_2F32_USE_FP) {
> +        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
> +               fpclassify(b.h) == FP_NORMAL;
> +    }
> +    return float32_is_zero_or_normal(a.s) && float32_is_normal(b.s);
> +}
> +
> +static bool f64_div_pre(union_float64 a, union_float64 b)
> +{
> +    if (QEMU_HARDFLOAT_2F64_USE_FP) {
> +        return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
> +               fpclassify(b.h) == FP_NORMAL;
> +    }
> +    return float64_is_zero_or_normal(a.s) && float64_is_normal(b.s);
> +}
> +
> +static bool f32_div_post(union_float32 a, union_float32 b)
> +{
> +    if (QEMU_HARDFLOAT_2F32_USE_FP) {
> +        return fpclassify(a.h) != FP_ZERO;
> +    }
> +    return !float32_is_zero(a.s);
> +}
> +
> +static bool f64_div_post(union_float64 a, union_float64 b)
> +{
> +    if (QEMU_HARDFLOAT_2F64_USE_FP) {
> +        return fpclassify(a.h) != FP_ZERO;
> +    }
> +    return !float64_is_zero(a.s);
> +}
> +
> +float32 QEMU_FLATTEN
> +float32_div(float32 a, float32 b, float_status *s)
> +{
> +    return float32_gen2(a, b, s, hard_f32_div, soft_f32_div,
> +                        f32_div_pre, f32_div_post, NULL, NULL);
> +}
> +
> +float64 QEMU_FLATTEN
> +float64_div(float64 a, float64 b, float_status *s)
> +{
> +    return float64_gen2(a, b, s, hard_f64_div, soft_f64_div,
> +                        f64_div_pre, f64_div_post, NULL, NULL);
> +}
> +
>  /*
>   * Float to Float conversions
>   *


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
@ 2018-12-05 12:25   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:25 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Performance results for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> fma-single: 74.73 MFlops
> fma-double: 74.54 MFlops
> - after:
> fma-single: 203.37 MFlops
> fma-double: 169.37 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> fma-single: 23.24 MFlops
> fma-double: 23.70 MFlops
> - after:
> fma-single: 66.14 MFlops
> fma-double: 63.10 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> fma-single: 37.26 MFlops
> fma-double: 37.29 MFlops
> - after:
> fma-single: 48.90 MFlops
> fma-double: 59.51 MFlops
>
> Here having 3FP64 set to 1 pays off for x86_64:
> [1] 170.15 vs [0] 153.12 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> ---
>  fpu/softfloat.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 128 insertions(+), 4 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e35ebfaae7..e03feafb6f 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1514,8 +1514,9 @@ float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
> -                                                int flags, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
> +                float_status *status)
>  {
>      FloatParts pa = float32_unpack_canonical(a, status);
>      FloatParts pb = float32_unpack_canonical(b, status);
> @@ -1525,8 +1526,9 @@ float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
>      return float32_round_pack_canonical(pr, status);
>  }
>
> -float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
> -                                                int flags, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
> +                float_status *status)
>  {
>      FloatParts pa = float64_unpack_canonical(a, status);
>      FloatParts pb = float64_unpack_canonical(b, status);
> @@ -1536,6 +1538,128 @@ float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
>      return float64_round_pack_canonical(pr, status);
>  }
>
> +float32 QEMU_FLATTEN
> +float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
> +{
> +    union_float32 ua, ub, uc, ur;
> +
> +    ua.s = xa;
> +    ub.s = xb;
> +    uc.s = xc;
> +
> +    if (unlikely(!can_use_fpu(s))) {
> +        goto soft;
> +    }
> +    if (unlikely(flags & float_muladd_halve_result)) {
> +        goto soft;
> +    }
> +
> +    float32_input_flush3(&ua.s, &ub.s, &uc.s, s);
> +    if (unlikely(!f32_is_zon3(ua, ub, uc))) {
> +        goto soft;
> +    }
> +    /*
> +     * When (a || b) == 0, there's no need to check for under/over flow,
> +     * since we know the addend is (normal || 0) and the product is 0.
> +     */
> +    if (float32_is_zero(ua.s) || float32_is_zero(ub.s)) {
> +        union_float32 up;
> +        bool prod_sign;
> +
> +        prod_sign = float32_is_neg(ua.s) ^ float32_is_neg(ub.s);
> +        prod_sign ^= !!(flags & float_muladd_negate_product);
> +        up.s = float32_set_sign(float32_zero, prod_sign);
> +
> +        if (flags & float_muladd_negate_c) {
> +            uc.h = -uc.h;
> +        }
> +        ur.h = up.h + uc.h;
> +    } else {
> +        if (flags & float_muladd_negate_product) {
> +            ua.h = -ua.h;
> +        }
> +        if (flags & float_muladd_negate_c) {
> +            uc.h = -uc.h;
> +        }
> +
> +        ur.h = fmaf(ua.h, ub.h, uc.h);
> +
> +        if (unlikely(f32_is_inf(ur))) {
> +            s->float_exception_flags |= float_flag_overflow;
> +        } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
> +            goto soft;
> +        }
> +    }
> +    if (flags & float_muladd_negate_result) {
> +        return float32_chs(ur.s);
> +    }
> +    return ur.s;
> +
> + soft:
> +    return soft_f32_muladd(ua.s, ub.s, uc.s, flags, s);
> +}
> +
> +float64 QEMU_FLATTEN
> +float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
> +{
> +    union_float64 ua, ub, uc, ur;
> +
> +    ua.s = xa;
> +    ub.s = xb;
> +    uc.s = xc;
> +
> +    if (unlikely(!can_use_fpu(s))) {
> +        goto soft;
> +    }
> +    if (unlikely(flags & float_muladd_halve_result)) {
> +        goto soft;
> +    }
> +
> +    float64_input_flush3(&ua.s, &ub.s, &uc.s, s);
> +    if (unlikely(!f64_is_zon3(ua, ub, uc))) {
> +        goto soft;
> +    }
> +    /*
> +     * When (a || b) == 0, there's no need to check for under/over flow,
> +     * since we know the addend is (normal || 0) and the product is 0.
> +     */
> +    if (float64_is_zero(ua.s) || float64_is_zero(ub.s)) {
> +        union_float64 up;
> +        bool prod_sign;
> +
> +        prod_sign = float64_is_neg(ua.s) ^ float64_is_neg(ub.s);
> +        prod_sign ^= !!(flags & float_muladd_negate_product);
> +        up.s = float64_set_sign(float64_zero, prod_sign);
> +
> +        if (flags & float_muladd_negate_c) {
> +            uc.h = -uc.h;
> +        }
> +        ur.h = up.h + uc.h;
> +    } else {
> +        if (flags & float_muladd_negate_product) {
> +            ua.h = -ua.h;
> +        }
> +        if (flags & float_muladd_negate_c) {
> +            uc.h = -uc.h;
> +        }
> +
> +        ur.h = fma(ua.h, ub.h, uc.h);
> +
> +        if (unlikely(f64_is_inf(ur))) {
> +            s->float_exception_flags |= float_flag_overflow;
> +        } else if (unlikely(fabs(ur.h) <= FLT_MIN)) {
> +            goto soft;
> +        }
> +    }
> +    if (flags & float_muladd_negate_result) {
> +        return float64_chs(ur.s);
> +    }
> +    return ur.s;
> +
> + soft:
> +    return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
> +}
> +
>  /*
>   * Returns the result of dividing the floating-point value `a' by the
>   * corresponding value `b'. The operation is performed according to


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
@ 2018-12-05 12:26   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:26 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Performance results for fp-bench:
>
> Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> sqrt-single: 42.30 MFlops
> sqrt-double: 22.97 MFlops
> - after:
> sqrt-single: 311.42 MFlops
> sqrt-double: 311.08 MFlops
>
> Here USE_FP makes a huge difference for f64's, with throughput
> going from ~200 MFlops to ~300 MFlops.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> ---
>  fpu/softfloat.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e03feafb6f..4c6ecd1883 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -3040,20 +3040,76 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
>      return float16_round_pack_canonical(pr, status);
>  }
>
> -float32 QEMU_FLATTEN float32_sqrt(float32 a, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_sqrt(float32 a, float_status *status)
>  {
>      FloatParts pa = float32_unpack_canonical(a, status);
>      FloatParts pr = sqrt_float(pa, status, &float32_params);
>      return float32_round_pack_canonical(pr, status);
>  }
>
> -float64 QEMU_FLATTEN float64_sqrt(float64 a, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_sqrt(float64 a, float_status *status)
>  {
>      FloatParts pa = float64_unpack_canonical(a, status);
>      FloatParts pr = sqrt_float(pa, status, &float64_params);
>      return float64_round_pack_canonical(pr, status);
>  }
>
> +float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
> +{
> +    union_float32 ua, ur;
> +
> +    ua.s = xa;
> +    if (unlikely(!can_use_fpu(s))) {
> +        goto soft;
> +    }
> +
> +    float32_input_flush1(&ua.s, s);
> +    if (QEMU_HARDFLOAT_1F32_USE_FP) {
> +        if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
> +                       fpclassify(ua.h) == FP_ZERO) ||
> +                     signbit(ua.h))) {
> +            goto soft;
> +        }
> +    } else if (unlikely(!float32_is_zero_or_normal(ua.s) ||
> +                        float32_is_neg(ua.s))) {
> +        goto soft;
> +    }
> +    ur.h = sqrtf(ua.h);
> +    return ur.s;
> +
> + soft:
> +    return soft_f32_sqrt(ua.s, s);
> +}
> +
> +float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
> +{
> +    union_float64 ua, ur;
> +
> +    ua.s = xa;
> +    if (unlikely(!can_use_fpu(s))) {
> +        goto soft;
> +    }
> +
> +    float64_input_flush1(&ua.s, s);
> +    if (QEMU_HARDFLOAT_1F64_USE_FP) {
> +        if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
> +                       fpclassify(ua.h) == FP_ZERO) ||
> +                     signbit(ua.h))) {
> +            goto soft;
> +        }
> +    } else if (unlikely(!float64_is_zero_or_normal(ua.s) ||
> +                        float64_is_neg(ua.s))) {
> +        goto soft;
> +    }
> +    ur.h = sqrt(ua.h);
> +    return ur.s;
> +
> + soft:
> +    return soft_f64_sqrt(ua.s, s);
> +}
> +
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated NaN.
>  *----------------------------------------------------------------------------*/


--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison
  2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
@ 2018-12-05 12:36   ` Alex Bennée
  0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:36 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> Performance results for fp-bench:
>
> Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> cmp-single: 110.98 MFlops
> cmp-double: 107.12 MFlops
> - after:
> cmp-single: 506.28 MFlops
> cmp-double: 524.77 MFlops
>
> Note that flattening both eq and eq_signaling versions
> would give us extra performance (695v506, 615v524 Mflops
> for single/double, respectively) but this would emit two
> essentially identical functions for each eq/signaling pair,
> which is a waste.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
  2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
                   ` (14 preceding siblings ...)
  2018-11-27 17:32 ` no-reply
@ 2018-12-05 12:41 ` Alex Bennée
  2018-12-05 16:47   ` Emilio G. Cota
  15 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:41 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html
>
> Changes since v5:
>
> - Rebase on rth/tcg-next-for-4.0
<snip>

Awesome work - the series is looking really good now and I think we are
ready for a merge once the tree re-opens. I think there were a few
wording changes and the #ifdef fix to apply so if you are happy to do
that I'll slurp up v7 prepare a PR once it's ready.

Going forward I think we want to wire up the fp-test code so we can run
it in CI via the rest of make check (or check-tcg?) but no need to hold
up the merge for that.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
  2018-12-05 12:41 ` Alex Bennée
@ 2018-12-05 16:47   ` Emilio G. Cota
  0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-12-05 16:47 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Richard Henderson

On Wed, Dec 05, 2018 at 12:41:15 +0000, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
> > v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html
> >
> > Changes since v5:
> >
> > - Rebase on rth/tcg-next-for-4.0
> <snip>
> 
> Awesome work - the series is looking really good now and I think we are
> ready for a merge once the tree re-opens. I think there were a few
> wording changes and the #ifdef fix to apply so if you are happy to do
> that I'll slurp up v7 prepare a PR once it's ready.

Great, thanks for reviewing!

I just pushed v7. The changes are tiny (v6->v7 diff shown below)
so unless you want me to, I won't send it to the list.

  https://github.com/cota/qemu/tree/hardfloat-v7

I added your R-b's and the __FAST_MATH__ check:

$ git diff hardfloat-v6..hardfloat-v7
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 494422faab..59eac97d10 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -220,7 +220,11 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
  * the use of hardfloat, since hardfloat relies on the inexact flag being
  * already set.
  */
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(__FAST_MATH__)
+# if defined(__FAST_MATH__)
+#  warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
+    IEEE implementation
+# endif
 # define QEMU_NO_HARDFLOAT 1
 # define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
 #else

I also updated patch 7's commit message:
    [..]
    However, this approach will break on most hosts if we compile
    QEMU with flags that break IEEE compatibility. There is no way to detect
    all of these flags at compilation time, but at least we check for
    -ffast-math (which defines __FAST_MATH__) and disable hardfloat
    (plus emit a #warning) when it is set.

> Going forward I think we want to wire up the fp-test code so we can run
> it in CI via the rest of make check (or check-tcg?) but no need to hold
> up the merge for that.

Yes, I think starting with `make check' would make sense. We should
test with and without `-f x', to make sure that both soft and
hardfloat are tested. There are still some f80/f128 known errors
though, so we might want to disable the testing of those for now.

Thanks,

		Emilio

^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-12-05 16:47 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
2018-12-03 12:13   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
2018-12-03 14:16   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
2018-12-03 14:16   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
2018-12-03 14:29   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
2018-11-25  0:25   ` Aleksandar Markovic
2018-11-25  1:25     ` Emilio G. Cota
2018-12-04 12:28   ` Alex Bennée
2018-12-04 13:33     ` Richard Henderson
2018-12-04 13:52       ` Alex Bennée
2018-12-04 17:31         ` Emilio G. Cota
2018-12-04 19:08           ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
2018-12-04 18:34   ` Alex Bennée
2018-12-04 20:07     ` Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
2018-12-05 10:10   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
2018-12-05 10:11   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
2018-12-05 12:25   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
2018-12-05 12:26   ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
2018-12-05 12:36   ` Alex Bennée
2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
2018-11-27 17:52   ` Emilio G. Cota
2018-11-27 17:32 ` no-reply
2018-12-05 12:41 ` Alex Bennée
2018-12-05 16:47   ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.