* [Qemu-devel] [PATCH v6 00/13] hardfloat
@ 2018-11-24 23:55 Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
` (15 more replies)
0 siblings, 16 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html
Changes since v5:
- Rebase on rth/tcg-next-for-4.0
- Use QEMU_FLATTEN instead of __attribute__((flatten))
- Merge rth's cleanups (thanks!). With this, we now use a union to
hold {float|float32} or {double|float64} types, which gets
rid of most macros. I added a few optimizations (i.e. likely
hints in some branches, and not using temp variables to hold
the result of fpclassify) to roughly match (and sometimes
surpass) v5's performance.
- float64_sqrt: use fpclassify, which gives a 1.5x speedup.
This series introduces no regressions to fp-test. You can test
hardfloat by passing "-f x" to fp-test (so that the inexact flag
is set before each operation) and using even rounding (fp-test's
default). Note that hardfloat does not affect operations with
other rounding modes.
Perf numbers for fp-bench running on several host machines are in
each commit log; numbers for several benchmarks (NBench, SPEC06fp)
are in the last patch's commit log. These numbers are a bit
outdated (they're from v2 or so), but I've decided to keep them
because they give a good idea of the speedups to expect, and I don't
have time to re-run them =)
I did re-run the numbers for sqrt and cmp, though, since the
implementation has changed quite a bit since v5. I didn't
re-run these on Aarch64 and PPC hosts due to lack of time,
but I doubt they'd change significantly.
You can fetch this series from:
https://github.com/cota/qemu/tree/hardfloat-v6
Thanks,
Emilio
^ permalink raw reply [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-03 12:13 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
` (14 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
This gets rid of the muladd errors due to not raising the invalid flag.
- Before:
Errors found in f64_mulAdd, rounding near_even, tininess before rounding:
+000.0000000000000 +7FF.0000000000000 +7FF.FFFFFFFFFFFFF
=> +7FF.FFFFFFFFFFFFF ..... expected -7FF.FFFFFFFFFFFFF v....
[...]
- After:
In 6133248 tests, no errors found in f64_mulAdd, rounding near_even, tininess before rounding.
[...]
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
tests/fp/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index d649a5a1db..49cdcd1bd2 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -29,6 +29,9 @@ QEMU_INCLUDES += -I$(TF_SOURCE_DIR)
# work around TARGET_* poisoning
QEMU_CFLAGS += -DHW_POISON_H
+# define a target to match testfloat's implementation-defined choices, such as
+# whether to raise the invalid flag when dealing with NaNs in muladd.
+QEMU_CFLAGS += -DTARGET_ARM
# capstone has a platform.h file that clashes with softfloat's
QEMU_CFLAGS := $(filter-out %capstone, $(QEMU_CFLAGS))
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal Emilio G. Cota
` (13 subsequent siblings)
15 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
This paves the way for upcoming work.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/fpu/softfloat.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 8fd9f9bbae..9eeccd88a5 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -464,6 +464,16 @@ static inline int float32_is_zero_or_denormal(float32 a)
return (float32_val(a) & 0x7f800000) == 0;
}
+static inline bool float32_is_normal(float32 a)
+{
+ return ((float32_val(a) + 0x00800000) & 0x7fffffff) >= 0x01000000;
+}
+
+static inline bool float32_is_denormal(float32 a)
+{
+ return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
+}
+
static inline float32 float32_set_sign(float32 a, int sign)
{
return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -605,6 +615,16 @@ static inline int float64_is_zero_or_denormal(float64 a)
return (float64_val(a) & 0x7ff0000000000000LL) == 0;
}
+static inline bool float64_is_normal(float64 a)
+{
+ return ((float64_val(a) + (1ULL << 52)) & -1ULL >> 1) >= 1ULL << 53;
+}
+
+static inline bool float64_is_denormal(float64 a)
+{
+ return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
+}
+
static inline float64 float64_set_sign(float64 a, int sign)
{
return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
` (12 subsequent siblings)
15 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
target/tricore/fpu_helper.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c
index df162902d6..31df462e4a 100644
--- a/target/tricore/fpu_helper.c
+++ b/target/tricore/fpu_helper.c
@@ -44,11 +44,6 @@ static inline uint8_t f_get_excp_flags(CPUTriCoreState *env)
| float_flag_inexact);
}
-static inline bool f_is_denormal(float32 arg)
-{
- return float32_is_zero_or_denormal(arg) && !float32_is_zero(arg);
-}
-
static inline float32 f_maddsub_nan_result(float32 arg1, float32 arg2,
float32 arg3, float32 result,
uint32_t muladd_negate_c)
@@ -260,8 +255,8 @@ uint32_t helper_fcmp(CPUTriCoreState *env, uint32_t r1, uint32_t r2)
set_flush_inputs_to_zero(0, &env->fp_status);
result = 1 << (float32_compare_quiet(arg1, arg2, &env->fp_status) + 1);
- result |= f_is_denormal(arg1) << 4;
- result |= f_is_denormal(arg2) << 5;
+ result |= float32_is_denormal(arg1) << 4;
+ result |= float32_is_denormal(arg2) << 5;
flags = f_get_excp_flags(env);
if (flags) {
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (2 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-03 14:16 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
` (11 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
Given that we'll be including <math.h> soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().
Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e1eef954e6..ecdc00c633 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -336,8 +336,8 @@ static inline float64 float64_pack_raw(FloatParts p)
#include "softfloat-specialize.h"
/* Canonicalize EXP and FRAC, setting CLS. */
-static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
- float_status *status)
+static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
+ float_status *status)
{
if (part.exp == parm->exp_max && !parm->arm_althp) {
if (part.frac == 0) {
@@ -513,7 +513,7 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
const FloatFmt *params)
{
- return canonicalize(float16_unpack_raw(f), params, s);
+ return sf_canonicalize(float16_unpack_raw(f), params, s);
}
static FloatParts float16_unpack_canonical(float16 f, float_status *s)
@@ -534,7 +534,7 @@ static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
static FloatParts float32_unpack_canonical(float32 f, float_status *s)
{
- return canonicalize(float32_unpack_raw(f), &float32_params, s);
+ return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
}
static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
@@ -544,7 +544,7 @@ static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
static FloatParts float64_unpack_canonical(float64 f, float_status *s)
{
- return canonicalize(float64_unpack_raw(f), &float64_params, s);
+ return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
}
static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (3 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-03 14:16 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
` (10 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
These will gain some users very soon.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/fpu/softfloat.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 9eeccd88a5..38a5e99cf3 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -474,6 +474,11 @@ static inline bool float32_is_denormal(float32 a)
return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
}
+static inline bool float32_is_zero_or_normal(float32 a)
+{
+ return float32_is_normal(a) || float32_is_zero(a);
+}
+
static inline float32 float32_set_sign(float32 a, int sign)
{
return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -625,6 +630,11 @@ static inline bool float64_is_denormal(float64 a)
return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
}
+static inline bool float64_is_zero_or_normal(float64 a)
+{
+ return float64_is_normal(a) || float64_is_zero(a);
+}
+
static inline float64 float64_set_sign(float64 a, int sign)
{
return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (4 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-03 14:29 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
` (9 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
These microbenchmarks will allow us to measure the performance impact of
FP emulation optimizations. Note that we can measure both directly the impact
on the softfloat functions (with "-t soft"), or the impact on an
emulated workload (call with "-t host" and run under qemu user-mode).
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
tests/fp/fp-bench.c | 630 ++++++++++++++++++++++++++++++++++++++++++++
tests/fp/.gitignore | 1 +
tests/fp/Makefile | 5 +-
3 files changed, 635 insertions(+), 1 deletion(-)
create mode 100644 tests/fp/fp-bench.c
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
new file mode 100644
index 0000000000..f5bc5edebf
--- /dev/null
+++ b/tests/fp/fp-bench.c
@@ -0,0 +1,630 @@
+/*
+ * fp-bench.c - A collection of simple floating point microbenchmarks.
+ *
+ * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include <math.h>
+#include <fenv.h>
+#include "qemu/timer.h"
+#include "fpu/softfloat.h"
+
+/* amortize the computation of random inputs */
+#define OPS_PER_ITER 50000
+
+#define MAX_OPERANDS 3
+
+#define SEED_A 0xdeadfacedeadface
+#define SEED_B 0xbadc0feebadc0fee
+#define SEED_C 0xbeefdeadbeefdead
+
+enum op {
+ OP_ADD,
+ OP_SUB,
+ OP_MUL,
+ OP_DIV,
+ OP_FMA,
+ OP_SQRT,
+ OP_CMP,
+ OP_MAX_NR,
+};
+
+static const char * const op_names[] = {
+ [OP_ADD] = "add",
+ [OP_SUB] = "sub",
+ [OP_MUL] = "mul",
+ [OP_DIV] = "div",
+ [OP_FMA] = "mulAdd",
+ [OP_SQRT] = "sqrt",
+ [OP_CMP] = "cmp",
+ [OP_MAX_NR] = NULL,
+};
+
+enum precision {
+ PREC_SINGLE,
+ PREC_DOUBLE,
+ PREC_FLOAT32,
+ PREC_FLOAT64,
+ PREC_MAX_NR,
+};
+
+enum rounding {
+ ROUND_EVEN,
+ ROUND_ZERO,
+ ROUND_DOWN,
+ ROUND_UP,
+ ROUND_TIEAWAY,
+ N_ROUND_MODES,
+};
+
+static const char * const round_names[] = {
+ [ROUND_EVEN] = "even",
+ [ROUND_ZERO] = "zero",
+ [ROUND_DOWN] = "down",
+ [ROUND_UP] = "up",
+ [ROUND_TIEAWAY] = "tieaway",
+};
+
+enum tester {
+ TESTER_SOFT,
+ TESTER_HOST,
+ TESTER_MAX_NR,
+};
+
+static const char * const tester_names[] = {
+ [TESTER_SOFT] = "soft",
+ [TESTER_HOST] = "host",
+ [TESTER_MAX_NR] = NULL,
+};
+
+union fp {
+ float f;
+ double d;
+ float32 f32;
+ float64 f64;
+ uint64_t u64;
+};
+
+struct op_state;
+
+typedef float (*float_func_t)(const struct op_state *s);
+typedef double (*double_func_t)(const struct op_state *s);
+
+union fp_func {
+ float_func_t float_func;
+ double_func_t double_func;
+};
+
+typedef void (*bench_func_t)(void);
+
+struct op_desc {
+ const char * const name;
+};
+
+#define DEFAULT_DURATION_SECS 1
+
+static uint64_t random_ops[MAX_OPERANDS] = {
+ SEED_A, SEED_B, SEED_C,
+};
+static float_status soft_status;
+static enum precision precision;
+static enum op operation;
+static enum tester tester;
+static uint64_t n_completed_ops;
+static unsigned int duration = DEFAULT_DURATION_SECS;
+static int64_t ns_elapsed;
+/* disable optimizations with volatile */
+static volatile union fp res;
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+ x ^= x >> 12; /* a */
+ x ^= x << 25; /* b */
+ x ^= x >> 27; /* c */
+ return x * UINT64_C(2685821657736338717);
+}
+
+static void update_random_ops(int n_ops, enum precision prec)
+{
+ int i;
+
+ for (i = 0; i < n_ops; i++) {
+ uint64_t r = random_ops[i];
+
+ if (prec == PREC_SINGLE || PREC_FLOAT32) {
+ do {
+ r = xorshift64star(r);
+ } while (!float32_is_normal(r));
+ } else if (prec == PREC_DOUBLE || PREC_FLOAT64) {
+ do {
+ r = xorshift64star(r);
+ } while (!float64_is_normal(r));
+ } else {
+ g_assert_not_reached();
+ }
+ random_ops[i] = r;
+ }
+}
+
+static void fill_random(union fp *ops, int n_ops, enum precision prec,
+ bool no_neg)
+{
+ int i;
+
+ for (i = 0; i < n_ops; i++) {
+ switch (prec) {
+ case PREC_SINGLE:
+ case PREC_FLOAT32:
+ ops[i].f32 = make_float32(random_ops[i]);
+ if (no_neg && float32_is_neg(ops[i].f32)) {
+ ops[i].f32 = float32_chs(ops[i].f32);
+ }
+ /* raise the exponent to limit the frequency of denormal results */
+ ops[i].f32 |= 0x40000000;
+ break;
+ case PREC_DOUBLE:
+ case PREC_FLOAT64:
+ ops[i].f64 = make_float64(random_ops[i]);
+ if (no_neg && float64_is_neg(ops[i].f64)) {
+ ops[i].f64 = float64_chs(ops[i].f64);
+ }
+ /* raise the exponent to limit the frequency of denormal results */
+ ops[i].f64 |= LIT64(0x4000000000000000);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+}
+
+/*
+ * The main benchmark function. Instead of (ab)using macros, we rely
+ * on the compiler to unfold this at compile-time.
+ */
+static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
+{
+ int64_t tf = get_clock() + duration * 1000000000LL;
+
+ while (get_clock() < tf) {
+ union fp ops[MAX_OPERANDS];
+ int64_t t0;
+ int i;
+
+ update_random_ops(n_ops, prec);
+ switch (prec) {
+ case PREC_SINGLE:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ float a = ops[0].f;
+ float b = ops[1].f;
+ float c = ops[2].f;
+
+ switch (op) {
+ case OP_ADD:
+ res.f = a + b;
+ break;
+ case OP_SUB:
+ res.f = a - b;
+ break;
+ case OP_MUL:
+ res.f = a * b;
+ break;
+ case OP_DIV:
+ res.f = a / b;
+ break;
+ case OP_FMA:
+ res.f = fmaf(a, b, c);
+ break;
+ case OP_SQRT:
+ res.f = sqrtf(a);
+ break;
+ case OP_CMP:
+ res.u64 = isgreater(a, b);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ case PREC_DOUBLE:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ double a = ops[0].d;
+ double b = ops[1].d;
+ double c = ops[2].d;
+
+ switch (op) {
+ case OP_ADD:
+ res.d = a + b;
+ break;
+ case OP_SUB:
+ res.d = a - b;
+ break;
+ case OP_MUL:
+ res.d = a * b;
+ break;
+ case OP_DIV:
+ res.d = a / b;
+ break;
+ case OP_FMA:
+ res.d = fma(a, b, c);
+ break;
+ case OP_SQRT:
+ res.d = sqrt(a);
+ break;
+ case OP_CMP:
+ res.u64 = isgreater(a, b);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ case PREC_FLOAT32:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ float32 a = ops[0].f32;
+ float32 b = ops[1].f32;
+ float32 c = ops[2].f32;
+
+ switch (op) {
+ case OP_ADD:
+ res.f32 = float32_add(a, b, &soft_status);
+ break;
+ case OP_SUB:
+ res.f32 = float32_sub(a, b, &soft_status);
+ break;
+ case OP_MUL:
+ res.f = float32_mul(a, b, &soft_status);
+ break;
+ case OP_DIV:
+ res.f32 = float32_div(a, b, &soft_status);
+ break;
+ case OP_FMA:
+ res.f32 = float32_muladd(a, b, c, 0, &soft_status);
+ break;
+ case OP_SQRT:
+ res.f32 = float32_sqrt(a, &soft_status);
+ break;
+ case OP_CMP:
+ res.u64 = float32_compare_quiet(a, b, &soft_status);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ case PREC_FLOAT64:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ float64 a = ops[0].f64;
+ float64 b = ops[1].f64;
+ float64 c = ops[2].f64;
+
+ switch (op) {
+ case OP_ADD:
+ res.f64 = float64_add(a, b, &soft_status);
+ break;
+ case OP_SUB:
+ res.f64 = float64_sub(a, b, &soft_status);
+ break;
+ case OP_MUL:
+ res.f = float64_mul(a, b, &soft_status);
+ break;
+ case OP_DIV:
+ res.f64 = float64_div(a, b, &soft_status);
+ break;
+ case OP_FMA:
+ res.f64 = float64_muladd(a, b, c, 0, &soft_status);
+ break;
+ case OP_SQRT:
+ res.f64 = float64_sqrt(a, &soft_status);
+ break;
+ case OP_CMP:
+ res.u64 = float64_compare_quiet(a, b, &soft_status);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ ns_elapsed += get_clock() - t0;
+ n_completed_ops += OPS_PER_ITER;
+ }
+}
+
+#define GEN_BENCH(name, type, prec, op, n_ops) \
+ static void __attribute__((flatten)) name(void) \
+ { \
+ bench(prec, op, n_ops, false); \
+ }
+
+#define GEN_BENCH_NO_NEG(name, type, prec, op, n_ops) \
+ static void __attribute__((flatten)) name(void) \
+ { \
+ bench(prec, op, n_ops, true); \
+ }
+
+#define GEN_BENCH_ALL_TYPES(opname, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _float, float, PREC_SINGLE, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _double, double, PREC_DOUBLE, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _float32, float32, PREC_FLOAT32, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _float64, float64, PREC_FLOAT64, op, n_ops)
+
+GEN_BENCH_ALL_TYPES(add, OP_ADD, 2)
+GEN_BENCH_ALL_TYPES(sub, OP_SUB, 2)
+GEN_BENCH_ALL_TYPES(mul, OP_MUL, 2)
+GEN_BENCH_ALL_TYPES(div, OP_DIV, 2)
+GEN_BENCH_ALL_TYPES(fma, OP_FMA, 3)
+GEN_BENCH_ALL_TYPES(cmp, OP_CMP, 2)
+#undef GEN_BENCH_ALL_TYPES
+
+#define GEN_BENCH_ALL_TYPES_NO_NEG(name, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _float, float, PREC_SINGLE, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _double, double, PREC_DOUBLE, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _float32, float32, PREC_FLOAT32, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _float64, float64, PREC_FLOAT64, op, n)
+
+GEN_BENCH_ALL_TYPES_NO_NEG(sqrt, OP_SQRT, 1)
+#undef GEN_BENCH_ALL_TYPES_NO_NEG
+
+#undef GEN_BENCH_NO_NEG
+#undef GEN_BENCH
+
+#define GEN_BENCH_FUNCS(opname, op) \
+ [op] = { \
+ [PREC_SINGLE] = bench_ ## opname ## _float, \
+ [PREC_DOUBLE] = bench_ ## opname ## _double, \
+ [PREC_FLOAT32] = bench_ ## opname ## _float32, \
+ [PREC_FLOAT64] = bench_ ## opname ## _float64, \
+ }
+
+static const bench_func_t bench_funcs[OP_MAX_NR][PREC_MAX_NR] = {
+ GEN_BENCH_FUNCS(add, OP_ADD),
+ GEN_BENCH_FUNCS(sub, OP_SUB),
+ GEN_BENCH_FUNCS(mul, OP_MUL),
+ GEN_BENCH_FUNCS(div, OP_DIV),
+ GEN_BENCH_FUNCS(fma, OP_FMA),
+ GEN_BENCH_FUNCS(sqrt, OP_SQRT),
+ GEN_BENCH_FUNCS(cmp, OP_CMP),
+};
+
+#undef GEN_BENCH_FUNCS
+
+static void run_bench(void)
+{
+ bench_func_t f;
+
+ f = bench_funcs[operation][precision];
+ g_assert(f);
+ f();
+}
+
+/* @arr must be NULL-terminated */
+static int find_name(const char * const *arr, const char *name)
+{
+ int i;
+
+ for (i = 0; arr[i] != NULL; i++) {
+ if (strcmp(name, arr[i]) == 0) {
+ return i;
+ }
+ }
+ return -1;
+}
+
+static void usage_complete(int argc, char *argv[])
+{
+ gchar *op_list = g_strjoinv(", ", (gchar **)op_names);
+ gchar *tester_list = g_strjoinv(", ", (gchar **)tester_names);
+
+ fprintf(stderr, "Usage: %s [options]\n", argv[0]);
+ fprintf(stderr, "options:\n");
+ fprintf(stderr, " -d = duration, in seconds. Default: %d\n",
+ DEFAULT_DURATION_SECS);
+ fprintf(stderr, " -h = show this help message.\n");
+ fprintf(stderr, " -o = floating point operation (%s). Default: %s\n",
+ op_list, op_names[0]);
+ fprintf(stderr, " -p = floating point precision (single, double). "
+ "Default: single\n");
+ fprintf(stderr, " -r = rounding mode (even, zero, down, up, tieaway). "
+ "Default: even\n");
+ fprintf(stderr, " -t = tester (%s). Default: %s\n",
+ tester_list, tester_names[0]);
+ fprintf(stderr, " -z = flush inputs to zero (soft tester only). "
+ "Default: disabled\n");
+ fprintf(stderr, " -Z = flush output to zero (soft tester only). "
+ "Default: disabled\n");
+
+ g_free(tester_list);
+ g_free(op_list);
+}
+
+static int round_name_to_mode(const char *name)
+{
+ int i;
+
+ for (i = 0; i < N_ROUND_MODES; i++) {
+ if (!strcmp(round_names[i], name)) {
+ return i;
+ }
+ }
+ return -1;
+}
+
+static void QEMU_NORETURN die_host_rounding(enum rounding rounding)
+{
+ fprintf(stderr, "fatal: '%s' rounding not supported on this host\n",
+ round_names[rounding]);
+ exit(EXIT_FAILURE);
+}
+
+static void set_host_precision(enum rounding rounding)
+{
+ int rhost;
+
+ switch (rounding) {
+ case ROUND_EVEN:
+ rhost = FE_TONEAREST;
+ break;
+ case ROUND_ZERO:
+ rhost = FE_TOWARDZERO;
+ break;
+ case ROUND_DOWN:
+ rhost = FE_DOWNWARD;
+ break;
+ case ROUND_UP:
+ rhost = FE_UPWARD;
+ break;
+ case ROUND_TIEAWAY:
+ die_host_rounding(rounding);
+ return;
+ default:
+ g_assert_not_reached();
+ }
+
+ if (fesetround(rhost)) {
+ die_host_rounding(rounding);
+ }
+}
+
+static void set_soft_precision(enum rounding rounding)
+{
+ signed char mode;
+
+ switch (rounding) {
+ case ROUND_EVEN:
+ mode = float_round_nearest_even;
+ break;
+ case ROUND_ZERO:
+ mode = float_round_to_zero;
+ break;
+ case ROUND_DOWN:
+ mode = float_round_down;
+ break;
+ case ROUND_UP:
+ mode = float_round_up;
+ break;
+ case ROUND_TIEAWAY:
+ mode = float_round_ties_away;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ soft_status.float_rounding_mode = mode;
+}
+
+static void parse_args(int argc, char *argv[])
+{
+ int c;
+ int val;
+ int rounding = ROUND_EVEN;
+
+ for (;;) {
+ c = getopt(argc, argv, "d:ho:p:r:t:zZ");
+ if (c < 0) {
+ break;
+ }
+ switch (c) {
+ case 'd':
+ duration = atoi(optarg);
+ break;
+ case 'h':
+ usage_complete(argc, argv);
+ exit(EXIT_SUCCESS);
+ case 'o':
+ val = find_name(op_names, optarg);
+ if (val < 0) {
+ fprintf(stderr, "Unsupported op '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ operation = val;
+ break;
+ case 'p':
+ if (!strcmp(optarg, "single")) {
+ precision = PREC_SINGLE;
+ } else if (!strcmp(optarg, "double")) {
+ precision = PREC_DOUBLE;
+ } else {
+ fprintf(stderr, "Unsupported precision '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case 'r':
+ rounding = round_name_to_mode(optarg);
+ if (rounding < 0) {
+ fprintf(stderr, "fatal: invalid rounding mode '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case 't':
+ val = find_name(tester_names, optarg);
+ if (val < 0) {
+ fprintf(stderr, "Unsupported tester '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ tester = val;
+ break;
+ case 'z':
+ soft_status.flush_inputs_to_zero = 1;
+ break;
+ case 'Z':
+ soft_status.flush_to_zero = 1;
+ break;
+ }
+ }
+
+ /* set precision and rounding mode based on the tester */
+ switch (tester) {
+ case TESTER_HOST:
+ set_host_precision(rounding);
+ break;
+ case TESTER_SOFT:
+ set_soft_precision(rounding);
+ switch (precision) {
+ case PREC_SINGLE:
+ precision = PREC_FLOAT32;
+ break;
+ case PREC_DOUBLE:
+ precision = PREC_FLOAT64;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+}
+
+static void pr_stats(void)
+{
+ printf("%.2f MFlops\n", (double)n_completed_ops / ns_elapsed * 1e3);
+}
+
+int main(int argc, char *argv[])
+{
+ parse_args(argc, argv);
+ run_bench();
+ pr_stats();
+ return 0;
+}
diff --git a/tests/fp/.gitignore b/tests/fp/.gitignore
index 8d45d18ac4..704fd42992 100644
--- a/tests/fp/.gitignore
+++ b/tests/fp/.gitignore
@@ -1 +1,2 @@
fp-test
+fp-bench
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index 49cdcd1bd2..5019dcdca0 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -553,7 +553,7 @@ TF_OBJS_LIB += $(TF_OBJS_WRITECASE)
TF_OBJS_LIB += testLoops_common.o
TF_OBJS_LIB += $(TF_OBJS_TEST)
-BINARIES := fp-test$(EXESUF)
+BINARIES := fp-test$(EXESUF) fp-bench$(EXESUF)
# everything depends on config-host.h because platform.h includes it
all: $(BUILD_DIR)/config-host.h
@@ -590,10 +590,13 @@ $(TF_OBJS_LIB) slowfloat.o: %.o: $(TF_SOURCE_DIR)/%.c
libtestfloat.a: $(TF_OBJS_LIB)
+fp-bench$(EXESUF): fp-bench.o $(QEMU_SOFTFLOAT_OBJ) $(LIBQEMUUTIL)
+
clean:
rm -f *.o *.d $(BINARIES)
rm -f *.gcno *.gcda *.gcov
rm -f fp-test$(EXESUF)
+ rm -f fp-bench$(EXESUF)
rm -f libsoftfloat.a
rm -f libtestfloat.a
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (5 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-11-25 0:25 ` Aleksandar Markovic
2018-12-04 12:28 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
` (8 subsequent siblings)
15 siblings, 2 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.
The approach followed here avoids checking the FP exception flags register.
See the added comment for details.
This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags such as -ffast-math. We control the flags so this should
be easy to enforce though.
This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.
Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 315 insertions(+)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ecdc00c633..306a12fa8d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -83,6 +83,7 @@ this code that are retained.
* target-dependent and needs the TARGET_* macros.
*/
#include "qemu/osdep.h"
+#include <math.h>
#include "qemu/bitops.h"
#include "fpu/softfloat.h"
@@ -95,6 +96,320 @@ this code that are retained.
*----------------------------------------------------------------------------*/
#include "fpu/softfloat-macros.h"
+/*
+ * Hardfloat
+ *
+ * Fast emulation of guest FP instructions is challenging for two reasons.
+ * First, FP instruction semantics are similar but not identical, particularly
+ * when handling NaNs. Second, emulating at reasonable speed the guest FP
+ * exception flags is not trivial: reading the host's flags register with a
+ * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
+ * and trapping on every FP exception is not fast nor pleasant to work with.
+ *
+ * We address these challenges by leveraging the host FPU for a subset of the
+ * operations. To do this we expand on the idea presented in this paper:
+ *
+ * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
+ * binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.
+ *
+ * The idea is thus to leverage the host FPU to (1) compute FP operations
+ * and (2) identify whether FP exceptions occurred while avoiding
+ * expensive exception flag register accesses.
+ *
+ * An important optimization shown in the paper is that given that exception
+ * flags are rarely cleared by the guest, we can avoid recomputing some flags.
+ * This is particularly useful for the inexact flag, which is very frequently
+ * raised in floating-point workloads.
+ *
+ * We optimize the code further by deferring to soft-fp whenever FP exception
+ * detection might get hairy. Two examples: (1) when at least one operand is
+ * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 result
+ * and the result is < the minimum normal.
+ */
+#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t) \
+ static inline void name(soft_t *a, float_status *s) \
+ { \
+ if (unlikely(soft_t ## _is_denormal(*a))) { \
+ *a = soft_t ## _set_sign(soft_t ## _zero, \
+ soft_t ## _is_neg(*a)); \
+ s->float_exception_flags |= float_flag_input_denormal; \
+ } \
+ }
+
+GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
+GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
+#undef GEN_INPUT_FLUSH__NOCHECK
+
+#define GEN_INPUT_FLUSH1(name, soft_t) \
+ static inline void name(soft_t *a, float_status *s) \
+ { \
+ if (likely(!s->flush_inputs_to_zero)) { \
+ return; \
+ } \
+ soft_t ## _input_flush__nocheck(a, s); \
+ }
+
+GEN_INPUT_FLUSH1(float32_input_flush1, float32)
+GEN_INPUT_FLUSH1(float64_input_flush1, float64)
+#undef GEN_INPUT_FLUSH1
+
+#define GEN_INPUT_FLUSH2(name, soft_t) \
+ static inline void name(soft_t *a, soft_t *b, float_status *s) \
+ { \
+ if (likely(!s->flush_inputs_to_zero)) { \
+ return; \
+ } \
+ soft_t ## _input_flush__nocheck(a, s); \
+ soft_t ## _input_flush__nocheck(b, s); \
+ }
+
+GEN_INPUT_FLUSH2(float32_input_flush2, float32)
+GEN_INPUT_FLUSH2(float64_input_flush2, float64)
+#undef GEN_INPUT_FLUSH2
+
+#define GEN_INPUT_FLUSH3(name, soft_t) \
+ static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
+ { \
+ if (likely(!s->flush_inputs_to_zero)) { \
+ return; \
+ } \
+ soft_t ## _input_flush__nocheck(a, s); \
+ soft_t ## _input_flush__nocheck(b, s); \
+ soft_t ## _input_flush__nocheck(c, s); \
+ }
+
+GEN_INPUT_FLUSH3(float32_input_flush3, float32)
+GEN_INPUT_FLUSH3(float64_input_flush3, float64)
+#undef GEN_INPUT_FLUSH3
+
+/*
+ * Choose whether to use fpclassify or float32/64_* primitives in the generated
+ * hardfloat functions. Each combination of number of inputs and float size
+ * gets its own value.
+ */
+#if defined(__x86_64__)
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 1
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 1
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 1
+#else
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 0
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 0
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 0
+#endif
+
+/*
+ * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
+ * float{32,64}_is_infinity when !USE_FP.
+ * On x86_64/aarch64, using the former over the latter can yield a ~6% speedup.
+ * On power64 however, using isinf() reduces fp-bench performance by up to 50%.
+ */
+#if defined(__x86_64__) || defined(__aarch64__)
+# define QEMU_HARDFLOAT_USE_ISINF 1
+#else
+# define QEMU_HARDFLOAT_USE_ISINF 0
+#endif
+
+/*
+ * Some targets clear the FP flags before most FP operations. This prevents
+ * the use of hardfloat, since hardfloat relies on the inexact flag being
+ * already set.
+ */
+#if defined(TARGET_PPC)
+# define QEMU_NO_HARDFLOAT 1
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
+#else
+# define QEMU_NO_HARDFLOAT 0
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN __attribute__((noinline))
+#endif
+
+static inline bool can_use_fpu(const float_status *s)
+{
+ if (QEMU_NO_HARDFLOAT) {
+ return false;
+ }
+ return likely(s->float_exception_flags & float_flag_inexact &&
+ s->float_rounding_mode == float_round_nearest_even);
+}
+
+/*
+ * Hardfloat generation functions. Each operation can have two flavors:
+ * either using softfloat primitives (e.g. float32_is_zero_or_normal) for
+ * most condition checks, or native ones (e.g. fpclassify).
+ *
+ * The flavor is chosen by the callers. Instead of using macros, we rely on the
+ * compiler to propagate constants and inline everything into the callers.
+ *
+ * We only generate functions for operations with two inputs, since only
+ * these are common enough to justify consolidating them into common code.
+ */
+
+typedef union {
+ float32 s;
+ float h;
+} union_float32;
+
+typedef union {
+ float64 s;
+ double h;
+} union_float64;
+
+typedef bool (*f32_check_fn)(union_float32 a, union_float32 b);
+typedef bool (*f64_check_fn)(union_float64 a, union_float64 b);
+
+typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);
+typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
+typedef float (*hard_f32_op2_fn)(float a, float b);
+typedef double (*hard_f64_op2_fn)(double a, double b);
+
+/* 2-input is-zero-or-normal */
+static inline bool f32_is_zon2(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ /*
+ * Not using a temp variable for consecutive fpclassify calls ends up
+ * generating faster code.
+ */
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO);
+ }
+ return float32_is_zero_or_normal(a.s) &&
+ float32_is_zero_or_normal(b.s);
+}
+
+static inline bool f64_is_zon2(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO);
+ }
+ return float64_is_zero_or_normal(a.s) &&
+ float64_is_zero_or_normal(b.s);
+}
+
+/* 3-input is-zero-or-normal */
+static inline
+bool f32_is_zon3(union_float32 a, union_float32 b, union_float32 c)
+{
+ if (QEMU_HARDFLOAT_3F32_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO) &&
+ (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) == FP_ZERO);
+ }
+ return float32_is_zero_or_normal(a.s) &&
+ float32_is_zero_or_normal(b.s) &&
+ float32_is_zero_or_normal(c.s);
+}
+
+static inline
+bool f64_is_zon3(union_float64 a, union_float64 b, union_float64 c)
+{
+ if (QEMU_HARDFLOAT_3F64_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO) &&
+ (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) == FP_ZERO);
+ }
+ return float64_is_zero_or_normal(a.s) &&
+ float64_is_zero_or_normal(b.s) &&
+ float64_is_zero_or_normal(c.s);
+}
+
+static inline bool f32_is_inf(union_float32 a)
+{
+ if (QEMU_HARDFLOAT_USE_ISINF) {
+ return isinff(a.h);
+ }
+ return float32_is_infinity(a.s);
+}
+
+static inline bool f64_is_inf(union_float64 a)
+{
+ if (QEMU_HARDFLOAT_USE_ISINF) {
+ return isinf(a.h);
+ }
+ return float64_is_infinity(a.s);
+}
+
+/* Note: @fast_test and @post can be NULL */
+static inline float32
+float32_gen2(float32 xa, float32 xb, float_status *s,
+ hard_f32_op2_fn hard, soft_f32_op2_fn soft,
+ f32_check_fn pre, f32_check_fn post,
+ f32_check_fn fast_test, soft_f32_op2_fn fast_op)
+{
+ union_float32 ua, ub, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float32_input_flush2(&ua.s, &ub.s, s);
+ if (unlikely(!pre(ua, ub))) {
+ goto soft;
+ }
+ if (fast_test && fast_test(ua, ub)) {
+ return fast_op(ua.s, ub.s, s);
+ }
+
+ ur.h = hard(ua.h, ub.h);
+ if (unlikely(f32_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
+ if (post == NULL || post(ua, ub)) {
+ goto soft;
+ }
+ }
+ return ur.s;
+
+ soft:
+ return soft(ua.s, ub.s, s);
+}
+
+static inline float64
+float64_gen2(float64 xa, float64 xb, float_status *s,
+ hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+ f64_check_fn pre, f64_check_fn post,
+ f64_check_fn fast_test, soft_f64_op2_fn fast_op)
+{
+ union_float64 ua, ub, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float64_input_flush2(&ua.s, &ub.s, s);
+ if (unlikely(!pre(ua, ub))) {
+ goto soft;
+ }
+ if (fast_test && fast_test(ua, ub)) {
+ return fast_op(ua.s, ub.s, s);
+ }
+
+ ur.h = hard(ua.h, ub.h);
+ if (unlikely(f64_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabs(ur.h) <= DBL_MIN)) {
+ if (post == NULL || post(ua, ub)) {
+ goto soft;
+ }
+ }
+ return ur.s;
+
+ soft:
+ return soft(ua.s, ub.s, s);
+}
+
/*----------------------------------------------------------------------------
| Returns the fraction bits of the half-precision floating-point value `a'.
*----------------------------------------------------------------------------*/
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (6 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-04 18:34 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
` (7 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Performance results (single and double precision) for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
add-single: 135.07 MFlops
add-double: 131.60 MFlops
sub-single: 130.04 MFlops
sub-double: 133.01 MFlops
- after:
add-single: 443.04 MFlops
add-double: 301.95 MFlops
sub-single: 411.36 MFlops
sub-double: 293.15 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
add-single: 44.79 MFlops
add-double: 49.20 MFlops
sub-single: 44.55 MFlops
sub-double: 49.06 MFlops
- after:
add-single: 93.28 MFlops
add-double: 88.27 MFlops
sub-single: 91.47 MFlops
sub-double: 88.27 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
add-single: 72.59 MFlops
add-double: 72.27 MFlops
sub-single: 75.33 MFlops
sub-double: 70.54 MFlops
- after:
add-single: 112.95 MFlops
add-double: 201.11 MFlops
sub-single: 116.80 MFlops
sub-double: 188.72 MFlops
Note that the IBM and ARM machines benefit from having
HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
can suffer significantly:
- IBM Power8:
add-single: [1] 54.94 vs [0] 116.37 MFlops
add-double: [1] 58.92 vs [0] 201.44 MFlops
- Aarch64 A57:
add-single: [1] 80.72 vs [0] 93.24 MFlops
add-double: [1] 82.10 vs [0] 88.18 MFlops
On the Intel machine, having 2F64 set to 1 pays off, but it
doesn't for 2F32:
- Intel i7-6700K:
add-single: [1] 285.79 vs [0] 426.70 MFlops
add-double: [1] 302.15 vs [0] 278.82 MFlops
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 117 ++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 98 insertions(+), 19 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 306a12fa8d..cc500b1618 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1050,49 +1050,128 @@ float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_add(float32 a, float32 b, float_status *status)
+float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+{
+ FloatParts pa = float16_unpack_canonical(a, status);
+ FloatParts pb = float16_unpack_canonical(b, status);
+ FloatParts pr = addsub_floats(pa, pb, true, status);
+
+ return float16_round_pack_canonical(pr, status);
+}
+
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, false, status);
+ FloatParts pr = addsub_floats(pa, pb, subtract, status);
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_add(float64 a, float64 b, float_status *status)
+static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
+{
+ return soft_f32_addsub(a, b, false, status);
+}
+
+static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
+{
+ return soft_f32_addsub(a, b, true, status);
+}
+
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, false, status);
+ FloatParts pr = addsub_floats(pa, pb, subtract, status);
return float64_round_pack_canonical(pr, status);
}
-float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
{
- FloatParts pa = float16_unpack_canonical(a, status);
- FloatParts pb = float16_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, true, status);
+ return soft_f64_addsub(a, b, false, status);
+}
- return float16_round_pack_canonical(pr, status);
+static inline float64 soft_f64_sub(float64 a, float64 b, float_status *status)
+{
+ return soft_f64_addsub(a, b, true, status);
}
-float32 QEMU_FLATTEN float32_sub(float32 a, float32 b, float_status *status)
+static float hard_f32_add(float a, float b)
{
- FloatParts pa = float32_unpack_canonical(a, status);
- FloatParts pb = float32_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, true, status);
+ return a + b;
+}
- return float32_round_pack_canonical(pr, status);
+static float hard_f32_sub(float a, float b)
+{
+ return a - b;
}
-float64 QEMU_FLATTEN float64_sub(float64 a, float64 b, float_status *status)
+static double hard_f64_add(double a, double b)
{
- FloatParts pa = float64_unpack_canonical(a, status);
- FloatParts pb = float64_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, true, status);
+ return a + b;
+}
- return float64_round_pack_canonical(pr, status);
+static double hard_f64_sub(double a, double b)
+{
+ return a - b;
+}
+
+static bool f32_addsub_post(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ return !(fpclassify(a.h) == FP_ZERO && fpclassify(b.h) == FP_ZERO);
+ }
+ return !(float32_is_zero(a.s) && float32_is_zero(b.s));
+}
+
+static bool f64_addsub_post(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return !(fpclassify(a.h) == FP_ZERO && fpclassify(b.h) == FP_ZERO);
+ } else {
+ return !(float64_is_zero(a.s) && float64_is_zero(b.s));
+ }
+}
+
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
+ hard_f32_op2_fn hard, soft_f32_op2_fn soft)
+{
+ return float32_gen2(a, b, s, hard, soft,
+ f32_is_zon2, f32_addsub_post, NULL, NULL);
+}
+
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
+ hard_f64_op2_fn hard, soft_f64_op2_fn soft)
+{
+ return float64_gen2(a, b, s, hard, soft,
+ f64_is_zon2, f64_addsub_post, NULL, NULL);
+}
+
+float32 QEMU_FLATTEN
+float32_add(float32 a, float32 b, float_status *s)
+{
+ return float32_addsub(a, b, s, hard_f32_add, soft_f32_add);
+}
+
+float32 QEMU_FLATTEN
+float32_sub(float32 a, float32 b, float_status *s)
+{
+ return float32_addsub(a, b, s, hard_f32_sub, soft_f32_sub);
+}
+
+float64 QEMU_FLATTEN
+float64_add(float64 a, float64 b, float_status *s)
+{
+ return float64_addsub(a, b, s, hard_f64_add, soft_f64_add);
+}
+
+float64 QEMU_FLATTEN
+float64_sub(float64 a, float64 b, float_status *s)
+{
+ return float64_addsub(a, b, s, hard_f64_sub, soft_f64_sub);
}
/*
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (7 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-05 10:10 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
` (6 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Performance results for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
mul-single: 126.91 MFlops
mul-double: 118.28 MFlops
- after:
mul-single: 258.02 MFlops
mul-double: 197.96 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
mul-single: 37.42 MFlops
mul-double: 38.77 MFlops
- after:
mul-single: 73.41 MFlops
mul-double: 76.93 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
mul-single: 58.40 MFlops
mul-double: 59.33 MFlops
- after:
mul-single: 60.25 MFlops
mul-double: 94.79 MFlops
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index cc500b1618..58e67d9b80 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1232,7 +1232,8 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_mul(float32 a, float32 b, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
@@ -1241,7 +1242,8 @@ float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_mul(float64 a, float64 b, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
@@ -1250,6 +1252,54 @@ float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
return float64_round_pack_canonical(pr, status);
}
+static float hard_f32_mul(float a, float b)
+{
+ return a * b;
+}
+
+static double hard_f64_mul(double a, double b)
+{
+ return a * b;
+}
+
+static bool f32_mul_fast_test(union_float32 a, union_float32 b)
+{
+ return float32_is_zero(a.s) || float32_is_zero(b.s);
+}
+
+static bool f64_mul_fast_test(union_float64 a, union_float64 b)
+{
+ return float64_is_zero(a.s) || float64_is_zero(b.s);
+}
+
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
+{
+ bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
+
+ return float32_set_sign(float32_zero, signbit);
+}
+
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
+{
+ bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
+
+ return float64_set_sign(float64_zero, signbit);
+}
+
+float32 QEMU_FLATTEN
+float32_mul(float32 a, float32 b, float_status *s)
+{
+ return float32_gen2(a, b, s, hard_f32_mul, soft_f32_mul,
+ f32_is_zon2, NULL, f32_mul_fast_test, f32_mul_fast_op);
+}
+
+float64 QEMU_FLATTEN
+float64_mul(float64 a, float64 b, float_status *s)
+{
+ return float64_gen2(a, b, s, hard_f64_mul, soft_f64_mul,
+ f64_is_zon2, NULL, f64_mul_fast_test, f64_mul_fast_op);
+}
+
/*
* Returns the result of multiplying the floating-point values `a' and
* `b' then adding 'c', with no intermediate rounding step after the
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (8 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-05 10:11 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
` (5 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Performance results for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
div-single: 34.84 MFlops
div-double: 34.04 MFlops
- after:
div-single: 275.23 MFlops
div-double: 216.38 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
div-single: 9.33 MFlops
div-double: 9.30 MFlops
- after:
div-single: 51.55 MFlops
div-double: 15.09 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
div-single: 25.65 MFlops
div-double: 24.91 MFlops
- after:
div-single: 96.83 MFlops
div-double: 31.01 MFlops
Here setting 2FP64_USE_FP to 1 pays off for x86_64:
[1] 215.97 vs [0] 62.15 MFlops
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 62 insertions(+), 2 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 58e67d9b80..e35ebfaae7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1624,7 +1624,8 @@ float16 float16_div(float16 a, float16 b, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 float32_div(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_div(float32 a, float32 b, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
@@ -1633,7 +1634,8 @@ float32 float32_div(float32 a, float32 b, float_status *status)
return float32_round_pack_canonical(pr, status);
}
-float64 float64_div(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_div(float64 a, float64 b, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
@@ -1642,6 +1644,64 @@ float64 float64_div(float64 a, float64 b, float_status *status)
return float64_round_pack_canonical(pr, status);
}
+static float hard_f32_div(float a, float b)
+{
+ return a / b;
+}
+
+static double hard_f64_div(double a, double b)
+{
+ return a / b;
+}
+
+static bool f32_div_pre(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ fpclassify(b.h) == FP_NORMAL;
+ }
+ return float32_is_zero_or_normal(a.s) && float32_is_normal(b.s);
+}
+
+static bool f64_div_pre(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ fpclassify(b.h) == FP_NORMAL;
+ }
+ return float64_is_zero_or_normal(a.s) && float64_is_normal(b.s);
+}
+
+static bool f32_div_post(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ return fpclassify(a.h) != FP_ZERO;
+ }
+ return !float32_is_zero(a.s);
+}
+
+static bool f64_div_post(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return fpclassify(a.h) != FP_ZERO;
+ }
+ return !float64_is_zero(a.s);
+}
+
+float32 QEMU_FLATTEN
+float32_div(float32 a, float32 b, float_status *s)
+{
+ return float32_gen2(a, b, s, hard_f32_div, soft_f32_div,
+ f32_div_pre, f32_div_post, NULL, NULL);
+}
+
+float64 QEMU_FLATTEN
+float64_div(float64 a, float64 b, float_status *s)
+{
+ return float64_gen2(a, b, s, hard_f64_div, soft_f64_div,
+ f64_div_pre, f64_div_post, NULL, NULL);
+}
+
/*
* Float to Float conversions
*
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (9 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-05 12:25 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
` (4 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Performance results for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
fma-single: 74.73 MFlops
fma-double: 74.54 MFlops
- after:
fma-single: 203.37 MFlops
fma-double: 169.37 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
fma-single: 23.24 MFlops
fma-double: 23.70 MFlops
- after:
fma-single: 66.14 MFlops
fma-double: 63.10 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
fma-single: 37.26 MFlops
fma-double: 37.29 MFlops
- after:
fma-single: 48.90 MFlops
fma-double: 59.51 MFlops
Here having 3FP64 set to 1 pays off for x86_64:
[1] 170.15 vs [0] 153.12 MFlops
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 128 insertions(+), 4 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e35ebfaae7..e03feafb6f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1514,8 +1514,9 @@ float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
- int flags, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
+ float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
@@ -1525,8 +1526,9 @@ float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
- int flags, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
+ float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
@@ -1536,6 +1538,128 @@ float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
return float64_round_pack_canonical(pr, status);
}
+float32 QEMU_FLATTEN
+float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
+{
+ union_float32 ua, ub, uc, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+ uc.s = xc;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+ if (unlikely(flags & float_muladd_halve_result)) {
+ goto soft;
+ }
+
+ float32_input_flush3(&ua.s, &ub.s, &uc.s, s);
+ if (unlikely(!f32_is_zon3(ua, ub, uc))) {
+ goto soft;
+ }
+ /*
+ * When (a || b) == 0, there's no need to check for under/over flow,
+ * since we know the addend is (normal || 0) and the product is 0.
+ */
+ if (float32_is_zero(ua.s) || float32_is_zero(ub.s)) {
+ union_float32 up;
+ bool prod_sign;
+
+ prod_sign = float32_is_neg(ua.s) ^ float32_is_neg(ub.s);
+ prod_sign ^= !!(flags & float_muladd_negate_product);
+ up.s = float32_set_sign(float32_zero, prod_sign);
+
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+ ur.h = up.h + uc.h;
+ } else {
+ if (flags & float_muladd_negate_product) {
+ ua.h = -ua.h;
+ }
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+
+ ur.h = fmaf(ua.h, ub.h, uc.h);
+
+ if (unlikely(f32_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
+ goto soft;
+ }
+ }
+ if (flags & float_muladd_negate_result) {
+ return float32_chs(ur.s);
+ }
+ return ur.s;
+
+ soft:
+ return soft_f32_muladd(ua.s, ub.s, uc.s, flags, s);
+}
+
+float64 QEMU_FLATTEN
+float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
+{
+ union_float64 ua, ub, uc, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+ uc.s = xc;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+ if (unlikely(flags & float_muladd_halve_result)) {
+ goto soft;
+ }
+
+ float64_input_flush3(&ua.s, &ub.s, &uc.s, s);
+ if (unlikely(!f64_is_zon3(ua, ub, uc))) {
+ goto soft;
+ }
+ /*
+ * When (a || b) == 0, there's no need to check for under/over flow,
+ * since we know the addend is (normal || 0) and the product is 0.
+ */
+ if (float64_is_zero(ua.s) || float64_is_zero(ub.s)) {
+ union_float64 up;
+ bool prod_sign;
+
+ prod_sign = float64_is_neg(ua.s) ^ float64_is_neg(ub.s);
+ prod_sign ^= !!(flags & float_muladd_negate_product);
+ up.s = float64_set_sign(float64_zero, prod_sign);
+
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+ ur.h = up.h + uc.h;
+ } else {
+ if (flags & float_muladd_negate_product) {
+ ua.h = -ua.h;
+ }
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+
+ ur.h = fma(ua.h, ub.h, uc.h);
+
+ if (unlikely(f64_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabs(ur.h) <= FLT_MIN)) {
+ goto soft;
+ }
+ }
+ if (flags & float_muladd_negate_result) {
+ return float64_chs(ur.s);
+ }
+ return ur.s;
+
+ soft:
+ return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
+}
+
/*
* Returns the result of dividing the floating-point value `a' by the
* corresponding value `b'. The operation is performed according to
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (10 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-05 12:26 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
` (3 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 42.30 MFlops
sqrt-double: 22.97 MFlops
- after:
sqrt-single: 311.42 MFlops
sqrt-double: 311.08 MFlops
Here USE_FP makes a huge difference for f64's, with throughput
going from ~200 MFlops to ~300 MFlops.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 58 insertions(+), 2 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e03feafb6f..4c6ecd1883 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3040,20 +3040,76 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_sqrt(float32 a, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_sqrt(float32 a, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pr = sqrt_float(pa, status, &float32_params);
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_sqrt(float64 a, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_sqrt(float64 a, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pr = sqrt_float(pa, status, &float64_params);
return float64_round_pack_canonical(pr, status);
}
+float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
+{
+ union_float32 ua, ur;
+
+ ua.s = xa;
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float32_input_flush1(&ua.s, s);
+ if (QEMU_HARDFLOAT_1F32_USE_FP) {
+ if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
+ fpclassify(ua.h) == FP_ZERO) ||
+ signbit(ua.h))) {
+ goto soft;
+ }
+ } else if (unlikely(!float32_is_zero_or_normal(ua.s) ||
+ float32_is_neg(ua.s))) {
+ goto soft;
+ }
+ ur.h = sqrtf(ua.h);
+ return ur.s;
+
+ soft:
+ return soft_f32_sqrt(ua.s, s);
+}
+
+float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
+{
+ union_float64 ua, ur;
+
+ ua.s = xa;
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float64_input_flush1(&ua.s, s);
+ if (QEMU_HARDFLOAT_1F64_USE_FP) {
+ if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
+ fpclassify(ua.h) == FP_ZERO) ||
+ signbit(ua.h))) {
+ goto soft;
+ }
+ } else if (unlikely(!float64_is_zero_or_normal(ua.s) ||
+ float64_is_neg(ua.s))) {
+ goto soft;
+ }
+ ur.h = sqrt(ua.h);
+ return ur.s;
+
+ soft:
+ return soft_f64_sqrt(ua.s, s);
+}
+
/*----------------------------------------------------------------------------
| The pattern for a default generated NaN.
*----------------------------------------------------------------------------*/
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (11 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
@ 2018-11-24 23:55 ` Emilio G. Cota
2018-12-05 12:36 ` Alex Bennée
2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
` (2 subsequent siblings)
15 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-24 23:55 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson
Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
cmp-single: 110.98 MFlops
cmp-double: 107.12 MFlops
- after:
cmp-single: 506.28 MFlops
cmp-double: 524.77 MFlops
Note that flattening both eq and eq_signaling versions
would give us extra performance (695v506, 615v524 Mflops
for single/double, respectively) but this would emit two
essentially identical functions for each eq/signaling pair,
which is a waste.
Aggregate performance improvement for the last few patches:
[ all charts in png: https://imgur.com/a/4yV8p ]
1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
qemu-aarch64 NBench score; higher is better
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
16 +-+-----------+-------------+----===-------+---===-------+-----------+-+
14 +-+..........................@@@&&.=.......@@@&&.=...................+-+
12 +-+..........................@.@.&.=.......@.@.&.=.....+befor=== +-+
10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& = +-+
8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+ @@u& = +-+
6 +-+............@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& = +-+
4 +-+.......###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& = +-+
2 +-+.....***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& = +-+
0 +-+-----***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-----+-+
FOURIER NEURAL NELU DECOMPOSITION gmean
qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
error bars: 95% confidence interval
4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
4 +-+..........................+@@+...........................................................................+-+
3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub +-+
2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@& +-+
2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+
1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean
2. Host: ARM Aarch64 A57 @ 2.4GHz
qemu-aarch64 NBench score; higher is better
Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz
5 +-+-----------+-------------+-------------+-------------+-----------+-+
4.5 +-+........................................@@@&==...................+-+
3 4 +-+..........................@@@&==........@.@&.=.....+before +-+
3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&== +-+
2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+ @m@& = +-+
2 +-+............@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& = +-+
1.5 +-+.....***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& = +-+
0.5 +-+.....*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& = +-+
0 +-+-----***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-----+-+
FOURIER NEURAL NLU DECOMPOSITION gmean
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
fpu/softfloat.c | 109 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 95 insertions(+), 14 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4c6ecd1883..b29a2b6714 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2899,28 +2899,109 @@ static int compare_floats(FloatParts a, FloatParts b, bool is_quiet,
}
}
-#define COMPARE(sz) \
-int float ## sz ## _compare(float ## sz a, float ## sz b, \
- float_status *s) \
+#define COMPARE(name, attr, sz) \
+static int attr \
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s) \
{ \
FloatParts pa = float ## sz ## _unpack_canonical(a, s); \
FloatParts pb = float ## sz ## _unpack_canonical(b, s); \
- return compare_floats(pa, pb, false, s); \
-} \
-int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, \
- float_status *s) \
-{ \
- FloatParts pa = float ## sz ## _unpack_canonical(a, s); \
- FloatParts pb = float ## sz ## _unpack_canonical(b, s); \
- return compare_floats(pa, pb, true, s); \
+ return compare_floats(pa, pb, is_quiet, s); \
}
-COMPARE(16)
-COMPARE(32)
-COMPARE(64)
+COMPARE(soft_f16_compare, QEMU_FLATTEN, 16)
+COMPARE(soft_f32_compare, QEMU_SOFTFLOAT_ATTR, 32)
+COMPARE(soft_f64_compare, QEMU_SOFTFLOAT_ATTR, 64)
#undef COMPARE
+int float16_compare(float16 a, float16 b, float_status *s)
+{
+ return soft_f16_compare(a, b, false, s);
+}
+
+int float16_compare_quiet(float16 a, float16 b, float_status *s)
+{
+ return soft_f16_compare(a, b, true, s);
+}
+
+static int QEMU_FLATTEN
+f32_compare(float32 xa, float32 xb, bool is_quiet, float_status *s)
+{
+ union_float32 ua, ub;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (QEMU_NO_HARDFLOAT) {
+ goto soft;
+ }
+
+ float32_input_flush2(&ua.s, &ub.s, s);
+ if (isgreaterequal(ua.h, ub.h)) {
+ if (isgreater(ua.h, ub.h)) {
+ return float_relation_greater;
+ }
+ return float_relation_equal;
+ }
+ if (likely(isless(ua.h, ub.h))) {
+ return float_relation_less;
+ }
+ /* The only condition remaining is unordered.
+ * Fall through to set flags.
+ */
+ soft:
+ return soft_f32_compare(ua.s, ub.s, is_quiet, s);
+}
+
+int float32_compare(float32 a, float32 b, float_status *s)
+{
+ return f32_compare(a, b, false, s);
+}
+
+int float32_compare_quiet(float32 a, float32 b, float_status *s)
+{
+ return f32_compare(a, b, true, s);
+}
+
+static int QEMU_FLATTEN
+f64_compare(float64 xa, float64 xb, bool is_quiet, float_status *s)
+{
+ union_float64 ua, ub;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (QEMU_NO_HARDFLOAT) {
+ goto soft;
+ }
+
+ float64_input_flush2(&ua.s, &ub.s, s);
+ if (isgreaterequal(ua.h, ub.h)) {
+ if (isgreater(ua.h, ub.h)) {
+ return float_relation_greater;
+ }
+ return float_relation_equal;
+ }
+ if (likely(isless(ua.h, ub.h))) {
+ return float_relation_less;
+ }
+ /* The only condition remaining is unordered.
+ * Fall through to set flags.
+ */
+ soft:
+ return soft_f64_compare(ua.s, ub.s, is_quiet, s);
+}
+
+int float64_compare(float64 a, float64 b, float_status *s)
+{
+ return f64_compare(a, b, false, s);
+}
+
+int float64_compare_quiet(float64 a, float64 b, float_status *s)
+{
+ return f64_compare(a, b, true, s);
+}
+
/* Multiply A by 2 raised to the power N. */
static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
{
--
2.17.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
@ 2018-11-25 0:25 ` Aleksandar Markovic
2018-11-25 1:25 ` Emilio G. Cota
2018-12-04 12:28 ` Alex Bennée
1 sibling, 1 reply; 37+ messages in thread
From: Aleksandar Markovic @ 2018-11-25 0:25 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: Richard Henderson, Alex Bennée, qemu-devel
Hi, Emilio.
> Note: some architectures (at least PPC, there might be others) clear
> the status flags passed to softfloat before most FP operations. This
> precludes the use of hardfloat, so to avoid introducing a performance
> regression for those targets, we add a flag to disable hardfloat.
> In the long run though it would be good to fix the targets so that
> at least the inexact flag passed to softfloat is indeed sticky.
Can you elaborate more on this paragraph?
Thanks,
Aleksandar Markovic
On Nov 25, 2018 1:08 AM, "Emilio G. Cota" <cota@braap.org> wrote:
> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the added comment for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.
>
> This patch just adds common code. Some operations will be migrated
> to hardfloat in subsequent patches to ease bisection.
>
> Note: some architectures (at least PPC, there might be others) clear
> the status flags passed to softfloat before most FP operations. This
> precludes the use of hardfloat, so to avoid introducing a performance
> regression for those targets, we add a flag to disable hardfloat.
> In the long run though it would be good to fix the targets so that
> at least the inexact flag passed to softfloat is indeed sticky.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> fpu/softfloat.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 315 insertions(+)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index ecdc00c633..306a12fa8d 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -83,6 +83,7 @@ this code that are retained.
> * target-dependent and needs the TARGET_* macros.
> */
> #include "qemu/osdep.h"
> +#include <math.h>
> #include "qemu/bitops.h"
> #include "fpu/softfloat.h"
>
> @@ -95,6 +96,320 @@ this code that are retained.
> *-----------------------------------------------------------
> -----------------*/
> #include "fpu/softfloat-macros.h"
>
> +/*
> + * Hardfloat
> + *
> + * Fast emulation of guest FP instructions is challenging for two reasons.
> + * First, FP instruction semantics are similar but not identical,
> particularly
> + * when handling NaNs. Second, emulating at reasonable speed the guest FP
> + * exception flags is not trivial: reading the host's flags register with
> a
> + * feclearexcept & fetestexcept pair is slow [slightly slower than
> soft-fp],
> + * and trapping on every FP exception is not fast nor pleasant to work
> with.
> + *
> + * We address these challenges by leveraging the host FPU for a subset of
> the
> + * operations. To do this we expand on the idea presented in this paper:
> + *
> + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions
> in a
> + * binary translator." Software: Practice and Experience 46.12
> (2016):1591-1615.
> + *
> + * The idea is thus to leverage the host FPU to (1) compute FP operations
> + * and (2) identify whether FP exceptions occurred while avoiding
> + * expensive exception flag register accesses.
> + *
> + * An important optimization shown in the paper is that given that
> exception
> + * flags are rarely cleared by the guest, we can avoid recomputing some
> flags.
> + * This is particularly useful for the inexact flag, which is very
> frequently
> + * raised in floating-point workloads.
> + *
> + * We optimize the code further by deferring to soft-fp whenever FP
> exception
> + * detection might get hairy. Two examples: (1) when at least one operand
> is
> + * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0
> result
> + * and the result is < the minimum normal.
> + */
> +#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t) \
> + static inline void name(soft_t *a, float_status *s) \
> + { \
> + if (unlikely(soft_t ## _is_denormal(*a))) { \
> + *a = soft_t ## _set_sign(soft_t ## _zero, \
> + soft_t ## _is_neg(*a)); \
> + s->float_exception_flags |= float_flag_input_denormal; \
> + } \
> + }
> +
> +GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
> +GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
> +#undef GEN_INPUT_FLUSH__NOCHECK
> +
> +#define GEN_INPUT_FLUSH1(name, soft_t) \
> + static inline void name(soft_t *a, float_status *s) \
> + { \
> + if (likely(!s->flush_inputs_to_zero)) { \
> + return; \
> + } \
> + soft_t ## _input_flush__nocheck(a, s); \
> + }
> +
> +GEN_INPUT_FLUSH1(float32_input_flush1, float32)
> +GEN_INPUT_FLUSH1(float64_input_flush1, float64)
> +#undef GEN_INPUT_FLUSH1
> +
> +#define GEN_INPUT_FLUSH2(name, soft_t) \
> + static inline void name(soft_t *a, soft_t *b, float_status *s) \
> + { \
> + if (likely(!s->flush_inputs_to_zero)) { \
> + return; \
> + } \
> + soft_t ## _input_flush__nocheck(a, s); \
> + soft_t ## _input_flush__nocheck(b, s); \
> + }
> +
> +GEN_INPUT_FLUSH2(float32_input_flush2, float32)
> +GEN_INPUT_FLUSH2(float64_input_flush2, float64)
> +#undef GEN_INPUT_FLUSH2
> +
> +#define GEN_INPUT_FLUSH3(name, soft_t) \
> + static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status
> *s) \
> + { \
> + if (likely(!s->flush_inputs_to_zero)) { \
> + return; \
> + } \
> + soft_t ## _input_flush__nocheck(a, s); \
> + soft_t ## _input_flush__nocheck(b, s); \
> + soft_t ## _input_flush__nocheck(c, s); \
> + }
> +
> +GEN_INPUT_FLUSH3(float32_input_flush3, float32)
> +GEN_INPUT_FLUSH3(float64_input_flush3, float64)
> +#undef GEN_INPUT_FLUSH3
> +
> +/*
> + * Choose whether to use fpclassify or float32/64_* primitives in the
> generated
> + * hardfloat functions. Each combination of number of inputs and float
> size
> + * gets its own value.
> + */
> +#if defined(__x86_64__)
> +# define QEMU_HARDFLOAT_1F32_USE_FP 0
> +# define QEMU_HARDFLOAT_1F64_USE_FP 1
> +# define QEMU_HARDFLOAT_2F32_USE_FP 0
> +# define QEMU_HARDFLOAT_2F64_USE_FP 1
> +# define QEMU_HARDFLOAT_3F32_USE_FP 0
> +# define QEMU_HARDFLOAT_3F64_USE_FP 1
> +#else
> +# define QEMU_HARDFLOAT_1F32_USE_FP 0
> +# define QEMU_HARDFLOAT_1F64_USE_FP 0
> +# define QEMU_HARDFLOAT_2F32_USE_FP 0
> +# define QEMU_HARDFLOAT_2F64_USE_FP 0
> +# define QEMU_HARDFLOAT_3F32_USE_FP 0
> +# define QEMU_HARDFLOAT_3F64_USE_FP 0
> +#endif
> +
> +/*
> + * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
> + * float{32,64}_is_infinity when !USE_FP.
> + * On x86_64/aarch64, using the former over the latter can yield a ~6%
> speedup.
> + * On power64 however, using isinf() reduces fp-bench performance by up
> to 50%.
> + */
> +#if defined(__x86_64__) || defined(__aarch64__)
> +# define QEMU_HARDFLOAT_USE_ISINF 1
> +#else
> +# define QEMU_HARDFLOAT_USE_ISINF 0
> +#endif
> +
> +/*
> + * Some targets clear the FP flags before most FP operations. This
> prevents
> + * the use of hardfloat, since hardfloat relies on the inexact flag being
> + * already set.
> + */
> +#if defined(TARGET_PPC)
> +# define QEMU_NO_HARDFLOAT 1
> +# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
> +#else
> +# define QEMU_NO_HARDFLOAT 0
> +# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN __attribute__((noinline))
> +#endif
> +
> +static inline bool can_use_fpu(const float_status *s)
> +{
> + if (QEMU_NO_HARDFLOAT) {
> + return false;
> + }
> + return likely(s->float_exception_flags & float_flag_inexact &&
> + s->float_rounding_mode == float_round_nearest_even);
> +}
> +
> +/*
> + * Hardfloat generation functions. Each operation can have two flavors:
> + * either using softfloat primitives (e.g. float32_is_zero_or_normal) for
> + * most condition checks, or native ones (e.g. fpclassify).
> + *
> + * The flavor is chosen by the callers. Instead of using macros, we rely
> on the
> + * compiler to propagate constants and inline everything into the callers.
> + *
> + * We only generate functions for operations with two inputs, since only
> + * these are common enough to justify consolidating them into common code.
> + */
> +
> +typedef union {
> + float32 s;
> + float h;
> +} union_float32;
> +
> +typedef union {
> + float64 s;
> + double h;
> +} union_float64;
> +
> +typedef bool (*f32_check_fn)(union_float32 a, union_float32 b);
> +typedef bool (*f64_check_fn)(union_float64 a, union_float64 b);
> +
> +typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);
> +typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
> +typedef float (*hard_f32_op2_fn)(float a, float b);
> +typedef double (*hard_f64_op2_fn)(double a, double b);
> +
> +/* 2-input is-zero-or-normal */
> +static inline bool f32_is_zon2(union_float32 a, union_float32 b)
> +{
> + if (QEMU_HARDFLOAT_2F32_USE_FP) {
> + /*
> + * Not using a temp variable for consecutive fpclassify calls
> ends up
> + * generating faster code.
> + */
> + return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> + (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO);
> + }
> + return float32_is_zero_or_normal(a.s) &&
> + float32_is_zero_or_normal(b.s);
> +}
> +
> +static inline bool f64_is_zon2(union_float64 a, union_float64 b)
> +{
> + if (QEMU_HARDFLOAT_2F64_USE_FP) {
> + return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> + (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO);
> + }
> + return float64_is_zero_or_normal(a.s) &&
> + float64_is_zero_or_normal(b.s);
> +}
> +
> +/* 3-input is-zero-or-normal */
> +static inline
> +bool f32_is_zon3(union_float32 a, union_float32 b, union_float32 c)
> +{
> + if (QEMU_HARDFLOAT_3F32_USE_FP) {
> + return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> + (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO) &&
> + (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) ==
> FP_ZERO);
> + }
> + return float32_is_zero_or_normal(a.s) &&
> + float32_is_zero_or_normal(b.s) &&
> + float32_is_zero_or_normal(c.s);
> +}
> +
> +static inline
> +bool f64_is_zon3(union_float64 a, union_float64 b, union_float64 c)
> +{
> + if (QEMU_HARDFLOAT_3F64_USE_FP) {
> + return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) ==
> FP_ZERO) &&
> + (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) ==
> FP_ZERO) &&
> + (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) ==
> FP_ZERO);
> + }
> + return float64_is_zero_or_normal(a.s) &&
> + float64_is_zero_or_normal(b.s) &&
> + float64_is_zero_or_normal(c.s);
> +}
> +
> +static inline bool f32_is_inf(union_float32 a)
> +{
> + if (QEMU_HARDFLOAT_USE_ISINF) {
> + return isinff(a.h);
> + }
> + return float32_is_infinity(a.s);
> +}
> +
> +static inline bool f64_is_inf(union_float64 a)
> +{
> + if (QEMU_HARDFLOAT_USE_ISINF) {
> + return isinf(a.h);
> + }
> + return float64_is_infinity(a.s);
> +}
> +
> +/* Note: @fast_test and @post can be NULL */
> +static inline float32
> +float32_gen2(float32 xa, float32 xb, float_status *s,
> + hard_f32_op2_fn hard, soft_f32_op2_fn soft,
> + f32_check_fn pre, f32_check_fn post,
> + f32_check_fn fast_test, soft_f32_op2_fn fast_op)
> +{
> + union_float32 ua, ub, ur;
> +
> + ua.s = xa;
> + ub.s = xb;
> +
> + if (unlikely(!can_use_fpu(s))) {
> + goto soft;
> + }
> +
> + float32_input_flush2(&ua.s, &ub.s, s);
> + if (unlikely(!pre(ua, ub))) {
> + goto soft;
> + }
> + if (fast_test && fast_test(ua, ub)) {
> + return fast_op(ua.s, ub.s, s);
> + }
> +
> + ur.h = hard(ua.h, ub.h);
> + if (unlikely(f32_is_inf(ur))) {
> + s->float_exception_flags |= float_flag_overflow;
> + } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
> + if (post == NULL || post(ua, ub)) {
> + goto soft;
> + }
> + }
> + return ur.s;
> +
> + soft:
> + return soft(ua.s, ub.s, s);
> +}
> +
> +static inline float64
> +float64_gen2(float64 xa, float64 xb, float_status *s,
> + hard_f64_op2_fn hard, soft_f64_op2_fn soft,
> + f64_check_fn pre, f64_check_fn post,
> + f64_check_fn fast_test, soft_f64_op2_fn fast_op)
> +{
> + union_float64 ua, ub, ur;
> +
> + ua.s = xa;
> + ub.s = xb;
> +
> + if (unlikely(!can_use_fpu(s))) {
> + goto soft;
> + }
> +
> + float64_input_flush2(&ua.s, &ub.s, s);
> + if (unlikely(!pre(ua, ub))) {
> + goto soft;
> + }
> + if (fast_test && fast_test(ua, ub)) {
> + return fast_op(ua.s, ub.s, s);
> + }
> +
> + ur.h = hard(ua.h, ub.h);
> + if (unlikely(f64_is_inf(ur))) {
> + s->float_exception_flags |= float_flag_overflow;
> + } else if (unlikely(fabs(ur.h) <= DBL_MIN)) {
> + if (post == NULL || post(ua, ub)) {
> + goto soft;
> + }
> + }
> + return ur.s;
> +
> + soft:
> + return soft(ua.s, ub.s, s);
> +}
> +
> /*----------------------------------------------------------
> ------------------
> | Returns the fraction bits of the half-precision floating-point value
> `a'.
> *-----------------------------------------------------------
> -----------------*/
> --
> 2.17.1
>
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-11-25 0:25 ` Aleksandar Markovic
@ 2018-11-25 1:25 ` Emilio G. Cota
0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-25 1:25 UTC (permalink / raw)
To: Aleksandar Markovic; +Cc: Richard Henderson, Alex Bennée, qemu-devel
On Sun, Nov 25, 2018 at 01:25:25 +0100, Aleksandar Markovic wrote:
> > Note: some architectures (at least PPC, there might be others) clear
> > the status flags passed to softfloat before most FP operations. This
> > precludes the use of hardfloat, so to avoid introducing a performance
> > regression for those targets, we add a flag to disable hardfloat.
> > In the long run though it would be good to fix the targets so that
> > at least the inexact flag passed to softfloat is indeed sticky.
>
> Can you elaborate more on this paragraph?
Sure. We only use hardfloat when the inexact flag is already
set. If it isn't, we defer to softfloat. This is done for two
reasons:
- Computing the inexact flag requires duplicating
most of what softfloat does, so it's not worth doing. Note
that clearing and reading the host's fp flags is even
slower, so that's not an option.
- The inexact flag is raised *very* frequently. The flag
remains set (in the guest) unless guest code explicitly
clears it, which few guest workloads do.
It therefore makes sense for hardfloat to only kick in once
the inexact flag has already been set.
Most targets directly keep the guest's FP flags in the same
struct (float_status) that is passed to softfloat ops.
PPC, however, keeps the state of the guest FP flags in one
place, and passes a pristine float_status to softfloat code
every time it calls it. Thus, given that hardfloat is
entirely implemented in softfloat.c, PPC targets cannot
currently take advantage of it.
Changing this in the PPC target is not impossible, but it will
require additional work that I'm not doing in this series, hence
my note. So for now, PPC targets just have hardfloat disabled
at compile time, which avoids adding overhead for a feature
that they cannot use.
Let me know if anything is unclear. Cheers,
Emilio
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (12 preceding siblings ...)
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
@ 2018-11-27 17:24 ` no-reply
2018-11-27 17:52 ` Emilio G. Cota
2018-11-27 17:32 ` no-reply
2018-12-05 12:41 ` Alex Bennée
15 siblings, 1 reply; 37+ messages in thread
From: no-reply @ 2018-11-27 17:24 UTC (permalink / raw)
To: cota; +Cc: famz, qemu-devel, richard.henderson, alex.bennee
Hi,
This series failed docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.
Message-id: 20181124235553.17371-1-cota@braap.org
Subject: [Qemu-devel] [PATCH v6 00/13] hardfloat
Type: series
=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=8
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fe0cee3 hardfloat: implement float32/64 comparison
ac5968b hardfloat: implement float32/64 square root
0f10937 hardfloat: implement float32/64 fused multiply-add
de38097 hardfloat: implement float32/64 division
fbeab45 hardfloat: implement float32/64 multiplication
8894a16 hardfloat: implement float32/64 addition and subtraction
834d403 fpu: introduce hardfloat
94b3f9b tests/fp: add fp-bench
fe2ef78 softfloat: add float{32, 64}_is_zero_or_normal
a343567 softfloat: rename canonicalize to sf_canonicalize
73e6c0d target/tricore: use float32_is_denormal
be09b31 softfloat: add float{32, 64}_is_{de, }normal
319042a fp-test: pick TARGET_ARM to get its specialization
=== OUTPUT BEGIN ===
BUILD fedora
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-spofu4kn/src'
GEN /var/tmp/patchew-tester-tmp-spofu4kn/src/docker-src.2018-11-27-12.22.26.24055/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-spofu4kn/src/docker-src.2018-11-27-12.22.26.24055/qemu.tar.vroot'...
done.
Checking out files: 46% (3007/6464)
Checking out files: 47% (3039/6464)
Checking out files: 48% (3103/6464)
Checking out files: 49% (3168/6464)
Checking out files: 50% (3232/6464)
Checking out files: 51% (3297/6464)
Checking out files: 52% (3362/6464)
Checking out files: 53% (3426/6464)
Checking out files: 54% (3491/6464)
Checking out files: 55% (3556/6464)
Checking out files: 56% (3620/6464)
Checking out files: 57% (3685/6464)
Checking out files: 58% (3750/6464)
Checking out files: 59% (3814/6464)
Checking out files: 60% (3879/6464)
Checking out files: 61% (3944/6464)
Checking out files: 62% (4008/6464)
Checking out files: 63% (4073/6464)
Checking out files: 64% (4137/6464)
Checking out files: 65% (4202/6464)
Checking out files: 66% (4267/6464)
Checking out files: 67% (4331/6464)
Checking out files: 68% (4396/6464)
Checking out files: 69% (4461/6464)
Checking out files: 70% (4525/6464)
Checking out files: 71% (4590/6464)
Checking out files: 72% (4655/6464)
Checking out files: 73% (4719/6464)
Checking out files: 74% (4784/6464)
Checking out files: 75% (4848/6464)
Checking out files: 76% (4913/6464)
Checking out files: 77% (4978/6464)
Checking out files: 78% (5042/6464)
Checking out files: 79% (5107/6464)
Checking out files: 80% (5172/6464)
Checking out files: 81% (5236/6464)
Checking out files: 82% (5301/6464)
Checking out files: 83% (5366/6464)
Checking out files: 84% (5430/6464)
Checking out files: 85% (5495/6464)
Checking out files: 86% (5560/6464)
Checking out files: 87% (5624/6464)
Checking out files: 88% (5689/6464)
Checking out files: 89% (5753/6464)
Checking out files: 90% (5818/6464)
Checking out files: 91% (5883/6464)
Checking out files: 92% (5947/6464)
Checking out files: 93% (6012/6464)
Checking out files: 94% (6077/6464)
Checking out files: 95% (6141/6464)
Checking out files: 96% (6206/6464)
Checking out files: 97% (6271/6464)
Checking out files: 98% (6335/6464)
Checking out files: 99% (6400/6464)
Checking out files: 100% (6464/6464)
Checking out files: 100% (6464/6464), done.
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into 'ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
COPY RUNNER
RUN test-mingw in qemu:fedora
Packages installed:
SDL2-devel-2.0.9-1.fc28.x86_64
bc-1.07.1-5.fc28.x86_64
bison-3.0.4-9.fc28.x86_64
bluez-libs-devel-5.50-1.fc28.x86_64
brlapi-devel-0.6.7-19.fc28.x86_64
bzip2-1.0.6-26.fc28.x86_64
bzip2-devel-1.0.6-26.fc28.x86_64
ccache-3.4.2-2.fc28.x86_64
clang-6.0.1-2.fc28.x86_64
device-mapper-multipath-devel-0.7.4-3.git07e7bd5.fc28.x86_64
findutils-4.6.0-19.fc28.x86_64
flex-2.6.1-7.fc28.x86_64
gcc-8.2.1-5.fc28.x86_64
gcc-c++-8.2.1-5.fc28.x86_64
gettext-0.19.8.1-14.fc28.x86_64
git-2.17.2-1.fc28.x86_64
glib2-devel-2.56.3-2.fc28.x86_64
glusterfs-api-devel-4.1.5-1.fc28.x86_64
gnutls-devel-3.6.4-1.fc28.x86_64
gtk3-devel-3.22.30-1.fc28.x86_64
hostname-3.20-3.fc28.x86_64
libaio-devel-0.3.110-11.fc28.x86_64
libasan-8.2.1-5.fc28.x86_64
libattr-devel-2.4.48-3.fc28.x86_64
libcap-devel-2.25-9.fc28.x86_64
libcap-ng-devel-0.7.9-4.fc28.x86_64
libcurl-devel-7.59.0-8.fc28.x86_64
libfdt-devel-1.4.7-1.fc28.x86_64
libpng-devel-1.6.34-6.fc28.x86_64
librbd-devel-12.2.8-1.fc28.x86_64
libssh2-devel-1.8.0-7.fc28.x86_64
libubsan-8.2.1-5.fc28.x86_64
libusbx-devel-1.0.22-1.fc28.x86_64
libxml2-devel-2.9.8-4.fc28.x86_64
llvm-6.0.1-8.fc28.x86_64
lzo-devel-2.08-12.fc28.x86_64
make-4.2.1-6.fc28.x86_64
mingw32-SDL2-2.0.9-1.fc28.noarch
mingw32-bzip2-1.0.6-9.fc27.noarch
mingw32-curl-7.57.0-1.fc28.noarch
mingw32-glib2-2.56.1-1.fc28.noarch
mingw32-gmp-6.1.2-2.fc27.noarch
mingw32-gnutls-3.6.3-1.fc28.noarch
mingw32-gtk3-3.22.30-1.fc28.noarch
mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw32-libpng-1.6.29-2.fc27.noarch
mingw32-libssh2-1.8.0-3.fc27.noarch
mingw32-libtasn1-4.13-1.fc28.noarch
mingw32-nettle-3.4-1.fc28.noarch
mingw32-pixman-0.34.0-3.fc27.noarch
mingw32-pkg-config-0.28-9.fc27.x86_64
mingw64-SDL2-2.0.9-1.fc28.noarch
mingw64-bzip2-1.0.6-9.fc27.noarch
mingw64-curl-7.57.0-1.fc28.noarch
mingw64-glib2-2.56.1-1.fc28.noarch
mingw64-gmp-6.1.2-2.fc27.noarch
mingw64-gnutls-3.6.3-1.fc28.noarch
mingw64-gtk3-3.22.30-1.fc28.noarch
mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw64-libpng-1.6.29-2.fc27.noarch
mingw64-libssh2-1.8.0-3.fc27.noarch
mingw64-libtasn1-4.13-1.fc28.noarch
mingw64-nettle-3.4-1.fc28.noarch
mingw64-pixman-0.34.0-3.fc27.noarch
mingw64-pkg-config-0.28-9.fc27.x86_64
ncurses-devel-6.1-5.20180224.fc28.x86_64
nettle-devel-3.4-2.fc28.x86_64
nss-devel-3.39.0-1.0.fc28.x86_64
numactl-devel-2.0.11-8.fc28.x86_64
package PyYAML is not installed
package libjpeg-devel is not installed
perl-5.26.2-414.fc28.x86_64
pixman-devel-0.34.0-8.fc28.x86_64
python3-3.6.6-1.fc28.x86_64
snappy-devel-1.1.7-5.fc28.x86_64
sparse-0.5.2-1.fc28.x86_64
spice-server-devel-0.14.0-4.fc28.x86_64
systemtap-sdt-devel-4.0-1.fc28.x86_64
tar-1.30-3.fc28.x86_64
usbredir-devel-0.8.0-1.fc28.x86_64
virglrenderer-devel-0.6.0-4.20170210git76b3da97b.fc28.x86_64
vte3-devel-0.36.5-6.fc28.x86_64
which-2.21-8.fc28.x86_64
xen-devel-4.10.2-2.fc28.x86_64
zlib-devel-1.2.11-8.fc28.x86_64
Environment variables:
TARGET_LIST=
PACKAGES=bc bison bluez-libs-devel brlapi-devel bzip2 bzip2-devel ccache clang device-mapper-multipath-devel findutils flex gcc gcc-c++ gettext git glib2-devel glusterfs-api-devel gnutls-devel gtk3-devel hostname libaio-devel libasan libattr-devel libcap-devel libcap-ng-devel libcurl-devel libfdt-devel libjpeg-devel libpng-devel librbd-devel libssh2-devel libubsan libusbx-devel libxml2-devel llvm lzo-devel make mingw32-bzip2 mingw32-curl mingw32-glib2 mingw32-gmp mingw32-gnutls mingw32-gtk3 mingw32-libjpeg-turbo mingw32-libpng mingw32-libssh2 mingw32-libtasn1 mingw32-nettle mingw32-pixman mingw32-pkg-config mingw32-SDL2 mingw64-bzip2 mingw64-curl mingw64-glib2 mingw64-gmp mingw64-gnutls mingw64-gtk3 mingw64-libjpeg-turbo mingw64-libpng mingw64-libssh2 mingw64-libtasn1 mingw64-nettle mingw64-pixman mingw64-pkg-config mingw64-SDL2 ncurses-devel nettle-devel nss-devel numactl-devel perl pixman-devel python3 PyYAML SDL2-devel snappy-devel sparse spice-server-devel systemtap-sdt-devel tar usbredir-devel virglrenderer-devel vte3-devel which xen-devel zlib-devel
J=8
V=
HOSTNAME=2ccc3ec89689
DEBUG=
SHOW_ENV=1
PWD=/
HOME=/
CCACHE_DIR=/var/tmp/ccache
FBR=f28
DISTTAG=f28container
QEMU_CONFIGURE_OPTS=--python=/usr/bin/python3
FGC=f28
TEST_DIR=/tmp/qemu-test
SHLVL=1
FEATURES=mingw clang pyyaml asan dtc
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MAKEFLAGS= -j8
EXTRA_CONFIGURE_OPTS=
_=/usr/bin/env
Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install --python=/usr/bin/python3 --cross-prefix=x86_64-w64-mingw32- --enable-trace-backends=simple --enable-gnutls --enable-nettle --enable-curl --enable-vnc --enable-bzip2 --enable-guest-agent --with-sdlabi=2.0
Install prefix /tmp/qemu-test/install
BIOS directory /tmp/qemu-test/install
firmware path /tmp/qemu-test/install/share/qemu-firmware
binary directory /tmp/qemu-test/install
library directory /tmp/qemu-test/install/lib
module directory /tmp/qemu-test/install/lib
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory /tmp/qemu-test/install
local state directory queried at runtime
Windows SDK no
Source path /tmp/qemu-test/src
GIT binary git
GIT submodules
C compiler x86_64-w64-mingw32-gcc
Host C compiler cc
C++ compiler x86_64-w64-mingw32-g++
Objective-C compiler clang
ARFLAGS rv
CFLAGS -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g
QEMU_CFLAGS -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/pixman-1 -I$(SRC_PATH)/dtc/libfdt -Werror -DHAS_LIBSSH2_SFTP_FSYNC -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -mms-bitfields -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0 -I/usr/x86_64-w64-mingw32/sys-root/mingw/lib/glib-2.0/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -m64 -mcx16 -mthreads -D__USE_MINGW_ANSI_STDIO=1 -DWIN32_LEAN_AND_MEAN -DWINVER=0x501 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/p11-kit-1 -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/libpng16
LDFLAGS -Wl,--nxcompat -Wl,--no-seh -Wl,--dynamicbase -Wl,--warn-common -m64 -g
QEMU_LDFLAGS -L$(BUILD_DIR)/dtc/libfdt
make make
install install
python /usr/bin/python3 -B
smbd /usr/sbin/smbd
module support no
host CPU x86_64
host big endian no
target list x86_64-softmmu aarch64-softmmu
gprof enabled no
sparse enabled no
strip binaries yes
profiler no
static build no
SDL support yes (2.0.9)
GTK support yes (3.22.30)
GTK GL support no
VTE support no
TLS priority NORMAL
GNUTLS support yes
libgcrypt no
nettle yes (3.4)
libtasn1 yes
curses support no
virgl support no
curl support yes
mingw32 support yes
Audio drivers dsound
Block whitelist (rw)
Block whitelist (ro)
VirtFS support no
Multipath support no
VNC support yes
VNC SASL support no
VNC JPEG support yes
VNC PNG support yes
xen support no
brlapi support no
bluez support no
Documentation no
PIE no
vde support no
netmap support no
Linux AIO support no
ATTR/XATTR support no
Install blobs yes
KVM support no
HAX support yes
HVF support no
WHPX support no
TCG support yes
TCG debug enabled no
TCG interpreter no
malloc trim support no
RDMA support no
PVRDMA support no
fdt support git
membarrier no
preadv support no
fdatasync no
madvise no
posix_madvise no
posix_memalign no
libcap-ng support no
vhost-net support no
vhost-crypto support no
vhost-scsi support no
vhost-vsock support no
vhost-user support no
Trace backends simple
Trace output file trace-<pid>
spice support no
rbd support no
xfsctl support no
smartcard support no
libusb no
usb net redir no
OpenGL support no
OpenGL dmabufs no
libiscsi support no
libnfs support no
build guest agent yes
QGA VSS support no
QGA w32 disk info yes
QGA MSI support no
seccomp support no
coroutine backend win32
coroutine pool yes
debug stack usage no
mutex debugging no
crypto afalg no
GlusterFS support no
gcov gcov
gcov enabled no
TPM support yes
libssh2 support yes
TPM passthrough no
TPM emulator no
QOM debugging yes
Live block migration yes
lzo support no
snappy support no
bzip2 support yes
NUMA host support no
libxml2 no
tcmalloc support no
jemalloc support no
avx2 optimization yes
replication support yes
VxHS block device no
bochs support yes
cloop support yes
dmg support yes
qcow v1 support yes
vdi support yes
vvfat support yes
qed support yes
parallels support yes
sheepdog support yes
capstone no
docker no
libpmem support no
libudev no
NOTE: cross-compilers enabled: 'x86_64-w64-mingw32-gcc'
GEN config-host.h
GEN x86_64-softmmu/config-devices.mak.tmp
GEN aarch64-softmmu/config-devices.mak.tmp
GEN qemu-options.def
GEN qapi-gen
GEN trace/generated-tcg-tracers.h
GEN trace/generated-helpers-wrappers.h
GEN trace/generated-helpers.h
GEN trace/generated-helpers.c
GEN aarch64-softmmu/config-devices.mak
GEN x86_64-softmmu/config-devices.mak
GEN module_block.h
GEN ui/input-keymap-atset1-to-qcode.c
GEN ui/input-keymap-linux-to-qcode.c
GEN ui/input-keymap-qcode-to-atset1.c
GEN ui/input-keymap-qcode-to-atset2.c
GEN ui/input-keymap-qcode-to-atset3.c
GEN ui/input-keymap-qcode-to-linux.c
GEN ui/input-keymap-qcode-to-qnum.c
GEN ui/input-keymap-qcode-to-sun.c
GEN ui/input-keymap-qnum-to-qcode.c
GEN ui/input-keymap-usb-to-qcode.c
GEN ui/input-keymap-win32-to-qcode.c
GEN ui/input-keymap-x11-to-qcode.c
GEN ui/input-keymap-xorgevdev-to-qcode.c
GEN ui/input-keymap-xorgkbd-to-qcode.c
GEN ui/input-keymap-xorgxquartz-to-qcode.c
GEN ui/input-keymap-xorgxwin-to-qcode.c
GEN ui/input-keymap-osx-to-qcode.c
GEN trace-root.h
GEN tests/test-qapi-gen
GEN accel/kvm/trace.h
GEN accel/tcg/trace.h
GEN audio/trace.h
GEN block/trace.h
GEN chardev/trace.h
GEN crypto/trace.h
GEN hw/9pfs/trace.h
GEN hw/acpi/trace.h
GEN hw/alpha/trace.h
GEN hw/arm/trace.h
GEN hw/audio/trace.h
GEN hw/block/trace.h
GEN hw/block/dataplane/trace.h
GEN hw/char/trace.h
GEN hw/display/trace.h
GEN hw/dma/trace.h
GEN hw/hppa/trace.h
GEN hw/i2c/trace.h
GEN hw/i386/trace.h
GEN hw/i386/xen/trace.h
GEN hw/ide/trace.h
GEN hw/input/trace.h
GEN hw/intc/trace.h
GEN hw/isa/trace.h
GEN hw/mem/trace.h
GEN hw/misc/trace.h
GEN hw/misc/macio/trace.h
GEN hw/net/trace.h
GEN hw/nvram/trace.h
GEN hw/pci/trace.h
GEN hw/pci-host/trace.h
GEN hw/ppc/trace.h
GEN hw/rdma/trace.h
GEN hw/rdma/vmw/trace.h
GEN hw/s390x/trace.h
GEN hw/scsi/trace.h
GEN hw/sd/trace.h
GEN hw/sparc/trace.h
GEN hw/sparc64/trace.h
GEN hw/timer/trace.h
GEN hw/tpm/trace.h
GEN hw/usb/trace.h
GEN hw/vfio/trace.h
GEN hw/virtio/trace.h
GEN hw/watchdog/trace.h
GEN hw/xen/trace.h
GEN io/trace.h
GEN linux-user/trace.h
GEN migration/trace.h
GEN nbd/trace.h
GEN net/trace.h
GEN qapi/trace.h
GEN qom/trace.h
GEN scsi/trace.h
GEN target/arm/trace.h
GEN target/i386/trace.h
GEN target/mips/trace.h
GEN target/ppc/trace.h
GEN target/s390x/trace.h
GEN target/sparc/trace.h
GEN ui/trace.h
GEN util/trace.h
GEN trace-root.c
GEN accel/kvm/trace.c
GEN accel/tcg/trace.c
GEN audio/trace.c
GEN block/trace.c
GEN chardev/trace.c
GEN crypto/trace.c
GEN hw/9pfs/trace.c
GEN hw/acpi/trace.c
GEN hw/alpha/trace.c
GEN hw/arm/trace.c
GEN hw/audio/trace.c
GEN hw/block/trace.c
GEN hw/block/dataplane/trace.c
GEN hw/char/trace.c
GEN hw/display/trace.c
GEN hw/dma/trace.c
GEN hw/hppa/trace.c
GEN hw/i2c/trace.c
GEN hw/i386/trace.c
GEN hw/i386/xen/trace.c
GEN hw/ide/trace.c
GEN hw/input/trace.c
GEN hw/intc/trace.c
GEN hw/isa/trace.c
GEN hw/mem/trace.c
GEN hw/misc/trace.c
GEN hw/misc/macio/trace.c
GEN hw/net/trace.c
GEN hw/nvram/trace.c
GEN hw/pci/trace.c
GEN hw/pci-host/trace.c
GEN hw/ppc/trace.c
GEN hw/rdma/trace.c
GEN hw/rdma/vmw/trace.c
GEN hw/s390x/trace.c
GEN hw/scsi/trace.c
GEN hw/sd/trace.c
GEN hw/sparc/trace.c
GEN hw/sparc64/trace.c
GEN hw/timer/trace.c
GEN hw/tpm/trace.c
GEN hw/usb/trace.c
GEN hw/vfio/trace.c
GEN hw/virtio/trace.c
GEN hw/watchdog/trace.c
GEN hw/xen/trace.c
GEN io/trace.c
GEN linux-user/trace.c
GEN migration/trace.c
GEN nbd/trace.c
GEN net/trace.c
GEN qapi/trace.c
GEN qom/trace.c
GEN scsi/trace.c
GEN target/arm/trace.c
GEN target/i386/trace.c
GEN target/mips/trace.c
GEN target/ppc/trace.c
GEN target/s390x/trace.c
GEN target/sparc/trace.c
GEN ui/trace.c
GEN util/trace.c
GEN config-all-devices.mak
DEP /tmp/qemu-test/src/dtc/tests/dumptrees.c
DEP /tmp/qemu-test/src/dtc/tests/trees.S
DEP /tmp/qemu-test/src/dtc/tests/testutils.c
DEP /tmp/qemu-test/src/dtc/tests/value-labels.c
DEP /tmp/qemu-test/src/dtc/tests/asm_tree_dump.c
DEP /tmp/qemu-test/src/dtc/tests/truncated_string.c
DEP /tmp/qemu-test/src/dtc/tests/truncated_memrsv.c
DEP /tmp/qemu-test/src/dtc/tests/truncated_property.c
DEP /tmp/qemu-test/src/dtc/tests/check_full.c
DEP /tmp/qemu-test/src/dtc/tests/check_header.c
DEP /tmp/qemu-test/src/dtc/tests/check_path.c
DEP /tmp/qemu-test/src/dtc/tests/overlay_bad_fixup.c
DEP /tmp/qemu-test/src/dtc/tests/overlay.c
DEP /tmp/qemu-test/src/dtc/tests/subnode_iterate.c
DEP /tmp/qemu-test/src/dtc/tests/property_iterate.c
DEP /tmp/qemu-test/src/dtc/tests/integer-expressions.c
DEP /tmp/qemu-test/src/dtc/tests/utilfdt_test.c
DEP /tmp/qemu-test/src/dtc/tests/path_offset_aliases.c
DEP /tmp/qemu-test/src/dtc/tests/add_subnode_with_nops.c
DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_unordered.c
DEP /tmp/qemu-test/src/dtc/tests/dtb_reverse.c
DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_ordered.c
DEP /tmp/qemu-test/src/dtc/tests/extra-terminating-null.c
DEP /tmp/qemu-test/src/dtc/tests/incbin.c
DEP /tmp/qemu-test/src/dtc/tests/boot-cpuid.c
DEP /tmp/qemu-test/src/dtc/tests/phandle_format.c
DEP /tmp/qemu-test/src/dtc/tests/path-references.c
DEP /tmp/qemu-test/src/dtc/tests/references.c
DEP /tmp/qemu-test/src/dtc/tests/string_escapes.c
DEP /tmp/qemu-test/src/dtc/tests/propname_escapes.c
DEP /tmp/qemu-test/src/dtc/tests/appendprop2.c
DEP /tmp/qemu-test/src/dtc/tests/appendprop1.c
DEP /tmp/qemu-test/src/dtc/tests/del_node.c
DEP /tmp/qemu-test/src/dtc/tests/del_property.c
DEP /tmp/qemu-test/src/dtc/tests/setprop.c
DEP /tmp/qemu-test/src/dtc/tests/set_name.c
DEP /tmp/qemu-test/src/dtc/tests/rw_tree1.c
DEP /tmp/qemu-test/src/dtc/tests/open_pack.c
DEP /tmp/qemu-test/src/dtc/tests/nopulate.c
DEP /tmp/qemu-test/src/dtc/tests/mangle-layout.c
DEP /tmp/qemu-test/src/dtc/tests/move_and_save.c
DEP /tmp/qemu-test/src/dtc/tests/sw_states.c
DEP /tmp/qemu-test/src/dtc/tests/sw_tree1.c
DEP /tmp/qemu-test/src/dtc/tests/nop_node.c
DEP /tmp/qemu-test/src/dtc/tests/nop_property.c
DEP /tmp/qemu-test/src/dtc/tests/setprop_inplace.c
DEP /tmp/qemu-test/src/dtc/tests/stringlist.c
DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells2.c
DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells.c
DEP /tmp/qemu-test/src/dtc/tests/notfound.c
DEP /tmp/qemu-test/src/dtc/tests/sized_cells.c
DEP /tmp/qemu-test/src/dtc/tests/char_literal.c
DEP /tmp/qemu-test/src/dtc/tests/get_alias.c
DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_compatible.c
DEP /tmp/qemu-test/src/dtc/tests/node_check_compatible.c
DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_phandle.c
DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_prop_value.c
DEP /tmp/qemu-test/src/dtc/tests/parent_offset.c
DEP /tmp/qemu-test/src/dtc/tests/get_path.c
DEP /tmp/qemu-test/src/dtc/tests/supernode_atdepth_offset.c
DEP /tmp/qemu-test/src/dtc/tests/get_phandle.c
DEP /tmp/qemu-test/src/dtc/tests/getprop.c
DEP /tmp/qemu-test/src/dtc/tests/get_name.c
DEP /tmp/qemu-test/src/dtc/tests/path_offset.c
DEP /tmp/qemu-test/src/dtc/tests/subnode_offset.c
DEP /tmp/qemu-test/src/dtc/tests/find_property.c
DEP /tmp/qemu-test/src/dtc/tests/root_node.c
DEP /tmp/qemu-test/src/dtc/tests/get_mem_rsv.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_overlay.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_addresses.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_empty_tree.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_strerror.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_rw.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_sw.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_wip.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt_ro.c
DEP /tmp/qemu-test/src/dtc/libfdt/fdt.c
DEP /tmp/qemu-test/src/dtc/util.c
DEP /tmp/qemu-test/src/dtc/fdtoverlay.c
DEP /tmp/qemu-test/src/dtc/fdtput.c
DEP /tmp/qemu-test/src/dtc/fdtget.c
DEP /tmp/qemu-test/src/dtc/fdtdump.c
LEX convert-dtsv0-lexer.lex.c
DEP /tmp/qemu-test/src/dtc/srcpos.c
BISON dtc-parser.tab.c
LEX dtc-lexer.lex.c
DEP /tmp/qemu-test/src/dtc/treesource.c
DEP /tmp/qemu-test/src/dtc/livetree.c
DEP /tmp/qemu-test/src/dtc/fstree.c
DEP /tmp/qemu-test/src/dtc/flattree.c
DEP /tmp/qemu-test/src/dtc/dtc.c
DEP /tmp/qemu-test/src/dtc/data.c
DEP /tmp/qemu-test/src/dtc/checks.c
DEP convert-dtsv0-lexer.lex.c
DEP dtc-parser.tab.c
DEP dtc-lexer.lex.c
CHK version_gen.h
UPD version_gen.h
DEP /tmp/qemu-test/src/dtc/util.c
CC libfdt/fdt.o
CC libfdt/fdt_ro.o
CC libfdt/fdt_wip.o
CC libfdt/fdt_sw.o
CC libfdt/fdt_rw.o
CC libfdt/fdt_strerror.o
CC libfdt/fdt_empty_tree.o
CC libfdt/fdt_addresses.o
CC libfdt/fdt_overlay.o
AR libfdt/libfdt.a
x86_64-w64-mingw32-ar: creating libfdt/libfdt.a
a - libfdt/fdt.o
a - libfdt/fdt_ro.o
a - libfdt/fdt_wip.o
a - libfdt/fdt_sw.o
a - libfdt/fdt_rw.o
a - libfdt/fdt_strerror.o
a - libfdt/fdt_empty_tree.o
a - libfdt/fdt_addresses.o
a - libfdt/fdt_overlay.o
RC version.o
GEN qga/qapi-generated/qapi-gen
CC qapi/qapi-builtin-types.o
CC qapi/qapi-types.o
CC qapi/qapi-types-block-core.o
CC qapi/qapi-types-block.o
CC qapi/qapi-types-char.o
CC qapi/qapi-types-common.o
CC qapi/qapi-types-crypto.o
CC qapi/qapi-types-introspect.o
CC qapi/qapi-types-job.o
CC qapi/qapi-types-migration.o
CC qapi/qapi-types-misc.o
CC qapi/qapi-types-net.o
CC qapi/qapi-types-rocker.o
CC qapi/qapi-types-run-state.o
CC qapi/qapi-types-sockets.o
CC qapi/qapi-types-tpm.o
CC qapi/qapi-types-trace.o
CC qapi/qapi-types-transaction.o
CC qapi/qapi-types-ui.o
CC qapi/qapi-builtin-visit.o
CC qapi/qapi-visit.o
CC qapi/qapi-visit-block-core.o
CC qapi/qapi-visit-block.o
CC qapi/qapi-visit-char.o
CC qapi/qapi-visit-common.o
CC qapi/qapi-visit-crypto.o
CC qapi/qapi-visit-introspect.o
CC qapi/qapi-visit-job.o
CC qapi/qapi-visit-migration.o
CC qapi/qapi-visit-misc.o
CC qapi/qapi-visit-net.o
CC qapi/qapi-visit-rocker.o
CC qapi/qapi-visit-run-state.o
CC qapi/qapi-visit-sockets.o
CC qapi/qapi-visit-tpm.o
CC qapi/qapi-visit-trace.o
CC qapi/qapi-visit-transaction.o
CC qapi/qapi-visit-ui.o
CC qapi/qapi-events.o
CC qapi/qapi-events-block-core.o
CC qapi/qapi-events-char.o
CC qapi/qapi-events-block.o
CC qapi/qapi-events-common.o
CC qapi/qapi-events-crypto.o
CC qapi/qapi-events-introspect.o
CC qapi/qapi-events-job.o
CC qapi/qapi-events-migration.o
CC qapi/qapi-events-misc.o
CC qapi/qapi-events-net.o
CC qapi/qapi-events-rocker.o
CC qapi/qapi-events-run-state.o
CC qapi/qapi-events-sockets.o
CC qapi/qapi-events-tpm.o
CC qapi/qapi-events-trace.o
CC qapi/qapi-events-transaction.o
CC qapi/qapi-events-ui.o
CC qapi/qapi-introspect.o
CC qapi/qapi-visit-core.o
CC qapi/qapi-dealloc-visitor.o
CC qapi/qobject-input-visitor.o
CC qapi/qobject-output-visitor.o
CC qapi/qmp-registry.o
CC qapi/qmp-dispatch.o
CC qapi/string-input-visitor.o
CC qapi/string-output-visitor.o
CC qapi/opts-visitor.o
CC qapi/qapi-clone-visitor.o
CC qapi/qmp-event.o
CC qobject/qnull.o
CC qapi/qapi-util.o
CC qobject/qstring.o
CC qobject/qnum.o
CC qobject/qdict.o
CC qobject/qlist.o
CC qobject/qbool.o
CC qobject/qlit.o
CC qobject/qobject.o
CC qobject/qjson.o
CC qobject/json-lexer.o
CC qobject/json-streamer.o
CC qobject/json-parser.o
CC qobject/block-qdict.o
CC trace/simple.o
CC trace/control.o
CC trace/qmp.o
CC util/osdep.o
CC util/cutils.o
CC util/unicode.o
CC util/qemu-timer-common.o
CC util/bufferiszero.o
CC util/lockcnt.o
CC util/aiocb.o
CC util/async.o
CC util/aio-wait.o
CC util/thread-pool.o
CC util/qemu-timer.o
CC util/main-loop.o
CC util/iohandler.o
CC util/aio-win32.o
CC util/event_notifier-win32.o
CC util/oslib-win32.o
CC util/qemu-thread-win32.o
CC util/envlist.o
CC util/path.o
CC util/module.o
CC util/host-utils.o
CC util/bitmap.o
CC util/bitops.o
CC util/hbitmap.o
CC util/fifo8.o
CC util/acl.o
CC util/cacheinfo.o
CC util/error.o
CC util/qemu-error.o
CC util/id.o
CC util/iov.o
CC util/qemu-config.o
CC util/qemu-sockets.o
CC util/uri.o
CC util/notify.o
CC util/qemu-option.o
CC util/qemu-progress.o
CC util/keyval.o
CC util/hexdump.o
CC util/crc32c.o
CC util/uuid.o
CC util/throttle.o
CC util/getauxval.o
CC util/readline.o
CC util/rcu.o
CC util/qemu-coroutine.o
CC util/qemu-coroutine-lock.o
CC util/qemu-coroutine-io.o
CC util/qemu-coroutine-sleep.o
CC util/coroutine-win32.o
CC util/buffer.o
CC util/timed-average.o
CC util/base64.o
CC util/log.o
CC util/pagesize.o
CC util/qdist.o
CC util/qht.o
CC util/qsp.o
CC util/range.o
CC util/stats64.o
CC util/systemd.o
CC util/iova-tree.o
CC trace-root.o
CC accel/kvm/trace.o
CC accel/tcg/trace.o
CC audio/trace.o
CC block/trace.o
CC chardev/trace.o
CC crypto/trace.o
CC hw/9pfs/trace.o
CC hw/acpi/trace.o
CC hw/alpha/trace.o
CC hw/arm/trace.o
CC hw/audio/trace.o
CC hw/block/trace.o
CC hw/display/trace.o
CC hw/block/dataplane/trace.o
CC hw/char/trace.o
CC hw/dma/trace.o
CC hw/hppa/trace.o
CC hw/i2c/trace.o
CC hw/i386/trace.o
CC hw/i386/xen/trace.o
CC hw/ide/trace.o
CC hw/input/trace.o
CC hw/intc/trace.o
CC hw/isa/trace.o
CC hw/mem/trace.o
CC hw/misc/trace.o
CC hw/misc/macio/trace.o
CC hw/net/trace.o
CC hw/nvram/trace.o
CC hw/pci/trace.o
CC hw/pci-host/trace.o
CC hw/rdma/trace.o
CC hw/ppc/trace.o
CC hw/rdma/vmw/trace.o
CC hw/s390x/trace.o
CC hw/scsi/trace.o
CC hw/sd/trace.o
CC hw/sparc/trace.o
CC hw/sparc64/trace.o
CC hw/timer/trace.o
CC hw/tpm/trace.o
CC hw/usb/trace.o
CC hw/vfio/trace.o
CC hw/virtio/trace.o
CC hw/watchdog/trace.o
CC hw/xen/trace.o
CC io/trace.o
CC linux-user/trace.o
CC migration/trace.o
CC nbd/trace.o
CC net/trace.o
CC qapi/trace.o
CC qom/trace.o
CC scsi/trace.o
CC target/arm/trace.o
CC target/i386/trace.o
CC target/mips/trace.o
CC target/ppc/trace.o
CC target/s390x/trace.o
CC target/sparc/trace.o
CC ui/trace.o
CC util/trace.o
CC crypto/pbkdf-stub.o
CC stubs/arch-query-cpu-def.o
CC stubs/arch-query-cpu-model-expansion.o
CC stubs/arch-query-cpu-model-comparison.o
CC stubs/bdrv-next-monitor-owned.o
CC stubs/arch-query-cpu-model-baseline.o
CC stubs/blk-commit-all.o
CC stubs/blockdev-close-all-bdrv-states.o
CC stubs/clock-warp.o
CC stubs/cpu-get-clock.o
CC stubs/cpu-get-icount.o
CC stubs/dump.o
CC stubs/gdbstub.o
CC stubs/error-printf.o
CC stubs/fdset.o
CC stubs/get-vm-name.o
CC stubs/iothread.o
CC stubs/iothread-lock.o
CC stubs/is-daemonized.o
CC stubs/migr-blocker.o
CC stubs/machine-init-done.o
CC stubs/change-state-handler.o
CC stubs/monitor.o
CC stubs/qtest.o
CC stubs/notify-event.o
CC stubs/replay.o
CC stubs/runstate-check.o
CC stubs/set-fd-handler.o
CC stubs/slirp.o
CC stubs/sysbus.o
CC stubs/tpm.o
CC stubs/trace-control.o
CC stubs/uuid.o
CC stubs/vm-stop.o
CC stubs/vmstate.o
CC stubs/fd-register.o
CC stubs/qmp_memory_device.o
CC stubs/target-monitor-defs.o
CC stubs/target-get-monitor-def.o
CC stubs/pc_madt_cpu_entry.o
CC stubs/vmgenid.o
CC stubs/xen-common.o
CC stubs/xen-hvm.o
CC stubs/pci-host-piix.o
CC stubs/ram-block.o
CC stubs/ramfb.o
GEN qemu-img-cmds.h
CC block.o
CC blockjob.o
CC job.o
CC qemu-io-cmds.o
CC replication.o
CC block/raw-format.o
CC block/vmdk.o
CC block/vpc.o
CC block/qcow.o
CC block/vdi.o
CC block/cloop.o
CC block/bochs.o
CC block/vvfat.o
CC block/dmg.o
CC block/qcow2.o
CC block/qcow2-snapshot.o
CC block/qcow2-refcount.o
CC block/qcow2-cluster.o
CC block/qcow2-cache.o
CC block/qcow2-bitmap.o
CC block/qed.o
CC block/qed-l2-cache.o
CC block/qed-table.o
CC block/qed-cluster.o
CC block/qed-check.o
CC block/vhdx.o
CC block/vhdx-endian.o
CC block/vhdx-log.o
CC block/quorum.o
CC block/blkdebug.o
CC block/blkverify.o
CC block/blkreplay.o
CC block/parallels.o
CC block/blklogwrites.o
CC block/block-backend.o
CC block/snapshot.o
CC block/qapi.o
CC block/file-win32.o
CC block/win32-aio.o
CC block/null.o
CC block/mirror.o
CC block/commit.o
CC block/io.o
CC block/create.o
CC block/throttle-groups.o
CC block/nbd.o
CC block/sheepdog.o
CC block/nbd-client.o
CC block/accounting.o
CC block/dirty-bitmap.o
CC block/write-threshold.o
CC block/backup.o
CC block/replication.o
CC block/throttle.o
CC block/copy-on-read.o
CC block/crypto.o
CC nbd/server.o
CC nbd/client.o
CC nbd/common.o
CC scsi/utils.o
CC scsi/pr-manager-stub.o
CC block/curl.o
CC block/ssh.o
CC block/dmg-bz2.o
CC crypto/init.o
CC crypto/hash.o
CC crypto/hash-nettle.o
CC crypto/hmac.o
CC crypto/hmac-nettle.o
CC crypto/aes.o
CC crypto/desrfb.o
CC crypto/cipher.o
CC crypto/tlscreds.o
CC crypto/tlscredsanon.o
CC crypto/tlscredspsk.o
CC crypto/tlscredsx509.o
CC crypto/tlssession.o
CC crypto/secret.o
CC crypto/random-gnutls.o
CC crypto/pbkdf.o
CC crypto/pbkdf-nettle.o
CC crypto/ivgen.o
CC crypto/ivgen-essiv.o
CC crypto/ivgen-plain.o
CC crypto/ivgen-plain64.o
CC crypto/afsplit.o
CC crypto/xts.o
CC crypto/block.o
CC crypto/block-qcow.o
CC crypto/block-luks.o
CC io/channel.o
CC io/channel-buffer.o
CC io/channel-command.o
CC io/channel-file.o
CC io/channel-socket.o
CC io/channel-tls.o
CC io/channel-watch.o
CC io/channel-websock.o
CC io/channel-util.o
CC io/dns-resolver.o
CC io/net-listener.o
CC io/task.o
CC qom/object.o
CC qom/container.o
CC qom/qom-qobject.o
CC qom/object_interfaces.o
CC qemu-io.o
CC qemu-edid.o
CC hw/display/edid-generate.o
CC blockdev.o
CC blockdev-nbd.o
CC bootdevice.o
CC iothread.o
CC job-qmp.o
CC qdev-monitor.o
CC device-hotplug.o
CC os-win32.o
CC bt-host.o
CC bt-vhci.o
CC dma-helpers.o
CC vl.o
CC device_tree.o
CC tpm.o
CC qapi/qapi-commands.o
CC qapi/qapi-commands-block.o
CC qapi/qapi-commands-block-core.o
CC qapi/qapi-commands-char.o
CC qapi/qapi-commands-common.o
CC qapi/qapi-commands-crypto.o
CC qapi/qapi-commands-introspect.o
CC qapi/qapi-commands-migration.o
CC qapi/qapi-commands-job.o
CC qapi/qapi-commands-net.o
CC qapi/qapi-commands-misc.o
CC qapi/qapi-commands-rocker.o
CC qapi/qapi-commands-run-state.o
CC qapi/qapi-commands-sockets.o
CC qapi/qapi-commands-tpm.o
CC qapi/qapi-commands-trace.o
CC qapi/qapi-commands-transaction.o
CC qapi/qapi-commands-ui.o
CC qmp.o
CC hmp.o
CC cpus-common.o
CC audio/audio.o
CC audio/noaudio.o
CC audio/wavaudio.o
CC audio/mixeng.o
CC audio/dsoundaudio.o
CC audio/audio_win_int.o
CC audio/wavcapture.o
CC backends/rng.o
CC backends/rng-egd.o
CC backends/hostmem.o
CC backends/tpm.o
CC backends/hostmem-ram.o
CC backends/cryptodev.o
CC backends/cryptodev-builtin.o
CC backends/cryptodev-vhost.o
CC block/stream.o
CC chardev/msmouse.o
CC chardev/wctablet.o
CC chardev/testdev.o
CC disas/arm.o
CC disas/i386.o
CXX disas/arm-a64.o
CXX disas/libvixl/vixl/utils.o
CXX disas/libvixl/vixl/a64/instructions-a64.o
CXX disas/libvixl/vixl/compiler-intrinsics.o
CXX disas/libvixl/vixl/a64/decoder-a64.o
CXX disas/libvixl/vixl/a64/disasm-a64.o
CC hw/acpi/core.o
CC hw/acpi/piix4.o
CC hw/acpi/pcihp.o
CC hw/acpi/ich9.o
CC hw/acpi/tco.o
CC hw/acpi/cpu_hotplug.o
CC hw/acpi/memory_hotplug.o
CC hw/acpi/cpu.o
CC hw/acpi/nvdimm.o
CC hw/acpi/vmgenid.o
CC hw/acpi/acpi_interface.o
CC hw/acpi/bios-linker-loader.o
CC hw/acpi/aml-build.o
CC hw/acpi/ipmi.o
CC hw/acpi/acpi-stub.o
CC hw/acpi/ipmi-stub.o
CC hw/audio/sb16.o
CC hw/audio/es1370.o
CC hw/audio/ac97.o
CC hw/audio/fmopl.o
CC hw/audio/adlib.o
CC hw/audio/gus.o
CC hw/audio/gusemu_hal.o
CC hw/audio/gusemu_mixer.o
CC hw/audio/cs4231a.o
CC hw/audio/intel-hda.o
CC hw/audio/hda-codec.o
CC hw/audio/pcspk.o
CC hw/audio/wm8750.o
CC hw/audio/pl041.o
CC hw/audio/lm4549.o
CC hw/audio/marvell_88w8618.o
CC hw/audio/soundhw.o
CC hw/block/block.o
CC hw/block/cdrom.o
CC hw/block/hd-geometry.o
CC hw/block/fdc.o
CC hw/block/m25p80.o
CC hw/block/nand.o
CC hw/block/pflash_cfi01.o
CC hw/block/pflash_cfi02.o
CC hw/block/onenand.o
CC hw/block/ecc.o
CC hw/block/nvme.o
CC hw/bt/core.o
CC hw/bt/l2cap.o
CC hw/bt/sdp.o
CC hw/bt/hci.o
CC hw/bt/hid.o
CC hw/bt/hci-csr.o
CC hw/char/ipoctal232.o
CC hw/char/nrf51_uart.o
CC hw/char/parallel.o
CC hw/char/parallel-isa.o
CC hw/char/pl011.o
CC hw/char/serial.o
CC hw/char/serial-isa.o
CC hw/char/serial-pci.o
CC hw/char/virtio-console.o
CC hw/char/cadence_uart.o
CC hw/char/cmsdk-apb-uart.o
CC hw/char/debugcon.o
CC hw/char/imx_serial.o
CC hw/core/qdev.o
CC hw/core/qdev-properties.o
CC hw/core/bus.o
CC hw/core/reset.o
CC hw/core/qdev-fw.o
CC hw/core/fw-path-provider.o
CC hw/core/irq.o
CC hw/core/hotplug.o
CC hw/core/nmi.o
CC hw/core/stream.o
CC hw/core/ptimer.o
CC hw/core/sysbus.o
CC hw/core/machine.o
CC hw/core/loader.o
CC hw/core/qdev-properties-system.o
CC hw/core/register.o
CC hw/core/or-irq.o
CC hw/core/split-irq.o
CC hw/core/platform-bus.o
CC hw/core/generic-loader.o
CC hw/core/null-machine.o
CC hw/cpu/core.o
CC hw/display/ramfb.o
CC hw/display/ramfb-standalone.o
CC hw/display/ads7846.o
CC hw/display/cirrus_vga.o
CC hw/display/cirrus_vga_isa.o
CC hw/display/pl110.o
CC hw/display/sii9022.o
CC hw/display/ssd0303.o
CC hw/display/ssd0323.o
CC hw/display/vga-pci.o
CC hw/display/edid-region.o
CC hw/display/vga-isa.o
CC hw/display/vmware_vga.o
CC hw/display/bochs-display.o
CC hw/display/blizzard.o
CC hw/display/exynos4210_fimd.o
CC hw/display/framebuffer.o
CC hw/display/tc6393xb.o
CC hw/dma/pl080.o
CC hw/dma/pl330.o
CC hw/dma/i8257.o
CC hw/dma/xilinx_axidma.o
CC hw/dma/xlnx-zynq-devcfg.o
CC hw/dma/xlnx-zdma.o
CC hw/gpio/max7310.o
CC hw/gpio/pl061.o
CC hw/gpio/zaurus.o
CC hw/gpio/gpio_key.o
CC hw/i2c/core.o
CC hw/i2c/smbus.o
CC hw/i2c/smbus_eeprom.o
CC hw/i2c/i2c-ddc.o
CC hw/i2c/smbus_ich9.o
CC hw/i2c/versatile_i2c.o
CC hw/i2c/pm_smbus.o
CC hw/i2c/bitbang_i2c.o
CC hw/i2c/exynos4210_i2c.o
CC hw/i2c/imx_i2c.o
CC hw/i2c/aspeed_i2c.o
CC hw/ide/core.o
CC hw/ide/atapi.o
CC hw/ide/qdev.o
CC hw/ide/pci.o
CC hw/ide/isa.o
CC hw/ide/piix.o
CC hw/ide/microdrive.o
CC hw/ide/ahci.o
CC hw/ide/ich.o
CC hw/ide/ahci-allwinner.o
CC hw/input/hid.o
CC hw/input/lm832x.o
CC hw/input/pckbd.o
CC hw/input/pl050.o
CC hw/input/ps2.o
CC hw/input/stellaris_input.o
CC hw/input/tsc2005.o
CC hw/input/virtio-input.o
CC hw/input/virtio-input-hid.o
CC hw/intc/i8259_common.o
CC hw/intc/i8259.o
CC hw/intc/pl190.o
CC hw/intc/xlnx-pmu-iomod-intc.o
CC hw/intc/xlnx-zynqmp-ipi.o
CC hw/intc/imx_avic.o
CC hw/intc/imx_gpcv2.o
CC hw/intc/realview_gic.o
CC hw/intc/ioapic_common.o
CC hw/intc/arm_gic_common.o
CC hw/intc/arm_gic.o
CC hw/intc/arm_gicv2m.o
CC hw/intc/arm_gicv3_common.o
CC hw/intc/arm_gicv3.o
CC hw/intc/arm_gicv3_dist.o
CC hw/intc/arm_gicv3_redist.o
CC hw/intc/arm_gicv3_its_common.o
CC hw/intc/intc.o
CC hw/ipack/ipack.o
CC hw/ipack/tpci200.o
CC hw/ipmi/ipmi.o
CC hw/ipmi/ipmi_bmc_sim.o
CC hw/ipmi/ipmi_bmc_extern.o
CC hw/ipmi/isa_ipmi_kcs.o
CC hw/ipmi/isa_ipmi_bt.o
CC hw/isa/isa-bus.o
CC hw/isa/isa-superio.o
CC hw/isa/apm.o
CC hw/mem/pc-dimm.o
CC hw/mem/memory-device.o
CC hw/mem/nvdimm.o
CC hw/misc/applesmc.o
CC hw/misc/max111x.o
CC hw/misc/tmp105.o
CC hw/misc/tmp421.o
CC hw/misc/debugexit.o
CC hw/misc/pc-testdev.o
CC hw/misc/sga.o
CC hw/misc/pci-testdev.o
CC hw/misc/edu.o
CC hw/misc/pca9552.o
CC hw/misc/unimp.o
CC hw/misc/vmcoreinfo.o
CC hw/misc/arm_l2x0.o
CC hw/misc/arm_integrator_debug.o
CC hw/misc/a9scu.o
CC hw/misc/arm11scu.o
CC hw/net/ne2000.o
CC hw/net/eepro100.o
CC hw/net/pcnet-pci.o
CC hw/net/pcnet.o
CC hw/net/e1000.o
CC hw/net/e1000x_common.o
CC hw/net/net_tx_pkt.o
CC hw/net/net_rx_pkt.o
CC hw/net/e1000e.o
CC hw/net/e1000e_core.o
CC hw/net/rtl8139.o
CC hw/net/vmxnet3.o
CC hw/net/smc91c111.o
CC hw/net/lan9118.o
CC hw/net/ne2000-isa.o
CC hw/net/xgmac.o
CC hw/net/xilinx_axienet.o
CC hw/net/allwinner_emac.o
CC hw/net/imx_fec.o
CC hw/net/cadence_gem.o
CC hw/net/stellaris_enet.o
CC hw/net/ftgmac100.o
CC hw/net/rocker/rocker.o
CC hw/net/rocker/rocker_fp.o
CC hw/net/rocker/rocker_desc.o
CC hw/net/rocker/rocker_world.o
CC hw/net/rocker/rocker_of_dpa.o
CC hw/net/can/can_sja1000.o
CC hw/net/can/can_kvaser_pci.o
CC hw/net/can/can_pcm3680_pci.o
CC hw/net/can/can_mioe3680_pci.o
CC hw/nvram/eeprom93xx.o
CC hw/nvram/fw_cfg.o
CC hw/nvram/chrp_nvram.o
CC hw/pci-bridge/pci_bridge_dev.o
CC hw/pci-bridge/pcie_root_port.o
CC hw/pci-bridge/gen_pcie_root_port.o
CC hw/pci-bridge/pcie_pci_bridge.o
CC hw/pci-bridge/pci_expander_bridge.o
CC hw/pci-bridge/xio3130_upstream.o
CC hw/pci-bridge/xio3130_downstream.o
CC hw/pci-bridge/ioh3420.o
CC hw/pci-bridge/i82801b11.o
CC hw/pci-host/pam.o
CC hw/pci-host/versatile.o
CC hw/pci-host/piix.o
CC hw/pci-host/q35.o
CC hw/pci-host/gpex.o
CC hw/pci-host/designware.o
CC hw/pci/pci.o
CC hw/pci/pci_bridge.o
CC hw/pci/msix.o
CC hw/pci/msi.o
CC hw/pci/shpc.o
CC hw/pci/slotid_cap.o
CC hw/pci/pci_host.o
CC hw/pci/pcie_host.o
CC hw/pci/pcie.o
CC hw/pci/pcie_aer.o
CC hw/pci/pcie_port.o
CC hw/pci/pci-stub.o
CC hw/pcmcia/pcmcia.o
CC hw/scsi/scsi-disk.o
CC hw/scsi/emulation.o
CC hw/scsi/scsi-generic.o
CC hw/scsi/scsi-bus.o
CC hw/scsi/lsi53c895a.o
CC hw/scsi/mptsas.o
CC hw/scsi/mptconfig.o
CC hw/scsi/mptendian.o
CC hw/scsi/megasas.o
CC hw/scsi/vmw_pvscsi.o
CC hw/scsi/esp.o
CC hw/scsi/esp-pci.o
CC hw/sd/pl181.o
CC hw/sd/ssi-sd.o
CC hw/sd/sd.o
CC hw/sd/core.o
CC hw/sd/sdmmc-internal.o
CC hw/smbios/smbios.o
CC hw/sd/sdhci.o
CC hw/smbios/smbios_type_38.o
CC hw/smbios/smbios-stub.o
CC hw/smbios/smbios_type_38-stub.o
CC hw/ssi/pl022.o
CC hw/ssi/ssi.o
CC hw/ssi/xilinx_spips.o
CC hw/ssi/aspeed_smc.o
CC hw/ssi/stm32f2xx_spi.o
CC hw/ssi/mss-spi.o
CC hw/timer/arm_timer.o
CC hw/timer/arm_mptimer.o
CC hw/timer/armv7m_systick.o
CC hw/timer/a9gtimer.o
CC hw/timer/cadence_ttc.o
CC hw/timer/ds1338.o
CC hw/timer/i8254_common.o
CC hw/timer/hpet.o
CC hw/timer/i8254.o
CC hw/timer/pl031.o
CC hw/timer/twl92230.o
CC hw/timer/imx_epit.o
CC hw/timer/imx_gpt.o
CC hw/timer/stm32f2xx_timer.o
CC hw/timer/xlnx-zynqmp-rtc.o
CC hw/timer/aspeed_timer.o
CC hw/timer/cmsdk-apb-timer.o
CC hw/timer/cmsdk-apb-dualtimer.o
CC hw/timer/mss-timer.o
CC hw/tpm/tpm_util.o
CC hw/tpm/tpm_tis.o
CC hw/tpm/tpm_crb.o
CC hw/usb/core.o
CC hw/usb/combined-packet.o
CC hw/usb/bus.o
CC hw/usb/libhw.o
CC hw/usb/desc.o
CC hw/usb/desc-msos.o
CC hw/usb/hcd-uhci.o
CC hw/usb/hcd-ohci.o
CC hw/usb/hcd-ehci.o
CC hw/usb/hcd-ehci-sysbus.o
CC hw/usb/hcd-ehci-pci.o
CC hw/usb/hcd-xhci.o
CC hw/usb/hcd-xhci-nec.o
CC hw/usb/hcd-musb.o
CC hw/usb/dev-hub.o
CC hw/usb/dev-hid.o
CC hw/usb/dev-wacom.o
CC hw/usb/dev-storage.o
CC hw/usb/dev-uas.o
CC hw/usb/dev-audio.o
CC hw/usb/dev-serial.o
CC hw/usb/dev-network.o
CC hw/usb/dev-bluetooth.o
CC hw/usb/dev-smartcard-reader.o
CC hw/usb/host-stub.o
CC hw/virtio/virtio-bus.o
CC hw/virtio/virtio-rng.o
CC hw/virtio/virtio-pci.o
CC hw/virtio/virtio-mmio.o
CC hw/virtio/vhost-stub.o
CC hw/watchdog/watchdog.o
CC hw/watchdog/cmsdk-apb-watchdog.o
CC hw/watchdog/wdt_i6300esb.o
CC hw/watchdog/wdt_ib700.o
CC hw/watchdog/wdt_aspeed.o
CC migration/migration.o
CC migration/socket.o
CC migration/fd.o
CC migration/exec.o
CC migration/tls.o
CC migration/channel.o
CC migration/savevm.o
CC migration/colo.o
CC migration/colo-failover.o
CC migration/vmstate.o
CC migration/vmstate-types.o
CC migration/page_cache.o
CC migration/qemu-file.o
CC migration/global_state.o
CC migration/qemu-file-channel.o
CC migration/xbzrle.o
CC migration/postcopy-ram.o
CC migration/qjson.o
CC migration/block-dirty-bitmap.o
CC migration/block.o
CC net/net.o
CC net/queue.o
CC net/checksum.o
CC net/util.o
CC net/hub.o
CC net/socket.o
CC net/dump.o
CC net/eth.o
CC net/slirp.o
CC net/filter.o
CC net/filter-buffer.o
CC net/filter-mirror.o
CC net/colo-compare.o
CC net/colo.o
CC net/filter-rewriter.o
CC net/filter-replay.o
CC net/tap-win32.o
CC net/can/can_core.o
CC net/can/can_host.o
CC qom/cpu.o
CC replay/replay.o
CC replay/replay-internal.o
CC replay/replay-events.o
CC replay/replay-time.o
CC replay/replay-input.o
CC replay/replay-char.o
CC replay/replay-snapshot.o
CC replay/replay-net.o
CC replay/replay-audio.o
CC slirp/cksum.o
CC slirp/if.o
CC slirp/ip_icmp.o
CC slirp/ip6_icmp.o
CC slirp/ip6_input.o
CC slirp/ip6_output.o
CC slirp/ip_input.o
CC slirp/ip_output.o
CC slirp/dnssearch.o
CC slirp/dhcpv6.o
CC slirp/slirp.o
CC slirp/mbuf.o
CC slirp/misc.o
CC slirp/sbuf.o
CC slirp/socket.o
CC slirp/tcp_input.o
CC slirp/tcp_output.o
CC slirp/tcp_subr.o
CC slirp/tcp_timer.o
CC slirp/udp.o
CC slirp/udp6.o
CC slirp/bootp.o
CC slirp/tftp.o
CC slirp/arp_table.o
CC slirp/ndp_table.o
CC slirp/ncsi.o
CC ui/keymaps.o
CC ui/console.o
CC ui/cursor.o
CC ui/qemu-pixman.o
CC ui/input.o
CC ui/input-keymap.o
CC ui/input-legacy.o
CC ui/vnc.o
CC ui/vnc-enc-zlib.o
CC ui/vnc-enc-hextile.o
CC ui/vnc-enc-tight.o
CC ui/vnc-palette.o
CC ui/vnc-enc-zrle.o
CC ui/vnc-auth-vencrypt.o
CC ui/vnc-ws.o
CC ui/vnc-jobs.o
CC ui/sdl2.o
CC ui/sdl2-input.o
CC ui/sdl2-2d.o
CC ui/gtk.o
CC chardev/char.o
CC chardev/char-console.o
CC chardev/char-fe.o
CC chardev/char-file.o
CC chardev/char-io.o
CC chardev/char-mux.o
CC chardev/char-null.o
CC chardev/char-pipe.o
CC chardev/char-ringbuf.o
CC chardev/char-serial.o
CC chardev/char-socket.o
CC chardev/char-stdio.o
CC chardev/char-udp.o
CC chardev/char-win.o
CC chardev/char-win-stdio.o
CC qga/commands.o
CC qga/guest-agent-command-state.o
CC qga/main.o
CC qga/commands-win32.o
CC qga/channel-win32.o
AS optionrom/multiboot.o
AS optionrom/linuxboot.o
CC qga/service-win32.o
CC optionrom/linuxboot_dma.o
AS optionrom/kvmvapic.o
BUILD optionrom/multiboot.img
BUILD optionrom/linuxboot.img
CC qga/vss-win32.o
BUILD optionrom/multiboot.raw
CC qga/qapi-generated/qga-qapi-types.o
CC qga/qapi-generated/qga-qapi-visit.o
CC qga/qapi-generated/qga-qapi-commands.o
AR libqemuutil.a
CC qemu-img.o
BUILD optionrom/linuxboot.raw
BUILD optionrom/linuxboot_dma.img
BUILD optionrom/kvmvapic.img
SIGN optionrom/multiboot.bin
SIGN optionrom/linuxboot.bin
BUILD optionrom/linuxboot_dma.raw
BUILD optionrom/kvmvapic.raw
SIGN optionrom/linuxboot_dma.bin
SIGN optionrom/kvmvapic.bin
LINK qemu-io.exe
LINK qemu-edid.exe
LINK qemu-img.exe
LINK qemu-ga.exe
GEN x86_64-softmmu/config-target.h
GEN x86_64-softmmu/hmp-commands.h
GEN x86_64-softmmu/hmp-commands-info.h
CC x86_64-softmmu/exec.o
CC x86_64-softmmu/tcg/tcg-op-gvec.o
CC x86_64-softmmu/tcg/tcg.o
CC x86_64-softmmu/tcg/tcg-op.o
CC x86_64-softmmu/tcg/tcg-op-vec.o
CC x86_64-softmmu/tcg/tcg-common.o
CC x86_64-softmmu/tcg/optimize.o
CC x86_64-softmmu/fpu/softfloat.o
CC x86_64-softmmu/disas.o
GEN x86_64-softmmu/gdbstub-xml.c
GEN aarch64-softmmu/hmp-commands.h
GEN aarch64-softmmu/hmp-commands-info.h
GEN aarch64-softmmu/config-target.h
CC x86_64-softmmu/arch_init.o
CC x86_64-softmmu/cpus.o
CC aarch64-softmmu/exec.o
CC x86_64-softmmu/monitor.o
CC x86_64-softmmu/gdbstub.o
CC x86_64-softmmu/balloon.o
CC x86_64-softmmu/ioport.o
CC aarch64-softmmu/tcg/tcg.o
CC aarch64-softmmu/tcg/tcg-op.o
CC aarch64-softmmu/tcg/tcg-op-vec.o
CC x86_64-softmmu/numa.o
CC aarch64-softmmu/tcg/tcg-op-gvec.o
CC x86_64-softmmu/qtest.o
CC x86_64-softmmu/memory.o
CC x86_64-softmmu/memory_mapping.o
CC x86_64-softmmu/dump.o
CC x86_64-softmmu/win_dump.o
CC x86_64-softmmu/migration/ram.o
CC x86_64-softmmu/accel/accel.o
CC aarch64-softmmu/tcg/tcg-common.o
CC x86_64-softmmu/accel/stubs/hvf-stub.o
CC aarch64-softmmu/tcg/optimize.o
CC x86_64-softmmu/accel/stubs/whpx-stub.o
CC aarch64-softmmu/fpu/softfloat.o
CC x86_64-softmmu/accel/stubs/kvm-stub.o
CC aarch64-softmmu/disas.o
CC x86_64-softmmu/accel/tcg/tcg-all.o
GEN aarch64-softmmu/gdbstub-xml.c
CC x86_64-softmmu/accel/tcg/cputlb.o
CC x86_64-softmmu/accel/tcg/tcg-runtime.o
CC aarch64-softmmu/arch_init.o
CC x86_64-softmmu/accel/tcg/tcg-runtime-gvec.o
CC aarch64-softmmu/cpus.o
CC x86_64-softmmu/accel/tcg/cpu-exec.o
CC aarch64-softmmu/monitor.o
CC x86_64-softmmu/accel/tcg/cpu-exec-common.o
CC aarch64-softmmu/gdbstub.o
CC x86_64-softmmu/accel/tcg/translate-all.o
CC aarch64-softmmu/balloon.o
CC x86_64-softmmu/accel/tcg/translator.o
CC aarch64-softmmu/ioport.o
CC x86_64-softmmu/hw/block/virtio-blk.o
CC aarch64-softmmu/numa.o
CC x86_64-softmmu/hw/block/dataplane/virtio-blk.o
CC aarch64-softmmu/qtest.o
CC x86_64-softmmu/hw/char/virtio-serial-bus.o
CC x86_64-softmmu/hw/display/vga.o
CC aarch64-softmmu/memory.o
CC x86_64-softmmu/hw/display/virtio-gpu.o
CC aarch64-softmmu/memory_mapping.o
CC x86_64-softmmu/hw/display/virtio-gpu-3d.o
CC aarch64-softmmu/dump.o
CC aarch64-softmmu/migration/ram.o
CC aarch64-softmmu/accel/accel.o
CC aarch64-softmmu/accel/stubs/hax-stub.o
CC aarch64-softmmu/accel/stubs/hvf-stub.o
CC aarch64-softmmu/accel/stubs/whpx-stub.o
CC aarch64-softmmu/accel/stubs/kvm-stub.o
CC aarch64-softmmu/accel/tcg/tcg-all.o
CC aarch64-softmmu/accel/tcg/cputlb.o
CC x86_64-softmmu/hw/display/virtio-gpu-pci.o
CC aarch64-softmmu/accel/tcg/tcg-runtime.o
CC x86_64-softmmu/hw/display/virtio-vga.o
CC aarch64-softmmu/accel/tcg/tcg-runtime-gvec.o
CC aarch64-softmmu/accel/tcg/cpu-exec.o
CC x86_64-softmmu/hw/intc/apic.o
CC aarch64-softmmu/accel/tcg/cpu-exec-common.o
CC aarch64-softmmu/accel/tcg/translate-all.o
CC x86_64-softmmu/hw/intc/apic_common.o
CC aarch64-softmmu/accel/tcg/translator.o
CC aarch64-softmmu/hw/adc/stm32f2xx_adc.o
CC x86_64-softmmu/hw/intc/ioapic.o
CC aarch64-softmmu/hw/block/virtio-blk.o
CC x86_64-softmmu/hw/isa/lpc_ich9.o
CC x86_64-softmmu/hw/misc/pvpanic.o
CC aarch64-softmmu/hw/block/dataplane/virtio-blk.o
CC aarch64-softmmu/hw/char/exynos4210_uart.o
CC x86_64-softmmu/hw/net/virtio-net.o
CC aarch64-softmmu/hw/char/omap_uart.o
CC x86_64-softmmu/hw/net/vhost_net.o
CC aarch64-softmmu/hw/char/digic-uart.o
CC x86_64-softmmu/hw/scsi/virtio-scsi.o
CC aarch64-softmmu/hw/char/stm32f2xx_usart.o
CC x86_64-softmmu/hw/scsi/virtio-scsi-dataplane.o
CC x86_64-softmmu/hw/timer/mc146818rtc.o
CC aarch64-softmmu/hw/char/bcm2835_aux.o
CC x86_64-softmmu/hw/virtio/virtio.o
CC aarch64-softmmu/hw/char/virtio-serial-bus.o
CC aarch64-softmmu/hw/cpu/arm11mpcore.o
CC x86_64-softmmu/hw/virtio/virtio-balloon.o
CC x86_64-softmmu/hw/virtio/virtio-crypto.o
CC x86_64-softmmu/hw/virtio/virtio-crypto-pci.o
CC aarch64-softmmu/hw/cpu/realview_mpcore.o
CC aarch64-softmmu/hw/cpu/a9mpcore.o
CC x86_64-softmmu/hw/i386/multiboot.o
CC aarch64-softmmu/hw/cpu/a15mpcore.o
CC x86_64-softmmu/hw/i386/pc.o
CC aarch64-softmmu/hw/display/omap_dss.o
CC x86_64-softmmu/hw/i386/pc_piix.o
CC x86_64-softmmu/hw/i386/pc_q35.o
CC x86_64-softmmu/hw/i386/pc_sysfw.o
CC x86_64-softmmu/hw/i386/x86-iommu.o
CC aarch64-softmmu/hw/display/pxa2xx_lcd.o
CC aarch64-softmmu/hw/display/omap_lcdc.o
CC x86_64-softmmu/hw/i386/intel_iommu.o
CC x86_64-softmmu/hw/i386/amd_iommu.o
CC x86_64-softmmu/hw/i386/vmport.o
CC aarch64-softmmu/hw/display/bcm2835_fb.o
CC x86_64-softmmu/hw/i386/vmmouse.o
CC aarch64-softmmu/hw/display/vga.o
CC x86_64-softmmu/hw/i386/kvmvapic.o
CC aarch64-softmmu/hw/display/virtio-gpu.o
CC x86_64-softmmu/hw/i386/acpi-build.o
CC x86_64-softmmu/target/i386/helper.o
CC x86_64-softmmu/target/i386/cpu.o
CC aarch64-softmmu/hw/display/virtio-gpu-3d.o
CC aarch64-softmmu/hw/display/virtio-gpu-pci.o
CC x86_64-softmmu/target/i386/gdbstub.o
CC aarch64-softmmu/hw/display/dpcd.o
CC x86_64-softmmu/target/i386/xsave_helper.o
CC aarch64-softmmu/hw/display/xlnx_dp.o
CC x86_64-softmmu/target/i386/translate.o
CC x86_64-softmmu/target/i386/bpt_helper.o
CC aarch64-softmmu/hw/dma/xlnx_dpdma.o
CC aarch64-softmmu/hw/dma/omap_dma.o
CC x86_64-softmmu/target/i386/cc_helper.o
CC x86_64-softmmu/target/i386/excp_helper.o
CC aarch64-softmmu/hw/dma/soc_dma.o
CC x86_64-softmmu/target/i386/fpu_helper.o
CC x86_64-softmmu/target/i386/int_helper.o
CC aarch64-softmmu/hw/dma/pxa2xx_dma.o
CC x86_64-softmmu/target/i386/mem_helper.o
CC aarch64-softmmu/hw/dma/bcm2835_dma.o
CC aarch64-softmmu/hw/gpio/omap_gpio.o
CC x86_64-softmmu/target/i386/misc_helper.o
CC aarch64-softmmu/hw/gpio/imx_gpio.o
CC aarch64-softmmu/hw/gpio/bcm2835_gpio.o
CC x86_64-softmmu/target/i386/mpx_helper.o
CC aarch64-softmmu/hw/i2c/omap_i2c.o
CC x86_64-softmmu/target/i386/seg_helper.o
CC aarch64-softmmu/hw/input/pxa2xx_keypad.o
CC x86_64-softmmu/target/i386/smm_helper.o
CC aarch64-softmmu/hw/input/tsc210x.o
CC x86_64-softmmu/target/i386/svm_helper.o
CC x86_64-softmmu/target/i386/machine.o
CC x86_64-softmmu/target/i386/arch_memory_mapping.o
CC aarch64-softmmu/hw/intc/armv7m_nvic.o
CC aarch64-softmmu/hw/intc/exynos4210_gic.o
CC aarch64-softmmu/hw/intc/exynos4210_combiner.o
CC x86_64-softmmu/target/i386/arch_dump.o
CC x86_64-softmmu/target/i386/monitor.o
CC aarch64-softmmu/hw/intc/omap_intc.o
CC aarch64-softmmu/hw/intc/bcm2835_ic.o
CC x86_64-softmmu/target/i386/kvm-stub.o
CC aarch64-softmmu/hw/intc/bcm2836_control.o
CC x86_64-softmmu/target/i386/hyperv-stub.o
CC aarch64-softmmu/hw/intc/allwinner-a10-pic.o
CC x86_64-softmmu/target/i386/hax-all.o
CC aarch64-softmmu/hw/intc/aspeed_vic.o
CC aarch64-softmmu/hw/intc/arm_gicv3_cpuif.o
CC aarch64-softmmu/hw/misc/arm_sysctl.o
CC aarch64-softmmu/hw/misc/cbus.o
CC aarch64-softmmu/hw/misc/exynos4210_pmu.o
CC x86_64-softmmu/target/i386/hax-mem.o
CC aarch64-softmmu/hw/misc/exynos4210_clk.o
CC aarch64-softmmu/hw/misc/exynos4210_rng.o
CC x86_64-softmmu/target/i386/hax-windows.o
CC aarch64-softmmu/hw/misc/imx_ccm.o
CC x86_64-softmmu/target/i386/sev-stub.o
CC aarch64-softmmu/hw/misc/imx31_ccm.o
CC aarch64-softmmu/hw/misc/imx25_ccm.o
CC aarch64-softmmu/hw/misc/imx6_ccm.o
CC aarch64-softmmu/hw/misc/imx6ul_ccm.o
CC aarch64-softmmu/hw/misc/imx6_src.o
GEN trace/generated-helpers.c
CC aarch64-softmmu/hw/misc/imx7_ccm.o
CC aarch64-softmmu/hw/misc/imx2_wdt.o
CC x86_64-softmmu/trace/control-target.o
CC aarch64-softmmu/hw/misc/imx7_snvs.o
CC aarch64-softmmu/hw/misc/imx7_gpr.o
CC aarch64-softmmu/hw/misc/mst_fpga.o
CC aarch64-softmmu/hw/misc/omap_clk.o
CC aarch64-softmmu/hw/misc/omap_gpmc.o
CC aarch64-softmmu/hw/misc/omap_l4.o
CC x86_64-softmmu/gdbstub-xml.o
CC aarch64-softmmu/hw/misc/omap_sdrc.o
CC aarch64-softmmu/hw/misc/omap_tap.o
CC aarch64-softmmu/hw/misc/bcm2835_mbox.o
CC aarch64-softmmu/hw/misc/bcm2835_property.o
CC aarch64-softmmu/hw/misc/bcm2835_rng.o
CC aarch64-softmmu/hw/misc/zynq_slcr.o
CC x86_64-softmmu/trace/generated-helpers.o
CC aarch64-softmmu/hw/misc/zynq-xadc.o
CC aarch64-softmmu/hw/misc/stm32f2xx_syscfg.o
CC aarch64-softmmu/hw/misc/mps2-fpgaio.o
CC aarch64-softmmu/hw/misc/mps2-scc.o
CC aarch64-softmmu/hw/misc/tz-mpc.o
CC aarch64-softmmu/hw/misc/tz-msc.o
CC aarch64-softmmu/hw/misc/tz-ppc.o
CC aarch64-softmmu/hw/misc/iotkit-secctl.o
CC aarch64-softmmu/hw/misc/iotkit-sysctl.o
CC aarch64-softmmu/hw/misc/iotkit-sysinfo.o
CC aarch64-softmmu/hw/misc/auxbus.o
CC aarch64-softmmu/hw/misc/aspeed_scu.o
CC aarch64-softmmu/hw/misc/aspeed_sdmc.o
CC aarch64-softmmu/hw/misc/msf2-sysreg.o
CC aarch64-softmmu/hw/net/virtio-net.o
CC aarch64-softmmu/hw/net/vhost_net.o
CC aarch64-softmmu/hw/pcmcia/pxa2xx.o
CC aarch64-softmmu/hw/scsi/virtio-scsi.o
CC aarch64-softmmu/hw/scsi/virtio-scsi-dataplane.o
CC aarch64-softmmu/hw/sd/omap_mmc.o
CC aarch64-softmmu/hw/sd/pxa2xx_mmci.o
CC aarch64-softmmu/hw/sd/bcm2835_sdhost.o
CC aarch64-softmmu/hw/ssi/omap_spi.o
CC aarch64-softmmu/hw/ssi/imx_spi.o
CC aarch64-softmmu/hw/timer/exynos4210_mct.o
CC aarch64-softmmu/hw/timer/exynos4210_pwm.o
CC aarch64-softmmu/hw/timer/exynos4210_rtc.o
CC aarch64-softmmu/hw/timer/omap_gptimer.o
CC aarch64-softmmu/hw/timer/omap_synctimer.o
CC aarch64-softmmu/hw/timer/pxa2xx_timer.o
CC aarch64-softmmu/hw/timer/digic-timer.o
CC aarch64-softmmu/hw/timer/allwinner-a10-pit.o
CC aarch64-softmmu/hw/usb/tusb6010.o
CC aarch64-softmmu/hw/usb/chipidea.o
CC aarch64-softmmu/hw/virtio/virtio.o
CC aarch64-softmmu/hw/virtio/virtio-balloon.o
CC aarch64-softmmu/hw/virtio/virtio-crypto.o
CC aarch64-softmmu/hw/virtio/virtio-crypto-pci.o
CC aarch64-softmmu/hw/arm/boot.o
CC aarch64-softmmu/hw/arm/virt.o
CC aarch64-softmmu/hw/arm/sysbus-fdt.o
CC aarch64-softmmu/hw/arm/virt-acpi-build.o
CC aarch64-softmmu/hw/arm/digic_boards.o
CC aarch64-softmmu/hw/arm/exynos4_boards.o
CC aarch64-softmmu/hw/arm/highbank.o
CC aarch64-softmmu/hw/arm/integratorcp.o
CC aarch64-softmmu/hw/arm/mainstone.o
CC aarch64-softmmu/hw/arm/musicpal.o
CC aarch64-softmmu/hw/arm/netduino2.o
CC aarch64-softmmu/hw/arm/nseries.o
CC aarch64-softmmu/hw/arm/omap_sx1.o
CC aarch64-softmmu/hw/arm/palm.o
CC aarch64-softmmu/hw/arm/gumstix.o
CC aarch64-softmmu/hw/arm/spitz.o
CC aarch64-softmmu/hw/arm/tosa.o
CC aarch64-softmmu/hw/arm/z2.o
CC aarch64-softmmu/hw/arm/realview.o
CC aarch64-softmmu/hw/arm/stellaris.o
CC aarch64-softmmu/hw/arm/collie.o
CC aarch64-softmmu/hw/arm/vexpress.o
CC aarch64-softmmu/hw/arm/versatilepb.o
CC aarch64-softmmu/hw/arm/xilinx_zynq.o
CC aarch64-softmmu/hw/arm/armv7m.o
CC aarch64-softmmu/hw/arm/exynos4210.o
CC aarch64-softmmu/hw/arm/pxa2xx.o
CC aarch64-softmmu/hw/arm/pxa2xx_gpio.o
CC aarch64-softmmu/hw/arm/pxa2xx_pic.o
CC aarch64-softmmu/hw/arm/digic.o
CC aarch64-softmmu/hw/arm/omap1.o
CC aarch64-softmmu/hw/arm/omap2.o
CC aarch64-softmmu/hw/arm/strongarm.o
CC aarch64-softmmu/hw/arm/allwinner-a10.o
CC aarch64-softmmu/hw/arm/cubieboard.o
CC aarch64-softmmu/hw/arm/bcm2835_peripherals.o
CC aarch64-softmmu/hw/arm/bcm2836.o
CC aarch64-softmmu/hw/arm/raspi.o
CC aarch64-softmmu/hw/arm/stm32f205_soc.o
CC aarch64-softmmu/hw/arm/xlnx-zynqmp.o
CC aarch64-softmmu/hw/arm/xlnx-zcu102.o
CC aarch64-softmmu/hw/arm/xlnx-versal.o
CC aarch64-softmmu/hw/arm/xlnx-versal-virt.o
CC aarch64-softmmu/hw/arm/fsl-imx25.o
CC aarch64-softmmu/hw/arm/imx25_pdk.o
CC aarch64-softmmu/hw/arm/fsl-imx31.o
CC aarch64-softmmu/hw/arm/kzm.o
CC aarch64-softmmu/hw/arm/fsl-imx6.o
CC aarch64-softmmu/hw/arm/sabrelite.o
CC aarch64-softmmu/hw/arm/aspeed_soc.o
CC aarch64-softmmu/hw/arm/aspeed.o
CC aarch64-softmmu/hw/arm/mps2.o
CC aarch64-softmmu/hw/arm/mps2-tz.o
CC aarch64-softmmu/hw/arm/msf2-soc.o
CC aarch64-softmmu/hw/arm/msf2-som.o
CC aarch64-softmmu/hw/arm/iotkit.o
CC aarch64-softmmu/hw/arm/fsl-imx7.o
CC aarch64-softmmu/hw/arm/mcimx7d-sabre.o
CC aarch64-softmmu/hw/arm/smmu-common.o
CC aarch64-softmmu/hw/arm/smmuv3.o
CC aarch64-softmmu/hw/arm/fsl-imx6ul.o
CC aarch64-softmmu/hw/arm/mcimx6ul-evk.o
CC aarch64-softmmu/hw/arm/nrf51_soc.o
CC aarch64-softmmu/hw/arm/microbit.o
CC aarch64-softmmu/target/arm/arm-semi.o
CC aarch64-softmmu/target/arm/machine.o
CC aarch64-softmmu/target/arm/psci.o
CC aarch64-softmmu/target/arm/arch_dump.o
CC aarch64-softmmu/target/arm/monitor.o
CC aarch64-softmmu/target/arm/kvm-stub.o
CC aarch64-softmmu/target/arm/translate.o
CC aarch64-softmmu/target/arm/op_helper.o
CC aarch64-softmmu/target/arm/helper.o
CC aarch64-softmmu/target/arm/cpu.o
CC aarch64-softmmu/target/arm/neon_helper.o
CC aarch64-softmmu/target/arm/iwmmxt_helper.o
CC aarch64-softmmu/target/arm/vec_helper.o
CC aarch64-softmmu/target/arm/gdbstub.o
CC aarch64-softmmu/target/arm/cpu64.o
CC aarch64-softmmu/target/arm/translate-a64.o
CC aarch64-softmmu/target/arm/helper-a64.o
/tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
return isinff(a.h);
^~~~~~
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
cc1: all warnings being treated as errors
/tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
return isinff(a.h);
^~~~~~
/tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:483: subdir-x86_64-softmmu] Error 2
make: *** Waiting for unfinished jobs....
make: *** [Makefile:483: subdir-aarch64-softmmu] Error 2
Traceback (most recent call last):
File "./tests/docker/docker.py", line 563, in <module>
sys.exit(main())
File "./tests/docker/docker.py", line 560, in main
return args.cmdobj.run(args, argv)
File "./tests/docker/docker.py", line 306, in run
return Docker().run(argv, args.keep, quiet=args.quiet)
File "./tests/docker/docker.py", line 274, in run
quiet=quiet)
File "./tests/docker/docker.py", line 181, in _do_check
return subprocess.check_call(self._command + cmd, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=0f08c190f26911e8a02468b59973b7d0', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-spofu4kn/src/docker-src.2018-11-27-12.22.26.24055:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-spofu4kn/src'
make: *** [docker-run-test-mingw@fedora] Error 2
real 1m54.170s
user 0m17.471s
sys 0m3.686s
=== OUTPUT END ===
Test command exited with code: 2
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (13 preceding siblings ...)
2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
@ 2018-11-27 17:32 ` no-reply
2018-12-05 12:41 ` Alex Bennée
15 siblings, 0 replies; 37+ messages in thread
From: no-reply @ 2018-11-27 17:32 UTC (permalink / raw)
To: cota; +Cc: famz, qemu-devel, richard.henderson, alex.bennee
Hi,
This series seems to have some coding style problems. See output below for
more information:
Message-id: 20181124235553.17371-1-cota@braap.org
Subject: [Qemu-devel] [PATCH v6 00/13] hardfloat
Type: series
=== TEST SCRIPT BEGIN ===
#!/bin/bash
BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done
exit $failed
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fe0cee3 hardfloat: implement float32/64 comparison
ac5968b hardfloat: implement float32/64 square root
0f10937 hardfloat: implement float32/64 fused multiply-add
de38097 hardfloat: implement float32/64 division
fbeab45 hardfloat: implement float32/64 multiplication
8894a16 hardfloat: implement float32/64 addition and subtraction
834d403 fpu: introduce hardfloat
94b3f9b tests/fp: add fp-bench
fe2ef78 softfloat: add float{32, 64}_is_zero_or_normal
a343567 softfloat: rename canonicalize to sf_canonicalize
73e6c0d target/tricore: use float32_is_denormal
be09b31 softfloat: add float{32, 64}_is_{de, }normal
319042a fp-test: pick TARGET_ARM to get its specialization
=== OUTPUT BEGIN ===
Checking PATCH 1/13: fp-test: pick TARGET_ARM to get its specialization...
Checking PATCH 2/13: softfloat: add float{32, 64}_is_{de, }normal...
Checking PATCH 3/13: target/tricore: use float32_is_denormal...
Checking PATCH 4/13: softfloat: rename canonicalize to sf_canonicalize...
Checking PATCH 5/13: softfloat: add float{32, 64}_is_zero_or_normal...
Checking PATCH 6/13: tests/fp: add fp-bench...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#49:
new file mode 100644
total: 0 errors, 1 warnings, 653 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 7/13: fpu: introduce hardfloat...
ERROR: spaces required around that '*' (ctx:WxV)
#82: FILE: fpu/softfloat.c:130:
+ static inline void name(soft_t *a, float_status *s) \
^
ERROR: spaces required around that '*' (ctx:WxV)
#96: FILE: fpu/softfloat.c:144:
+ static inline void name(soft_t *a, float_status *s) \
^
ERROR: spaces required around that '*' (ctx:WxV)
#109: FILE: fpu/softfloat.c:157:
+ static inline void name(soft_t *a, soft_t *b, float_status *s) \
^
ERROR: spaces required around that '*' (ctx:WxV)
#123: FILE: fpu/softfloat.c:171:
+ static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
^
WARNING: architecture specific defines should be avoided
#142: FILE: fpu/softfloat.c:190:
+#if defined(__x86_64__)
WARNING: architecture specific defines should be avoided
#164: FILE: fpu/softfloat.c:212:
+#if defined(__x86_64__) || defined(__aarch64__)
ERROR: spaces required around that '*' (ctx:WxV)
#183: FILE: fpu/softfloat.c:231:
+static inline bool can_use_fpu(const float_status *s)
^
total: 5 errors, 2 warnings, 327 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 8/13: hardfloat: implement float32/64 addition and subtraction...
ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: fpu/softfloat.c:1063:
+soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#109: FILE: fpu/softfloat.c:1072:
+static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#114: FILE: fpu/softfloat.c:1077:
+static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#120: FILE: fpu/softfloat.c:1083:
+soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#130: FILE: fpu/softfloat.c:1092:
+static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#135: FILE: fpu/softfloat.c:1097:
+static inline float64 soft_f64_sub(float64 a, float64 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#177: FILE: fpu/softfloat.c:1139:
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
^
ERROR: spaces required around that '*' (ctx:WxV)
#184: FILE: fpu/softfloat.c:1146:
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
^
ERROR: spaces required around that '*' (ctx:WxV)
#192: FILE: fpu/softfloat.c:1154:
+float32_add(float32 a, float32 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#198: FILE: fpu/softfloat.c:1160:
+float32_sub(float32 a, float32 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#204: FILE: fpu/softfloat.c:1166:
+float64_add(float64 a, float64 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#210: FILE: fpu/softfloat.c:1172:
+float64_sub(float64 a, float64 b, float_status *s)
^
total: 12 errors, 0 warnings, 149 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 9/13: hardfloat: implement float32/64 multiplication...
ERROR: spaces required around that '*' (ctx:WxV)
#45: FILE: fpu/softfloat.c:1236:
+soft_f32_mul(float32 a, float32 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#55: FILE: fpu/softfloat.c:1246:
+soft_f64_mul(float64 a, float64 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#83: FILE: fpu/softfloat.c:1275:
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#90: FILE: fpu/softfloat.c:1282:
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: fpu/softfloat.c:1290:
+float32_mul(float32 a, float32 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#105: FILE: fpu/softfloat.c:1297:
+float64_mul(float64 a, float64 b, float_status *s)
^
total: 6 errors, 0 warnings, 72 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 10/13: hardfloat: implement float32/64 division...
ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: fpu/softfloat.c:1628:
+soft_f32_div(float32 a, float32 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#58: FILE: fpu/softfloat.c:1638:
+soft_f64_div(float64 a, float64 b, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#111: FILE: fpu/softfloat.c:1692:
+float32_div(float32 a, float32 b, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#118: FILE: fpu/softfloat.c:1699:
+float64_div(float64 a, float64 b, float_status *s)
^
total: 4 errors, 0 warnings, 82 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 11/13: hardfloat: implement float32/64 fused multiply-add...
ERROR: spaces required around that '*' (ctx:WxV)
#50: FILE: fpu/softfloat.c:1519:
+ float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#62: FILE: fpu/softfloat.c:1531:
+ float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#71: FILE: fpu/softfloat.c:1542:
+float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#132: FILE: fpu/softfloat.c:1603:
+float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
^
total: 4 errors, 0 warnings, 150 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 12/13: hardfloat: implement float32/64 square root...
ERROR: spaces required around that '*' (ctx:WxV)
#32: FILE: fpu/softfloat.c:3044:
+soft_f32_sqrt(float32 a, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#41: FILE: fpu/softfloat.c:3052:
+soft_f64_sqrt(float64 a, float_status *status)
^
ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: fpu/softfloat.c:3059:
+float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
^
ERROR: spaces required around that '*' (ctx:WxV)
#75: FILE: fpu/softfloat.c:3086:
+float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
^
total: 4 errors, 0 warnings, 78 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 13/13: hardfloat: implement float32/64 comparison...
ERROR: spaces required around that '*' (ctx:WxV)
#87: FILE: fpu/softfloat.c:2904:
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s) \
^
ERROR: spaces required around that '*' (ctx:WxV)
#111: FILE: fpu/softfloat.c:2917:
+int float16_compare(float16 a, float16 b, float_status *s)
^
total: 2 errors, 0 warnings, 123 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===
Test command exited with code: 1
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
@ 2018-11-27 17:52 ` Emilio G. Cota
0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-11-27 17:52 UTC (permalink / raw)
To: qemu-devel; +Cc: famz, richard.henderson, alex.bennee
On Tue, Nov 27, 2018 at 09:24:21 -0800, no-reply@patchew.org wrote:
> /tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
> return isinff(a.h);
> ^~~~~~
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
> cc1: all warnings being treated as errors
> /tmp/qemu-test/src/fpu/softfloat.c: In function 'f32_is_inf':
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: implicit declaration of function 'isinff' [-Werror=implicit-function-declaration]
> return isinff(a.h);
> ^~~~~~
> /tmp/qemu-test/src/fpu/softfloat.c:325:16: error: incompatible implicit declaration of built-in function 'isinff' [-Werror]
> cc1: all warnings being treated as errors
> make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
> make[1]: *** [/tmp/qemu-test/src/rules.mak:69: fpu/softfloat.o] Error 1
This is the offender:
+static inline bool f32_is_inf(union_float32 a)
+{
+ if (QEMU_HARDFLOAT_USE_ISINF) {
+ return isinff(a.h);
+ }
+ return float32_is_infinity(a.s);
+}
I've fixed up the branch on github to use isinf here instead
of isinff.
Emilio
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
@ 2018-12-03 12:13 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 12:13 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> This gets rid of the muladd errors due to not raising the invalid flag.
>
> - Before:
> Errors found in f64_mulAdd, rounding near_even, tininess before rounding:
> +000.0000000000000 +7FF.0000000000000 +7FF.FFFFFFFFFFFFF
> => +7FF.FFFFFFFFFFFFF ..... expected -7FF.FFFFFFFFFFFFF v....
> [...]
>
> - After:
> In 6133248 tests, no errors found in f64_mulAdd, rounding near_even, tininess before rounding.
> [...]
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> tests/fp/Makefile | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tests/fp/Makefile b/tests/fp/Makefile
> index d649a5a1db..49cdcd1bd2 100644
> --- a/tests/fp/Makefile
> +++ b/tests/fp/Makefile
> @@ -29,6 +29,9 @@ QEMU_INCLUDES += -I$(TF_SOURCE_DIR)
>
> # work around TARGET_* poisoning
> QEMU_CFLAGS += -DHW_POISON_H
> +# define a target to match testfloat's implementation-defined choices, such as
> +# whether to raise the invalid flag when dealing with NaNs in muladd.
> +QEMU_CFLAGS += -DTARGET_ARM
>
> # capstone has a platform.h file that clashes with softfloat's
> QEMU_CFLAGS := $(filter-out %capstone, $(QEMU_CFLAGS))
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
@ 2018-12-03 14:16 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 14:16 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> glibc >= 2.25 defines canonicalize in commit eaf5ad0
> (Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
>
> Given that we'll be including <math.h> soon, prepare
> for this by prefixing our canonicalize() with sf_ to avoid
> clashing with the libc's canonicalize().
>
> Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> fpu/softfloat.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e1eef954e6..ecdc00c633 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -336,8 +336,8 @@ static inline float64 float64_pack_raw(FloatParts p)
> #include "softfloat-specialize.h"
>
> /* Canonicalize EXP and FRAC, setting CLS. */
> -static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
> - float_status *status)
> +static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
> + float_status *status)
> {
> if (part.exp == parm->exp_max && !parm->arm_althp) {
> if (part.frac == 0) {
> @@ -513,7 +513,7 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
> static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
> const FloatFmt *params)
> {
> - return canonicalize(float16_unpack_raw(f), params, s);
> + return sf_canonicalize(float16_unpack_raw(f), params, s);
> }
>
> static FloatParts float16_unpack_canonical(float16 f, float_status *s)
> @@ -534,7 +534,7 @@ static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
>
> static FloatParts float32_unpack_canonical(float32 f, float_status *s)
> {
> - return canonicalize(float32_unpack_raw(f), &float32_params, s);
> + return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
> }
>
> static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
> @@ -544,7 +544,7 @@ static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
>
> static FloatParts float64_unpack_canonical(float64 f, float_status *s)
> {
> - return canonicalize(float64_unpack_raw(f), &float64_params, s);
> + return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
> }
>
> static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
@ 2018-12-03 14:16 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 14:16 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> These will gain some users very soon.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> include/fpu/softfloat.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 9eeccd88a5..38a5e99cf3 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -474,6 +474,11 @@ static inline bool float32_is_denormal(float32 a)
> return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
> }
>
> +static inline bool float32_is_zero_or_normal(float32 a)
> +{
> + return float32_is_normal(a) || float32_is_zero(a);
> +}
> +
> static inline float32 float32_set_sign(float32 a, int sign)
> {
> return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
> @@ -625,6 +630,11 @@ static inline bool float64_is_denormal(float64 a)
> return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
> }
>
> +static inline bool float64_is_zero_or_normal(float64 a)
> +{
> + return float64_is_normal(a) || float64_is_zero(a);
> +}
> +
> static inline float64 float64_set_sign(float64 a, int sign)
> {
> return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
@ 2018-12-03 14:29 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-03 14:29 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> These microbenchmarks will allow us to measure the performance impact of
> FP emulation optimizations. Note that we can measure both directly the impact
> on the softfloat functions (with "-t soft"), or the impact on an
> emulated workload (call with "-t host" and run under qemu user-mode).
It would be nice to be able to cross-build this later so we can build
easily for non-x86. However no reason to hold things up for that:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> +
> +/*
> + * The main benchmark function. Instead of (ab)using macros, we rely
> + * on the compiler to unfold this at compile-time.
> + */
\o/
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
2018-11-25 0:25 ` Aleksandar Markovic
@ 2018-12-04 12:28 ` Alex Bennée
2018-12-04 13:33 ` Richard Henderson
1 sibling, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 12:28 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the added comment for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.
We don't currently enforce this though although maybe that would be too
much hand holding for compiler ricers hell bent on not understanding the
flags they use.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-12-04 12:28 ` Alex Bennée
@ 2018-12-04 13:33 ` Richard Henderson
2018-12-04 13:52 ` Alex Bennée
0 siblings, 1 reply; 37+ messages in thread
From: Richard Henderson @ 2018-12-04 13:33 UTC (permalink / raw)
To: Alex Bennée, Emilio G. Cota; +Cc: qemu-devel
On 12/4/18 6:28 AM, Alex Bennée wrote:
> Emilio G. Cota <cota@braap.org> writes:
>> This assumes that QEMU is running on an IEEE754-compliant FPU and
>> that the rounding is set to the default (to nearest). The
>> implementation-dependent specifics of the FPU should not matter; things
>> like tininess detection and snan representation are still dealt with in
>> soft-fp. However, this approach will break on most hosts if we compile
>> QEMU with flags such as -ffast-math. We control the flags so this should
>> be easy to enforce though.
>
> We don't currently enforce this though although maybe that would be too
> much hand holding for compiler ricers hell bent on not understanding the
> flags they use.
We could always
#ifdef __FAST_MATH__
#error "Silliness like this will get you nowhere"
#endif
r~
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-12-04 13:33 ` Richard Henderson
@ 2018-12-04 13:52 ` Alex Bennée
2018-12-04 17:31 ` Emilio G. Cota
0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 13:52 UTC (permalink / raw)
To: Richard Henderson; +Cc: Emilio G. Cota, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> On 12/4/18 6:28 AM, Alex Bennée wrote:
>> Emilio G. Cota <cota@braap.org> writes:
>>> This assumes that QEMU is running on an IEEE754-compliant FPU and
>>> that the rounding is set to the default (to nearest). The
>>> implementation-dependent specifics of the FPU should not matter; things
>>> like tininess detection and snan representation are still dealt with in
>>> soft-fp. However, this approach will break on most hosts if we compile
>>> QEMU with flags such as -ffast-math. We control the flags so this should
>>> be easy to enforce though.
>>
>> We don't currently enforce this though although maybe that would be too
>> much hand holding for compiler ricers hell bent on not understanding the
>> flags they use.
>
> We could always
>
> #ifdef __FAST_MATH__
> #error "Silliness like this will get you nowhere"
> #endif
Emilio, are you happy to add that guard with a suitable pithy comment?
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-12-04 13:52 ` Alex Bennée
@ 2018-12-04 17:31 ` Emilio G. Cota
2018-12-04 19:08 ` Alex Bennée
0 siblings, 1 reply; 37+ messages in thread
From: Emilio G. Cota @ 2018-12-04 17:31 UTC (permalink / raw)
To: Alex Bennée; +Cc: Richard Henderson, qemu-devel
On Tue, Dec 04, 2018 at 13:52:16 +0000, Alex Bennée wrote:
> > We could always
> >
> > #ifdef __FAST_MATH__
> > #error "Silliness like this will get you nowhere"
> > #endif
>
> Emilio, are you happy to add that guard with a suitable pithy comment?
Isn't it better to just disable hardfloat then?
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -220,7 +220,7 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
* the use of hardfloat, since hardfloat relies on the inexact flag being
* already set.
*/
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(__FAST_MATH__)
# define QEMU_NO_HARDFLOAT 1
# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
#else
Or perhaps disable it, as well as issue a #warning?
E.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
@ 2018-12-04 18:34 ` Alex Bennée
2018-12-04 20:07 ` Emilio G. Cota
0 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 18:34 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> Performance results (single and double precision) for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> add-single: 135.07 MFlops
> add-double: 131.60 MFlops
> sub-single: 130.04 MFlops
> sub-double: 133.01 MFlops
> - after:
> add-single: 443.04 MFlops
> add-double: 301.95 MFlops
> sub-single: 411.36 MFlops
> sub-double: 293.15 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> add-single: 44.79 MFlops
> add-double: 49.20 MFlops
> sub-single: 44.55 MFlops
> sub-double: 49.06 MFlops
> - after:
> add-single: 93.28 MFlops
> add-double: 88.27 MFlops
> sub-single: 91.47 MFlops
> sub-double: 88.27 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> add-single: 72.59 MFlops
> add-double: 72.27 MFlops
> sub-single: 75.33 MFlops
> sub-double: 70.54 MFlops
> - after:
> add-single: 112.95 MFlops
> add-double: 201.11 MFlops
> sub-single: 116.80 MFlops
> sub-double: 188.72 MFlops
>
> Note that the IBM and ARM machines benefit from having
> HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
> can suffer significantly:
Is this just the latency of pushing the number into a SIMD register and
checking the flags compared to a bitmask check?
> - IBM Power8:
> add-single: [1] 54.94 vs [0] 116.37 MFlops
> add-double: [1] 58.92 vs [0] 201.44 MFlops
> - Aarch64 A57:
> add-single: [1] 80.72 vs [0] 93.24 MFlops
> add-double: [1] 82.10 vs [0] 88.18 MFlops
>
> On the Intel machine, having 2F64 set to 1 pays off, but it
> doesn't for 2F32:
> - Intel i7-6700K:
> add-single: [1] 285.79 vs [0] 426.70 MFlops
> add-double: [1] 302.15 vs [0] 278.82 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> fpu/softfloat.c | 117 ++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 98 insertions(+), 19 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 306a12fa8d..cc500b1618 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1050,49 +1050,128 @@ float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
> return float16_round_pack_canonical(pr, status);
> }
>
> -float32 QEMU_FLATTEN float32_add(float32 a, float32 b, float_status *status)
> +float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
> +{
> + FloatParts pa = float16_unpack_canonical(a, status);
> + FloatParts pb = float16_unpack_canonical(b, status);
> + FloatParts pr = addsub_floats(pa, pb, true, status);
> +
> + return float16_round_pack_canonical(pr, status);
> +}
Hmm the diff is confusing but the changes look fine in the final code:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat
2018-12-04 17:31 ` Emilio G. Cota
@ 2018-12-04 19:08 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-04 19:08 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: Richard Henderson, qemu-devel
Emilio G. Cota <cota@braap.org> writes:
> On Tue, Dec 04, 2018 at 13:52:16 +0000, Alex Bennée wrote:
>> > We could always
>> >
>> > #ifdef __FAST_MATH__
>> > #error "Silliness like this will get you nowhere"
>> > #endif
>>
>> Emilio, are you happy to add that guard with a suitable pithy comment?
>
> Isn't it better to just disable hardfloat then?
>
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -220,7 +220,7 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
> * the use of hardfloat, since hardfloat relies on the inexact flag being
> * already set.
> */
> -#if defined(TARGET_PPC)
> +#if defined(TARGET_PPC) || defined(__FAST_MATH__)
> # define QEMU_NO_HARDFLOAT 1
> # define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
> #else
>
> Or perhaps disable it, as well as issue a #warning?
Issuing the warning is only to tell the user they are being stupid but
yeah certainly disable. Maybe we'll be around when someone comes asking
why maths didn't get faster ;-)
>
> E.
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction
2018-12-04 18:34 ` Alex Bennée
@ 2018-12-04 20:07 ` Emilio G. Cota
0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-12-04 20:07 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Richard Henderson
On Tue, Dec 04, 2018 at 18:34:18 +0000, Alex Bennée wrote:
>
> Emilio G. Cota <cota@braap.org> writes:
(snip)
> > Note that the IBM and ARM machines benefit from having
> > HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
> > can suffer significantly:
>
> Is this just the latency of pushing the number into a SIMD register and
> checking the flags compared to a bitmask check?
That's the case in the generated x86 assembly, so I presume
the same it's happening in the other ISAs (I didn't check
the assembly there).
(snip)
>
> Hmm the diff is confusing but the changes look fine in the final code:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Thanks!
E.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
@ 2018-12-05 10:10 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 10:10 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> Performance results for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> mul-single: 126.91 MFlops
> mul-double: 118.28 MFlops
> - after:
> mul-single: 258.02 MFlops
> mul-double: 197.96 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> mul-single: 37.42 MFlops
> mul-double: 38.77 MFlops
> - after:
> mul-single: 73.41 MFlops
> mul-double: 76.93 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> mul-single: 58.40 MFlops
> mul-double: 59.33 MFlops
> - after:
> mul-single: 60.25 MFlops
> mul-double: 94.79 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index cc500b1618..58e67d9b80 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1232,7 +1232,8 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
> return float16_round_pack_canonical(pr, status);
> }
>
> -float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_mul(float32 a, float32 b, float_status *status)
> {
> FloatParts pa = float32_unpack_canonical(a, status);
> FloatParts pb = float32_unpack_canonical(b, status);
> @@ -1241,7 +1242,8 @@ float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
> return float32_round_pack_canonical(pr, status);
> }
>
> -float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_mul(float64 a, float64 b, float_status *status)
> {
> FloatParts pa = float64_unpack_canonical(a, status);
> FloatParts pb = float64_unpack_canonical(b, status);
> @@ -1250,6 +1252,54 @@ float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
> return float64_round_pack_canonical(pr, status);
> }
>
> +static float hard_f32_mul(float a, float b)
> +{
> + return a * b;
> +}
> +
> +static double hard_f64_mul(double a, double b)
> +{
> + return a * b;
> +}
> +
> +static bool f32_mul_fast_test(union_float32 a, union_float32 b)
> +{
> + return float32_is_zero(a.s) || float32_is_zero(b.s);
> +}
> +
> +static bool f64_mul_fast_test(union_float64 a, union_float64 b)
> +{
> + return float64_is_zero(a.s) || float64_is_zero(b.s);
> +}
> +
> +static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
> +{
> + bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
> +
> + return float32_set_sign(float32_zero, signbit);
> +}
> +
> +static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
> +{
> + bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
> +
> + return float64_set_sign(float64_zero, signbit);
> +}
> +
> +float32 QEMU_FLATTEN
> +float32_mul(float32 a, float32 b, float_status *s)
> +{
> + return float32_gen2(a, b, s, hard_f32_mul, soft_f32_mul,
> + f32_is_zon2, NULL, f32_mul_fast_test, f32_mul_fast_op);
> +}
> +
> +float64 QEMU_FLATTEN
> +float64_mul(float64 a, float64 b, float_status *s)
> +{
> + return float64_gen2(a, b, s, hard_f64_mul, soft_f64_mul,
> + f64_is_zon2, NULL, f64_mul_fast_test, f64_mul_fast_op);
> +}
> +
> /*
> * Returns the result of multiplying the floating-point values `a' and
> * `b' then adding 'c', with no intermediate rounding step after the
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
@ 2018-12-05 10:11 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 10:11 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> Performance results for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> div-single: 34.84 MFlops
> div-double: 34.04 MFlops
> - after:
> div-single: 275.23 MFlops
> div-double: 216.38 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> div-single: 9.33 MFlops
> div-double: 9.30 MFlops
> - after:
> div-single: 51.55 MFlops
> div-double: 15.09 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> div-single: 25.65 MFlops
> div-double: 24.91 MFlops
> - after:
> div-single: 96.83 MFlops
> div-double: 31.01 MFlops
>
> Here setting 2FP64_USE_FP to 1 pays off for x86_64:
> [1] 215.97 vs [0] 62.15 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> fpu/softfloat.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 58e67d9b80..e35ebfaae7 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1624,7 +1624,8 @@ float16 float16_div(float16 a, float16 b, float_status *status)
> return float16_round_pack_canonical(pr, status);
> }
>
> -float32 float32_div(float32 a, float32 b, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_div(float32 a, float32 b, float_status *status)
> {
> FloatParts pa = float32_unpack_canonical(a, status);
> FloatParts pb = float32_unpack_canonical(b, status);
> @@ -1633,7 +1634,8 @@ float32 float32_div(float32 a, float32 b, float_status *status)
> return float32_round_pack_canonical(pr, status);
> }
>
> -float64 float64_div(float64 a, float64 b, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_div(float64 a, float64 b, float_status *status)
> {
> FloatParts pa = float64_unpack_canonical(a, status);
> FloatParts pb = float64_unpack_canonical(b, status);
> @@ -1642,6 +1644,64 @@ float64 float64_div(float64 a, float64 b, float_status *status)
> return float64_round_pack_canonical(pr, status);
> }
>
> +static float hard_f32_div(float a, float b)
> +{
> + return a / b;
> +}
> +
> +static double hard_f64_div(double a, double b)
> +{
> + return a / b;
> +}
> +
> +static bool f32_div_pre(union_float32 a, union_float32 b)
> +{
> + if (QEMU_HARDFLOAT_2F32_USE_FP) {
> + return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
> + fpclassify(b.h) == FP_NORMAL;
> + }
> + return float32_is_zero_or_normal(a.s) && float32_is_normal(b.s);
> +}
> +
> +static bool f64_div_pre(union_float64 a, union_float64 b)
> +{
> + if (QEMU_HARDFLOAT_2F64_USE_FP) {
> + return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
> + fpclassify(b.h) == FP_NORMAL;
> + }
> + return float64_is_zero_or_normal(a.s) && float64_is_normal(b.s);
> +}
> +
> +static bool f32_div_post(union_float32 a, union_float32 b)
> +{
> + if (QEMU_HARDFLOAT_2F32_USE_FP) {
> + return fpclassify(a.h) != FP_ZERO;
> + }
> + return !float32_is_zero(a.s);
> +}
> +
> +static bool f64_div_post(union_float64 a, union_float64 b)
> +{
> + if (QEMU_HARDFLOAT_2F64_USE_FP) {
> + return fpclassify(a.h) != FP_ZERO;
> + }
> + return !float64_is_zero(a.s);
> +}
> +
> +float32 QEMU_FLATTEN
> +float32_div(float32 a, float32 b, float_status *s)
> +{
> + return float32_gen2(a, b, s, hard_f32_div, soft_f32_div,
> + f32_div_pre, f32_div_post, NULL, NULL);
> +}
> +
> +float64 QEMU_FLATTEN
> +float64_div(float64 a, float64 b, float_status *s)
> +{
> + return float64_gen2(a, b, s, hard_f64_div, soft_f64_div,
> + f64_div_pre, f64_div_post, NULL, NULL);
> +}
> +
> /*
> * Float to Float conversions
> *
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
@ 2018-12-05 12:25 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:25 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> Performance results for fp-bench:
>
> 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> fma-single: 74.73 MFlops
> fma-double: 74.54 MFlops
> - after:
> fma-single: 203.37 MFlops
> fma-double: 169.37 MFlops
>
> 2. ARM Aarch64 A57 @ 2.4GHz
> - before:
> fma-single: 23.24 MFlops
> fma-double: 23.70 MFlops
> - after:
> fma-single: 66.14 MFlops
> fma-double: 63.10 MFlops
>
> 3. IBM POWER8E @ 2.1 GHz
> - before:
> fma-single: 37.26 MFlops
> fma-double: 37.29 MFlops
> - after:
> fma-single: 48.90 MFlops
> fma-double: 59.51 MFlops
>
> Here having 3FP64 set to 1 pays off for x86_64:
> [1] 170.15 vs [0] 153.12 MFlops
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> fpu/softfloat.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 128 insertions(+), 4 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e35ebfaae7..e03feafb6f 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1514,8 +1514,9 @@ float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
> return float16_round_pack_canonical(pr, status);
> }
>
> -float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
> - int flags, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
> + float_status *status)
> {
> FloatParts pa = float32_unpack_canonical(a, status);
> FloatParts pb = float32_unpack_canonical(b, status);
> @@ -1525,8 +1526,9 @@ float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
> return float32_round_pack_canonical(pr, status);
> }
>
> -float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
> - int flags, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
> + float_status *status)
> {
> FloatParts pa = float64_unpack_canonical(a, status);
> FloatParts pb = float64_unpack_canonical(b, status);
> @@ -1536,6 +1538,128 @@ float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
> return float64_round_pack_canonical(pr, status);
> }
>
> +float32 QEMU_FLATTEN
> +float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
> +{
> + union_float32 ua, ub, uc, ur;
> +
> + ua.s = xa;
> + ub.s = xb;
> + uc.s = xc;
> +
> + if (unlikely(!can_use_fpu(s))) {
> + goto soft;
> + }
> + if (unlikely(flags & float_muladd_halve_result)) {
> + goto soft;
> + }
> +
> + float32_input_flush3(&ua.s, &ub.s, &uc.s, s);
> + if (unlikely(!f32_is_zon3(ua, ub, uc))) {
> + goto soft;
> + }
> + /*
> + * When (a || b) == 0, there's no need to check for under/over flow,
> + * since we know the addend is (normal || 0) and the product is 0.
> + */
> + if (float32_is_zero(ua.s) || float32_is_zero(ub.s)) {
> + union_float32 up;
> + bool prod_sign;
> +
> + prod_sign = float32_is_neg(ua.s) ^ float32_is_neg(ub.s);
> + prod_sign ^= !!(flags & float_muladd_negate_product);
> + up.s = float32_set_sign(float32_zero, prod_sign);
> +
> + if (flags & float_muladd_negate_c) {
> + uc.h = -uc.h;
> + }
> + ur.h = up.h + uc.h;
> + } else {
> + if (flags & float_muladd_negate_product) {
> + ua.h = -ua.h;
> + }
> + if (flags & float_muladd_negate_c) {
> + uc.h = -uc.h;
> + }
> +
> + ur.h = fmaf(ua.h, ub.h, uc.h);
> +
> + if (unlikely(f32_is_inf(ur))) {
> + s->float_exception_flags |= float_flag_overflow;
> + } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
> + goto soft;
> + }
> + }
> + if (flags & float_muladd_negate_result) {
> + return float32_chs(ur.s);
> + }
> + return ur.s;
> +
> + soft:
> + return soft_f32_muladd(ua.s, ub.s, uc.s, flags, s);
> +}
> +
> +float64 QEMU_FLATTEN
> +float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
> +{
> + union_float64 ua, ub, uc, ur;
> +
> + ua.s = xa;
> + ub.s = xb;
> + uc.s = xc;
> +
> + if (unlikely(!can_use_fpu(s))) {
> + goto soft;
> + }
> + if (unlikely(flags & float_muladd_halve_result)) {
> + goto soft;
> + }
> +
> + float64_input_flush3(&ua.s, &ub.s, &uc.s, s);
> + if (unlikely(!f64_is_zon3(ua, ub, uc))) {
> + goto soft;
> + }
> + /*
> + * When (a || b) == 0, there's no need to check for under/over flow,
> + * since we know the addend is (normal || 0) and the product is 0.
> + */
> + if (float64_is_zero(ua.s) || float64_is_zero(ub.s)) {
> + union_float64 up;
> + bool prod_sign;
> +
> + prod_sign = float64_is_neg(ua.s) ^ float64_is_neg(ub.s);
> + prod_sign ^= !!(flags & float_muladd_negate_product);
> + up.s = float64_set_sign(float64_zero, prod_sign);
> +
> + if (flags & float_muladd_negate_c) {
> + uc.h = -uc.h;
> + }
> + ur.h = up.h + uc.h;
> + } else {
> + if (flags & float_muladd_negate_product) {
> + ua.h = -ua.h;
> + }
> + if (flags & float_muladd_negate_c) {
> + uc.h = -uc.h;
> + }
> +
> + ur.h = fma(ua.h, ub.h, uc.h);
> +
> + if (unlikely(f64_is_inf(ur))) {
> + s->float_exception_flags |= float_flag_overflow;
> + } else if (unlikely(fabs(ur.h) <= FLT_MIN)) {
> + goto soft;
> + }
> + }
> + if (flags & float_muladd_negate_result) {
> + return float64_chs(ur.s);
> + }
> + return ur.s;
> +
> + soft:
> + return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
> +}
> +
> /*
> * Returns the result of dividing the floating-point value `a' by the
> * corresponding value `b'. The operation is performed according to
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
@ 2018-12-05 12:26 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:26 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> Performance results for fp-bench:
>
> Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> sqrt-single: 42.30 MFlops
> sqrt-double: 22.97 MFlops
> - after:
> sqrt-single: 311.42 MFlops
> sqrt-double: 311.08 MFlops
>
> Here USE_FP makes a huge difference for f64's, with throughput
> going from ~200 MFlops to ~300 MFlops.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> fpu/softfloat.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e03feafb6f..4c6ecd1883 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -3040,20 +3040,76 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
> return float16_round_pack_canonical(pr, status);
> }
>
> -float32 QEMU_FLATTEN float32_sqrt(float32 a, float_status *status)
> +static float32 QEMU_SOFTFLOAT_ATTR
> +soft_f32_sqrt(float32 a, float_status *status)
> {
> FloatParts pa = float32_unpack_canonical(a, status);
> FloatParts pr = sqrt_float(pa, status, &float32_params);
> return float32_round_pack_canonical(pr, status);
> }
>
> -float64 QEMU_FLATTEN float64_sqrt(float64 a, float_status *status)
> +static float64 QEMU_SOFTFLOAT_ATTR
> +soft_f64_sqrt(float64 a, float_status *status)
> {
> FloatParts pa = float64_unpack_canonical(a, status);
> FloatParts pr = sqrt_float(pa, status, &float64_params);
> return float64_round_pack_canonical(pr, status);
> }
>
> +float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
> +{
> + union_float32 ua, ur;
> +
> + ua.s = xa;
> + if (unlikely(!can_use_fpu(s))) {
> + goto soft;
> + }
> +
> + float32_input_flush1(&ua.s, s);
> + if (QEMU_HARDFLOAT_1F32_USE_FP) {
> + if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
> + fpclassify(ua.h) == FP_ZERO) ||
> + signbit(ua.h))) {
> + goto soft;
> + }
> + } else if (unlikely(!float32_is_zero_or_normal(ua.s) ||
> + float32_is_neg(ua.s))) {
> + goto soft;
> + }
> + ur.h = sqrtf(ua.h);
> + return ur.s;
> +
> + soft:
> + return soft_f32_sqrt(ua.s, s);
> +}
> +
> +float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
> +{
> + union_float64 ua, ur;
> +
> + ua.s = xa;
> + if (unlikely(!can_use_fpu(s))) {
> + goto soft;
> + }
> +
> + float64_input_flush1(&ua.s, s);
> + if (QEMU_HARDFLOAT_1F64_USE_FP) {
> + if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
> + fpclassify(ua.h) == FP_ZERO) ||
> + signbit(ua.h))) {
> + goto soft;
> + }
> + } else if (unlikely(!float64_is_zero_or_normal(ua.s) ||
> + float64_is_neg(ua.s))) {
> + goto soft;
> + }
> + ur.h = sqrt(ua.h);
> + return ur.s;
> +
> + soft:
> + return soft_f64_sqrt(ua.s, s);
> +}
> +
> /*----------------------------------------------------------------------------
> | The pattern for a default generated NaN.
> *----------------------------------------------------------------------------*/
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
@ 2018-12-05 12:36 ` Alex Bennée
0 siblings, 0 replies; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:36 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> Performance results for fp-bench:
>
> Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> - before:
> cmp-single: 110.98 MFlops
> cmp-double: 107.12 MFlops
> - after:
> cmp-single: 506.28 MFlops
> cmp-double: 524.77 MFlops
>
> Note that flattening both eq and eq_signaling versions
> would give us extra performance (695v506, 615v524 Mflops
> for single/double, respectively) but this would emit two
> essentially identical functions for each eq/signaling pair,
> which is a waste.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
` (14 preceding siblings ...)
2018-11-27 17:32 ` no-reply
@ 2018-12-05 12:41 ` Alex Bennée
2018-12-05 16:47 ` Emilio G. Cota
15 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2018-12-05 12:41 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html
>
> Changes since v5:
>
> - Rebase on rth/tcg-next-for-4.0
<snip>
Awesome work - the series is looking really good now and I think we are
ready for a merge once the tree re-opens. I think there were a few
wording changes and the #ifdef fix to apply so if you are happy to do
that I'll slurp up v7 prepare a PR once it's ready.
Going forward I think we want to wire up the fp-test code so we can run
it in CI via the rest of make check (or check-tcg?) but no need to hold
up the merge for that.
--
Alex Bennée
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/13] hardfloat
2018-12-05 12:41 ` Alex Bennée
@ 2018-12-05 16:47 ` Emilio G. Cota
0 siblings, 0 replies; 37+ messages in thread
From: Emilio G. Cota @ 2018-12-05 16:47 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Richard Henderson
On Wed, Dec 05, 2018 at 12:41:15 +0000, Alex Bennée wrote:
>
> Emilio G. Cota <cota@braap.org> writes:
>
> > v5: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02793.html
> >
> > Changes since v5:
> >
> > - Rebase on rth/tcg-next-for-4.0
> <snip>
>
> Awesome work - the series is looking really good now and I think we are
> ready for a merge once the tree re-opens. I think there were a few
> wording changes and the #ifdef fix to apply so if you are happy to do
> that I'll slurp up v7 prepare a PR once it's ready.
Great, thanks for reviewing!
I just pushed v7. The changes are tiny (v6->v7 diff shown below)
so unless you want me to, I won't send it to the list.
https://github.com/cota/qemu/tree/hardfloat-v7
I added your R-b's and the __FAST_MATH__ check:
$ git diff hardfloat-v6..hardfloat-v7
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 494422faab..59eac97d10 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -220,7 +220,11 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
* the use of hardfloat, since hardfloat relies on the inexact flag being
* already set.
*/
-#if defined(TARGET_PPC)
+#if defined(TARGET_PPC) || defined(__FAST_MATH__)
+# if defined(__FAST_MATH__)
+# warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
+ IEEE implementation
+# endif
# define QEMU_NO_HARDFLOAT 1
# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
#else
I also updated patch 7's commit message:
[..]
However, this approach will break on most hosts if we compile
QEMU with flags that break IEEE compatibility. There is no way to detect
all of these flags at compilation time, but at least we check for
-ffast-math (which defines __FAST_MATH__) and disable hardfloat
(plus emit a #warning) when it is set.
> Going forward I think we want to wire up the fp-test code so we can run
> it in CI via the rest of make check (or check-tcg?) but no need to hold
> up the merge for that.
Yes, I think starting with `make check' would make sense. We should
test with and without `-f x', to make sure that both soft and
hardfloat are tested. There are still some f80/f128 known errors
though, so we might want to disable the testing of those for now.
Thanks,
Emilio
^ permalink raw reply related [flat|nested] 37+ messages in thread
end of thread, other threads:[~2018-12-05 16:47 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-24 23:55 [Qemu-devel] [PATCH v6 00/13] hardfloat Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 01/13] fp-test: pick TARGET_ARM to get its specialization Emilio G. Cota
2018-12-03 12:13 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 02/13] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 03/13] target/tricore: use float32_is_denormal Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 04/13] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
2018-12-03 14:16 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 05/13] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
2018-12-03 14:16 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 06/13] tests/fp: add fp-bench Emilio G. Cota
2018-12-03 14:29 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 07/13] fpu: introduce hardfloat Emilio G. Cota
2018-11-25 0:25 ` Aleksandar Markovic
2018-11-25 1:25 ` Emilio G. Cota
2018-12-04 12:28 ` Alex Bennée
2018-12-04 13:33 ` Richard Henderson
2018-12-04 13:52 ` Alex Bennée
2018-12-04 17:31 ` Emilio G. Cota
2018-12-04 19:08 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 08/13] hardfloat: implement float32/64 addition and subtraction Emilio G. Cota
2018-12-04 18:34 ` Alex Bennée
2018-12-04 20:07 ` Emilio G. Cota
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 09/13] hardfloat: implement float32/64 multiplication Emilio G. Cota
2018-12-05 10:10 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 10/13] hardfloat: implement float32/64 division Emilio G. Cota
2018-12-05 10:11 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 11/13] hardfloat: implement float32/64 fused multiply-add Emilio G. Cota
2018-12-05 12:25 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root Emilio G. Cota
2018-12-05 12:26 ` Alex Bennée
2018-11-24 23:55 ` [Qemu-devel] [PATCH v6 13/13] hardfloat: implement float32/64 comparison Emilio G. Cota
2018-12-05 12:36 ` Alex Bennée
2018-11-27 17:24 ` [Qemu-devel] [PATCH v6 00/13] hardfloat no-reply
2018-11-27 17:52 ` Emilio G. Cota
2018-11-27 17:32 ` no-reply
2018-12-05 12:41 ` Alex Bennée
2018-12-05 16:47 ` Emilio G. Cota
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.