All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat
@ 2018-04-04 23:11 Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite Emilio G. Cota
                   ` (15 more replies)
  0 siblings, 16 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland,
	Bastian Koppelmann

v2: https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06805.html

Changes since v2:

- Add R-b tags

- Add a patch to rename our canonicalize to sf_canonicalize,
  to avoid clashing with glibc's.

- Add a patch to define float{32,64}_is_zero_or_normal

- Simplify the float{32,64}_input_flushX macros -- now the
  macros are more verbose but the full function names are greppable.

- Move tests/fp-test to tests/fp, since now both fp-bench and fp-test
  are under tests/fp.
  + Use tests/fp/fp-test.h for helpers common to both fp-bench and fp-test.

- Complete rewrite of fp-bench:
  + We can now directly call the softfloat functions, thereby
    making the benchmark more sensitive to changes to those functions.
  + We can still use the native ops with "-t host".
  + The rewrite also has less macro trickery; we rely instead on
    constant propagation by the compiler.
  + Alex: dropped your R-b since this changed a lot. I think you'll
    like this version better though!

- Define a generic function to generate the hardfloat implementation
  for ops with 2 inputs; add, sub, mul and div depend on it.
  Instead of using macros, rely on the constant propagation done
  by the compiler. [Alex: I dropped your R-b for the addsub
  patch because it changed a lot]
  + I kept macros for other ops, because I think the subsequent
    code duplication savings are worth the pain.

- Add #define's to select whether to use fpclassify etc. or
  float32_is_zero etc.
  + Benchmark perf differences on x86_64, aarch64 and IBM Power8 hosts.
  + For 32-bit we don't use fpclassify etc. for any architectures,
    so I was tempted to get rid of this option to save some code.
    It's possible however that on some hosts I have not tested this option
    might pay off, so I decided to keep it there.

- Add a #define to select whether to use isinf() or floatX_is_infinity().
  Turns out this makes a big difference for power64.

- Remove float32_to_float64 support in hardfloat, since nbench or
  SPEC actually showed a small yet measurable slowdown with it,
  despite fp-bench showing a significant speedup for this operation.

- Do not flatten soft-fp functions; these are now slow paths.
  This shrinks the size of the softfloat object below its original
  size (see last patch's log).

- Add a #define to disable hardfloat for some targets. I noticed that
  some targets (at least I noticed PPC, there might be others) do
  clear the FP flags before calling softfloat. This precludes hardfloat
  since it relies on inexact not being set. In the long run we should
  fix these targets though.

Note: fp-bench can run _very_ slowly (~0.5 IPC) for -o fma on some x86_64
hosts. I have not pinned down what's going on, but from the few hosts
I have access to, it seems that machines that have been patched for
Spectre/Meltdown are susceptible to this slowdown.
Fortunately though:
1) when fma is run in QEMU (and not under a microbenchmark such as
   fp-bench), fma performance is still very good (much better than with
   soft-fp).
2) Compiling with -march=native gets rid of the problem.
I've reproduced this with both gcc 5.4.0 and gcc 7.1.0. The *very* same
fp-bench binary that performs very well for FMA on two machines (one
AMD, one Intel, neither patched against Meltdown/Spectre) performs
below soft-fp on another three machines (all Intel, all patched).

Note: there are some checkpatch errors, but they are false positives.

Perf numbers for fp-bench are in each commit log; numbers for several
benchmarks are in the last patch's commit log.

You can fetch this series from:
  https://github.com/cota/qemu/tree/hardfloat-v3

Thanks,

		Emilio

---
 configure                   |    2 +
 fpu/softfloat.c             |  945 ++++++++++++++++++++++++++++++--
 include/fpu/softfloat.h     |   30 +
 target/tricore/fpu_helper.c |    9 +-
 tests/Makefile.include      |    3 +
 tests/fp/.gitignore         |    4 +
 tests/fp/Makefile           |   36 ++
 tests/fp/fp-bench.c         |  528 ++++++++++++++++++
 tests/fp/fp-test.c          | 1183 ++++++++++++++++++++++++++++++++++++++++
 tests/fp/muladd.fptest      |   51 ++
 10 files changed, 2737 insertions(+), 54 deletions(-)
 create mode 100644 tests/fp/.gitignore
 create mode 100644 tests/fp/Makefile
 create mode 100644 tests/fp/fp-bench.c
 create mode 100644 tests/fp/fp-test.c
 create mode 100644 tests/fp/muladd.fptest

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-11  1:20   ` Alex Bennée
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 02/15] softfloat: fix {min, max}nummag for same-abs-value inputs Emilio G. Cota
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

This will allow us to run correctness tests against our
FP implementation. The test can be run in two modes (called
"testers"): host and soft. With the former we check the results
and FP flags on the host machine against the model.
With the latter we check QEMU's fpu primitives against the
model. Note that in soft mode we are not instantiating any
particular CPU (hence the HW_POISON_H hack to avoid macro poisoning);
for that we need to run the test in host mode under QEMU.

The input files are taken from IBM's FPGen test suite:
https://www.research.ibm.com/haifa/projects/verification/fpgen/

I see no license file in there so I am just downloading them
with wget. We might want to keep a copy on a qemu server though,
in case IBM takes those files down in the future.

The "IBM" syntax of those files (for now the only syntax supported
in fp-test) is documented here:
https://www.research.ibm.com/haifa/projects/verification/fpgen/papers/ieee-test-suite-v2.pdf

Note that the syntax document has some inaccuracies; the appended
parsing code works around some of those.

The exception flag (-e) is important: many of the optimizations
included in the following commits assume that the inexact flag
is set, so "-e x" is necessary in order to test those code paths.

The whitelist flag (-w) points to a file with test cases to be ignored.
I have put some whitelist files online, but we should have them
on a QEMU-related server.

Thus, a typical of fp-test is as follows:

  $ cd qemu/build/tests/fp-test
  $ make -j && \
	./fp-test -t soft ibm/*.fptest \
	-w whitelist.txt \
	-e x

If we want to test after-rounding tininess detection, then we need to
pass "-a -w whitelist-tininess-after.txt" in addition to the above.
(NB. we can pass "-w" as many times as we want.)

The patch immediately after this one fixes a mismatch against the model
in softfloat, but after that is applied the above should finish with a 0
return code, and print something like:
  All tests OK.
  Tests passed: 76572. Not handled: 51237, whitelisted: 2662

The tests pass on "host" mode on x86_64 and aarch64 machines, although
note that for the x86_64 you need to pass -w whitelist-tininess-after.txt.

Running on host mode under QEMU reports flag mismatches (e.g. for
x86_64-linux-user), but that isn't too surprising given how little
love the i386 frontend gets. Host mode under aarch64-linux-user
passes OK.

Flush-to-zero and flush-inputs-to-zero modes can be tested with the
-z and -Z flags. Note however that the IBM input files are only
IEEE-compliant, so for now I've tested these modes by diff'ing
the reported errors against the model files. We should look into
generating files for these non-standard modes to make testing
these modes less painful.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 configure              |    2 +
 tests/fp/fp-test.c     | 1159 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/Makefile.include |    3 +
 tests/fp/.gitignore    |    3 +
 tests/fp/Makefile      |   34 ++
 5 files changed, 1201 insertions(+)
 create mode 100644 tests/fp/fp-test.c
 create mode 100644 tests/fp/.gitignore
 create mode 100644 tests/fp/Makefile

diff --git a/configure b/configure
index f156805..07dc5da 100755
--- a/configure
+++ b/configure
@@ -7106,12 +7106,14 @@ fi
 
 # build tree in object directory in case the source is not in the current directory
 DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests tests/vm"
+DIRS="$DIRS tests/fp"
 DIRS="$DIRS docs docs/interop fsdev scsi"
 DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas pc-bios/s390-ccw"
 DIRS="$DIRS roms/seabios roms/vgabios"
 FILES="Makefile tests/tcg/Makefile qdict-test-data.txt"
 FILES="$FILES tests/tcg/cris/Makefile tests/tcg/cris/.gdbinit"
 FILES="$FILES tests/tcg/lm32/Makefile tests/tcg/xtensa/Makefile po/Makefile"
+FILES="$FILES tests/fp/Makefile"
 FILES="$FILES pc-bios/optionrom/Makefile pc-bios/keymaps"
 FILES="$FILES pc-bios/spapr-rtas/Makefile"
 FILES="$FILES pc-bios/s390-ccw/Makefile"
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
new file mode 100644
index 0000000..27637c4
--- /dev/null
+++ b/tests/fp/fp-test.c
@@ -0,0 +1,1159 @@
+/*
+ * fp-test.c - Floating point test suite.
+ *
+ * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include "fpu/softfloat.h"
+
+#include <fenv.h>
+#include <math.h>
+
+enum error {
+    ERROR_NONE,
+    ERROR_NOT_HANDLED,
+    ERROR_WHITELISTED,
+    ERROR_COMMENT,
+    ERROR_INPUT,
+    ERROR_RESULT,
+    ERROR_EXCEPTIONS,
+    ERROR_MAX,
+};
+
+enum input_fmt {
+    INPUT_FMT_IBM,
+};
+
+struct input {
+    const char * const name;
+    enum error (*test_line)(const char *line);
+};
+
+enum precision {
+    PREC_FLOAT,
+    PREC_DOUBLE,
+    PREC_QUAD,
+    PREC_FLOAT_TO_DOUBLE,
+};
+
+struct op_desc {
+    const char * const name;
+    int n_operands;
+};
+
+enum op {
+    OP_ADD,
+    OP_SUB,
+    OP_MUL,
+    OP_MULADD,
+    OP_DIV,
+    OP_SQRT,
+    OP_MINNUM,
+    OP_MAXNUM,
+    OP_MAXNUMMAG,
+    OP_ABS,
+    OP_IS_NAN,
+    OP_IS_INF,
+    OP_FLOAT_TO_DOUBLE,
+};
+
+static const struct op_desc ops[] = {
+    [OP_ADD] =       { "+", 2 },
+    [OP_SUB] =       { "-", 2 },
+    [OP_MUL] =       { "*", 2 },
+    [OP_MULADD] =    { "*+", 3 },
+    [OP_DIV] =       { "/", 2 },
+    [OP_SQRT] =      { "V", 1 },
+    [OP_MINNUM] =    { "<C", 2 },
+    [OP_MAXNUM] =    { ">C", 2 },
+    [OP_MAXNUMMAG] = { ">A", 2 },
+    [OP_ABS] =       { "A", 1 },
+    [OP_IS_NAN] =    { "?N", 1 },
+    [OP_IS_INF] =    { "?i", 1 },
+    [OP_FLOAT_TO_DOUBLE] = { "cff", 1 },
+};
+
+/*
+ * We could enumerate all the types here. But really we only care about
+ * QNaN and SNaN since only those can vary across ISAs.
+ */
+enum op_type {
+    OP_TYPE_NUMBER,
+    OP_TYPE_QNAN,
+    OP_TYPE_SNAN,
+};
+
+struct operand {
+    uint64_t val;
+    enum op_type type;
+};
+
+struct test_op {
+    struct operand operands[3];
+    struct operand expected_result;
+    enum precision prec;
+    enum op op;
+    signed char round;
+    uint8_t trapped_exceptions;
+    uint8_t exceptions;
+    bool expected_result_is_valid;
+};
+
+typedef enum error (*tester_func_t)(struct test_op *);
+
+struct tester {
+    tester_func_t func;
+    const char *name;
+};
+
+struct whitelist {
+    char **lines;
+    size_t n;
+    GHashTable *ht;
+};
+
+static uint64_t test_stats[ERROR_MAX];
+static struct whitelist whitelist;
+static uint8_t default_exceptions;
+static bool die_on_error = true;
+static struct float_status soft_status = {
+    .float_detect_tininess = float_tininess_before_rounding,
+};
+
+static inline float u64_to_float(uint64_t v)
+{
+    uint32_t v32 = v;
+    uint32_t *v32p = &v32;
+
+    return *(float *)v32p;
+}
+
+static inline double u64_to_double(uint64_t v)
+{
+    uint64_t *vp = &v;
+
+    return *(double *)vp;
+}
+
+static inline uint64_t float_to_u64(float f)
+{
+    float *fp = &f;
+
+    return *(uint32_t *)fp;
+}
+
+static inline uint64_t double_to_u64(double d)
+{
+    double *dp = &d;
+
+    return *(uint64_t *)dp;
+}
+
+static inline bool is_err(enum error err)
+{
+    return err != ERROR_NONE &&
+        err != ERROR_NOT_HANDLED &&
+        err != ERROR_WHITELISTED &&
+        err != ERROR_COMMENT;
+}
+
+static int host_exceptions_translate(int host_flags)
+{
+    int flags = 0;
+
+    if (host_flags & FE_INEXACT) {
+        flags |= float_flag_inexact;
+    }
+    if (host_flags & FE_UNDERFLOW) {
+        flags |= float_flag_underflow;
+    }
+    if (host_flags & FE_OVERFLOW) {
+        flags |= float_flag_overflow;
+    }
+    if (host_flags & FE_DIVBYZERO) {
+        flags |= float_flag_divbyzero;
+    }
+    if (host_flags & FE_INVALID) {
+        flags |= float_flag_invalid;
+    }
+    return flags;
+}
+
+static inline uint8_t host_get_exceptions(void)
+{
+    return host_exceptions_translate(fetestexcept(FE_ALL_EXCEPT));
+}
+
+static void host_set_exceptions(uint8_t flags)
+{
+    int host_flags = 0;
+
+    if (flags & float_flag_inexact) {
+        host_flags |= FE_INEXACT;
+    }
+    if (flags & float_flag_underflow) {
+        host_flags |= FE_UNDERFLOW;
+    }
+    if (flags & float_flag_overflow) {
+        host_flags |= FE_OVERFLOW;
+    }
+    if (flags & float_flag_divbyzero) {
+        host_flags |= FE_DIVBYZERO;
+    }
+    if (flags & float_flag_invalid) {
+        host_flags |= FE_INVALID;
+    }
+    feraiseexcept(host_flags);
+}
+
+#define STANDARD_EXCEPTIONS \
+    (float_flag_inexact | float_flag_underflow | \
+     float_flag_overflow | float_flag_divbyzero | float_flag_invalid)
+#define FMT_EXCEPTIONS "%s%s%s%s%s%s"
+#define PR_EXCEPTIONS(x)                                \
+        ((x) & STANDARD_EXCEPTIONS ? "" : "none"),      \
+        (((x) & float_flag_inexact)   ? "x" : ""),      \
+        (((x) & float_flag_underflow) ? "u" : ""),      \
+        (((x) & float_flag_overflow)  ? "o" : ""),      \
+        (((x) & float_flag_divbyzero) ? "z" : ""),      \
+        (((x) & float_flag_invalid)   ? "i" : "")
+
+static enum error tester_check(const struct test_op *t, uint64_t res64,
+                               bool res_is_nan, uint8_t flags)
+{
+    enum error err = ERROR_NONE;
+
+    if (t->expected_result_is_valid) {
+        if (t->expected_result.type == OP_TYPE_QNAN ||
+            t->expected_result.type == OP_TYPE_SNAN) {
+            if (!res_is_nan) {
+                err = ERROR_RESULT;
+                goto out;
+            }
+        } else if (res64 != t->expected_result.val) {
+            err = ERROR_RESULT;
+            goto out;
+        }
+    }
+    if (t->exceptions && flags != (t->exceptions | default_exceptions)) {
+        err = ERROR_EXCEPTIONS;
+        goto out;
+    }
+
+ out:
+    if (is_err(err)) {
+        int i;
+
+        fprintf(stderr, "%s ", ops[t->op].name);
+        for (i = 0; i < ops[t->op].n_operands; i++) {
+            fprintf(stderr, "0x%" PRIx64 "%s", t->operands[i].val,
+                    i < ops[t->op].n_operands - 1 ? " " : "");
+        }
+        fprintf(stderr, ", expected: 0x%" PRIx64 ", returned: 0x%" PRIx64,
+                t->expected_result.val, res64);
+        if (err == ERROR_EXCEPTIONS) {
+            fprintf(stderr, ", expected exceptions: " FMT_EXCEPTIONS
+                    ", returned: " FMT_EXCEPTIONS,
+                    PR_EXCEPTIONS(t->exceptions), PR_EXCEPTIONS(flags));
+        }
+        fprintf(stderr, "\n");
+    }
+    return err;
+}
+
+static enum error host_tester(struct test_op *t)
+{
+    uint64_t res64;
+    bool result_is_nan;
+    uint8_t flags = 0;
+
+    feclearexcept(FE_ALL_EXCEPT);
+    if (default_exceptions) {
+        host_set_exceptions(default_exceptions);
+    }
+
+    if (t->prec == PREC_FLOAT) {
+        float a, b, c;
+        float *in[] = { &a, &b, &c };
+        float res;
+        int i;
+
+        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        for (i = 0; i < ops[t->op].n_operands; i++) {
+            /* use the host's QNaN/SNaN patterns */
+            if (t->operands[i].type == OP_TYPE_QNAN) {
+                *in[i] = __builtin_nanf("");
+            } else if (t->operands[i].type == OP_TYPE_SNAN) {
+                *in[i] = __builtin_nansf("");
+            } else {
+                *in[i] = u64_to_float(t->operands[i].val);
+            }
+        }
+
+        if (t->expected_result.type == OP_TYPE_QNAN) {
+            t->expected_result.val = float_to_u64(__builtin_nanf(""));
+        } else if (t->expected_result.type == OP_TYPE_SNAN) {
+            t->expected_result.val = float_to_u64(__builtin_nansf(""));
+        }
+
+        switch (t->op) {
+        case OP_ADD:
+            res = a + b;
+            break;
+        case OP_SUB:
+            res = a - b;
+            break;
+        case OP_MUL:
+            res = a * b;
+            break;
+        case OP_MULADD:
+            res = fmaf(a, b, c);
+            break;
+        case OP_DIV:
+            res = a / b;
+            break;
+        case OP_SQRT:
+            res = sqrtf(a);
+            break;
+        case OP_ABS:
+            res = fabsf(a);
+            break;
+        case OP_IS_NAN:
+            res = !!isnan(a);
+            break;
+        case OP_IS_INF:
+            res = !!isinf(a);
+            break;
+        default:
+            return ERROR_NOT_HANDLED;
+        }
+        flags = host_get_exceptions();
+        res64 = float_to_u64(res);
+        result_is_nan = isnan(res);
+    } else if (t->prec == PREC_DOUBLE) {
+        double a, b, c;
+        double *in[] = { &a, &b, &c };
+        double res;
+        int i;
+
+        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        for (i = 0; i < ops[t->op].n_operands; i++) {
+            /* use the host's QNaN/SNaN patterns */
+            if (t->operands[i].type == OP_TYPE_QNAN) {
+                *in[i] = __builtin_nan("");
+            } else if (t->operands[i].type == OP_TYPE_SNAN) {
+                *in[i] = __builtin_nans("");
+            } else {
+                *in[i] = u64_to_double(t->operands[i].val);
+            }
+        }
+
+        if (t->expected_result.type == OP_TYPE_QNAN) {
+            t->expected_result.val = double_to_u64(__builtin_nan(""));
+        } else if (t->expected_result.type == OP_TYPE_SNAN) {
+            t->expected_result.val = double_to_u64(__builtin_nans(""));
+        }
+
+        switch (t->op) {
+        case OP_ADD:
+            res = a + b;
+            break;
+        case OP_SUB:
+            res = a - b;
+            break;
+        case OP_MUL:
+            res = a * b;
+            break;
+        case OP_MULADD:
+            res = fma(a, b, c);
+            break;
+        case OP_DIV:
+            res = a / b;
+            break;
+        case OP_SQRT:
+            res = sqrt(a);
+            break;
+        case OP_ABS:
+            res = fabs(a);
+            break;
+        case OP_IS_NAN:
+            res = !!isnan(a);
+            break;
+        case OP_IS_INF:
+            res = !!isinf(a);
+            break;
+        default:
+            return ERROR_NOT_HANDLED;
+        }
+        flags = host_get_exceptions();
+        res64 = double_to_u64(res);
+        result_is_nan = isnan(res);
+    } else if (t->prec == PREC_FLOAT_TO_DOUBLE) {
+        float a;
+        double res;
+
+        if (t->operands[0].type == OP_TYPE_QNAN) {
+            a = __builtin_nanf("");
+        } else if (t->operands[0].type == OP_TYPE_SNAN) {
+            a = __builtin_nansf("");
+        } else {
+            a = u64_to_float(t->operands[0].val);
+        }
+
+        if (t->expected_result.type == OP_TYPE_QNAN) {
+            t->expected_result.val = double_to_u64(__builtin_nan(""));
+        } else if (t->expected_result.type == OP_TYPE_SNAN) {
+            t->expected_result.val = double_to_u64(__builtin_nans(""));
+        }
+
+        switch (t->op) {
+        case OP_FLOAT_TO_DOUBLE:
+            res = a;
+            break;
+        default:
+            return ERROR_NOT_HANDLED;
+        }
+        flags = host_get_exceptions();
+        res64 = double_to_u64(res);
+        result_is_nan = isnan(res);
+    } else {
+        return ERROR_NOT_HANDLED; /* XXX */
+    }
+    return tester_check(t, res64, result_is_nan, flags);
+}
+
+static enum error soft_tester(struct test_op *t)
+{
+    float_status *s = &soft_status;
+    uint64_t res64;
+    enum error err = ERROR_NONE;
+    bool result_is_nan;
+
+    s->float_rounding_mode = t->round;
+    s->float_exception_flags = default_exceptions;
+
+    if (t->prec == PREC_FLOAT) {
+        float32 a, b, c;
+        float32 *in[] = { &a, &b, &c };
+        float32 res;
+        int i;
+
+        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        for (i = 0; i < ops[t->op].n_operands; i++) {
+            *in[i] = t->operands[i].val;
+        }
+
+        switch (t->op) {
+        case OP_ADD:
+            res = float32_add(a, b, s);
+            break;
+        case OP_SUB:
+            res = float32_sub(a, b, s);
+            break;
+        case OP_MUL:
+            res = float32_mul(a, b, s);
+            break;
+        case OP_MULADD:
+            res = float32_muladd(a, b, c, 0, s);
+            break;
+        case OP_DIV:
+            res = float32_div(a, b, s);
+            break;
+        case OP_SQRT:
+            res = float32_sqrt(a, s);
+            break;
+        case OP_MINNUM:
+            res = float32_minnum(a, b, s);
+            break;
+        case OP_MAXNUM:
+            res = float32_maxnum(a, b, s);
+            break;
+        case OP_MAXNUMMAG:
+            res = float32_maxnummag(a, b, s);
+            break;
+        case OP_IS_NAN:
+        {
+            float f = !!float32_is_any_nan(a);
+
+            res = float_to_u64(f);
+            break;
+        }
+        case OP_IS_INF:
+        {
+            float f = !!float32_is_infinity(a);
+
+            res = float_to_u64(f);
+            break;
+        }
+        case OP_ABS:
+            /* Fall-through: float32_abs does not handle NaN's */
+        default:
+            return ERROR_NOT_HANDLED;
+        }
+        res64 = res;
+        result_is_nan = isnan(*(float *)&res);
+    } else if (t->prec == PREC_DOUBLE) {
+        float64 a, b, c;
+        float64 *in[] = { &a, &b, &c };
+        int i;
+
+        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        for (i = 0; i < ops[t->op].n_operands; i++) {
+            *in[i] = t->operands[i].val;
+        }
+
+        switch (t->op) {
+        case OP_ADD:
+            res64 = float64_add(a, b, s);
+            break;
+        case OP_SUB:
+            res64 = float64_sub(a, b, s);
+            break;
+        case OP_MUL:
+            res64 = float64_mul(a, b, s);
+            break;
+        case OP_MULADD:
+            res64 = float64_muladd(a, b, c, 0, s);
+            break;
+        case OP_DIV:
+            res64 = float64_div(a, b, s);
+            break;
+        case OP_SQRT:
+            res64 = float64_sqrt(a, s);
+            break;
+        case OP_MINNUM:
+            res64 = float64_minnum(a, b, s);
+            break;
+        case OP_MAXNUM:
+            res64 = float64_maxnum(a, b, s);
+            break;
+        case OP_MAXNUMMAG:
+            res64 = float64_maxnummag(a, b, s);
+            break;
+        case OP_IS_NAN:
+        {
+            double d = !!float64_is_any_nan(a);
+
+            res64 = double_to_u64(d);
+            break;
+        }
+        case OP_IS_INF:
+        {
+            double d = !!float64_is_infinity(a);
+
+            res64 = double_to_u64(d);
+            break;
+        }
+        case OP_ABS:
+            /* Fall-through: float64_abs does not handle NaN's */
+        default:
+            return ERROR_NOT_HANDLED;
+        }
+        result_is_nan = isnan(*(double *)&res64);
+    } else if (t->prec == PREC_FLOAT_TO_DOUBLE) {
+        float32 a = t->operands[0].val;
+
+        switch (t->op) {
+        case OP_FLOAT_TO_DOUBLE:
+            res64 = float32_to_float64(a, s);
+            break;
+        default:
+            return ERROR_NOT_HANDLED;
+        }
+        result_is_nan = isnan(*(double *)&res64);
+    } else {
+        return ERROR_NOT_HANDLED; /* XXX */
+    }
+    return tester_check(t, res64, result_is_nan, s->float_exception_flags);
+    return err;
+}
+
+static const struct tester valid_testers[] = {
+    [0] = {
+        .name = "soft",
+        .func = soft_tester,
+    },
+    [1] = {
+        .name = "host",
+        .func = host_tester,
+    },
+};
+static const struct tester *tester = &valid_testers[0];
+
+static int ibm_get_exceptions(const char *p, uint8_t *excp)
+{
+    while (*p) {
+        switch (*p) {
+        case 'x':
+            *excp |= float_flag_inexact;
+            break;
+        case 'u':
+            *excp |= float_flag_underflow;
+            break;
+        case 'o':
+            *excp |= float_flag_overflow;
+            break;
+        case 'z':
+            *excp |= float_flag_divbyzero;
+            break;
+        case 'i':
+            *excp |= float_flag_invalid;
+            break;
+        default:
+            return 1;
+        }
+        p++;
+    }
+    return 0;
+}
+
+static uint64_t fp_choose(enum precision prec, uint64_t f, uint64_t d)
+{
+    switch (prec) {
+    case PREC_FLOAT:
+        return f;
+    case PREC_DOUBLE:
+        return d;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int
+ibm_fp_hex(const char *p, enum precision prec, struct operand *ret)
+{
+    int len;
+
+    ret->type = OP_TYPE_NUMBER;
+
+    /* QNaN */
+    if (unlikely(!strcmp("Q", p))) {
+        ret->val = fp_choose(prec, 0xffc00000, 0xfff8000000000000);
+        ret->type = OP_TYPE_QNAN;
+        return 0;
+    }
+    /* SNaN */
+    if (unlikely(!strcmp("S", p))) {
+        ret->val = fp_choose(prec, 0xffb00000, 0xfff7000000000000);
+        ret->type = OP_TYPE_SNAN;
+        return 0;
+    }
+    if (unlikely(!strcmp("+Zero", p))) {
+        ret->val = fp_choose(prec, 0x00000000, 0x0000000000000000);
+        return 0;
+    }
+    if (unlikely(!strcmp("-Zero", p))) {
+        ret->val = fp_choose(prec, 0x80000000, 0x8000000000000000);
+        return 0;
+    }
+    if (unlikely(!strcmp("+inf", p) || !strcmp("+Inf", p))) {
+        ret->val = fp_choose(prec, 0x7f800000, 0x7ff0000000000000);
+        return 0;
+    }
+    if (unlikely(!strcmp("-inf", p) || !strcmp("-Inf", p))) {
+        ret->val = fp_choose(prec, 0xff800000, 0xfff0000000000000);
+        return 0;
+    }
+
+    len = strlen(p);
+
+    if (strchr(p, 'P')) {
+        bool negative = p[0] == '-';
+        char *pos;
+        bool denormal;
+
+        if (len <= 4) {
+            return 1;
+        }
+        denormal = p[1] == '0';
+        if (prec == PREC_FLOAT) {
+            uint32_t exponent;
+            uint32_t significand;
+            uint32_t h;
+
+            significand = strtoul(&p[3], &pos, 16);
+            if (*pos != 'P') {
+                return 1;
+            }
+            pos++;
+            exponent = strtol(pos, &pos, 10) + 127;
+            if (pos != p + len) {
+                return 1;
+            }
+            /*
+             * When there's a leading zero, we have a denormal number. We'd
+             * expect the input (unbiased) exponent to be -127, but for some
+             * reason -126 is used. Correct that here.
+             */
+            if (denormal) {
+                if (exponent != 1) {
+                    return 1;
+                }
+                exponent = 0;
+            }
+            h = negative ? (1 << 31) : 0;
+            h |= exponent << 23;
+            h |= significand;
+            ret->val = h;
+            return 0;
+        } else if (prec == PREC_DOUBLE) {
+            uint64_t exponent;
+            uint64_t significand;
+            uint64_t h;
+
+            significand = strtoul(&p[3], &pos, 16);
+            if (*pos != 'P') {
+                return 1;
+            }
+            pos++;
+            exponent = strtol(pos, &pos, 10) + 1023;
+            if (pos != p + len) {
+                return 1;
+            }
+            if (denormal) {
+                return 1; /* XXX */
+            }
+            h = negative ? (1ULL << 63) : 0;
+            h |= exponent << 52;
+            h |= significand;
+            ret->val = h;
+            return 0;
+        } else { /* XXX */
+            return 1;
+        }
+    } else if (strchr(p, 'e')) {
+        char *pos;
+
+        if (prec == PREC_FLOAT) {
+            float f = strtof(p, &pos);
+
+            if (*pos) {
+                return 1;
+            }
+            ret->val = float_to_u64(f);
+            return 0;
+        }
+        if (prec == PREC_DOUBLE) {
+            double d = strtod(p, &pos);
+
+            if (*pos) {
+                return 1;
+            }
+            ret->val = double_to_u64(d);
+            return 0;
+        }
+        return 0;
+    } else if (!strcmp(p, "0x0")) {
+        if (prec == PREC_FLOAT) {
+            ret->val = float_to_u64(0.0);
+        } else if (prec == PREC_DOUBLE) {
+            ret->val = double_to_u64(0.0);
+        } else {
+            g_assert_not_reached();
+        }
+        return 0;
+    } else if (!strcmp(p, "0x1")) {
+        if (prec == PREC_FLOAT) {
+            ret->val = float_to_u64(1.0);
+        } else if (prec == PREC_DOUBLE) {
+            ret->val = double_to_u64(1.0);
+        } else {
+            g_assert_not_reached();
+        }
+        return 0;
+    }
+    return 1;
+}
+
+static int find_op(const char *name, enum op *op)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(ops); i++) {
+        if (strcmp(ops[i].name, name) == 0) {
+            *op = i;
+            return 0;
+        }
+    }
+    return 1;
+}
+
+/* Syntax of IBM FP test cases:
+ * https://www.research.ibm.com/haifa/projects/verification/fpgen/syntax.txt
+ */
+static enum error ibm_test_line(const char *line)
+{
+    struct test_op t;
+    /* at most nine fields; this should be more than enough for each field */
+    char s[9][64];
+    char *p;
+    int n, field;
+    int i;
+
+    /* data lines start with either b32 or d(64|128) */
+    if (unlikely(line[0] != 'b' && line[0] != 'd')) {
+        return ERROR_COMMENT;
+    }
+    n = sscanf(line, "%63s %63s %63s %63s %63s %63s %63s %63s %63s",
+               s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7], s[8]);
+    if (unlikely(n < 5 || n > 9)) {
+        return ERROR_INPUT;
+    }
+
+    field = 0;
+    p = s[field];
+    if (unlikely(strlen(p) < 4)) {
+        return ERROR_INPUT;
+    }
+    if (strcmp("b32b64cff", p) == 0) {
+        t.prec = PREC_FLOAT_TO_DOUBLE;
+        if (find_op(&p[6], &t.op)) {
+            return ERROR_NOT_HANDLED;
+        }
+    } else {
+        if (strncmp("b32", p, 3) == 0) {
+            t.prec = PREC_FLOAT;
+        } else if (strncmp("d64", p, 3) == 0) {
+            t.prec = PREC_DOUBLE;
+        } else if (strncmp("d128", p, 4) == 0) {
+            return ERROR_NOT_HANDLED; /* XXX */
+        } else {
+            return ERROR_INPUT;
+        }
+        if (find_op(&p[3], &t.op)) {
+            return ERROR_NOT_HANDLED;
+        }
+    }
+
+    field = 1;
+    p = s[field];
+    if (!strncmp("=0", p, 2)) {
+        t.round = float_round_nearest_even;
+    } else {
+        return ERROR_NOT_HANDLED; /* XXX */
+    }
+
+    /* The trapped exceptions field is optional */
+    t.trapped_exceptions = 0;
+    field = 2;
+    p = s[field];
+    if (ibm_get_exceptions(p, &t.trapped_exceptions)) {
+        if (unlikely(n == 9)) {
+            return ERROR_INPUT;
+        }
+    } else {
+        field++;
+    }
+
+    for (i = 0; i < ops[t.op].n_operands; i++) {
+        enum precision prec = t.prec == PREC_FLOAT_TO_DOUBLE ?
+            PREC_FLOAT : t.prec;
+
+        p = s[field++];
+        if (ibm_fp_hex(p, prec, &t.operands[i])) {
+            return ERROR_INPUT;
+        }
+    }
+
+    p = s[field++];
+    if (strcmp("->", p)) {
+        return ERROR_INPUT;
+    }
+
+    p = s[field++];
+    if (unlikely(strcmp("#", p) == 0)) {
+        t.expected_result_is_valid = false;
+    } else {
+        enum precision prec = t.prec == PREC_FLOAT_TO_DOUBLE ?
+            PREC_DOUBLE : t.prec;
+
+        if (ibm_fp_hex(p, prec, &t.expected_result)) {
+            return ERROR_INPUT;
+        }
+        t.expected_result_is_valid = true;
+    }
+
+    /*
+     * A 0 here means "do not check the exceptions", i.e. it does NOT mean
+     * "there should be no exceptions raised".
+     */
+    t.exceptions = 0;
+    /* the expected exceptions field is optional */
+    if (field == n - 1) {
+        p = s[field++];
+        if (ibm_get_exceptions(p, &t.exceptions)) {
+            return ERROR_INPUT;
+        }
+    }
+
+    /*
+     * We ignore "trapped exceptions" because we're not testing the trapping
+     * mechanism of the host CPU.
+     * We test though that the exception bits are correctly set.
+     */
+    if (t.trapped_exceptions) {
+        return ERROR_NOT_HANDLED;
+    }
+    return tester->func(&t);
+}
+
+static const struct input valid_input_types[] = {
+    [INPUT_FMT_IBM] = {
+        .name = "ibm",
+        .test_line = ibm_test_line,
+    },
+};
+
+static const struct input *input_type = &valid_input_types[INPUT_FMT_IBM];
+
+static bool line_is_whitelisted(const char *line)
+{
+    if (whitelist.ht == NULL) {
+        return false;
+    }
+    return !!g_hash_table_lookup(whitelist.ht, line);
+}
+
+static void test_file(const char *filename)
+{
+    static char line[256];
+    unsigned int i;
+    FILE *fp;
+
+    fp = fopen(filename, "r");
+    if (fp == NULL) {
+        fprintf(stderr, "cannot open file '%s': %s\n",
+                filename, strerror(errno));
+        exit(EXIT_FAILURE);
+    }
+    i = 0;
+    while (fgets(line, sizeof(line), fp)) {
+        enum error err;
+
+        i++;
+        if (unlikely(line_is_whitelisted(line))) {
+            test_stats[ERROR_WHITELISTED]++;
+            continue;
+        }
+        err = input_type->test_line(line);
+        if (unlikely(is_err(err))) {
+            switch (err) {
+            case ERROR_INPUT:
+                fprintf(stderr, "error: malformed input @ %s:%d:\n",
+                        filename, i);
+                break;
+            case ERROR_RESULT:
+                fprintf(stderr, "error: result mismatch for input @ %s:%d:\n",
+                        filename, i);
+                break;
+            case ERROR_EXCEPTIONS:
+                fprintf(stderr, "error: flags mismatch for input @ %s:%d:\n",
+                        filename, i);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+            fprintf(stderr, "%s", line);
+            if (die_on_error) {
+                exit(EXIT_FAILURE);
+            }
+        }
+        test_stats[err]++;
+    }
+    if (fclose(fp)) {
+        fprintf(stderr, "warning: cannot close file '%s': %s\n",
+                filename, strerror(errno));
+    }
+}
+
+static void set_input_fmt(const char *optarg)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(valid_input_types); i++) {
+        const struct input *type = &valid_input_types[i];
+
+        if (strcmp(optarg, type->name) == 0) {
+            input_type = type;
+            return;
+        }
+    }
+    fprintf(stderr, "Unknown input format '%s'", optarg);
+    exit(EXIT_FAILURE);
+}
+
+static void set_tester(const char *optarg)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(valid_testers); i++) {
+        const struct tester *t = &valid_testers[i];
+
+        if (strcmp(optarg, t->name) == 0) {
+            tester = t;
+            return;
+        }
+    }
+    fprintf(stderr, "Unknown tester '%s'", optarg);
+    exit(EXIT_FAILURE);
+}
+
+static void whitelist_add_line(const char *orig_line)
+{
+    char *line;
+    bool inserted;
+
+    if (whitelist.ht == NULL) {
+        whitelist.ht = g_hash_table_new(g_str_hash, g_str_equal);
+    }
+    line = g_hash_table_lookup(whitelist.ht, orig_line);
+    if (unlikely(line != NULL)) {
+        return;
+    }
+    whitelist.n++;
+    whitelist.lines = g_realloc_n(whitelist.lines, whitelist.n, sizeof(line));
+    line = strdup(orig_line);
+    whitelist.lines[whitelist.n - 1] = line;
+    /* if we pass key == val GLib will not reserve space for the value */
+    inserted = g_hash_table_insert(whitelist.ht, line, line);
+    g_assert(inserted);
+}
+
+static void set_whitelist(const char *filename)
+{
+    FILE *fp;
+    static char line[256];
+
+    fp = fopen(filename, "r");
+    if (fp == NULL) {
+        fprintf(stderr, "warning: cannot open white list file '%s': %s\n",
+                filename, strerror(errno));
+        return;
+    }
+    while (fgets(line, sizeof(line), fp)) {
+        if (isspace(line[0]) || line[0] == '#') {
+            continue;
+        }
+        whitelist_add_line(line);
+    }
+    if (fclose(fp)) {
+        fprintf(stderr, "warning: cannot close file '%s': %s\n",
+                filename, strerror(errno));
+    }
+}
+
+static void set_default_exceptions(const char *str)
+{
+    if (ibm_get_exceptions(str, &default_exceptions)) {
+        fprintf(stderr, "Invalid exception '%s'\n", str);
+        exit(EXIT_FAILURE);
+    }
+}
+
+static void usage_complete(int argc, char *argv[])
+{
+    fprintf(stderr, "Usage: %s [options] file1 [file2 ...]\n", argv[0]);
+    fprintf(stderr, "options:\n");
+    fprintf(stderr, "  -a = Perform tininess detection after rounding "
+            "(soft tester only). Default: before\n");
+    fprintf(stderr, "  -n = do not die on error. Default: dies on error\n");
+    fprintf(stderr, "  -e = default exception flags (xiozu). Default: none\n");
+    fprintf(stderr, "  -f = format of the input file(s). Default: %s\n",
+            valid_input_types[0].name);
+    fprintf(stderr, "  -t = tester. Default: %s\n", valid_testers[0].name);
+    fprintf(stderr, "  -w = path to file with test cases to be whitelisted\n");
+    fprintf(stderr, "  -z = flush inputs to zero (soft tester only). "
+            "Default: disabled\n");
+    fprintf(stderr, "  -Z = flush output to zero (soft tester only). "
+            "Default: disabled\n");
+}
+
+static void parse_opts(int argc, char *argv[])
+{
+    int c;
+
+    for (;;) {
+        c = getopt(argc, argv, "ae:f:hnt:w:zZ");
+        if (c < 0) {
+            return;
+        }
+        switch (c) {
+        case 'a':
+            soft_status.float_detect_tininess = float_tininess_after_rounding;
+            break;
+        case 'e':
+            set_default_exceptions(optarg);
+            break;
+        case 'f':
+            set_input_fmt(optarg);
+            break;
+        case 'h':
+            usage_complete(argc, argv);
+            exit(EXIT_SUCCESS);
+        case 'n':
+            die_on_error = false;
+            break;
+        case 't':
+            set_tester(optarg);
+            break;
+        case 'w':
+            set_whitelist(optarg);
+            break;
+        case 'z':
+            soft_status.flush_inputs_to_zero = 1;
+            break;
+        case 'Z':
+            soft_status.flush_to_zero = 1;
+            break;
+        }
+    }
+    g_assert_not_reached();
+}
+
+static uint64_t count_errors(void)
+{
+    uint64_t ret = 0;
+    int i;
+
+    for (i = ERROR_INPUT; i < ERROR_MAX; i++) {
+        ret += test_stats[i];
+    }
+    return ret;
+}
+
+int main(int argc, char *argv[])
+{
+    uint64_t n_errors;
+    int i;
+
+    if (argc == 1) {
+        usage_complete(argc, argv);
+        exit(EXIT_FAILURE);
+    }
+    parse_opts(argc, argv);
+    for (i = optind; i < argc; i++) {
+        test_file(argv[i]);
+    }
+
+    n_errors = count_errors();
+    if (n_errors) {
+        printf("Tests failed: %"PRIu64". Parsing: %"PRIu64
+               ", result:%"PRIu64", flags:%"PRIu64"\n",
+               n_errors, test_stats[ERROR_INPUT], test_stats[ERROR_RESULT],
+               test_stats[ERROR_EXCEPTIONS]);
+    } else {
+        printf("All tests OK.\n");
+    }
+    printf("Tests passed: %" PRIu64 ". Not handled: %" PRIu64
+           ", whitelisted: %"PRIu64 "\n",
+           test_stats[ERROR_NONE], test_stats[ERROR_NOT_HANDLED],
+           test_stats[ERROR_WHITELISTED]);
+    return !!n_errors;
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 0b27703..77d7353 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -642,6 +642,9 @@ tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
 tests/test-bufferiszero$(EXESUF): tests/test-bufferiszero.o $(test-util-obj-y)
 tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o $(test-util-obj-y)
 
+tests/fp/%:
+	$(MAKE) -C $(dir $@) $(notdir $@)
+
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 	hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
 	hw/core/bus.o \
diff --git a/tests/fp/.gitignore b/tests/fp/.gitignore
new file mode 100644
index 0000000..0a9fef4
--- /dev/null
+++ b/tests/fp/.gitignore
@@ -0,0 +1,3 @@
+ibm
+*.txt
+fp-test
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
new file mode 100644
index 0000000..a208f4c
--- /dev/null
+++ b/tests/fp/Makefile
@@ -0,0 +1,34 @@
+BUILD_DIR=$(CURDIR)/../..
+
+include ../../config-host.mak
+include $(SRC_PATH)/rules.mak
+
+$(call set-vpath, $(SRC_PATH)/tests/fp $(SRC_PATH)/fpu)
+
+QEMU_INCLUDES += -I../..
+QEMU_INCLUDES += -I$(SRC_PATH)/fpu
+# work around TARGET_* poisoning
+QEMU_CFLAGS += -DHW_POISON_H
+
+IBMFP := ibm-fptests.zip
+
+OBJS := fp-test$(EXESUF)
+
+WHITELIST_FILES := whitelist.txt whitelist-tininess-after.txt
+
+all: $(OBJS) ibm $(WHITELIST_FILES)
+
+ibm:
+	wget -nv -O $(IBMFP) http://www.haifa.il.ibm.com/projects/verification/fpgen/download/test_suite.zip
+	mkdir -p $@
+	unzip $(IBMFP) -d $@
+	rm -rf $(IBMFP)
+
+# XXX: upload this to a qemu server, or just commit it.
+$(WHITELIST_FILES):
+	wget -nv -O $@ http://www.cs.columbia.edu/~cota/qemu/fpbench-$@
+
+fp-test$(EXESUF): fp-test.o softfloat.o
+
+clean:
+	rm -f *.o *.d $(OBJS)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 02/15] softfloat: fix {min, max}nummag for same-abs-value inputs
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 03/15] fp-test: add muladd variants Emilio G. Cota
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Before 8936006 ("fpu/softfloat: re-factor minmax", 2018-02-21),
we used to return +Zero for maxnummag(-Zero,+Zero); after that
commit, we return -Zero.

Fix it by making {min,max}nummag consistent with {min,max}num,
deferring to the latter when the absolute value of the operands
is the same.

With this fix we now pass fp-test.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6e16284..6803279 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1704,7 +1704,6 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
         return pick_nan(a, b, s);
     } else {
         int a_exp, b_exp;
-        bool a_sign, b_sign;
 
         switch (a.cls) {
         case float_class_normal:
@@ -1735,20 +1734,22 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
             break;
         }
 
-        a_sign = a.sign;
-        b_sign = b.sign;
-        if (ismag) {
-            a_sign = b_sign = 0;
+        if (ismag && (a_exp != b_exp || a.frac != b.frac)) {
+            bool a_less = a_exp < b_exp;
+            if (a_exp == b_exp) {
+                a_less = a.frac < b.frac;
+            }
+            return a_less ^ ismin ? b : a;
         }
 
-        if (a_sign == b_sign) {
+        if (a.sign == b.sign) {
             bool a_less = a_exp < b_exp;
             if (a_exp == b_exp) {
                 a_less = a.frac < b.frac;
             }
-            return a_sign ^ a_less ^ ismin ? b : a;
+            return a.sign ^ a_less ^ ismin ? b : a;
         } else {
-            return a_sign ^ ismin ? b : a;
+            return a.sign ^ ismin ? b : a;
         }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 03/15] fp-test: add muladd variants
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 02/15] softfloat: fix {min, max}nummag for same-abs-value inputs Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 04/15] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

These are a few muladd-related operations that the original IBM syntax
does not specify; model files for these are in muladd.fptest.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/fp/fp-test.c     | 24 ++++++++++++++++++++++++
 tests/fp/muladd.fptest | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)
 create mode 100644 tests/fp/muladd.fptest

diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 27637c4..2200d40 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -53,6 +53,9 @@ enum op {
     OP_SUB,
     OP_MUL,
     OP_MULADD,
+    OP_MULADD_NEG_ADDEND,
+    OP_MULADD_NEG_PRODUCT,
+    OP_MULADD_NEG_RESULT,
     OP_DIV,
     OP_SQRT,
     OP_MINNUM,
@@ -69,6 +72,9 @@ static const struct op_desc ops[] = {
     [OP_SUB] =       { "-", 2 },
     [OP_MUL] =       { "*", 2 },
     [OP_MULADD] =    { "*+", 3 },
+    [OP_MULADD_NEG_ADDEND] =  { "*+nc", 3 },
+    [OP_MULADD_NEG_PRODUCT] = { "*+np", 3 },
+    [OP_MULADD_NEG_RESULT] =  { "*+nr", 3 },
     [OP_DIV] =       { "/", 2 },
     [OP_SQRT] =      { "V", 1 },
     [OP_MINNUM] =    { "<C", 2 },
@@ -463,6 +469,15 @@ static enum error soft_tester(struct test_op *t)
         case OP_MULADD:
             res = float32_muladd(a, b, c, 0, s);
             break;
+        case OP_MULADD_NEG_ADDEND:
+            res = float32_muladd(a, b, c, float_muladd_negate_c, s);
+            break;
+        case OP_MULADD_NEG_PRODUCT:
+            res = float32_muladd(a, b, c, float_muladd_negate_product, s);
+            break;
+        case OP_MULADD_NEG_RESULT:
+            res = float32_muladd(a, b, c, float_muladd_negate_result, s);
+            break;
         case OP_DIV:
             res = float32_div(a, b, s);
             break;
@@ -522,6 +537,15 @@ static enum error soft_tester(struct test_op *t)
         case OP_MULADD:
             res64 = float64_muladd(a, b, c, 0, s);
             break;
+        case OP_MULADD_NEG_ADDEND:
+            res64 = float64_muladd(a, b, c, float_muladd_negate_c, s);
+            break;
+        case OP_MULADD_NEG_PRODUCT:
+            res64 = float64_muladd(a, b, c, float_muladd_negate_product, s);
+            break;
+        case OP_MULADD_NEG_RESULT:
+            res64 = float64_muladd(a, b, c, float_muladd_negate_result, s);
+            break;
         case OP_DIV:
             res64 = float64_div(a, b, s);
             break;
diff --git a/tests/fp/muladd.fptest b/tests/fp/muladd.fptest
new file mode 100644
index 0000000..6cd48ff
--- /dev/null
+++ b/tests/fp/muladd.fptest
@@ -0,0 +1,51 @@
+# nc == negate addend
+b32*+nc =0 -Inf -Inf +Inf -> Q i
+b32*+nc =0 -1.7FFFFFP127 -Inf +Inf -> Q i
+b32*+nc =0 -1.6C9AE7P113 -Inf +Inf -> Q i
+b32*+nc =0 -1.000000P-126 -Inf +Inf -> Q i
+b32*+nc =0 -0.7FFFFFP-126 -Inf +Inf -> Q i
+b32*+nc =0 -0.1B977AP-126 -Inf +Inf -> Q i
+b32*+nc =0 -0.000001P-126 -Inf +Inf -> Q i
+b32*+nc =0 -1.000000P0 -Inf +Inf -> Q i
+b32*+nc =0 -Zero -Inf +Inf -> Q i
+b32*+nc =0 +Zero -Inf +Inf -> Q i
+b32*+nc =0 -Zero -1.000000P-126 +1.7FFFFFP127 -> -1.7FFFFFP127
+b32*+nc =0 +Zero -1.000000P-126 +1.7FFFFFP127 -> -1.7FFFFFP127
+b32*+nc =0 -1.000000P-126 -1.7FFFFFP127 -1.4B9156P109 -> +1.4B9156P109 x
+b32*+nc =0 -0.7FFFFFP-126 -1.7FFFFFP127 -1.51BA59P-113 -> +1.7FFFFDP1 x
+b32*+nc =0 -0.3D6B57P-126 -1.7FFFFFP127 -1.265398P-67 -> +1.75AD5BP0 x
+b32*+nc =0 -0.000001P-126 -1.7FFFFFP127 -1.677330P-113 -> +1.7FFFFFP-22 x
+
+# np == negate product
+b32*+np =0 +Inf -Inf -Inf -> Q i
+b32*+np =0 +1.7FFFFFP127 -Inf -Inf -> Q i
+b32*+np =0 +1.6C9AE7P113 -Inf -Inf -> Q i
+b32*+np =0 +1.000000P-126 -Inf -Inf -> Q i
+b32*+np =0 +0.7FFFFFP-126 -Inf -Inf -> Q i
+b32*+np =0 +0.1B977AP-126 -Inf -Inf -> Q i
+b32*+np =0 +0.000001P-126 -Inf -Inf -> Q i
+b32*+np =0 +1.000000P0 -Inf -Inf -> Q i
+b32*+np =0 +Zero -Inf -Inf -> Q i
+b32*+np =0 +Zero -Inf -Inf -> Q i
+b32*+np =0 -Zero -1.000000P-126 -1.7FFFFFP127 -> -1.7FFFFFP127
+b32*+np =0 +Zero -1.000000P-126 -1.7FFFFFP127 -> -1.7FFFFFP127
+b32*+np =0 -1.3A6A89P-18 +1.24E7AEP9 -0.7FFFFFP-126 -> +1.7029E9P-9 x
+
+# nr == negate result
+b32*+nr =0 -Inf -Inf -Inf -> Q i
+b32*+nr =0 -1.7FFFFFP127 -Inf -Inf -> Q i
+b32*+nr =0 -1.6C9AE7P113 -Inf -Inf -> Q i
+b32*+nr =0 -1.000000P-126 -Inf -Inf -> Q i
+b32*+nr =0 -0.7FFFFFP-126 -Inf -Inf -> Q i
+b32*+nr =0 -0.1B977AP-126 -Inf -Inf -> Q i
+b32*+nr =0 -0.000001P-126 -Inf -Inf -> Q i
+b32*+nr =0 -1.000000P0 -Inf -Inf -> Q i
+b32*+nr =0 -Zero -Inf -Inf -> Q i
+b32*+nr =0 -Zero -Inf -Inf -> Q i
+b32*+nr =0 +Zero -1.000000P-126 -1.7FFFFFP127 -> +1.7FFFFFP127
+b32*+nr =0 -Zero -1.000000P-126 -1.7FFFFFP127 -> +1.7FFFFFP127
+b32*+nr =0 -1.000000P-126 -1.7FFFFFP127 -1.4B9156P109 -> +1.4B9156P109 x
+b32*+nr =0 -0.7FFFFFP-126 -1.7FFFFFP127 -1.51BA59P-113 -> -1.7FFFFDP1 x
+b32*+nr =0 -0.3D6B57P-126 -1.7FFFFFP127 -1.265398P-67 -> -1.75AD5BP0 x
+b32*+nr =0 -0.000001P-126 -1.7FFFFFP127 -1.677330P-113 -> -1.7FFFFFP-22 x
+b32*+nr =0 +1.72E53AP-33 -1.7FFFFFP127 -1.5AA684P-2 -> +1.72E539P95 x
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 04/15] softfloat: add float{32, 64}_is_{de, }normal
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (2 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 03/15] fp-test: add muladd variants Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-06 12:01   ` Bastian Koppelmann
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 05/15] target/tricore: use float32_is_denormal Emilio G. Cota
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland,
	Bastian Koppelmann

This paves the way for upcoming work.

Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/fpu/softfloat.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 36626a5..a8512fb 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -412,6 +412,16 @@ static inline int float32_is_zero_or_denormal(float32 a)
     return (float32_val(a) & 0x7f800000) == 0;
 }
 
+static inline bool float32_is_normal(float32 a)
+{
+    return ((float32_val(a) + 0x00800000) & 0x7fffffff) >= 0x01000000;
+}
+
+static inline bool float32_is_denormal(float32 a)
+{
+    return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
+}
+
 static inline float32 float32_set_sign(float32 a, int sign)
 {
     return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -541,6 +551,16 @@ static inline int float64_is_zero_or_denormal(float64 a)
     return (float64_val(a) & 0x7ff0000000000000LL) == 0;
 }
 
+static inline bool float64_is_normal(float64 a)
+{
+    return ((float64_val(a) + (1ULL << 52)) & -1ULL >> 1) >= 1ULL << 53;
+}
+
+static inline bool float64_is_denormal(float64 a)
+{
+    return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
+}
+
 static inline float64 float64_set_sign(float64 a, int sign)
 {
     return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 05/15] target/tricore: use float32_is_denormal
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (3 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 04/15] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-06 12:01   ` Bastian Koppelmann
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 06/15] tests/fp: add fp-bench, a collection of simple floating point microbenchmarks Emilio G. Cota
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland,
	Bastian Koppelmann

Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/tricore/fpu_helper.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c
index df16290..31df462 100644
--- a/target/tricore/fpu_helper.c
+++ b/target/tricore/fpu_helper.c
@@ -44,11 +44,6 @@ static inline uint8_t f_get_excp_flags(CPUTriCoreState *env)
               | float_flag_inexact);
 }
 
-static inline bool f_is_denormal(float32 arg)
-{
-    return float32_is_zero_or_denormal(arg) && !float32_is_zero(arg);
-}
-
 static inline float32 f_maddsub_nan_result(float32 arg1, float32 arg2,
                                            float32 arg3, float32 result,
                                            uint32_t muladd_negate_c)
@@ -260,8 +255,8 @@ uint32_t helper_fcmp(CPUTriCoreState *env, uint32_t r1, uint32_t r2)
     set_flush_inputs_to_zero(0, &env->fp_status);
 
     result = 1 << (float32_compare_quiet(arg1, arg2, &env->fp_status) + 1);
-    result |= f_is_denormal(arg1) << 4;
-    result |= f_is_denormal(arg2) << 5;
+    result |= float32_is_denormal(arg1) << 4;
+    result |= float32_is_denormal(arg2) << 5;
 
     flags = f_get_excp_flags(env);
     if (flags) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 06/15] tests/fp: add fp-bench, a collection of simple floating point microbenchmarks
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (4 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 05/15] target/tricore: use float32_is_denormal Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 07/15] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

This will allow us to measure the performance impact of FP emulation
optimizations. Note that we can measure both directly the impact
on the softfloat functions (with "-t soft"), or the impact on an
emulated workload (call with "-t host" and run under qemu user-mode).

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/fp/fp-bench.c | 528 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/fp/.gitignore |   1 +
 tests/fp/Makefile   |   4 +-
 3 files changed, 532 insertions(+), 1 deletion(-)
 create mode 100644 tests/fp/fp-bench.c

diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
new file mode 100644
index 0000000..a012b78
--- /dev/null
+++ b/tests/fp/fp-bench.c
@@ -0,0 +1,528 @@
+/*
+ * fp-bench.c - A collection of simple floating point microbenchmarks.
+ *
+ * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include "qemu/timer.h"
+
+#include "fpu/softfloat.h"
+
+#include <math.h>
+
+/* amortize the computation of random inputs */
+#define OPS_PER_ITER     50000
+
+#define MAX_OPERANDS 3
+
+#define SEED_A 0xdeadfacedeadface
+#define SEED_B 0xbadc0feebadc0fee
+#define SEED_C 0xbeefdeadbeefdead
+
+enum op {
+    OP_ADD,
+    OP_SUB,
+    OP_MUL,
+    OP_DIV,
+    OP_FMA,
+    OP_SQRT,
+    OP_CMP,
+    OP_MAX_NR,
+};
+
+static const char * const op_names[] = {
+    [OP_ADD] = "add",
+    [OP_SUB] = "sub",
+    [OP_MUL] = "mul",
+    [OP_DIV] = "div",
+    [OP_FMA] = "fma",
+    [OP_SQRT] = "sqrt",
+    [OP_CMP] = "cmp",
+    [OP_MAX_NR] = NULL,
+};
+
+enum precision {
+    PREC_SINGLE,
+    PREC_DOUBLE,
+    PREC_FLOAT32,
+    PREC_FLOAT64,
+    PREC_MAX_NR,
+};
+
+enum tester {
+    TESTER_SOFT,
+    TESTER_HOST,
+    TESTER_MAX_NR,
+};
+
+static const char * const tester_names[] = {
+    [TESTER_SOFT] = "soft",
+    [TESTER_HOST] = "host",
+    [TESTER_MAX_NR] = NULL,
+};
+
+union fp {
+    float f;
+    double d;
+    float32 f32;
+    float64 f64;
+    uint64_t u64;
+};
+
+struct op_state;
+
+typedef float (*float_func_t)(const struct op_state *s);
+typedef double (*double_func_t)(const struct op_state *s);
+
+union fp_func {
+    float_func_t float_func;
+    double_func_t double_func;
+};
+
+typedef void (*bench_func_t)(void);
+
+struct op_desc {
+    const char * const name;
+};
+
+#define DEFAULT_DURATION_SECS 1
+
+static uint64_t random_ops[MAX_OPERANDS] = {
+    SEED_A, SEED_B, SEED_C,
+};
+static float_status soft_status;
+static enum precision precision;
+static enum op operation;
+static enum tester tester;
+static uint64_t n_completed_ops;
+static unsigned int duration = DEFAULT_DURATION_SECS;
+static int64_t ns_elapsed;
+/* disable optimizations with volatile */
+static volatile union fp res;
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+    x ^= x >> 12; /* a */
+    x ^= x << 25; /* b */
+    x ^= x >> 27; /* c */
+    return x * UINT64_C(2685821657736338717);
+}
+
+static void update_random_ops(int n_ops, enum precision prec)
+{
+    int i;
+
+    for (i = 0; i < n_ops; i++) {
+        uint64_t r = random_ops[i];
+
+        if (prec == PREC_SINGLE || PREC_FLOAT32) {
+            do {
+                r = xorshift64star(r);
+            } while (!float32_is_normal(r));
+        } else if (prec == PREC_DOUBLE || PREC_FLOAT64) {
+            do {
+                r = xorshift64star(r);
+            } while (!float64_is_normal(r));
+        } else {
+            g_assert_not_reached();
+        }
+        random_ops[i] = r;
+    }
+}
+
+static void fill_random(union fp *ops, int n_ops, enum precision prec,
+                        bool no_neg)
+{
+    int i;
+
+    for (i = 0; i < n_ops; i++) {
+        switch (prec) {
+        case PREC_SINGLE:
+        case PREC_FLOAT32:
+            ops[i].f32 = make_float32(random_ops[i]);
+            if (no_neg && float32_is_neg(ops[i].f32)) {
+                ops[i].f32 = float32_chs(ops[i].f32);
+            }
+            /* raise the exponent to limit the frequency of denormal results */
+            ops[i].f32 |= 0x40000000;
+            break;
+        case PREC_DOUBLE:
+        case PREC_FLOAT64:
+            ops[i].f64 = make_float64(random_ops[i]);
+            if (no_neg && float64_is_neg(ops[i].f64)) {
+                ops[i].f64 = float64_chs(ops[i].f64);
+            }
+            /* raise the exponent to limit the frequency of denormal results */
+            ops[i].f64 |= LIT64(0x4000000000000000);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+}
+
+/*
+ * The main benchmark function. Instead of (ab)using macros, we rely
+ * on the compiler to unfold this at compile-time.
+ */
+static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
+{
+    int64_t tf = get_clock_realtime() + duration * 1000000000LL;
+
+    while (get_clock_realtime() < tf) {
+        union fp ops[MAX_OPERANDS];
+        int64_t t0;
+        int i;
+
+        update_random_ops(n_ops, prec);
+        switch (prec) {
+        case PREC_SINGLE:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock_realtime();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float a = ops[0].f;
+                float b = ops[1].f;
+                float c = ops[2].f;
+
+                switch (op) {
+                case OP_ADD:
+                    res.f = a + b;
+                    break;
+                case OP_SUB:
+                    res.f = a - b;
+                    break;
+                case OP_MUL:
+                    res.f = a * b;
+                    break;
+                case OP_DIV:
+                    res.f = a / b;
+                    break;
+                case OP_FMA:
+                    res.f = fmaf(a, b, c);
+                    break;
+                case OP_SQRT:
+                    res.f = sqrtf(a);
+                    break;
+                case OP_CMP:
+                    res.u64 = isgreater(a, b);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        case PREC_DOUBLE:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock_realtime();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                double a = ops[0].d;
+                double b = ops[1].d;
+                double c = ops[2].d;
+
+                switch (op) {
+                case OP_ADD:
+                    res.d = a + b;
+                    break;
+                case OP_SUB:
+                    res.d = a - b;
+                    break;
+                case OP_MUL:
+                    res.d = a * b;
+                    break;
+                case OP_DIV:
+                    res.d = a / b;
+                    break;
+                case OP_FMA:
+                    res.d = fma(a, b, c);
+                    break;
+                case OP_SQRT:
+                    res.d = sqrt(a);
+                    break;
+                case OP_CMP:
+                    res.u64 = isgreater(a, b);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        case PREC_FLOAT32:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock_realtime();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float32 a = ops[0].f32;
+                float32 b = ops[1].f32;
+                float32 c = ops[2].f32;
+
+                switch (op) {
+                case OP_ADD:
+                    res.f32 = float32_add(a, b, &soft_status);
+                    break;
+                case OP_SUB:
+                    res.f32 = float32_sub(a, b, &soft_status);
+                    break;
+                case OP_MUL:
+                    res.f = float32_mul(a, b, &soft_status);
+                    break;
+                case OP_DIV:
+                    res.f32 = float32_div(a, b, &soft_status);
+                    break;
+                case OP_FMA:
+                    res.f32 = float32_muladd(a, b, c, 0, &soft_status);
+                    break;
+                case OP_SQRT:
+                    res.f32 = float32_sqrt(a, &soft_status);
+                    break;
+                case OP_CMP:
+                    res.u64 = float32_compare_quiet(a, b, &soft_status);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        case PREC_FLOAT64:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock_realtime();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float64 a = ops[0].f64;
+                float64 b = ops[1].f64;
+                float64 c = ops[2].f64;
+
+                switch (op) {
+                case OP_ADD:
+                    res.f64 = float64_add(a, b, &soft_status);
+                    break;
+                case OP_SUB:
+                    res.f64 = float64_sub(a, b, &soft_status);
+                    break;
+                case OP_MUL:
+                    res.f = float64_mul(a, b, &soft_status);
+                    break;
+                case OP_DIV:
+                    res.f64 = float64_div(a, b, &soft_status);
+                    break;
+                case OP_FMA:
+                    res.f64 = float64_muladd(a, b, c, 0, &soft_status);
+                    break;
+                case OP_SQRT:
+                    res.f64 = float64_sqrt(a, &soft_status);
+                    break;
+                case OP_CMP:
+                    res.u64 = float64_compare_quiet(a, b, &soft_status);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        ns_elapsed += get_clock_realtime() - t0;
+        n_completed_ops += OPS_PER_ITER;
+    }
+}
+
+#define GEN_BENCH(name, type, prec, op, n_ops)          \
+    static void __attribute__((flatten)) name(void)     \
+    {                                                   \
+        bench(prec, op, n_ops, false);                  \
+    }
+
+#define GEN_BENCH_NO_NEG(name, type, prec, op, n_ops)   \
+    static void __attribute__((flatten)) name(void)     \
+    {                                                   \
+        bench(prec, op, n_ops, true);                   \
+    }
+
+#define GEN_BENCH_ALL_TYPES(opname, op, n_ops)                          \
+    GEN_BENCH(bench_ ## opname ## _float, float, PREC_SINGLE, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _double, double, PREC_DOUBLE, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _float32, float32, PREC_FLOAT32, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _float64, float64, PREC_FLOAT64, op, n_ops)
+
+GEN_BENCH_ALL_TYPES(add, OP_ADD, 2)
+GEN_BENCH_ALL_TYPES(sub, OP_SUB, 2)
+GEN_BENCH_ALL_TYPES(mul, OP_MUL, 2)
+GEN_BENCH_ALL_TYPES(div, OP_DIV, 2)
+GEN_BENCH_ALL_TYPES(fma, OP_FMA, 3)
+GEN_BENCH_ALL_TYPES(cmp, OP_CMP, 2)
+#undef GEN_BENCH_ALL_TYPES
+
+#define GEN_BENCH_ALL_TYPES_NO_NEG(name, op, n)                         \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float, float, PREC_SINGLE, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _double, double, PREC_DOUBLE, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float32, float32, PREC_FLOAT32, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float64, float64, PREC_FLOAT64, op, n)
+
+GEN_BENCH_ALL_TYPES_NO_NEG(sqrt, OP_SQRT, 1)
+#undef GEN_BENCH_ALL_TYPES_NO_NEG
+
+#undef GEN_BENCH_NO_NEG
+#undef GEN_BENCH
+
+#define GEN_BENCH_FUNCS(opname, op)                             \
+    [op] = {                                                    \
+        [PREC_SINGLE]    = bench_ ## opname ## _float,          \
+        [PREC_DOUBLE]    = bench_ ## opname ## _double,         \
+        [PREC_FLOAT32]   = bench_ ## opname ## _float32,        \
+        [PREC_FLOAT64]   = bench_ ## opname ## _float64,        \
+    }
+
+static const bench_func_t bench_funcs[OP_MAX_NR][PREC_MAX_NR] = {
+    GEN_BENCH_FUNCS(add, OP_ADD),
+    GEN_BENCH_FUNCS(sub, OP_SUB),
+    GEN_BENCH_FUNCS(mul, OP_MUL),
+    GEN_BENCH_FUNCS(div, OP_DIV),
+    GEN_BENCH_FUNCS(fma, OP_FMA),
+    GEN_BENCH_FUNCS(sqrt, OP_SQRT),
+    GEN_BENCH_FUNCS(cmp, OP_CMP),
+};
+
+#undef GEN_BENCH_FUNCS
+
+static void run_bench(void)
+{
+    bench_func_t f;
+
+    f = bench_funcs[operation][precision];
+    g_assert(f);
+    f();
+}
+
+/* @arr must be NULL-terminated */
+static int find_name(const char * const *arr, const char *name)
+{
+    int i;
+
+    for (i = 0; arr[i] != NULL; i++) {
+        if (strcmp(name, arr[i]) == 0) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+static void usage_complete(int argc, char *argv[])
+{
+    gchar *op_list = g_strjoinv(", ", (gchar **)op_names);
+    gchar *tester_list = g_strjoinv(", ", (gchar **)tester_names);
+
+    fprintf(stderr, "Usage: %s [options]\n", argv[0]);
+    fprintf(stderr, "options:\n");
+    fprintf(stderr, "  -d = duration, in seconds. Default: %d\n",
+            DEFAULT_DURATION_SECS);
+    fprintf(stderr, "  -h = show this help message.\n");
+    fprintf(stderr, "  -o = floating point operation (%s). Default: %s\n",
+            op_list, op_names[0]);
+    fprintf(stderr, "  -p = floating point precision (single, double). "
+            "Default: single\n");
+    fprintf(stderr, "  -t = tester (%s). Default: %s\n",
+            tester_list, tester_names[0]);
+    fprintf(stderr, "  -z = flush inputs to zero (soft tester only). "
+            "Default: disabled\n");
+    fprintf(stderr, "  -Z = flush output to zero (soft tester only). "
+            "Default: disabled\n");
+
+    g_free(tester_list);
+    g_free(op_list);
+}
+
+static void parse_args(int argc, char *argv[])
+{
+    int c;
+    int val;
+
+    for (;;) {
+        c = getopt(argc, argv, "d:ho:p:t:zZ");
+        if (c < 0) {
+            break;
+        }
+        switch (c) {
+        case 'd':
+            duration = atoi(optarg);
+            break;
+        case 'h':
+            usage_complete(argc, argv);
+            exit(EXIT_SUCCESS);
+        case 'o':
+            val = find_name(op_names, optarg);
+            if (val < 0) {
+                fprintf(stderr, "Unsupported op '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            operation = val;
+            break;
+        case 'p':
+            if (!strcmp(optarg, "single")) {
+                precision = PREC_SINGLE;
+            } else if (!strcmp(optarg, "double")) {
+                precision = PREC_DOUBLE;
+            } else {
+                fprintf(stderr, "Unsupported precision '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            break;
+        case 't':
+            val = find_name(tester_names, optarg);
+            if (val < 0) {
+                fprintf(stderr, "Unsupported tester '%s'\n", optarg);
+                exit(EXIT_FAILURE);
+            }
+            tester = val;
+            break;
+        case 'z':
+            soft_status.flush_inputs_to_zero = 1;
+            break;
+        case 'Z':
+            soft_status.flush_to_zero = 1;
+            break;
+        }
+    }
+
+    /* set precision based on the tester */
+    switch (tester) {
+    case TESTER_HOST:
+        break;
+    case TESTER_SOFT:
+        switch (precision) {
+        case PREC_SINGLE:
+            precision = PREC_FLOAT32;
+            break;
+        case PREC_DOUBLE:
+            precision = PREC_FLOAT64;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void pr_stats(void)
+{
+    printf("%.2f MFlops\n", (double)n_completed_ops / ns_elapsed * 1e3);
+}
+
+int main(int argc, char *argv[])
+{
+    parse_args(argc, argv);
+    run_bench();
+    pr_stats();
+    return 0;
+}
diff --git a/tests/fp/.gitignore b/tests/fp/.gitignore
index 0a9fef4..a4e59d7 100644
--- a/tests/fp/.gitignore
+++ b/tests/fp/.gitignore
@@ -1,3 +1,4 @@
 ibm
 *.txt
 fp-test
+fp-bench
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index a208f4c..7c88ab0 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -12,7 +12,7 @@ QEMU_CFLAGS += -DHW_POISON_H
 
 IBMFP := ibm-fptests.zip
 
-OBJS := fp-test$(EXESUF)
+OBJS := fp-test$(EXESUF) fp-bench$(EXESUF)
 
 WHITELIST_FILES := whitelist.txt whitelist-tininess-after.txt
 
@@ -30,5 +30,7 @@ $(WHITELIST_FILES):
 
 fp-test$(EXESUF): fp-test.o softfloat.o
 
+fp-bench$(EXESUF): fp-bench.o softfloat.o
+
 clean:
 	rm -f *.o *.d $(OBJS)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 07/15] softfloat: rename canonicalize to sf_canonicalize
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (5 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 06/15] tests/fp: add fp-bench, a collection of simple floating point microbenchmarks Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-06 12:02   ` Bastian Koppelmann
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 08/15] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland,
	Bastian Koppelmann

glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).

Given that we'll be including <math.h> soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().

Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6803279..c3b9d07 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -323,8 +323,8 @@ static inline float64 float64_pack_raw(FloatParts p)
 }
 
 /* Canonicalize EXP and FRAC, setting CLS.  */
-static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
-                               float_status *status)
+static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
+                                  float_status *status)
 {
     if (part.exp == parm->exp_max) {
         if (part.frac == 0) {
@@ -494,7 +494,7 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
 
 static FloatParts float16_unpack_canonical(float16 f, float_status *s)
 {
-    return canonicalize(float16_unpack_raw(f), &float16_params, s);
+    return sf_canonicalize(float16_unpack_raw(f), &float16_params, s);
 }
 
 static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
@@ -512,7 +512,7 @@ static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
 
 static FloatParts float32_unpack_canonical(float32 f, float_status *s)
 {
-    return canonicalize(float32_unpack_raw(f), &float32_params, s);
+    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
 }
 
 static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
@@ -530,7 +530,7 @@ static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
 
 static FloatParts float64_unpack_canonical(float64 f, float_status *s)
 {
-    return canonicalize(float64_unpack_raw(f), &float64_params, s);
+    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
 }
 
 static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 08/15] softfloat: add float{32, 64}_is_zero_or_normal
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (6 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 07/15] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 09/15] fpu: introduce hardfloat Emilio G. Cota
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

These will gain some users very soon.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/fpu/softfloat.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index a8512fb..66985e1 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -422,6 +422,11 @@ static inline bool float32_is_denormal(float32 a)
     return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
 }
 
+static inline bool float32_is_zero_or_normal(float32 a)
+{
+    return float32_is_normal(a) || float32_is_zero(a);
+}
+
 static inline float32 float32_set_sign(float32 a, int sign)
 {
     return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -561,6 +566,11 @@ static inline bool float64_is_denormal(float64 a)
     return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
 }
 
+static inline bool float64_is_zero_or_normal(float64 a)
+{
+    return float64_is_normal(a) || float64_is_zero(a);
+}
+
 static inline float64 float64_set_sign(float64 a, int sign)
 {
     return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 09/15] fpu: introduce hardfloat
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (7 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 08/15] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 10/15] hardfloat: support float32/64 addition and subtraction Emilio G. Cota
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.

The approach followed here avoids checking the FP exception flags register.
See the added comment for details.

This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags such as -ffast-math. We control the flags so this should
be easy to enforce though.

This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.

Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 342 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 342 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c3b9d07..956b938 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -82,6 +82,8 @@ this code that are retained.
 /* softfloat (and in particular the code in softfloat-specialize.h) is
  * target-dependent and needs the TARGET_* macros.
  */
+#include <math.h>
+
 #include "qemu/osdep.h"
 #include "qemu/bitops.h"
 #include "fpu/softfloat.h"
@@ -105,6 +107,346 @@ this code that are retained.
 *----------------------------------------------------------------------------*/
 #include "softfloat-specialize.h"
 
+/*
+ * Hardfloat
+ *
+ * Fast emulation of guest FP instructions is challenging for two reasons.
+ * First, FP instruction semantics are similar but not identical, particularly
+ * when handling NaNs. Second, emulating at reasonable speed the guest FP
+ * exception flags is not trivial: reading the host's flags register with a
+ * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
+ * and trapping on every FP exception is not fast nor pleasant to work with.
+ *
+ * We address these challenges by leverage the host FPU for a subset of the
+ * operations. To do this we follow the main idea presented in this paper:
+ *
+ * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
+ * binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.
+ *
+ * The idea is thus to leverage the host FPU to (1) compute FP operations
+ * and (2) identify whether FP exceptions occurred while avoiding
+ * expensive exception flag register accesses.
+ *
+ * An important optimization shown in the paper is that given that exception
+ * flags are rarely cleared by the guest, we can avoid recomputing some flags.
+ * This is particularly useful for the inexact flag, which is very frequently
+ * raised in floating-point workloads.
+ *
+ * We optimize the code further by deferring to soft-fp whenever FP exception
+ * detection might get hairy. Two examples: (1) when at least one operand is
+ * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 result
+ * and the result is < the minimum normal.
+ */
+#define GEN_TYPE_CONV(name, to_t, from_t)       \
+    static inline to_t name(from_t a)           \
+    {                                           \
+        to_t r = *(to_t *)&a;                   \
+        return r;                               \
+    }
+
+GEN_TYPE_CONV(float32_to_float, float, float32)
+GEN_TYPE_CONV(float64_to_double, double, float64)
+GEN_TYPE_CONV(float_to_float32, float32, float)
+GEN_TYPE_CONV(double_to_float64, float64, double)
+#undef GEN_TYPE_CONV
+
+#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t)                          \
+    static inline void name(soft_t *a, float_status *s)                 \
+    {                                                                   \
+        if (unlikely(soft_t ## _is_denormal(*a))) {                     \
+            *a = soft_t ## _set_sign(soft_t ## _zero,                   \
+                                     soft_t ## _is_neg(*a));            \
+            s->float_exception_flags |= float_flag_input_denormal;      \
+        }                                                               \
+    }
+
+GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
+GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
+#undef GEN_INPUT_FLUSH__NOCHECK
+
+#define GEN_INPUT_FLUSH1(name, soft_t)                  \
+    static inline void name(soft_t *a, float_status *s) \
+    {                                                   \
+        if (likely(!s->flush_inputs_to_zero)) {         \
+            return;                                     \
+        }                                               \
+        soft_t ## _input_flush__nocheck(a, s);          \
+    }
+
+GEN_INPUT_FLUSH1(float32_input_flush1, float32)
+GEN_INPUT_FLUSH1(float64_input_flush1, float64)
+#undef GEN_INPUT_FLUSH1
+
+#define GEN_INPUT_FLUSH2(name, soft_t)                                  \
+    static inline void name(soft_t *a, soft_t *b, float_status *s)      \
+    {                                                                   \
+        if (likely(!s->flush_inputs_to_zero)) {                         \
+            return;                                                     \
+        }                                                               \
+        soft_t ## _input_flush__nocheck(a, s);                          \
+        soft_t ## _input_flush__nocheck(b, s);                          \
+    }
+
+GEN_INPUT_FLUSH2(float32_input_flush2, float32)
+GEN_INPUT_FLUSH2(float64_input_flush2, float64)
+#undef GEN_INPUT_FLUSH2
+
+#define GEN_INPUT_FLUSH3(name, soft_t)                                  \
+    static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
+    {                                                                   \
+        if (likely(!s->flush_inputs_to_zero)) {                         \
+            return;                                                     \
+        }                                                               \
+        soft_t ## _input_flush__nocheck(a, s);                          \
+        soft_t ## _input_flush__nocheck(b, s);                          \
+        soft_t ## _input_flush__nocheck(c, s);                          \
+    }
+
+GEN_INPUT_FLUSH3(float32_input_flush3, float32)
+GEN_INPUT_FLUSH3(float64_input_flush3, float64)
+#undef GEN_INPUT_FLUSH3
+
+static inline bool can_use_fpu(const float_status *s)
+{
+    return likely(s->float_exception_flags & float_flag_inexact &&
+                  s->float_rounding_mode == float_round_nearest_even);
+}
+
+/*
+ * Choose whether to use fpclassify or float32/64_* primitives in the generated
+ * hardfloat functions. Each combination of number of inputs and float size
+ * gets its own value.
+ */
+#if defined(__x86_64__)
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 0
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 1
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 1
+#else
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 0
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 0
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 0
+#endif
+
+/*
+ * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
+ * float{32,64}_is_infinity when !USE_FP.
+ * On x86_64/aarch64, using the former over the latter can yield a ~6% speedup.
+ * On power64 however, using isinf() reduces fp-bench performance by up to 50%.
+ */
+#if defined(__x86_64__) || defined(__aarch64__)
+# define QEMU_HARDFLOAT_USE_ISINF   1
+#else
+# define QEMU_HARDFLOAT_USE_ISINF   0
+#endif
+
+/*
+ * Some targets clear the FP flags before most FP operations. This prevents
+ * the use of hardfloat, since hardfloat relies on the inexact flag being
+ * already set.
+ */
+#if defined(TARGET_PPC)
+# define QEMU_NO_HARDFLOAT 1
+# define QEMU_SOFTFLOAT_ATTR __attribute__((flatten))
+#else
+# define QEMU_NO_HARDFLOAT 0
+# define QEMU_SOFTFLOAT_ATTR __attribute__((noinline))
+#endif
+
+/*
+ * Hardfloat generation functions. Each operation can have two flavors:
+ * either using softfloat primitives (e.g. float32_is_zero_or_normal) for
+ * most condition checks, or native ones (e.g. fpclassify).
+ *
+ * The flavor is chosen by the callers. Instead of using macros, we rely on the
+ * compiler to propagate constants and inline everything into the callers.
+ *
+ * We only generate functions for operations with two inputs, since only
+ * these are common enough to justify consolidating them into common code.
+ */
+typedef bool (*f32_check_func_t)(float32 a, float32 b, const float_status *s);
+typedef bool (*f64_check_func_t)(float64 a, float64 b, const float_status *s);
+typedef bool (*float_check_func_t)(float a, float b, const float_status *s);
+typedef bool (*double_check_func_t)(double a, double b, const float_status *s);
+
+typedef float32 (*f32_op2_func_t)(float32 a, float32 b, float_status *s);
+typedef float64 (*f64_op2_func_t)(float64 a, float64 b, float_status *s);
+typedef float (*float_op2_func_t)(float a, float b);
+typedef double (*double_op2_func_t)(double a, double b);
+
+/* 2-input is-zero-or-normal */
+static inline bool
+f32_is_zon2(float32 a, float32 b, const struct float_status *s)
+{
+    return likely(float32_is_zero_or_normal(a) &&
+                  float32_is_zero_or_normal(b) &&
+                  can_use_fpu(s));
+}
+
+static inline bool
+float_is_zon2(float a, float b, const struct float_status *s)
+{
+    return likely((fpclassify(a) == FP_NORMAL || fpclassify(a) == FP_ZERO) &&
+                  (fpclassify(b) == FP_NORMAL || fpclassify(b) == FP_ZERO) &&
+                  can_use_fpu(s));
+}
+
+static inline bool
+f64_is_zon2(float64 a, float64 b, const struct float_status *s)
+{
+    return likely(float64_is_zero_or_normal(a) &&
+                  float64_is_zero_or_normal(b) &&
+                  can_use_fpu(s));
+}
+
+static inline bool
+double_is_zon2(double a, double b, const struct float_status *s)
+{
+    return likely((fpclassify(a) == FP_NORMAL || fpclassify(a) == FP_ZERO) &&
+                  (fpclassify(b) == FP_NORMAL || fpclassify(b) == FP_ZERO) &&
+                  can_use_fpu(s));
+}
+
+/*
+ * Note: @fast and @post can be NULL.
+ * Note: @fast and @fast_op always use softfloat types.
+ */
+static inline float32
+f32_gen2(float32 a, float32 b, float_status *s, float_op2_func_t hard,
+         f32_op2_func_t soft, f32_check_func_t pre, f32_check_func_t post,
+         f32_check_func_t fast, f32_op2_func_t fast_op)
+{
+    if (QEMU_NO_HARDFLOAT) {
+        goto soft;
+    }
+    float32_input_flush2(&a, &b, s);
+    if (likely(pre(a, b, s))) {
+        if (fast != NULL && fast(a, b, s)) {
+            return fast_op(a, b, s);
+        } else {
+            float ha = float32_to_float(a);
+            float hb = float32_to_float(b);
+            float hr = hard(ha, hb);
+            float32 r = float_to_float32(hr);
+
+            if (unlikely(QEMU_HARDFLOAT_USE_ISINF ?
+                         isinf(hr) : float32_is_infinity(r))) {
+                s->float_exception_flags |= float_flag_overflow;
+            } else if (unlikely(fabsf(hr) <= FLT_MIN &&
+                                (post == NULL || post(a, b, s)))) {
+                goto soft;
+            }
+            return r;
+        }
+    }
+ soft:
+    return soft(a, b, s);
+}
+
+static inline float32
+float_gen2(float32 a, float32 b, float_status *s, float_op2_func_t hard,
+           f32_op2_func_t soft, float_check_func_t pre, float_check_func_t post,
+           f32_check_func_t fast, f32_op2_func_t fast_op)
+{
+    float ha, hb;
+
+    if (QEMU_NO_HARDFLOAT) {
+        goto soft;
+    }
+    float32_input_flush2(&a, &b, s);
+    ha = float32_to_float(a);
+    hb = float32_to_float(b);
+    if (likely(pre(ha, hb, s))) {
+        if (fast != NULL && fast(a, b, s)) {
+            return fast_op(a, b, s);
+        } else {
+            float hr = hard(ha, hb);
+            float32 r = float_to_float32(hr);
+
+            if (unlikely(isinf(hr))) {
+                s->float_exception_flags |= float_flag_overflow;
+            } else if (unlikely(fabsf(hr) <= FLT_MIN &&
+                                (post == NULL || post(ha, hb, s)))) {
+                goto soft;
+            }
+            return r;
+        }
+    }
+ soft:
+    return soft(a, b, s);
+}
+
+static inline float64
+f64_gen2(float64 a, float64 b, float_status *s, double_op2_func_t hard,
+         f64_op2_func_t soft, f64_check_func_t pre, f64_check_func_t post,
+         f64_check_func_t fast, f64_op2_func_t fast_op)
+{
+    if (QEMU_NO_HARDFLOAT) {
+        goto soft;
+    }
+    float64_input_flush2(&a, &b, s);
+    if (likely(pre(a, b, s))) {
+        if (fast != NULL && fast(a, b, s)) {
+            return fast_op(a, b, s);
+        } else {
+            double ha = float64_to_double(a);
+            double hb = float64_to_double(b);
+            double hr = hard(ha, hb);
+            float64 r = double_to_float64(hr);
+
+            if (unlikely(QEMU_HARDFLOAT_USE_ISINF ?
+                         isinf(hr) : float64_is_infinity(r))) {
+                s->float_exception_flags |= float_flag_overflow;
+            } else if (unlikely(fabsf(hr) <= FLT_MIN &&
+                                (post == NULL || post(a, b, s)))) {
+                goto soft;
+            }
+            return r;
+        }
+    }
+ soft:
+    return soft(a, b, s);
+}
+
+static inline float64
+double_gen2(float64 a, float64 b, float_status *s, double_op2_func_t hard,
+            f64_op2_func_t soft, double_check_func_t pre,
+            double_check_func_t post, f64_check_func_t fast,
+            f64_op2_func_t fast_op)
+{
+    double ha, hb;
+
+    if (QEMU_NO_HARDFLOAT) {
+        goto soft;
+    }
+    float64_input_flush2(&a, &b, s);
+    ha = float64_to_double(a);
+    hb = float64_to_double(b);
+    if (likely(pre(ha, hb, s))) {
+        if (fast != NULL && fast(a, b, s)) {
+            return fast_op(a, b, s);
+        } else {
+            double hr = hard(ha, hb);
+            float64 r = double_to_float64(hr);
+
+            if (unlikely(isinf(hr))) {
+                s->float_exception_flags |= float_flag_overflow;
+            } else if (unlikely(fabs(hr) <= DBL_MIN &&
+                                (post == NULL || post(ha, hb, s)))) {
+                goto soft;
+            }
+            return r;
+        }
+    }
+ soft:
+    return soft(a, b, s);
+}
+
 /*----------------------------------------------------------------------------
 | Returns the fraction bits of the half-precision floating-point value `a'.
 *----------------------------------------------------------------------------*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 10/15] hardfloat: support float32/64 addition and subtraction
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (8 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 09/15] fpu: introduce hardfloat Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 11/15] hardfloat: support float32/64 multiplication Emilio G. Cota
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Performance results (single and double precision) for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
add-single: 135.07 MFlops
add-double: 131.60 MFlops
sub-single: 130.04 MFlops
sub-double: 133.01 MFlops
- after:
add-single: 443.04 MFlops
add-double: 301.95 MFlops
sub-single: 411.36 MFlops
sub-double: 293.15 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
add-single: 44.79 MFlops
add-double: 49.20 MFlops
sub-single: 44.55 MFlops
sub-double: 49.06 MFlops
- after:
add-single: 93.28 MFlops
add-double: 88.27 MFlops
sub-single: 91.47 MFlops
sub-double: 88.27 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
add-single: 72.59 MFlops
add-double: 72.27 MFlops
sub-single: 75.33 MFlops
sub-double: 70.54 MFlops
- after:
add-single: 112.95 MFlops
add-double: 201.11 MFlops
sub-single: 116.80 MFlops
sub-double: 188.72 MFlops

Note that the IBM and ARM machines benefit from having
HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
can suffer significantly:
- IBM Power8:
add-single: [1] 54.94 vs [0] 116.37 MFlops
add-double: [1] 58.92 vs [0] 201.44 MFlops
- Aarch64 A57:
add-single: [1] 80.72 vs [0] 93.24 MFlops
add-double: [1] 82.10 vs [0] 88.18 MFlops

On the Intel machine, having 2F64 set to 1 pays off, but it
doesn't for 2F32:
- Intel i7-6700K:
add-single: [1] 285.79 vs [0] 426.70 MFlops
add-double: [1] 302.15 vs [0] 278.82 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 98 insertions(+), 8 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 956b938..ca0b8ab 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1080,8 +1080,8 @@ float16  __attribute__((flatten)) float16_add(float16 a, float16 b,
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_add(float32 a, float32 b,
-                                             float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_add(float32 a, float32 b, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1090,8 +1090,8 @@ float32 __attribute__((flatten)) float32_add(float32 a, float32 b,
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_add(float64 a, float64 b,
-                                             float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_add(float64 a, float64 b, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1110,8 +1110,8 @@ float16 __attribute__((flatten)) float16_sub(float16 a, float16 b,
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_sub(float32 a, float32 b,
-                                             float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_sub(float32 a, float32 b, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1120,8 +1120,8 @@ float32 __attribute__((flatten)) float32_sub(float32 a, float32 b,
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_sub(float64 a, float64 b,
-                                             float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_sub(float64 a, float64 b, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1130,6 +1130,96 @@ float64 __attribute__((flatten)) float64_sub(float64 a, float64 b,
     return float64_round_pack_canonical(pr, status);
 }
 
+static float float_add(float a, float b)
+{
+    return a + b;
+}
+
+static float float_sub(float a, float b)
+{
+    return a - b;
+}
+
+static double double_add(double a, double b)
+{
+    return a + b;
+}
+
+static double double_sub(double a, double b)
+{
+    return a - b;
+}
+
+static bool f32_addsub_post(float32 a, float32 b, const struct float_status *s)
+{
+    return !(float32_is_zero(a) && float32_is_zero(b));
+}
+
+static bool
+float_addsub_post(float a, float b, const struct float_status *s)
+{
+    return !(fpclassify(a) == FP_ZERO && fpclassify(b) == FP_ZERO);
+}
+
+static bool f64_addsub_post(float64 a, float64 b, const struct float_status *s)
+{
+    return !(float64_is_zero(a) && float64_is_zero(b));
+}
+
+static bool
+double_addsub_post(double a, double b, const struct float_status *s)
+{
+    return !(fpclassify(a) == FP_ZERO && fpclassify(b) == FP_ZERO);
+}
+
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
+                              float_op2_func_t hard, f32_op2_func_t soft)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        return float_gen2(a, b, s, hard, soft, float_is_zon2, float_addsub_post,
+                          NULL, NULL);
+    } else {
+        return f32_gen2(a, b, s, hard, soft, f32_is_zon2, f32_addsub_post,
+                        NULL, NULL);
+    }
+}
+
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
+                              double_op2_func_t hard, f64_op2_func_t soft)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return double_gen2(a, b, s, hard, soft, double_is_zon2,
+                           double_addsub_post, NULL, NULL);
+    } else {
+        return f64_gen2(a, b, s, hard, soft, f64_is_zon2, f64_addsub_post,
+                        NULL, NULL);
+    }
+}
+
+float32 __attribute__((flatten))
+float32_add(float32 a, float32 b, float_status *s)
+{
+    return float32_addsub(a, b, s, float_add, soft_float32_add);
+}
+
+float32 __attribute__((flatten))
+float32_sub(float32 a, float32 b, float_status *s)
+{
+    return float32_addsub(a, b, s, float_sub, soft_float32_sub);
+}
+
+float64 __attribute__((flatten))
+float64_add(float64 a, float64 b, float_status *s)
+{
+    return float64_addsub(a, b, s, double_add, soft_float64_add);
+}
+
+float64 __attribute__((flatten))
+float64_sub(float64 a, float64 b, float_status *s)
+{
+    return float64_addsub(a, b, s, double_sub, soft_float64_sub);
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b'. The operation is performed according to the IEC/IEEE Standard
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 11/15] hardfloat: support float32/64 multiplication
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (9 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 10/15] hardfloat: support float32/64 addition and subtraction Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 12/15] hardfloat: support float32/64 division Emilio G. Cota
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
mul-single: 126.91 MFlops
mul-double: 118.28 MFlops
- after:
mul-single: 258.02 MFlops
mul-double: 197.96 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
mul-single: 37.42 MFlops
mul-double: 38.77 MFlops
- after:
mul-single: 73.41 MFlops
mul-double: 76.93 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
mul-single: 58.40 MFlops
mul-double: 59.33 MFlops
- after:
mul-single: 60.25 MFlops
mul-double: 94.79 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ca0b8ab..2c68b9d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1281,8 +1281,8 @@ float16 __attribute__((flatten)) float16_mul(float16 a, float16 b,
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_mul(float32 a, float32 b,
-                                             float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_mul(float32 a, float32 b, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1291,8 +1291,8 @@ float32 __attribute__((flatten)) float32_mul(float32 a, float32 b,
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_mul(float64 a, float64 b,
-                                             float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_mul(float64 a, float64 b, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1301,6 +1301,64 @@ float64 __attribute__((flatten)) float64_mul(float64 a, float64 b,
     return float64_round_pack_canonical(pr, status);
 }
 
+static float float_mul(float a, float b)
+{
+    return a * b;
+}
+
+static double double_mul(double a, double b)
+{
+    return a * b;
+}
+
+static bool f32_mul_fast(float32 a, float32 b, const struct float_status *s)
+{
+    return float32_is_zero(a) || float32_is_zero(b);
+}
+
+static bool f64_mul_fast(float64 a, float64 b, const struct float_status *s)
+{
+    return float64_is_zero(a) || float64_is_zero(b);
+}
+
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
+{
+    bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
+
+    return float32_set_sign(float32_zero, signbit);
+}
+
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
+{
+    bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
+
+    return float64_set_sign(float64_zero, signbit);
+}
+
+float32 __attribute__((flatten))
+float32_mul(float32 a, float32 b, float_status *s)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        return float_gen2(a, b, s, float_mul, soft_float32_mul, float_is_zon2,
+                          NULL, f32_mul_fast, f32_mul_fast_op);
+    } else {
+        return f32_gen2(a, b, s, float_mul, soft_float32_mul, f32_is_zon2, NULL,
+                        f32_mul_fast, f32_mul_fast_op);
+    }
+}
+
+float64 __attribute__((flatten))
+float64_mul(float64 a, float64 b, float_status *s)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return double_gen2(a, b, s, double_mul, soft_float64_mul,
+                           double_is_zon2, NULL, f64_mul_fast, f64_mul_fast_op);
+    } else {
+        return f64_gen2(a, b, s, double_mul, soft_float64_mul, f64_is_zon2,
+                        NULL, f64_mul_fast, f64_mul_fast_op);
+    }
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b' then adding 'c', with no intermediate rounding step after the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 12/15] hardfloat: support float32/64 division
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (10 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 11/15] hardfloat: support float32/64 multiplication Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 13/15] hardfloat: support float32/64 fused multiply-add Emilio G. Cota
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
div-single: 34.84 MFlops
div-double: 34.04 MFlops
- after:
div-single: 275.23 MFlops
div-double: 216.38 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
div-single: 9.33 MFlops
div-double: 9.30 MFlops
- after:
div-single: 51.55 MFlops
div-double: 15.09 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
div-single: 25.65 MFlops
div-double: 24.91 MFlops
- after:
div-single: 96.83 MFlops
div-double: 31.01 MFlops

Here setting 2FP64_USE_FP to 1 pays off for x86_64:
[1] 215.97 vs [0] 62.15 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2c68b9d..4323dc2 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1666,7 +1666,8 @@ float16 float16_div(float16 a, float16 b, float_status *status)
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 float32_div(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_div(float32 a, float32 b, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1675,7 +1676,8 @@ float32 float32_div(float32 a, float32 b, float_status *status)
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 float64_div(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_div(float64 a, float64 b, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1684,6 +1686,88 @@ float64 float64_div(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+static float float_div(float a, float b)
+{
+    return a / b;
+}
+
+static double double_div(double a, double b)
+{
+    return a / b;
+}
+
+static bool f32_div_pre(float32 a, float32 b, const struct float_status *s)
+{
+    return likely(float32_is_zero_or_normal(a) &&
+                  float32_is_normal(b) &&
+                  can_use_fpu(s));
+}
+
+static bool f64_div_pre(float64 a, float64 b, const struct float_status *s)
+{
+    return likely(float64_is_zero_or_normal(a) &&
+                  float64_is_normal(b) &&
+                  can_use_fpu(s));
+}
+
+static bool float_div_pre(float a, float b, const struct float_status *s)
+{
+    return likely((fpclassify(a) == FP_NORMAL || fpclassify(a) == FP_ZERO) &&
+                  fpclassify(b) == FP_NORMAL &&
+                  can_use_fpu(s));
+}
+
+static bool double_div_pre(double a, double b, const struct float_status *s)
+{
+    return likely((fpclassify(a) == FP_NORMAL || fpclassify(a) == FP_ZERO) &&
+                  fpclassify(b) == FP_NORMAL &&
+                  can_use_fpu(s));
+}
+
+static bool f32_div_post(float32 a, float32 b, const struct float_status *s)
+{
+    return !float32_is_zero(a);
+}
+
+static bool f64_div_post(float64 a, float64 b, const struct float_status *s)
+{
+    return !float64_is_zero(a);
+}
+
+static bool float_div_post(float a, float b, const struct float_status *s)
+{
+    return fpclassify(a) != FP_ZERO;
+}
+
+static bool double_div_post(double a, double b, const struct float_status *s)
+{
+    return fpclassify(a) != FP_ZERO;
+}
+
+float32 __attribute__((flatten))
+float32_div(float32 a, float32 b, float_status *s)
+{
+    if (QEMU_HARDFLOAT_2F32_USE_FP) {
+        return float_gen2(a, b, s, float_div, soft_float32_div, float_div_pre,
+                          float_div_post, NULL, NULL);
+    } else {
+        return f32_gen2(a, b, s, float_div, soft_float32_div, f32_div_pre,
+                        f32_div_post, NULL, NULL);
+    }
+}
+
+float64 __attribute__((flatten))
+float64_div(float64 a, float64 b, float_status *s)
+{
+    if (QEMU_HARDFLOAT_2F64_USE_FP) {
+        return double_gen2(a, b, s, double_div, soft_float64_div,
+                           double_div_pre, double_div_post, NULL, NULL);
+    } else {
+        return f64_gen2(a, b, s, double_div, soft_float64_div, f64_div_pre,
+                        f64_div_post, NULL, NULL);
+    }
+}
+
 /*
  * Rounds the floating-point value `a' to an integer, and returns the
  * result as a floating-point value. The operation is performed
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 13/15] hardfloat: support float32/64 fused multiply-add
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (11 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 12/15] hardfloat: support float32/64 division Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:16   ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 14/15] hardfloat: support float32/64 square root Emilio G. Cota
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
fma-single: 74.73 MFlops
fma-double: 74.54 MFlops
- after:
fma-single: 203.37 MFlops
fma-double: 169.37 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
fma-single: 23.24 MFlops
fma-double: 23.70 MFlops
- after:
fma-single: 66.14 MFlops
fma-double: 63.10 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
fma-single: 37.26 MFlops
fma-double: 37.29 MFlops
- after:
fma-single: 48.90 MFlops
fma-double: 59.51 MFlops

Here having 3FP64 set to 1 pays off for x86_64:
[1] 170.15 vs [0] 153.12 MFlops

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 165 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4323dc2..ce14c87 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1574,8 +1574,9 @@ float16 __attribute__((flatten)) float16_muladd(float16 a, float16 b, float16 c,
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_muladd(float32 a, float32 b, float32 c,
-                                                int flags, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_muladd(float32 a, float32 b, float32 c, int flags,
+                    float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pb = float32_unpack_canonical(b, status);
@@ -1585,8 +1586,9 @@ float32 __attribute__((flatten)) float32_muladd(float32 a, float32 b, float32 c,
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_muladd(float64 a, float64 b, float64 c,
-                                                int flags, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_muladd(float64 a, float64 b, float64 c, int flags,
+                    float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pb = float64_unpack_canonical(b, status);
@@ -1597,6 +1599,165 @@ float64 __attribute__((flatten)) float64_muladd(float64 a, float64 b, float64 c,
 }
 
 /*
+ * FMA generator for softfloat-based condition checks.
+ *
+ * When (a || b) == 0, there's no need to check for under/over flow,
+ * since we know the addend is (normal || 0) and the product is 0.
+ */
+#define GEN_FMA_SF(name, soft_t, host_t, host_fma_f, host_abs_f, min_normal) \
+    static soft_t                                                       \
+    name(soft_t a, soft_t b, soft_t c, int flags, float_status *s)      \
+    {                                                                   \
+        if (QEMU_NO_HARDFLOAT) {                                        \
+            goto soft;                                                  \
+        }                                                               \
+        soft_t ## _input_flush3(&a, &b, &c, s);                         \
+        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \
+                   (soft_t ## _is_normal(b) || soft_t ## _is_zero(b)) && \
+                   (soft_t ## _is_normal(c) || soft_t ## _is_zero(c)) && \
+                   !(flags & float_muladd_halve_result) &&              \
+                   can_use_fpu(s))) {                                   \
+            if (soft_t ## _is_zero(a) || soft_t ## _is_zero(b)) {       \
+                soft_t p, r;                                            \
+                host_t hp, hc, hr;                                      \
+                bool prod_sign;                                         \
+                                                                        \
+                prod_sign = soft_t ## _is_neg(a) ^ soft_t ## _is_neg(b); \
+                prod_sign ^= !!(flags & float_muladd_negate_product);   \
+                p = soft_t ## _set_sign(soft_t ## _zero, prod_sign);    \
+                                                                        \
+                if (flags & float_muladd_negate_c) {                    \
+                    c = soft_t ## _chs(c);                              \
+                }                                                       \
+                                                                        \
+                hp = soft_t ## _to_ ## host_t(p);                       \
+                hc = soft_t ## _to_ ## host_t(c);                       \
+                hr = hp + hc;                                           \
+                r = host_t ## _to_ ## soft_t(hr);                       \
+                return flags & float_muladd_negate_result ?             \
+                    soft_t ## _chs(r) : r;                              \
+            } else {                                                    \
+                host_t ha, hb, hc, hr;                                  \
+                soft_t r;                                               \
+                soft_t sa = flags & float_muladd_negate_product ?       \
+                    soft_t ## _chs(a) : a;                              \
+                soft_t sc = flags & float_muladd_negate_c ?             \
+                    soft_t ## _chs(c) : c;                              \
+                                                                        \
+                ha = soft_t ## _to_ ## host_t(sa);                      \
+                hb = soft_t ## _to_ ## host_t(b);                       \
+                hc = soft_t ## _to_ ## host_t(sc);                      \
+                hr = host_fma_f(ha, hb, hc);                            \
+                r = host_t ## _to_ ## soft_t(hr);                       \
+                                                                        \
+                if (unlikely(isinf(hr))) {                              \
+                    s->float_exception_flags |= float_flag_overflow;    \
+                } else if (unlikely(host_abs_f(hr) <= min_normal)) {    \
+                    goto soft;                                          \
+                }                                                       \
+                return flags & float_muladd_negate_result ?             \
+                    soft_t ## _chs(r) : r;                              \
+            }                                                           \
+        }                                                               \
+    soft:                                                               \
+        return soft_ ## soft_t ## _muladd(a, b, c, flags, s);           \
+    }
+
+/* FMA generator for native floating point condition checks */
+#define GEN_FMA_FP(name, soft_t, host_t, host_fma_f, host_abs_f, min_normal) \
+    static soft_t \
+    name(soft_t a, soft_t b, soft_t c, int flags, float_status *s)      \
+    {                                                                   \
+        host_t ha, hb, hc;                                              \
+                                                                        \
+        if (QEMU_NO_HARDFLOAT) {                                        \
+            goto soft;                                                  \
+        }                                                               \
+        soft_t ## _input_flush3(&a, &b, &c, s);                         \
+        ha = soft_t ## _to_ ## host_t(a);                               \
+        hb = soft_t ## _to_ ## host_t(b);                               \
+        hc = soft_t ## _to_ ## host_t(c);                               \
+        if (likely((fpclassify(ha) == FP_NORMAL ||                      \
+                    fpclassify(ha) == FP_ZERO) &&                       \
+                   (fpclassify(hb) == FP_NORMAL ||                      \
+                    fpclassify(hb) == FP_ZERO) &&                       \
+                   (fpclassify(hc) == FP_NORMAL ||                      \
+                    fpclassify(hc) == FP_ZERO) &&                       \
+                   !(flags & float_muladd_halve_result) &&              \
+                   can_use_fpu(s))) {                                   \
+            if (soft_t ## _is_zero(a) || soft_t ## _is_zero(b)) {       \
+                soft_t p, r;                                            \
+                host_t hp, hc, hr;                                      \
+                bool prod_sign;                                         \
+                                                                        \
+                prod_sign = soft_t ## _is_neg(a) ^ soft_t ## _is_neg(b); \
+                prod_sign ^= !!(flags & float_muladd_negate_product);   \
+                p = soft_t ## _set_sign(soft_t ## _zero, prod_sign);    \
+                                                                        \
+                if (flags & float_muladd_negate_c) {                    \
+                    c = soft_t ## _chs(c);                              \
+                }                                                       \
+                                                                        \
+                hp = soft_t ## _to_ ## host_t(p);                       \
+                hc = soft_t ## _to_ ## host_t(c);                       \
+                hr = hp + hc;                                           \
+                r = host_t ## _to_ ## soft_t(hr);                       \
+                return flags & float_muladd_negate_result ?             \
+                    soft_t ## _chs(r) : r;                              \
+            } else {                                                    \
+                host_t hr;                                              \
+                                                                        \
+                if (flags & float_muladd_negate_product) {              \
+                    ha = -ha;                                           \
+                }                                                       \
+                if (flags & float_muladd_negate_c) {                    \
+                    hc = -hc;                                           \
+                }                                                       \
+                hr = host_fma_f(ha, hb, hc);                            \
+                if (unlikely(isinf(hr))) {                              \
+                    s->float_exception_flags |= float_flag_overflow;    \
+                } else if (unlikely(host_abs_f(hr) <= min_normal)) {    \
+                    goto soft;                                          \
+                }                                                       \
+                if (flags & float_muladd_negate_result) {               \
+                    hr = -hr;                                           \
+                }                                                       \
+                return host_t ## _to_ ## soft_t(hr);                    \
+            }                                                           \
+        }                                                               \
+    soft:                                                               \
+        return soft_ ## soft_t ## _muladd(a, b, c, flags, s);           \
+    }
+
+GEN_FMA_SF(f32_muladd, float32, float, fmaf, fabsf, FLT_MIN)
+GEN_FMA_SF(f64_muladd, float64, double, fma, fabs, DBL_MIN)
+#undef GEN_FMA_SF
+
+GEN_FMA_FP(float_muladd, float32, float, fmaf, fabsf, FLT_MIN)
+GEN_FMA_FP(double_muladd, float64, double, fma, fabs, DBL_MIN)
+#undef GEN_FMA_FP
+
+float32 __attribute__((flatten))
+float32_muladd(float32 a, float32 b, float32 c, int flags, float_status *s)
+{
+    if (QEMU_HARDFLOAT_3F32_USE_FP) {
+        return float_muladd(a, b, c, flags, s);
+    } else {
+        return f32_muladd(a, b, c, flags, s);
+    }
+}
+
+float64 __attribute__((flatten))
+float64_muladd(float64 a, float64 b, float64 c, int flags, float_status *s)
+{
+    if (QEMU_HARDFLOAT_3F64_USE_FP) {
+        return double_muladd(a, b, c, flags, s);
+    } else {
+        return f64_muladd(a, b, c, flags, s);
+    }
+}
+
+/*
  * Returns the result of dividing the floating-point value `a' by the
  * corresponding value `b'. The operation is performed according to
  * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 14/15] hardfloat: support float32/64 square root
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (12 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 13/15] hardfloat: support float32/64 fused multiply-add Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:17   ` Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 15/15] hardfloat: support float32/64 comparison Emilio G. Cota
  2018-04-04 23:31 ` [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat no-reply
  15 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 43.27 MFlops
sqrt-double: 24.81 MFlops
- after:
sqrt-single: 297.94 MFlops
sqrt-double: 210.46 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
sqrt-single: 12.41 MFlops
sqrt-double: 6.22 MFlops
- after:
sqrt-single: 55.58 MFlops
sqrt-double: 40.63 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
sqrt-single: 17.01 MFlops
sqrt-double: 9.61 MFlops
- after:
sqrt-single: 104.17 MFlops
sqrt-double: 133.32 MFlops

Here none of the machines got faster from enabling USE_FP. For
instance, on x86_64 sqrt is 23% slower for single precision,
with it enabled, and 17% slower for double precision.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ce14c87..5434d29 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2717,20 +2717,89 @@ float16 __attribute__((flatten)) float16_sqrt(float16 a, float_status *status)
     return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_sqrt(float32 a, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_sqrt(float32 a, float_status *status)
 {
     FloatParts pa = float32_unpack_canonical(a, status);
     FloatParts pr = sqrt_float(pa, status, &float32_params);
     return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_sqrt(float64 a, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_sqrt(float64 a, float_status *status)
 {
     FloatParts pa = float64_unpack_canonical(a, status);
     FloatParts pr = sqrt_float(pa, status, &float64_params);
     return float64_round_pack_canonical(pr, status);
 }
 
+#define GEN_SQRT_SF(name, soft_t, host_t, host_sqrt_func)               \
+    static soft_t name(soft_t a, float_status *s)                       \
+    {                                                                   \
+        if (QEMU_NO_HARDFLOAT) {                                        \
+            goto soft;                                                  \
+        }                                                               \
+        soft_t ## _input_flush1(&a, s);                                 \
+        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \
+                   !soft_t ## _is_neg(a) &&                             \
+                   can_use_fpu(s))) {                                   \
+            host_t ha = soft_t ## _to_ ## host_t(a);                    \
+            host_t hr = host_sqrt_func(ha);                             \
+                                                                        \
+            return host_t ## _to_ ## soft_t(hr);                        \
+        }                                                               \
+    soft:                                                               \
+        return soft_ ## soft_t ## _sqrt(a, s);                          \
+    }
+
+#define GEN_SQRT_FP(name, soft_t, host_t, host_sqrt_func)               \
+    static soft_t name(soft_t a, float_status *s)                       \
+    {                                                                   \
+        host_t ha;                                                      \
+                                                                        \
+        if (QEMU_NO_HARDFLOAT) {                                        \
+            goto soft;                                                  \
+        }                                                               \
+        soft_t ## _input_flush1(&a, s);                                 \
+        ha = soft_t ## _to_ ## host_t(a);                               \
+        if (likely((fpclassify(ha) == FP_NORMAL ||                      \
+                    fpclassify(ha) == FP_ZERO) &&                       \
+                   !signbit(ha) &&                                      \
+                   can_use_fpu(s))) {                                   \
+            host_t hr = host_sqrt_func(ha);                             \
+                                                                        \
+            return host_t ## _to_ ## soft_t(hr);                        \
+        }                                                               \
+    soft:                                                               \
+        return soft_ ## soft_t ## _sqrt(a, s);                          \
+    }
+
+GEN_SQRT_SF(f32_sqrt, float32, float, sqrtf)
+GEN_SQRT_SF(f64_sqrt, float64, double, sqrt)
+#undef GEN_SQRT_SF
+
+GEN_SQRT_FP(float_sqrt, float32, float, sqrtf)
+GEN_SQRT_FP(double_sqrt, float64, double, sqrt)
+#undef GEN_SQRT_FP
+
+float32 __attribute__((flatten)) float32_sqrt(float32 a, float_status *s)
+{
+    if (QEMU_HARDFLOAT_1F32_USE_FP) {
+        return float_sqrt(a, s);
+    } else {
+        return f32_sqrt(a, s);
+    }
+}
+
+float64 __attribute__((flatten)) float64_sqrt(float64 a, float_status *s)
+{
+    if (QEMU_HARDFLOAT_1F64_USE_FP) {
+        return double_sqrt(a, s);
+    } else {
+        return f64_sqrt(a, s);
+    }
+}
+
 
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v3 15/15] hardfloat: support float32/64 comparison
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (13 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 14/15] hardfloat: support float32/64 square root Emilio G. Cota
@ 2018-04-04 23:11 ` Emilio G. Cota
  2018-04-04 23:31 ` [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat no-reply
  15 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
cmp-single: 113.01 MFlops
cmp-double: 115.54 MFlops
- after:
cmp-single: 527.83 MFlops
cmp-double: 457.21 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
cmp-single: 39.32 MFlops
cmp-double: 39.80 MFlops
- after:
cmp-single: 162.74 MFlops
cmp-double: 167.08 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
cmp-single: 60.81 MFlops
cmp-double: 62.76 MFlops
- after:
cmp-single: 235.39 MFlops
cmp-double: 283.44 MFlops

Here using float{32,64}_is_any_nan is faster than using isnan
for all machines. On x86_64 the perf difference is just a few
percentage points, but on aarch64 we go from 117/119 to
164/169 MFlops for single/double precision, respectively.

Aggregate performance improvement for the last few patches:
[ all charts in png: https://imgur.com/a/4yV8p ]

1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

                   qemu-aarch64 NBench score; higher is better
                 Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

  16 +-+-----------+-------------+----===-------+---===-------+-----------+-+
  14 +-+..........................@@@&&.=.......@@@&&.=...................+-+
  12 +-+..........................@.@.&.=.......@.@.&.=.....+befor===     +-+
  10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& =     +-+
   8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+  @@u& =     +-+
   6 +-+............@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& =     +-+
   4 +-+.......###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& =     +-+
   2 +-+.....***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& =     +-+
   0 +-+-----***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-----+-+
            FOURIER    NEURAL NELU DECOMPOSITION         gmean

                              qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905
                                      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                            error bars: 95% confidence interval

  4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
    4 +-+..........................+@@+...........................................................................+-+
  3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub       +-+
  2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@&      +-+
    2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@&  %%@&+-+
  1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
  0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
    0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
  410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean

2. Host: ARM Aarch64 A57 @ 2.4GHz

                    qemu-aarch64 NBench score; higher is better
                 Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz

    5 +-+-----------+-------------+-------------+-------------+-----------+-+
  4.5 +-+........................................@@@&==...................+-+
  3 4 +-+..........................@@@&==........@.@&.=.....+before       +-+
    3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&==     +-+
  2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+  @m@& =     +-+
    2 +-+............@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& =     +-+
  1.5 +-+.....***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& =     +-+
  0.5 +-+.....*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& =     +-+
    0 +-+-----***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-----+-+
             FOURIER    NEURAL NLU DECOMPOSITION         gmean

Note that by not inlining the soft-fp primitives we end up
with a smaller softfloat.o--in particular, see the difference
for the softfloat.o built for fp-bench:

- before this series:
   text    data     bss     dec     hex filename
 103235       0       0  103235   19343 softfloat.o
- after:
   text    data     bss     dec     hex filename
  93369       0       0   93369   16cb9 softfloat.o

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 fpu/softfloat.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 14 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 5434d29..459dd87 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2581,28 +2581,74 @@ static int compare_floats(FloatParts a, FloatParts b, bool is_quiet,
     }
 }
 
-#define COMPARE(sz)                                                     \
-int float ## sz ## _compare(float ## sz a, float ## sz b,               \
-                            float_status *s)                            \
+#define COMPARE(name, attr, sz)                                         \
+static int attr                                                         \
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
 {                                                                       \
     FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
     FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
-    return compare_floats(pa, pb, false, s);                            \
-}                                                                       \
-int float ## sz ## _compare_quiet(float ## sz a, float ## sz b,         \
-                                  float_status *s)                      \
-{                                                                       \
-    FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
-    FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
-    return compare_floats(pa, pb, true, s);                             \
+    return compare_floats(pa, pb, is_quiet, s);                         \
 }
 
-COMPARE(16)
-COMPARE(32)
-COMPARE(64)
+COMPARE(soft_float16_compare, , 16)
+COMPARE(soft_float32_compare, QEMU_SOFTFLOAT_ATTR, 32)
+COMPARE(soft_float64_compare, QEMU_SOFTFLOAT_ATTR, 64)
 
 #undef COMPARE
 
+int __attribute__((flatten))
+float16_compare(float16 a, float16 b, float_status *s)
+{
+    return soft_float16_compare(a, b, false, s);
+}
+
+int __attribute__((flatten))
+float16_compare_quiet(float16 a, float16 b, float_status *s)
+{
+    return soft_float16_compare(a, b, true, s);
+}
+
+#define GEN_FPU_COMPARE(name, quiet_name, soft_t, host_t)               \
+    static int                                                          \
+    fpu_ ## name(soft_t a, soft_t b, bool is_quiet, float_status *s)    \
+    {                                                                   \
+        host_t ha, hb;                                                  \
+                                                                        \
+        if (QEMU_NO_HARDFLOAT) {                                        \
+            return soft_ ## name(a, b, is_quiet, s);                    \
+        }                                                               \
+        soft_t ## _input_flush2(&a, &b, s);                             \
+        ha = soft_t ## _to_ ## host_t(a);                               \
+        hb = soft_t ## _to_ ## host_t(b);                               \
+        if (unlikely(soft_t ## _is_any_nan(a) ||                        \
+                     soft_t ## _is_any_nan(b))) {                       \
+            return soft_ ## name(a, b, is_quiet, s);                    \
+        }                                                               \
+        if (isgreater(ha, hb)) {                                        \
+            return float_relation_greater;                              \
+        }                                                               \
+        if (isless(ha, hb)) {                                           \
+            return float_relation_less;                                 \
+        }                                                               \
+        return float_relation_equal;                                    \
+    }                                                                   \
+                                                                        \
+    int __attribute__((flatten))                                        \
+    name(soft_t a, soft_t b, float_status *s)                           \
+    {                                                                   \
+        return fpu_ ## name(a, b, false, s);                            \
+    }                                                                   \
+                                                                        \
+    int __attribute__((flatten))                                        \
+    quiet_name(soft_t a, soft_t b, float_status *s)                     \
+    {                                                                   \
+        return fpu_ ## name(a, b, true, s);                             \
+    }
+
+GEN_FPU_COMPARE(float32_compare, float32_compare_quiet, float32, float)
+GEN_FPU_COMPARE(float64_compare, float64_compare_quiet, float64, double)
+#undef GEN_FPU_COMPARE
+
 /* Multiply A by 2 raised to the power N.  */
 static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/15] hardfloat: support float32/64 fused multiply-add
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 13/15] hardfloat: support float32/64 fused multiply-add Emilio G. Cota
@ 2018-04-04 23:16   ` Emilio G. Cota
  0 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

On Wed, Apr 04, 2018 at 19:11:13 -0400, Emilio G. Cota wrote:
(snip)
> +        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \
> +                   (soft_t ## _is_normal(b) || soft_t ## _is_zero(b)) && \
> +                   (soft_t ## _is_normal(c) || soft_t ## _is_zero(c)) && \

This is outdated wrt to the v3 tree on github. Changed there to:


-        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \
-                   (soft_t ## _is_normal(b) || soft_t ## _is_zero(b)) && \
-                   (soft_t ## _is_normal(c) || soft_t ## _is_zero(c)) && \
+        if (likely(soft_t ## _is_zero_or_normal(a) &&                   \
+                   soft_t ## _is_zero_or_normal(b) &&                   \
+                   soft_t ## _is_zero_or_normal(c) &&                   \

		E.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/15] hardfloat: support float32/64 square root
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 14/15] hardfloat: support float32/64 square root Emilio G. Cota
@ 2018-04-04 23:17   ` Emilio G. Cota
  0 siblings, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

On Wed, Apr 04, 2018 at 19:11:14 -0400, Emilio G. Cota wrote:
(snip)
> +        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \

Updated on the github tree to:
-        if (likely((soft_t ## _is_normal(a) || soft_t ## _is_zero(a)) && \
+        if (likely(soft_t ## _is_zero_or_normal(a) &&                   \

All the other patches in this series are identical to the v3 on github.
Apologies for the last-minute mismatch on my end.

		Emilio

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat
  2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
                   ` (14 preceding siblings ...)
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 15/15] hardfloat: support float32/64 comparison Emilio G. Cota
@ 2018-04-04 23:31 ` no-reply
  15 siblings, 0 replies; 25+ messages in thread
From: no-reply @ 2018-04-04 23:31 UTC (permalink / raw)
  To: cota
  Cc: famz, qemu-devel, peter.maydell, mark.cave-ayland,
	richard.henderson, laurent, kbastian, pbonzini, alex.bennee,
	aurelien

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1522883475-27858-1-git-send-email-cota@braap.org
Subject: [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]            patchew/1522869306-17292-1-git-send-email-thuth@redhat.com -> patchew/1522869306-17292-1-git-send-email-thuth@redhat.com
 t [tag update]            patchew/1522873850-28733-1-git-send-email-eric.auger@redhat.com -> patchew/1522873850-28733-1-git-send-email-eric.auger@redhat.com
 * [new tag]               patchew/1522883475-27858-1-git-send-email-cota@braap.org -> patchew/1522883475-27858-1-git-send-email-cota@braap.org
Switched to a new branch 'test'
0160699351 hardfloat: support float32/64 comparison
6d049c226a hardfloat: support float32/64 square root
f3fe3550b9 hardfloat: support float32/64 fused multiply-add
46eb7c82ae hardfloat: support float32/64 division
ef7ad86525 hardfloat: support float32/64 multiplication
49d8c51e8d hardfloat: support float32/64 addition and subtraction
4c75ec0397 fpu: introduce hardfloat
a01d10e08c softfloat: add float{32, 64}_is_zero_or_normal
89e7dbd313 softfloat: rename canonicalize to sf_canonicalize
9ac9b3dbc8 tests/fp: add fp-bench, a collection of simple floating point microbenchmarks
d7ced9b1de target/tricore: use float32_is_denormal
17d8cfd693 softfloat: add float{32, 64}_is_{de, }normal
606b2bd962 fp-test: add muladd variants
0a1379a091 softfloat: fix {min, max}nummag for same-abs-value inputs
104af15d75 tests: add fp-test, a floating point test suite

=== OUTPUT BEGIN ===
Checking PATCH 1/15: tests: add fp-test, a floating point test suite...
ERROR: Macros with complex values should be enclosed in parenthesis
#380: FILE: tests/fp/fp-test.c:220:
+#define PR_EXCEPTIONS(x)                                \
+        ((x) & STANDARD_EXCEPTIONS ? "" : "none"),      \
+        (((x) & float_flag_inexact)   ? "x" : ""),      \
+        (((x) & float_flag_underflow) ? "u" : ""),      \
+        (((x) & float_flag_overflow)  ? "o" : ""),      \
+        (((x) & float_flag_divbyzero) ? "z" : ""),      \
+        (((x) & float_flag_invalid)   ? "i" : "")

ERROR: consider using qemu_strtoul in preference to strtoul
#841: FILE: tests/fp/fp-test.c:681:
+            significand = strtoul(&p[3], &pos, 16);

ERROR: consider using qemu_strtol in preference to strtol
#846: FILE: tests/fp/fp-test.c:686:
+            exponent = strtol(pos, &pos, 10) + 127;

ERROR: consider using qemu_strtoul in preference to strtoul
#871: FILE: tests/fp/fp-test.c:711:
+            significand = strtoul(&p[3], &pos, 16);

ERROR: consider using qemu_strtol in preference to strtol
#876: FILE: tests/fp/fp-test.c:716:
+            exponent = strtol(pos, &pos, 10) + 1023;

ERROR: consider using qemu_strtof in preference to strtof
#895: FILE: tests/fp/fp-test.c:735:
+            float f = strtof(p, &pos);

total: 6 errors, 0 warnings, 1219 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/15: softfloat: fix {min, max}nummag for same-abs-value inputs...
Checking PATCH 3/15: fp-test: add muladd variants...
Checking PATCH 4/15: softfloat: add float{32, 64}_is_{de, }normal...
Checking PATCH 5/15: target/tricore: use float32_is_denormal...
Checking PATCH 6/15: tests/fp: add fp-bench, a collection of simple floating point microbenchmarks...
ERROR: braces {} are necessary for all arms of this statement
#183: FILE: tests/fp/fp-bench.c:133:
+            } while (!float32_is_normal(r));
[...]

ERROR: braces {} are necessary for all arms of this statement
#187: FILE: tests/fp/fp-bench.c:137:
+            } while (!float64_is_normal(r));
[...]

total: 2 errors, 0 warnings, 547 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 7/15: softfloat: rename canonicalize to sf_canonicalize...
Checking PATCH 8/15: softfloat: add float{32, 64}_is_zero_or_normal...
Checking PATCH 9/15: fpu: introduce hardfloat...
ERROR: spaces required around that '*' (ctx:WxV)
#96: FILE: fpu/softfloat.c:154:
+    static inline void name(soft_t *a, float_status *s)                 \
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#110: FILE: fpu/softfloat.c:168:
+    static inline void name(soft_t *a, float_status *s) \
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#123: FILE: fpu/softfloat.c:181:
+    static inline void name(soft_t *a, soft_t *b, float_status *s)      \
                                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#137: FILE: fpu/softfloat.c:195:
+    static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
                                                                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#151: FILE: fpu/softfloat.c:209:
+static inline bool can_use_fpu(const float_status *s)
                                                   ^

WARNING: architecture specific defines should be avoided
#162: FILE: fpu/softfloat.c:220:
+#if defined(__x86_64__)

WARNING: architecture specific defines should be avoided
#184: FILE: fpu/softfloat.c:242:
+#if defined(__x86_64__) || defined(__aarch64__)

total: 5 errors, 2 warnings, 354 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 10/15: hardfloat: support float32/64 addition and subtraction...
ERROR: spaces required around that '*' (ctx:WxV)
#74: FILE: fpu/softfloat.c:1084:
+soft_float32_add(float32 a, float32 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#85: FILE: fpu/softfloat.c:1094:
+soft_float64_add(float64 a, float64 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#96: FILE: fpu/softfloat.c:1114:
+soft_float32_sub(float32 a, float32 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#107: FILE: fpu/softfloat.c:1124:
+soft_float64_sub(float64 a, float64 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#157: FILE: fpu/softfloat.c:1175:
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
                                                                  ^

ERROR: spaces required around that '*' (ctx:WxV)
#169: FILE: fpu/softfloat.c:1187:
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
                                                                  ^

ERROR: spaces required around that '*' (ctx:WxV)
#182: FILE: fpu/softfloat.c:1200:
+float32_add(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#188: FILE: fpu/softfloat.c:1206:
+float32_sub(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#194: FILE: fpu/softfloat.c:1212:
+float64_add(float64 a, float64 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#200: FILE: fpu/softfloat.c:1218:
+float64_sub(float64 a, float64 b, float_status *s)
                                                ^

total: 10 errors, 0 warnings, 136 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 11/15: hardfloat: support float32/64 multiplication...
ERROR: spaces required around that '*' (ctx:WxV)
#46: FILE: fpu/softfloat.c:1285:
+soft_float32_mul(float32 a, float32 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#57: FILE: fpu/softfloat.c:1295:
+soft_float64_mul(float64 a, float64 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#85: FILE: fpu/softfloat.c:1324:
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
                                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#92: FILE: fpu/softfloat.c:1331:
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
                                                                   ^

ERROR: spaces required around that '*' (ctx:WxV)
#100: FILE: fpu/softfloat.c:1339:
+float32_mul(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#112: FILE: fpu/softfloat.c:1351:
+float64_mul(float64 a, float64 b, float_status *s)
                                                ^

total: 6 errors, 0 warnings, 84 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 12/15: hardfloat: support float32/64 division...
ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: fpu/softfloat.c:1670:
+soft_float32_div(float32 a, float32 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#58: FILE: fpu/softfloat.c:1680:
+soft_float64_div(float64 a, float64 b, float_status *status)
                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#125: FILE: fpu/softfloat.c:1748:
+float32_div(float32 a, float32 b, float_status *s)
                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#137: FILE: fpu/softfloat.c:1760:
+float64_div(float64 a, float64 b, float_status *s)
                                                ^

total: 4 errors, 0 warnings, 106 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 13/15: hardfloat: support float32/64 fused multiply-add...
ERROR: spaces required around that '*' (ctx:WxV)
#50: FILE: fpu/softfloat.c:1579:
+                    float_status *status)
                                  ^

ERROR: spaces required around that '*' (ctx:WxV)
#62: FILE: fpu/softfloat.c:1591:
+                    float_status *status)
                                  ^

ERROR: spaces required around that '*' (ctx:WxV)
#77: FILE: fpu/softfloat.c:1609:
+    name(soft_t a, soft_t b, soft_t c, int flags, float_status *s)      \
                                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#137: FILE: fpu/softfloat.c:1669:
+    name(soft_t a, soft_t b, soft_t c, int flags, float_status *s)      \
                                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#209: FILE: fpu/softfloat.c:1741:
+float32_muladd(float32 a, float32 b, float32 c, int flags, float_status *s)
                                                                         ^

ERROR: spaces required around that '*' (ctx:WxV)
#219: FILE: fpu/softfloat.c:1751:
+float64_muladd(float64 a, float64 b, float64 c, int flags, float_status *s)
                                                                         ^

total: 6 errors, 0 warnings, 187 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 14/15: hardfloat: support float32/64 square root...
ERROR: spaces required around that '*' (ctx:WxV)
#49: FILE: fpu/softfloat.c:2721:
+soft_float32_sqrt(float32 a, float_status *status)
                                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#58: FILE: fpu/softfloat.c:2729:
+soft_float64_sqrt(float64 a, float_status *status)
                                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#66: FILE: fpu/softfloat.c:2737:
+    static soft_t name(soft_t a, float_status *s)                       \
                                               ^

ERROR: spaces required around that '*' (ctx:WxV)
#85: FILE: fpu/softfloat.c:2756:
+    static soft_t name(soft_t a, float_status *s)                       \
                                               ^

ERROR: spaces required around that '*' (ctx:WxV)
#114: FILE: fpu/softfloat.c:2785:
+float32 __attribute__((flatten)) float32_sqrt(float32 a, float_status *s)
                                                                       ^

ERROR: spaces required around that '*' (ctx:WxV)
#123: FILE: fpu/softfloat.c:2794:
+float64 __attribute__((flatten)) float64_sqrt(float64 a, float_status *s)
                                                                       ^

total: 6 errors, 0 warnings, 91 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 15/15: hardfloat: support float32/64 comparison...
ERROR: spaces required around that '*' (ctx:WxV)
#113: FILE: fpu/softfloat.c:2586:
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
                                                                ^

ERROR: spaces required around that '*' (ctx:WxV)
#138: FILE: fpu/softfloat.c:2600:
+float16_compare(float16 a, float16 b, float_status *s)
                                                    ^

total: 2 errors, 0 warnings, 88 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/15] softfloat: add float{32, 64}_is_{de, }normal
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 04/15] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
@ 2018-04-06 12:01   ` Bastian Koppelmann
  0 siblings, 0 replies; 25+ messages in thread
From: Bastian Koppelmann @ 2018-04-06 12:01 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Peter Maydell, Mark Cave-Ayland, Richard Henderson,
	Laurent Vivier, Paolo Bonzini, Alex Bennée, Aurelien Jarno

On 04/05/2018 01:11 AM, Emilio G. Cota wrote:
> This paves the way for upcoming work.
> 
> Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  include/fpu/softfloat.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Cheers,
Bastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/15] target/tricore: use float32_is_denormal
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 05/15] target/tricore: use float32_is_denormal Emilio G. Cota
@ 2018-04-06 12:01   ` Bastian Koppelmann
  0 siblings, 0 replies; 25+ messages in thread
From: Bastian Koppelmann @ 2018-04-06 12:01 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Peter Maydell, Mark Cave-Ayland, Richard Henderson,
	Laurent Vivier, Paolo Bonzini, Alex Bennée, Aurelien Jarno

On 04/05/2018 01:11 AM, Emilio G. Cota wrote:
> Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  target/tricore/fpu_helper.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Cheers,
Bastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/15] softfloat: rename canonicalize to sf_canonicalize
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 07/15] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
@ 2018-04-06 12:02   ` Bastian Koppelmann
  0 siblings, 0 replies; 25+ messages in thread
From: Bastian Koppelmann @ 2018-04-06 12:02 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Peter Maydell, Mark Cave-Ayland, Richard Henderson,
	Laurent Vivier, Paolo Bonzini, Alex Bennée, Aurelien Jarno

On 04/05/2018 01:11 AM, Emilio G. Cota wrote:
> glibc >= 2.25 defines canonicalize in commit eaf5ad0
> (Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
> 
> Given that we'll be including <math.h> soon, prepare
> for this by prefixing our canonicalize() with sf_ to avoid
> clashing with the libc's canonicalize().
> 
> Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---

This fixes the problem.
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Cheers,
Bastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite Emilio G. Cota
@ 2018-04-11  1:20   ` Alex Bennée
  2018-04-11  1:39     ` Alex Bennée
  2018-04-11 21:36     ` Emilio G. Cota
  0 siblings, 2 replies; 25+ messages in thread
From: Alex Bennée @ 2018-04-11  1:20 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Aurelien Jarno, Peter Maydell, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland


Emilio G. Cota <cota@braap.org> writes:

> This will allow us to run correctness tests against our
> FP implementation. The test can be run in two modes (called
> "testers"): host and soft. With the former we check the results
> and FP flags on the host machine against the model.
> With the latter we check QEMU's fpu primitives against the
> model. Note that in soft mode we are not instantiating any
> particular CPU (hence the HW_POISON_H hack to avoid macro poisoning);
> for that we need to run the test in host mode under QEMU.
<snip>

So with the attached patch and my proposed cross build we can now get:

02:15:54 [alex@zen:~/l/q/qemu.git] softfloat-fixes-for-2.12-v1 ± find . -iname "fp-test" | xargs file
./ppc64-linux-user/tests/fp-test:      ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, not stripped
./armeb-linux-user/tests/fp-test:      ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, not stripped
./aarch64-linux-user/tests/fp-test:    ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, for GNU/Linux 3.7.0, not stripped
./i386-linux-user/tests/fp-test:       ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, not stripped
./arm-linux-user/tests/fp-test:        ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, not stripped
./s390x-linux-user/tests/fp-test:      ELF 64-bit MSB executable, IBM S/390, version 1 (SYSV), statically linked, for GNU/Linux 3.2.0, not stripped
./aarch64_be-linux-user/tests/fp-test: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, for GNU/Linux 3.7.0, not stripped
./tests/fp-test/fp-test:               ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped

But it did mean having to hack about a little, mainly to get rid of
glib.

--8<---------------cut here---------------start------------->8---
>From 04ed0d9f58f34aa51b9a8284514aab6e36a702b4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>
Date: Wed, 11 Apr 2018 01:35:52 +0100
Subject: [PATCH] tests/tcg: add fp-test to per-guest tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The fp-test code was originally designed to be able to include
softfloat. However cross-compiling QEMU based code is harder than it
needs to be so hide softfloat stuff behind USE_SOFTFLOAT. We also need
to tweak:

  - manually include what we need
  - g_assert -> assert()
  - use libc hsearch instead of g_hash_table

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/fp/fp-test.c | 148 ++++++++++++++++++++++++++++++++++++++++++-----------
 tests/tcg/Makefile |   3 ++
 2 files changed, 121 insertions(+), 30 deletions(-)

diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 27637c4617..4cee2a918c 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -6,12 +6,72 @@
  * License: GNU GPL, version 2 or later.
  *   See the COPYING file in the top-level directory.
  */
-#ifndef HW_POISON_H
-#error Must define HW_POISON_H to work around TARGET_* poisoning
-#endif
+
+/* If HW_POISON_H isn't defined then we aren't building against qemu's
+ * softfloat */
+#ifdef HW_POISON_H

 #include "qemu/osdep.h"
 #include "fpu/softfloat.h"
+#define USE_SOFTFLOAT 1
+
+#else /* else define what QEMU would have given us */
+
+#define _GNU_SOURCE
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdbool.h>
+#include <string.h>
+#include <ctype.h>
+#include <unistd.h>
+#include <assert.h>
+#include <errno.h>
+
+#include <search.h>
+
+/* See include/fpu/softfloat-types.h */
+enum {
+    float_tininess_after_rounding  = 0,
+    float_tininess_before_rounding = 1
+};
+
+enum {
+    float_round_nearest_even = 0,
+    float_round_down         = 1,
+    float_round_up           = 2,
+    float_round_to_zero      = 3,
+    float_round_ties_away    = 4,
+    float_round_to_odd       = 5,
+};
+
+enum {
+    float_flag_invalid   =  1,
+    float_flag_divbyzero =  4,
+    float_flag_overflow  =  8,
+    float_flag_underflow = 16,
+    float_flag_inexact   = 32,
+    float_flag_input_denormal = 64,
+    float_flag_output_denormal = 128
+};
+
+/* See include/compiler.h */
+#ifndef likely
+#if __GNUC__ < 3
+#define __builtin_expect(x, n) (x)
+#endif
+
+#define likely(x)   __builtin_expect(!!(x), 1)
+#define unlikely(x)   __builtin_expect(!!(x), 0)
+#endif
+
+#endif
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif

 #include <fenv.h>
 #include <math.h>
@@ -116,16 +176,18 @@ struct tester {
 struct whitelist {
     char **lines;
     size_t n;
-    GHashTable *ht;
+    struct hsearch_data ht;
 };

 static uint64_t test_stats[ERROR_MAX];
 static struct whitelist whitelist;
 static uint8_t default_exceptions;
 static bool die_on_error = true;
+#ifdef USE_SOFTFLOAT
 static struct float_status soft_status = {
     .float_detect_tininess = float_tininess_before_rounding,
 };
+#endif

 static inline float u64_to_float(uint64_t v)
 {
@@ -285,7 +347,7 @@ static enum error host_tester(struct test_op *t)
         float res;
         int i;

-        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
         for (i = 0; i < ops[t->op].n_operands; i++) {
             /* use the host's QNaN/SNaN patterns */
             if (t->operands[i].type == OP_TYPE_QNAN) {
@@ -343,7 +405,7 @@ static enum error host_tester(struct test_op *t)
         double res;
         int i;

-        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
         for (i = 0; i < ops[t->op].n_operands; i++) {
             /* use the host's QNaN/SNaN patterns */
             if (t->operands[i].type == OP_TYPE_QNAN) {
@@ -429,6 +491,8 @@ static enum error host_tester(struct test_op *t)
     return tester_check(t, res64, result_is_nan, flags);
 }

+#ifdef USE_SOFTFLOAT
+
 static enum error soft_tester(struct test_op *t)
 {
     float_status *s = &soft_status;
@@ -445,7 +509,7 @@ static enum error soft_tester(struct test_op *t)
         float32 res;
         int i;

-        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
         for (i = 0; i < ops[t->op].n_operands; i++) {
             *in[i] = t->operands[i].val;
         }
@@ -504,7 +568,7 @@ static enum error soft_tester(struct test_op *t)
         float64 *in[] = { &a, &b, &c };
         int i;

-        g_assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
+        assert(ops[t->op].n_operands <= ARRAY_SIZE(in));
         for (i = 0; i < ops[t->op].n_operands; i++) {
             *in[i] = t->operands[i].val;
         }
@@ -585,6 +649,14 @@ static const struct tester valid_testers[] = {
         .func = host_tester,
     },
 };
+#else
+static const struct tester valid_testers[] = {
+    [0] = {
+        .name = "host",
+        .func = host_tester,
+    },
+};
+#endif
 static const struct tester *tester = &valid_testers[0];

 static int ibm_get_exceptions(const char *p, uint8_t *excp)
@@ -622,7 +694,7 @@ static uint64_t fp_choose(enum precision prec, uint64_t f, uint64_t d)
     case PREC_DOUBLE:
         return d;
     default:
-        g_assert_not_reached();
+        assert(false);
     }
 }

@@ -756,7 +828,7 @@ ibm_fp_hex(const char *p, enum precision prec, struct operand *ret)
         } else if (prec == PREC_DOUBLE) {
             ret->val = double_to_u64(0.0);
         } else {
-            g_assert_not_reached();
+            assert(false);
         }
         return 0;
     } else if (!strcmp(p, "0x1")) {
@@ -765,7 +837,7 @@ ibm_fp_hex(const char *p, enum precision prec, struct operand *ret)
         } else if (prec == PREC_DOUBLE) {
             ret->val = double_to_u64(1.0);
         } else {
-            g_assert_not_reached();
+            assert(false);
         }
         return 0;
     }
@@ -915,10 +987,13 @@ static const struct input *input_type = &valid_input_types[INPUT_FMT_IBM];

 static bool line_is_whitelisted(const char *line)
 {
-    if (whitelist.ht == NULL) {
+    ENTRY e, *ep;
+
+    if (whitelist.ht.size == 0) {
         return false;
     }
-    return !!g_hash_table_lookup(whitelist.ht, line);
+    e.key = line;
+    return hsearch_r(e, FIND, &ep, &whitelist.ht)==0;
 }

 static void test_file(const char *filename)
@@ -958,7 +1033,7 @@ static void test_file(const char *filename)
                         filename, i);
                 break;
             default:
-                g_assert_not_reached();
+                assert(false);
             }
             fprintf(stderr, "%s", line);
             if (die_on_error) {
@@ -1007,23 +1082,32 @@ static void set_tester(const char *optarg)

 static void whitelist_add_line(const char *orig_line)
 {
-    char *line;
+    char *line = strdup(orig_line);
     bool inserted;
+    ENTRY e, *ep;
+    int r;

-    if (whitelist.ht == NULL) {
-        whitelist.ht = g_hash_table_new(g_str_hash, g_str_equal);
+    if (whitelist.ht.size == 0) {
+        if (!hcreate_r(4096, &whitelist.ht)) {
+            fprintf(stderr, "%s: error creating hash table\n", __func__);
+        }
     }
-    line = g_hash_table_lookup(whitelist.ht, orig_line);
-    if (unlikely(line != NULL)) {
+
+    int hsearch_r(ENTRY item, ACTION action, ENTRY **retval,
+              struct hsearch_data *htab);
+
+    e.key = line;
+    r = hsearch_r(e, FIND, &ep, &whitelist.ht);
+    if (unlikely(r)) {
+        free(line);
         return;
     }
     whitelist.n++;
-    whitelist.lines = g_realloc_n(whitelist.lines, whitelist.n, sizeof(line));
-    line = strdup(orig_line);
+    whitelist.lines = realloc(whitelist.lines, (whitelist.n * sizeof(line)));
     whitelist.lines[whitelist.n - 1] = line;
-    /* if we pass key == val GLib will not reserve space for the value */
-    inserted = g_hash_table_insert(whitelist.ht, line, line);
-    g_assert(inserted);
+    e.data = line;
+    inserted = hsearch_r(e, ENTER, &ep, &whitelist.ht);
+    assert(inserted);
 }

 static void set_whitelist(const char *filename)
@@ -1061,18 +1145,20 @@ static void usage_complete(int argc, char *argv[])
 {
     fprintf(stderr, "Usage: %s [options] file1 [file2 ...]\n", argv[0]);
     fprintf(stderr, "options:\n");
-    fprintf(stderr, "  -a = Perform tininess detection after rounding "
-            "(soft tester only). Default: before\n");
     fprintf(stderr, "  -n = do not die on error. Default: dies on error\n");
     fprintf(stderr, "  -e = default exception flags (xiozu). Default: none\n");
     fprintf(stderr, "  -f = format of the input file(s). Default: %s\n",
             valid_input_types[0].name);
     fprintf(stderr, "  -t = tester. Default: %s\n", valid_testers[0].name);
     fprintf(stderr, "  -w = path to file with test cases to be whitelisted\n");
+#ifdef USE_SOFTFLOAT
+    fprintf(stderr, "  -a = Perform tininess detection after rounding "
+            "(soft tester only). Default: before\n");
     fprintf(stderr, "  -z = flush inputs to zero (soft tester only). "
             "Default: disabled\n");
     fprintf(stderr, "  -Z = flush output to zero (soft tester only). "
             "Default: disabled\n");
+#endif
 }

 static void parse_opts(int argc, char *argv[])
@@ -1085,9 +1171,6 @@ static void parse_opts(int argc, char *argv[])
             return;
         }
         switch (c) {
-        case 'a':
-            soft_status.float_detect_tininess = float_tininess_after_rounding;
-            break;
         case 'e':
             set_default_exceptions(optarg);
             break;
@@ -1106,15 +1189,20 @@ static void parse_opts(int argc, char *argv[])
         case 'w':
             set_whitelist(optarg);
             break;
+#ifdef USE_SOFTFLOAT
+        case 'a':
+            soft_status.float_detect_tininess = float_tininess_after_rounding;
+            break;
         case 'z':
             soft_status.flush_inputs_to_zero = 1;
             break;
         case 'Z':
             soft_status.flush_to_zero = 1;
             break;
+#endif
         }
     }
-    g_assert_not_reached();
+    assert(false);
 }

 static uint64_t count_errors(void)
diff --git a/tests/tcg/Makefile b/tests/tcg/Makefile
index 2bba0d2a32..9c8011063e 100644
--- a/tests/tcg/Makefile
+++ b/tests/tcg/Makefile
@@ -24,6 +24,9 @@
 VPATH = $(SRC_PATH)/tests/tcg/multiarch
 TEST_SRCS = $(wildcard $(SRC_PATH)/tests/tcg/multiarch/*.c)

+VPATH     += $(SRC_PATH)/tests/fp
+TEST_SRCS += $(wildcard $(SRC_PATH)/tests/fp/*.c)
+
 VPATH     += $(SRC_PATH)/tests/tcg/$(ARCH)
 TEST_SRCS += $(wildcard $(SRC_PATH)/tests/tcg/$(ARCH)/*.c)

--
2.16.2
--8<---------------cut here---------------end--------------->8---




--
Alex Bennée

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite
  2018-04-11  1:20   ` Alex Bennée
@ 2018-04-11  1:39     ` Alex Bennée
  2018-04-11 21:36     ` Emilio G. Cota
  1 sibling, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2018-04-11  1:39 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Aurelien Jarno, Peter Maydell, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland


Alex Bennée <alex.bennee@linaro.org> writes:

> Emilio G. Cota <cota@braap.org> writes:
>
>> This will allow us to run correctness tests against our
>> FP implementation. The test can be run in two modes (called
>> "testers"): host and soft. With the former we check the results
>> and FP flags on the host machine against the model.
>> With the latter we check QEMU's fpu primitives against the
>> model. Note that in soft mode we are not instantiating any
>> particular CPU (hence the HW_POISON_H hack to avoid macro poisoning);
>> for that we need to run the test in host mode under QEMU.
> <snip>
>
> So with the attached patch and my proposed cross build we can now get:
>
<snip>
> --- a/tests/tcg/Makefile
> +++ b/tests/tcg/Makefile
> @@ -24,6 +24,9 @@
>  VPATH = $(SRC_PATH)/tests/tcg/multiarch
>  TEST_SRCS = $(wildcard $(SRC_PATH)/tests/tcg/multiarch/*.c)
>
> +VPATH     += $(SRC_PATH)/tests/fp
> +TEST_SRCS += $(wildcard $(SRC_PATH)/tests/fp/*.c)
> +
>  VPATH     += $(SRC_PATH)/tests/tcg/$(ARCH)
>  TEST_SRCS += $(wildcard $(SRC_PATH)/tests/tcg/$(ARCH)/*.c)

It also needs:

fp-test: LDFLAGS+=-lm

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite
  2018-04-11  1:20   ` Alex Bennée
  2018-04-11  1:39     ` Alex Bennée
@ 2018-04-11 21:36     ` Emilio G. Cota
  1 sibling, 0 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-11 21:36 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Aurelien Jarno, Peter Maydell, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland

On Wed, Apr 11, 2018 at 02:20:49 +0100, Alex Bennée wrote:
> Emilio G. Cota <cota@braap.org> writes:
> So with the attached patch and my proposed cross build we can now get:
> 
> 02:15:54 [alex@zen:~/l/q/qemu.git] softfloat-fixes-for-2.12-v1 ± find . -iname "fp-test" | xargs file
> ./ppc64-linux-user/tests/fp-test:      ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, not stripped
(snip)
> But it did mean having to hack about a little, mainly to get rid of
> glib.

That will let us build fp-test using a cross-compiler. My initial
thinking was that since we'd end up testing on a real host
(with "-t host" mode), cross-compiling wouldn't be necessary since we
could just compile natively on said host.

But since we seem to be moving towards supporting cross-compilers,
it takes little effort to cross-compile fp-test as well. The main
hurdle is to remove the glib dependence as you pointed out. I just
wrote a few patches to do this:

$ git log --oneline -5 --reverse
48e802b osdep: disable glib-compat.h include with QEMU_NO_GLIB
d3c78c7 softfloat: do not include glib headers
744a9c4 tests/tcg/Makefile: define _GNU_SOURCE
661c0e2 tests/fp: fixup
e057d45 tests/tcg/Makefile: fp-test build fixup

The main difference with your attached patch is that we remove ifdef's
from fp-test.c while keeping the osdep.h include.

You can fetch the patches from
  https://github.com/cota/qemu/tree/softfloat-fixes-for-2.12-v1

[BTW the name of the branch is just to keep your original branch name;
I'm in now way intending for this to be part of 2.12 :>]

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-04-11 21:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-04 23:11 [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite Emilio G. Cota
2018-04-11  1:20   ` Alex Bennée
2018-04-11  1:39     ` Alex Bennée
2018-04-11 21:36     ` Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 02/15] softfloat: fix {min, max}nummag for same-abs-value inputs Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 03/15] fp-test: add muladd variants Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 04/15] softfloat: add float{32, 64}_is_{de, }normal Emilio G. Cota
2018-04-06 12:01   ` Bastian Koppelmann
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 05/15] target/tricore: use float32_is_denormal Emilio G. Cota
2018-04-06 12:01   ` Bastian Koppelmann
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 06/15] tests/fp: add fp-bench, a collection of simple floating point microbenchmarks Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 07/15] softfloat: rename canonicalize to sf_canonicalize Emilio G. Cota
2018-04-06 12:02   ` Bastian Koppelmann
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 08/15] softfloat: add float{32, 64}_is_zero_or_normal Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 09/15] fpu: introduce hardfloat Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 10/15] hardfloat: support float32/64 addition and subtraction Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 11/15] hardfloat: support float32/64 multiplication Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 12/15] hardfloat: support float32/64 division Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 13/15] hardfloat: support float32/64 fused multiply-add Emilio G. Cota
2018-04-04 23:16   ` Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 14/15] hardfloat: support float32/64 square root Emilio G. Cota
2018-04-04 23:17   ` Emilio G. Cota
2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 15/15] hardfloat: support float32/64 comparison Emilio G. Cota
2018-04-04 23:31 ` [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.