[Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat

* [Qemu-devel] [PATCH v3 00/15] fp-test + hardfloat
@ 2018-04-04 23:11 Emilio G. Cota
  2018-04-04 23:11 ` [Qemu-devel] [PATCH v3 01/15] tests: add fp-test, a floating point test suite Emilio G. Cota
                   ` (15 more replies)
  0 siblings, 16 replies; 25+ messages in thread
From: Emilio G. Cota @ 2018-04-04 23:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aurelien Jarno, Peter Maydell, Alex Bennée, Laurent Vivier,
	Richard Henderson, Paolo Bonzini, Mark Cave-Ayland,
	Bastian Koppelmann

v2: https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06805.html

Changes since v2:

- Add R-b tags

- Add a patch to rename our canonicalize to sf_canonicalize,
  to avoid clashing with glibc's.

- Add a patch to define float{32,64}_is_zero_or_normal

- Simplify the float{32,64}_input_flushX macros -- now the
  macros are more verbose but the full function names are greppable.

- Move tests/fp-test to tests/fp, since now both fp-bench and fp-test
  are under tests/fp.
  + Use tests/fp/fp-test.h for helpers common to both fp-bench and fp-test.

- Complete rewrite of fp-bench:
  + We can now directly call the softfloat functions, thereby
    making the benchmark more sensitive to changes to those functions.
  + We can still use the native ops with "-t host".
  + The rewrite also has less macro trickery; we rely instead on
    constant propagation by the compiler.
  + Alex: dropped your R-b since this changed a lot. I think you'll
    like this version better though!

- Define a generic function to generate the hardfloat implementation
  for ops with 2 inputs; add, sub, mul and div depend on it.
  Instead of using macros, rely on the constant propagation done
  by the compiler. [Alex: I dropped your R-b for the addsub
  patch because it changed a lot]
  + I kept macros for other ops, because I think the subsequent
    code duplication savings are worth the pain.

- Add #define's to select whether to use fpclassify etc. or
  float32_is_zero etc.
  + Benchmark perf differences on x86_64, aarch64 and IBM Power8 hosts.
  + For 32-bit we don't use fpclassify etc. for any architectures,
    so I was tempted to get rid of this option to save some code.
    It's possible however that on some hosts I have not tested this option
    might pay off, so I decided to keep it there.

- Add a #define to select whether to use isinf() or floatX_is_infinity().
  Turns out this makes a big difference for power64.

- Remove float32_to_float64 support in hardfloat, since nbench or
  SPEC actually showed a small yet measurable slowdown with it,
  despite fp-bench showing a significant speedup for this operation.

- Do not flatten soft-fp functions; these are now slow paths.
  This shrinks the size of the softfloat object below its original
  size (see last patch's log).

- Add a #define to disable hardfloat for some targets. I noticed that
  some targets (at least I noticed PPC, there might be others) do
  clear the FP flags before calling softfloat. This precludes hardfloat
  since it relies on inexact not being set. In the long run we should
  fix these targets though.

Note: fp-bench can run _very_ slowly (~0.5 IPC) for -o fma on some x86_64
hosts. I have not pinned down what's going on, but from the few hosts
I have access to, it seems that machines that have been patched for
Spectre/Meltdown are susceptible to this slowdown.
Fortunately though:
1) when fma is run in QEMU (and not under a microbenchmark such as
   fp-bench), fma performance is still very good (much better than with
   soft-fp).
2) Compiling with -march=native gets rid of the problem.
I've reproduced this with both gcc 5.4.0 and gcc 7.1.0. The *very* same
fp-bench binary that performs very well for FMA on two machines (one
AMD, one Intel, neither patched against Meltdown/Spectre) performs
below soft-fp on another three machines (all Intel, all patched).

Note: there are some checkpatch errors, but they are false positives.

Perf numbers for fp-bench are in each commit log; numbers for several
benchmarks are in the last patch's commit log.

You can fetch this series from:
  https://github.com/cota/qemu/tree/hardfloat-v3

Thanks,

		Emilio

---
 configure                   |    2 +
 fpu/softfloat.c             |  945 ++++++++++++++++++++++++++++++--
 include/fpu/softfloat.h     |   30 +
 target/tricore/fpu_helper.c |    9 +-
 tests/Makefile.include      |    3 +
 tests/fp/.gitignore         |    4 +
 tests/fp/Makefile           |   36 ++
 tests/fp/fp-bench.c         |  528 ++++++++++++++++++
 tests/fp/fp-test.c          | 1183 ++++++++++++++++++++++++++++++++++++++++
 tests/fp/muladd.fptest      |   51 ++
 10 files changed, 2737 insertions(+), 54 deletions(-)
 create mode 100644 tests/fp/.gitignore
 create mode 100644 tests/fp/Makefile
 create mode 100644 tests/fp/fp-bench.c
 create mode 100644 tests/fp/fp-test.c
 create mode 100644 tests/fp/muladd.fptest

^ permalink raw reply	[flat|nested] 25+ messages in thread