Re: [Qemu-devel] [PATCH v2 00/14] fp-test + hardfloat

From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>,
	Peter Maydell <peter.maydell@linaro.org>,
	Laurent Vivier <laurent@vivier.eu>,
	Richard Henderson <richard.henderson@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
	Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Subject: Re: [Qemu-devel] [PATCH v2 00/14] fp-test + hardfloat
Date: Wed, 28 Mar 2018 14:36:38 +0100	[thread overview]
Message-ID: <87h8p08gpl.fsf@linaro.org> (raw)
In-Reply-To: <1522128840-498-1-git-send-email-cota@braap.org>

Emilio G. Cota <cota@braap.org> writes:

> v1: https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05908.html
>
> Changes from v1:
>
> - Rename series from "hostfloat" to "hardfloat". The series already uses
>   "host" as an option for fp-test, so this change should make things clearer
>
> - Rebase on top of master (4c2c101590).
>
> - Move code from fpu/hostfloat.c to fpu/softfloat.c. I am not mentioning
>   anything about the license; I read the softfloat-2a license and I'm OK
>   with it. [ Laurent: thanks for the clarification on this. ]
>
> - Fix target-m68k build breakage
>
> - Merge is_normal and is_denormal additions into a single commit
>
> - Add tricore patch to use float32_is_denormal
>
> - Keep the flatten attribute for the soft-fp implementations that
>   have now become a slow path
>
> - Add the noinline attribute to the soft-fp primitives. Not doing
>   this reduces performance significantly

Yep - we want to avoid the compiler having to inline the complex
softfloat code in the hardfloat fast path. However I think we can still
keep the non-macro style and achieve this.

>
> - Add a comment about why dealing with denormals in hardfloat is
>   a bad idea
>
> - Keep separate float32 and float64 implementations for most ops. This
>   improves performance as shown in the commit logs.
>   + I'm keeping the macro-based definitions to make testing easier.
>   + In v1 I wrongly reported similar float/double results for fp-bench;
>   I noticed that in my testing I forgot to set -p single/double, so I was
>   benchmarking only with the default precision (single). Ouch!
>
> - Update commit logs with fresh (correct) numbers from fp-bench.
>
> - Move some zero-input detection (addsub/div) *after* checking for
>   <= min_normal. This makes the common case (i.e. not all inputs are zero)
>   faster, still allowing us to handle the 0-input cases in hardfloat
>
> - Update the commit log of the comparison patch to mention that
>   int64_to_float32/64 are still in soft-fp and take quite a bit of
>   execution time for fp-bench -o cmp.
>
> - fp-test:
>   + add *.txt to fp-test/.gitignore instead of just whitelist.txt
>
> - fp-bench
>   + generate only positive numbers for testing sqrt
>   + add -o cmp
>   + use g_strjoinv to print the list of available ops in the
>     help message
>   + remove libc headers except math.h
>   + use qemu/timer.h's get_clock_realtime instead of open-coding it
>   + add entry to tests/Makefile.include to call fp-test/Makefile
>     when building anything in tests/fp-test/
>
> Perf numbers are in the last patch. They are a little different than
> last week; I cannot replicate last week's performance (even with
> the very same binaries; might have to reboot the machine I'm using
> soon), but as of today v2 is certainly faster than v1 (e.g. 5% faster
> for nbench-fp).

And I made mul32 faster in my common code variant:

mul32 Before:
  101.95 MFlops
  102.29 MFlops
  101.62 MFlops

mul32 After:
  154.26 MFlops
  154.42 MFlops
  154.58 MFlops

I don't think macros are needed for this, just careful control of the
inline/flatten boundaries.

What do you think?

>
> I have checked all checkpatch warnings; they're all false positives.
>
> You can fetch the series from:
>   https://github.com/cota/qemu/tree/hardfloat-v2
>
> Thanks,
>
> 		Emilio
>
> diffstat:
>  configure                   |    2 +
>  fpu/softfloat.c             |  619 ++++++++++++++++++--
>  include/fpu/softfloat.h     |   20 +
>  target/tricore/fpu_helper.c |    9 +-
>  tests/.gitignore            |    2 +
>  tests/Makefile.include      |    6 +-
>  tests/fp-bench.c            |  334 +++++++++++
>  tests/fp-test/.gitignore    |    3 +
>  tests/fp-test/Makefile      |   34 ++
>  tests/fp-test/fp-test.c     | 1183 ++++++++++++++++++++++++++++++++++++++
>  tests/fp-test/muladd.fptest |   51 ++
>  11 files changed, 2212 insertions(+), 51 deletions(-)
>  create mode 100644 tests/fp-bench.c
>  create mode 100644 tests/fp-test/.gitignore
>  create mode 100644 tests/fp-test/Makefile
>  create mode 100644 tests/fp-test/fp-test.c
>  create mode 100644 tests/fp-test/muladd.fptest

--
Alex Bennée