qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/2] target/ppc/cpu: Add hardfloat property
  2020-02-17  1:54 [RFC PATCH 0/2] Enable hardfloat for PPC BALATON Zoltan
@ 2020-02-17  0:14 ` BALATON Zoltan
  2020-02-17  1:19 ` [RFC PATCH 2/2] target/ppc: Enable hardfloat for PPC BALATON Zoltan
  2020-02-17  9:51 ` [RFC PATCH 0/2] " Peter Maydell
  2 siblings, 0 replies; 5+ messages in thread
From: BALATON Zoltan @ 2020-02-17  0:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Mark Cave-Ayland, John Arbuckle, qemu-ppc, Paul Clarke,
	Howard Spoelstra, David Gibson

Add a property to allow setting a flag in cpu env that will be used to
control if hardfloat is used for floating point ops (i.e. speed is
preferred over accuracy).

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 target/ppc/cpu.h                | 2 ++
 target/ppc/translate_init.inc.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index b283042515..1b258a5db5 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1033,6 +1033,7 @@ struct CPUPPCState {
     float_status vec_status;
     float_status fp_status; /* Floating point execution context */
     target_ulong fpscr;     /* Floating point status and control register */
+    bool hardfloat;         /* use hardfloat (this breaks FPSCR[FI] bit) */
 
     /* Internal devices resources */
     ppc_tb_t *tb_env;      /* Time base and decrementer */
@@ -1163,6 +1164,7 @@ struct PowerPCCPU {
     void *machine_data;
     int32_t node_id; /* NUMA node this CPU belongs to */
     PPCHash64Options *hash64_opts;
+    bool hardfloat; /* pass on property to env */
 
     /* Those resources are used only during code translation */
     /* opcode handlers */
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 53995f62ea..d6e1d66bc8 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10736,6 +10736,7 @@ static void ppc_cpu_reset(CPUState *s)
     /* tininess for underflow is detected before rounding */
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->fp_status);
+    env->hardfloat = cpu->hardfloat;
 
     for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
         ppc_spr_t *spr = &env->spr_cb[i];
@@ -10868,6 +10869,7 @@ static Property ppc_cpu_properties[] = {
                      false),
     DEFINE_PROP_BOOL("pre-3.0-migration", PowerPCCPU, pre_3_0_migration,
                      false),
+    DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] target/ppc: Enable hardfloat for PPC
  2020-02-17  1:54 [RFC PATCH 0/2] Enable hardfloat for PPC BALATON Zoltan
  2020-02-17  0:14 ` [RFC PATCH 1/2] target/ppc/cpu: Add hardfloat property BALATON Zoltan
@ 2020-02-17  1:19 ` BALATON Zoltan
  2020-02-17  9:51 ` [RFC PATCH 0/2] " Peter Maydell
  2 siblings, 0 replies; 5+ messages in thread
From: BALATON Zoltan @ 2020-02-17  1:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: Mark Cave-Ayland, John Arbuckle, qemu-ppc, Paul Clarke,
	Howard Spoelstra, David Gibson

While other targets take advantage of using host FPU to do floating
point computations, this was disabled for PPC target because always
clearing exception flags before every FP op made it slightly slower
than emulating everyting with softfloat. To emulate some FPSCR bits,
clearing of fp_status may be necessary (unless these could be handled
e.g. using FP exceptions on host but there's no API for that in QEMU
yet) but preserving at least the inexact flag makes hardfloat usable
and faster than softfloat. Since most clients don't actually care
about this flag, we can gain some speed trading some emulation
accuracy.

This patch implements a simple way to keep the inexact flag set for
hardfloat while still allowing to revert to softfloat for workloads
that need more accurate albeit slower emulation. (Set hardfloat
property of CPU, i.e. -cpu name,hardfloat=false for that.) There are
still more places where flags are reset so there is place for further
improvement. Also having a conditional to test for the hardfloat flag
every time makes the softfloat case slower than before this patch so
some other way (like setting a function pointer once and use that
instead if possible) may be needed to avoid this otherwise this patch
only makes sense if the default is also set to enable hardfloat.

Because of the above this patch at the moment is mainly for testing
different workloads to evaluate how viable would this be in practice.
Thus, RFC and not ready for merge yet.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 fpu/softfloat.c                 | 14 +++++++-------
 target/ppc/fpu_helper.c         |  7 ++++++-
 target/ppc/translate_init.inc.c |  2 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 301ce3b537..6d3f4af72a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -216,15 +216,15 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
 #endif
 
 /*
- * Some targets clear the FP flags before most FP operations. This prevents
- * the use of hardfloat, since hardfloat relies on the inexact flag being
- * already set.
+ * Disable hardfloat for known problem cases.
+ * Additionally, some targets clear the FP flags before most FP operations.
+ * This prevents the use of hardfloat, since it relies on the inexact flag
+ * being already set and clearing it often may result in slower computations.
+ * Those targets could also be listed here.
  */
-#if defined(TARGET_PPC) || defined(__FAST_MATH__)
-# if defined(__FAST_MATH__)
-#  warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
+#if defined(__FAST_MATH__)
+# warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
     IEEE implementation
-# endif
 # define QEMU_NO_HARDFLOAT 1
 # define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
 #else
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index ae43b08eb5..33aa977970 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -659,7 +659,12 @@ void helper_float_check_status(CPUPPCState *env)
 
 void helper_reset_fpstatus(CPUPPCState *env)
 {
-    set_float_exception_flags(0, &env->fp_status);
+    if (env->hardfloat) {
+        /* hardfloat needs inexact flag already set, clear only others */
+        set_float_exception_flags(float_flag_inexact, &env->fp_status);
+    } else {
+        set_float_exception_flags(0, &env->fp_status);
+    }
 }
 
 static void float_invalid_op_addsub(CPUPPCState *env, bool set_fpcc,
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index d6e1d66bc8..caac0c2d11 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10869,7 +10869,7 @@ static Property ppc_cpu_properties[] = {
                      false),
     DEFINE_PROP_BOOL("pre-3.0-migration", PowerPCCPU, pre_3_0_migration,
                      false),
-    DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, false),
+    DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 0/2] Enable hardfloat for PPC
@ 2020-02-17  1:54 BALATON Zoltan
  2020-02-17  0:14 ` [RFC PATCH 1/2] target/ppc/cpu: Add hardfloat property BALATON Zoltan
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: BALATON Zoltan @ 2020-02-17  1:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Mark Cave-Ayland, John Arbuckle, qemu-ppc, Paul Clarke,
	Howard Spoelstra, David Gibson

Hello,

This is an RFC series to start exploring the possibility of enabling
hardfloat for PPC target that haven't progressed in the last two years.
Hopefully we can work out something now. Previously I've explored this
here:

https://lists.nongnu.org/archive/html/qemu-ppc/2018-07/msg00261.html

where some ad-hoc benchmarks using lame mp3 encoder is also explained
that has two versions: one using VMX and another only using FP. Both
are mostly floating point bounded. I've run this test on mac99 under
MorphOS before and after my patches, also verifying that md5sum of
resulting mp3 matches (this is no proof for correctness but maybe
shows it did not break too much at least those ops used by this
program).

I've got these measurements on an Intel i7-9700K CPU @ 3.60GHz (did
not bother to take multiple samples so these are just approximate):

1) before patch series using softfloat:
lame: 4:01, lame_vmx: 3:14

2) only enabling hardfloat in fpu/softfloat.c without other changes:
lame: 4:06, lame_vmx: 2:06
(this shows why hardfloat was disabled but VMX can benefit from this)

3) with this series, hardfloat=true:
lame: 3:15, lame_vmx: 1:59
(so the patch does something even if there should be more places to
preserve inexact flag to fully use hardfloat)

4) with this series but forcing softfloat with hardfloat=false:
lame: 4:11, lame_vmx: 2:08
(unfortunately it's slower than before, likely due to adding if () to
helper_reset_fpstatus() that should be avoided to at least get back
previous hardfloat enabled case that's still slower than softfloat so
this series only makes sense if the default can be hardfloat=true at
the moment but even that would need more testing)

I hope others can contribute to this by doing more testing to find out
what else this would break or give some ideas how this could be
improved.

Regards,
BALATON Zoltan

BALATON Zoltan (2):
  target/ppc/cpu: Add hardfloat property
  target/ppc: Enable hardfloat for PPC

 fpu/softfloat.c                 | 14 +++++++-------
 target/ppc/cpu.h                |  2 ++
 target/ppc/fpu_helper.c         |  7 ++++++-
 target/ppc/translate_init.inc.c |  2 ++
 4 files changed, 17 insertions(+), 8 deletions(-)

-- 
2.21.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Enable hardfloat for PPC
  2020-02-17  1:54 [RFC PATCH 0/2] Enable hardfloat for PPC BALATON Zoltan
  2020-02-17  0:14 ` [RFC PATCH 1/2] target/ppc/cpu: Add hardfloat property BALATON Zoltan
  2020-02-17  1:19 ` [RFC PATCH 2/2] target/ppc: Enable hardfloat for PPC BALATON Zoltan
@ 2020-02-17  9:51 ` Peter Maydell
  2020-02-17 11:26   ` BALATON Zoltan
  2 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2020-02-17  9:51 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Mark Cave-Ayland, QEMU Developers, John Arbuckle, qemu-ppc,
	Paul Clarke, Howard Spoelstra, David Gibson

On Mon, 17 Feb 2020 at 02:43, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> Hello,
>
> This is an RFC series to start exploring the possibility of enabling
> hardfloat for PPC target that haven't progressed in the last two years.
> Hopefully we can work out something now. Previously I've explored this
> here:
>
> https://lists.nongnu.org/archive/html/qemu-ppc/2018-07/msg00261.html
>
> where some ad-hoc benchmarks using lame mp3 encoder is also explained
> that has two versions: one using VMX and another only using FP. Both
> are mostly floating point bounded. I've run this test on mac99 under
> MorphOS before and after my patches, also verifying that md5sum of
> resulting mp3 matches (this is no proof for correctness but maybe
> shows it did not break too much at least those ops used by this
> program).

> I hope others can contribute to this by doing more testing to find out
> what else this would break or give some ideas how this could be
> improved.

I think the ideal would be to test against a reference using
risu to see whether this changes behaviour (FP results should
be bit-for-bit identical; usually application level testing is
often not sufficient to detect this). You could test either
against real hardware or against the non-hardfloat QEMU.
I'm not sure how comprehensive the coverage for ppc insns
is but there are a fair number of fp insns covered already:
https://git.linaro.org/people/peter.maydell/risu.git/tree/

It's also worth testing any alternate/non-standard config
modes the FPU might have (eg different default rounding modes,
any flush-to-zero or alternate denormal handling, that kind
of thing), and not just the default how-the-CPU-boots-up mode.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] Enable hardfloat for PPC
  2020-02-17  9:51 ` [RFC PATCH 0/2] " Peter Maydell
@ 2020-02-17 11:26   ` BALATON Zoltan
  0 siblings, 0 replies; 5+ messages in thread
From: BALATON Zoltan @ 2020-02-17 11:26 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, John Arbuckle, qemu-ppc, Paul Clarke,
	Howard Spoelstra, David Gibson

On Mon, 17 Feb 2020, Peter Maydell wrote:
> On Mon, 17 Feb 2020 at 02:43, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>> Hello,
>>
>> This is an RFC series to start exploring the possibility of enabling
>> hardfloat for PPC target that haven't progressed in the last two years.
>> Hopefully we can work out something now. Previously I've explored this
>> here:
>>
>> https://lists.nongnu.org/archive/html/qemu-ppc/2018-07/msg00261.html
>>
>> where some ad-hoc benchmarks using lame mp3 encoder is also explained
>> that has two versions: one using VMX and another only using FP. Both
>> are mostly floating point bounded. I've run this test on mac99 under
>> MorphOS before and after my patches, also verifying that md5sum of
>> resulting mp3 matches (this is no proof for correctness but maybe
>> shows it did not break too much at least those ops used by this
>> program).
>
>> I hope others can contribute to this by doing more testing to find out
>> what else this would break or give some ideas how this could be
>> improved.
>
> I think the ideal would be to test against a reference using
> risu to see whether this changes behaviour (FP results should
> be bit-for-bit identical; usually application level testing is
> often not sufficient to detect this). You could test either

Sure, thanks. I did not mean to claim the simple test I've done was 
sufficient but I expect others who have interest in this and more 
experienced in such testing (or even being payed to work on QEMU which I'm 
not) contribute to this so I did not try testing it more throughly than 
just showing it could be considerably faster and still work fot at least 
some workloads so it's worth working on. I'm surprised that in the two 
years since hardfloat was merged nobody even tried this (or those who did 
dropped the idea before any results without letting us know). So I tried 
to make a start with it to explore what would it take to fix this 
eventually but I don't want to do that alone. I hope this inspires others 
to help e.g. in thesting and we can reach a solution together.

> against real hardware or against the non-hardfloat QEMU.
> I'm not sure how comprehensive the coverage for ppc insns
> is but there are a fair number of fp insns covered already:
> https://git.linaro.org/people/peter.maydell/risu.git/tree/

I don't have real hardware and testing against QEMU may take longer and 
not sure how useful. There could also be preexisting bugs, although some 
fixes were made to PPC FP implementation recently. Maybe I'll have a look 
if have no better things to do but I have other ongoing QEMU related 
projects as well that I might try to make some progress as well.

> It's also worth testing any alternate/non-standard config
> modes the FPU might have (eg different default rounding modes,
> any flush-to-zero or alternate denormal handling, that kind
> of thing), and not just the default how-the-CPU-boots-up mode.

It is expected to break inexact exceptions currently until a better way 
can be found to handle those but I think hardfloat is already disabled for 
other than default rounding modes or FPU settings so maybe those should 
not break. According to:

https://git.qemu.org/?p=qemu.git;a=blob;f=fpu/softfloat.c;h=301ce3b537b6c0eee5dbbc358587b66a3a341d2a;hb=HEAD#l235

  235 static inline bool can_use_fpu(const float_status *s)
  236 {
  237     if (QEMU_NO_HARDFLOAT) {
  238         return false;
  239     }
  240     return likely(s->float_exception_flags & float_flag_inexact &&
  241                   s->float_rounding_mode == float_round_nearest_even);
  242 }
  243

and

https://git.qemu.org/?p=qemu.git;a=blob;f=fpu/softfloat.c;h=301ce3b537b6c0eee5dbbc358587b66a3a341d2a;hb=HEAD#l99

   99 /*
  100  * Hardfloat
  101  *
  102  * Fast emulation of guest FP instructions is challenging for two reasons.
  103  * First, FP instruction semantics are similar but not identical, particularly
  104  * when handling NaNs. Second, emulating at reasonable speed the guest FP
  105  * exception flags is not trivial: reading the host's flags register with a
  106  * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
  107  * and trapping on every FP exception is not fast nor pleasant to work with.
  108  *
  109  * We address these challenges by leveraging the host FPU for a subset of the
  110  * operations. To do this we expand on the idea presented in this paper:
  111  *
  112  * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
  113  * binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.
  114  *
  115  * The idea is thus to leverage the host FPU to (1) compute FP operations
  116  * and (2) identify whether FP exceptions occurred while avoiding
  117  * expensive exception flag register accesses.
  118  *
  119  * An important optimization shown in the paper is that given that exception
  120  * flags are rarely cleared by the guest, we can avoid recomputing some flags.
  121  * This is particularly useful for the inexact flag, which is very frequently
  122  * raised in floating-point workloads.
  123  *
  124  * We optimize the code further by deferring to soft-fp whenever FP exception
  125  * detection might get hairy. Two examples: (1) when at least one operand is
  126  * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 result
  127  * and the result is < the minimum normal.
  128  */
  129 #define GEN_INPUT_FLUSH__NOCHECK(name, soft_t)                          \
  130     static inline void name(soft_t *a, float_status *s)                 \
  131     {                                                                   \
  132         if (unlikely(soft_t ## _is_denormal(*a))) {                     \
  133             *a = soft_t ## _set_sign(soft_t ## _zero,                   \
  134                                      soft_t ## _is_neg(*a));            \
  135             s->float_exception_flags |= float_flag_input_denormal;      \
  136         }                                                               \
  137     }

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-17 11:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17  1:54 [RFC PATCH 0/2] Enable hardfloat for PPC BALATON Zoltan
2020-02-17  0:14 ` [RFC PATCH 1/2] target/ppc/cpu: Add hardfloat property BALATON Zoltan
2020-02-17  1:19 ` [RFC PATCH 2/2] target/ppc: Enable hardfloat for PPC BALATON Zoltan
2020-02-17  9:51 ` [RFC PATCH 0/2] " Peter Maydell
2020-02-17 11:26   ` BALATON Zoltan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).