All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/5] Idea for using hardfloat in PPC
@ 2022-10-26 19:25 Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 1/5] target/ppc: prepare instructions to work with caching last FP insn Víctor Colombo
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Víctor Colombo @ 2022-10-26 19:25 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: clg, danielhb413, david, groug, richard.henderson, aurelien,
	peter.maydell, alex.bennee, balaton, victor.colombo,
	matheus.ferst, lucas.araujo, leandro.lupori, lucas.coutinho

As can be seem in the mailing thread that added hardfloat support in
QEMU [1], a requirement for it to work is to have float_flag_inexact
set when entering the API in softfloat.c. However, in the same thread,
it was explained that PPC target would not work by default with this
implementation.
The problem is that PPC has a non-sticky inexact bit (there is a
discussion about it in [2]), meaning that we can't just set the flag
and call the API in softfloat.c, as it would return the same flag set
to 1, and we wouldn't know if it is supposed to be updated on FPSCR or
not.
Over the last couple years, there were attempts to enable hardfpu
for Power, like [3]. But nothing got to master.
[5] shows a suggestion by Yonggang Luo and commentaries by Richard and
Zoltan, about caching the last FP instruction and reexecuting it when
necessary.

This patch set is a proposition on the idea to cache the last FP insn,
to be reexecuted later when the value of FPSCR is to be read by a
program. When executed in hardfloat, the instruction "context" is saved
inside `env`, and is expected to be reexecuted later, in softfloat,
to calculate the correct value of the inexact flag in FPSCR.
The instruction to be cached is the last instruction that changes FI.
If the instructions does not change FI, it keeps the cache intact.
If it changes FI, it caches itself and tries to execute in hardfpu.
It might or might not use hardfloat, but as the inexact flag was
artificially set, it will require to be reexecuted later. 'Later'
means when FPSCR is to be read, like during a call to MFFS, or when
a signal occurs. There are probably other places, e.g. other mffs-like
instructions, but this RFC only addresses these two scenarios.
This is supposed to be more efficient because programs very seldomly
read FPSCR, meaning the amount of reexecutions will be low.

For now, this was implemented and tested for linux-user, no softmmu
work or analysis was done.
I implemented the base code to keep all instructions working with
this new behavior (patch 1), and also implemented some instructions
as an example on what it would be necessary to do for every instruction
to use hardfpu (patches 2, 3 and 4).

My tests with risu and other manual tests showed the behavior seems to
be correct. I tested mainly if FPSCR is the same after using softfloat
or hardfloat.

On the v1 of this RFC I reported a performance regression with the
implementation. However, the test I crafted [4] was supposed to be a
mix of many hardfloats and some softfloat fallbacks (instructions
fallback to softfloat in special cases, like e.g. negative argument
for sqrt). What actually was happening was that there was a huge amount
of fallbacks and not many hardfloats actually happening. The expected
'normal scenario' is to have a lot of valid, 'happy path' instructions
that can use hardfloat.
So, what I did for v2 is to create two tests, one that would hit 100%
hardfloat, and one that would fallback 100% to softfloat. I present
the results below. The tests are not comparable, neither the new ones
or the previous one from v1. So they are supposed to be analyzed
uniquely.

100% hardfloat (1:1 mix of fsqrt and fmadd) [6]
|                | min [s] | max [s] | avg [s] |
| before (master)| 30.731 | 31.420   | 31.186  |
| after changes  | 20.860 | 21.100   | 20.989  |
(approx. 1.5x speedup)

100% softfloat (1:1 mix of fsqrt and fmadd) [7]
|                | min [s] | max [s] | avg [s] |
| before (master)| 22.684  | 23.152   | 22.868  |
| after changes  | 25.098  | 25.397   | 25.281  |
(approx 0.9x of old performance)

This is way better than what I previously reported, and is a result
that might justify going forward with this idea. The only problem
is the performance impact when hardfloat cannot be used. I expect
that most real-life use cases will hit hardfloat almost 100% of the
time, so this might not be a big issue. Opinions on this?

You can see that I actually added a new commit to this RFC,
implementing the idea also for add, sub, mul, and div. I tested the old
test with this new commit, and the result was not better. So the new
patch was not responsible for the performance gain, the test itself
was bad.

As I did not test the code in softmmu or bsd-user (does bsd-user work
for PPC?), I added some build time checks to only enable this RFC for
linux-user. I'm pretty confident that making this work for softmmu will
need changes in other places in the code. But I'm focusing on linux-
user for now.

Thank you very much!

[1] https://patchwork.kernel.org/project/qemu-devel/patch/20181124235553.17371-8-cota@braap.org/
[2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-05/msg00246.html
[3] https://patchwork.kernel.org/project/qemu-devel/patch/20200218171702.979F074637D@zero.eik.bme.hu/
[4] https://gist.github.com/vcoracolombo/6ad884a402f1bba531e2e3da7e196656
[5] https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg00064.html
[6] https://gist.github.com/vcoracolombo/f0d8b7c9f1cb63dac6ff0221209ec4ff
[7] https://gist.github.com/vcoracolombo/4b592644517c0efb3854872a4b30f6cc

Víctor Colombo (5):
  target/ppc: prepare instructions to work with caching last FP insn
  target/ppc: Implement instruction caching for fsqrt
  target/ppc: Implement instruction caching for muladd
  target/ppc: Implement instruction caching for add/sub/mul/div
  target/ppc: Enable hardfpu for Power

 fpu/softfloat.c                    |  10 +-
 target/ppc/cpu.h                   |  37 ++++++
 target/ppc/excp_helper.c           |   2 +
 target/ppc/fpu_helper.c            | 186 +++++++++++++++++++++++++++++
 target/ppc/helper.h                |   1 +
 target/ppc/translate/fp-impl.c.inc |   1 +
 6 files changed, 233 insertions(+), 4 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 1/5] target/ppc: prepare instructions to work with caching last FP insn
  2022-10-26 19:25 [RFC PATCH v2 0/5] Idea for using hardfloat in PPC Víctor Colombo
@ 2022-10-26 19:25 ` Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 2/5] target/ppc: Implement instruction caching for fsqrt Víctor Colombo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Víctor Colombo @ 2022-10-26 19:25 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: clg, danielhb413, david, groug, richard.henderson, aurelien,
	peter.maydell, alex.bennee, balaton, victor.colombo,
	matheus.ferst, lucas.araujo, leandro.lupori, lucas.coutinho

When enabling hardfpu for Power and adding the instruction caching
feature, it will be necessary to uncache when the instruction
is garanteed to be executed in softfloat. If the cache is not cleared
in this situation, it could lead to a previous instruction being
reexecuted and yield a different result than when only softfloat
was present.

This patch introduces the base code to allow for the implementation of
FP instructions caching, while also adding calls to a macro that clears
the cached instruction for every one that has not been 'migrated' to
hardfpu-compliance yet.

In the future, it will be necessary to implement the necessary code
for each FP instruction that wants to use hardfpu.

This implementation only works in linux-user. No test or effort
was done in this patch to make it work for softmmu. Future work
will be required to make it work correctly in this scenario.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
---
 target/ppc/cpu.h                   |  6 +++
 target/ppc/excp_helper.c           |  2 +
 target/ppc/fpu_helper.c            | 71 ++++++++++++++++++++++++++++++
 target/ppc/helper.h                |  1 +
 target/ppc/translate/fp-impl.c.inc |  1 +
 5 files changed, 81 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index cca6c4e51c..116ee639ff 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1080,6 +1080,10 @@ struct ppc_radix_page_info {
 #define PPC_CPU_OPCODES_LEN          0x40
 #define PPC_CPU_INDIRECT_OPCODES_LEN 0x20
 
+enum {
+    CACHED_FN_TYPE_NONE,
+};
+
 struct CPUArchState {
     /* Most commonly used resources during translated code execution first */
     target_ulong gpr[32];  /* general purpose registers */
@@ -1157,6 +1161,8 @@ struct CPUArchState {
     float_status fp_status; /* Floating point execution context */
     target_ulong fpscr;     /* Floating point status and control register */
 
+    int cached_fn_type;
+
     /* Internal devices resources */
     ppc_tb_t *tb_env;      /* Time base and decrementer */
     ppc_dcr_t *dcr_env;    /* Device control registers */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 43f2480e94..6de8c369b8 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1910,6 +1910,8 @@ void raise_exception_err_ra(CPUPPCState *env, uint32_t exception,
 {
     CPUState *cs = env_cpu(env);
 
+    helper_execute_fp_cached(env);
+
     cs->exception_index = exception;
     env->error_code = error_code;
     cpu_loop_exit_restore(cs, raddr);
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index ae25f32d6e..34b242c025 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -23,6 +23,17 @@
 #include "internal.h"
 #include "fpu/softfloat.h"
 
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_LINUX_USER)
+#define CACHE_FN_NONE(env)                                                    \
+    do {                                                                      \
+        assert(!(env->fp_status.float_exception_flags &                       \
+                 float_flag_inexact));                                        \
+        env->cached_fn_type = CACHED_FN_TYPE_NONE;                            \
+    } while (0)
+#else
+#define CACHE_FN_NONE(env)
+#endif
+
 static inline float128 float128_snan_to_qnan(float128 x)
 {
     float128 r;
@@ -514,6 +525,24 @@ void helper_reset_fpstatus(CPUPPCState *env)
     set_float_exception_flags(0, &env->fp_status);
 }
 
+void helper_execute_fp_cached(CPUPPCState *env)
+{
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_LINUX_USER)
+    switch (env->cached_fn_type) {
+    case CACHED_FN_TYPE_NONE:
+        /*
+         * the last fp instruction was executed in softfloat
+         * so no need to execute it again
+         */
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    env->cached_fn_type = CACHED_FN_TYPE_NONE;
+#endif
+}
+
 static void float_invalid_op_addsub(CPUPPCState *env, int flags,
                                     bool set_fpcc, uintptr_t retaddr)
 {
@@ -527,6 +556,7 @@ static void float_invalid_op_addsub(CPUPPCState *env, int flags,
 /* fadd - fadd. */
 float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64_add(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -540,6 +570,7 @@ float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fadds - fadds. */
 float64 helper_fadds(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64r32_add(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -552,6 +583,7 @@ float64 helper_fadds(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fsub - fsub. */
 float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64_sub(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -565,6 +597,7 @@ float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fsubs - fsubs. */
 float64 helper_fsubs(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64r32_sub(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -587,6 +620,7 @@ static void float_invalid_op_mul(CPUPPCState *env, int flags,
 /* fmul - fmul. */
 float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64_mul(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -600,6 +634,7 @@ float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fmuls - fmuls. */
 float64 helper_fmuls(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64r32_mul(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -624,6 +659,7 @@ static void float_invalid_op_div(CPUPPCState *env, int flags,
 /* fdiv - fdiv. */
 float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64_div(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -640,6 +676,7 @@ float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fdivs - fdivs. */
 float64 helper_fdivs(CPUPPCState *env, float64 arg1, float64 arg2)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64r32_div(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -672,6 +709,7 @@ static uint64_t float_invalid_cvt(CPUPPCState *env, int flags,
 #define FPU_FCTI(op, cvt, nanval)                                      \
 uint64_t helper_##op(CPUPPCState *env, float64 arg)                    \
 {                                                                      \
+    CACHE_FN_NONE(env);                                                \
     uint64_t ret = float64_to_##cvt(arg, &env->fp_status);             \
     int flags = get_float_exception_flags(&env->fp_status);            \
     if (unlikely(flags & float_flag_invalid)) {                        \
@@ -694,6 +732,8 @@ uint64_t helper_##op(CPUPPCState *env, uint64_t arg)       \
 {                                                          \
     CPU_DoubleU farg;                                      \
                                                            \
+    CACHE_FN_NONE(env);                                    \
+                                                           \
     if (is_single) {                                       \
         float32 tmp = cvtr(arg, &env->fp_status);          \
         farg.d = float32_to_float64(tmp, &env->fp_status); \
@@ -715,6 +755,8 @@ static uint64_t do_fri(CPUPPCState *env, uint64_t arg,
     FloatRoundMode old_rounding_mode = get_float_rounding_mode(&env->fp_status);
     int flags;
 
+    CACHE_FN_NONE(env);
+
     set_float_rounding_mode(rounding_mode, &env->fp_status);
     arg = float64_round_to_int(arg, &env->fp_status);
     set_float_rounding_mode(old_rounding_mode, &env->fp_status);
@@ -764,6 +806,7 @@ static void float_invalid_op_madd(CPUPPCState *env, int flags,
 static float64 do_fmadd(CPUPPCState *env, float64 a, float64 b,
                          float64 c, int madd_flags, uintptr_t retaddr)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64_muladd(a, b, c, madd_flags, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -776,6 +819,7 @@ static float64 do_fmadd(CPUPPCState *env, float64 a, float64 b,
 static uint64_t do_fmadds(CPUPPCState *env, float64 a, float64 b,
                           float64 c, int madd_flags, uintptr_t retaddr)
 {
+    CACHE_FN_NONE(env);
     float64 ret = float64r32_muladd(a, b, c, madd_flags, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -817,6 +861,7 @@ static uint64_t do_frsp(CPUPPCState *env, uint64_t arg, uintptr_t retaddr)
 
 uint64_t helper_frsp(CPUPPCState *env, uint64_t arg)
 {
+    CACHE_FN_NONE(env);
     return do_frsp(env, arg, GETPC());
 }
 
@@ -833,6 +878,7 @@ static void float_invalid_op_sqrt(CPUPPCState *env, int flags,
 #define FPU_FSQRT(name, op)                                                   \
 float64 helper_##name(CPUPPCState *env, float64 arg)                          \
 {                                                                             \
+    CACHE_FN_NONE(env);                                                       \
     float64 ret = op(arg, &env->fp_status);                                   \
     int flags = get_float_exception_flags(&env->fp_status);                   \
                                                                               \
@@ -849,6 +895,7 @@ FPU_FSQRT(FSQRTS, float64r32_sqrt)
 /* fre - fre. */
 float64 helper_fre(CPUPPCState *env, float64 arg)
 {
+    CACHE_FN_NONE(env);
     /* "Estimate" the reciprocal with actual division.  */
     float64 ret = float64_div(float64_one, arg, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
@@ -868,6 +915,7 @@ float64 helper_fre(CPUPPCState *env, float64 arg)
 /* fres - fres. */
 uint64_t helper_fres(CPUPPCState *env, uint64_t arg)
 {
+    CACHE_FN_NONE(env);
     /* "Estimate" the reciprocal with actual division.  */
     float64 ret = float64r32_div(float64_one, arg, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
@@ -887,6 +935,7 @@ uint64_t helper_fres(CPUPPCState *env, uint64_t arg)
 /* frsqrte  - frsqrte. */
 float64 helper_frsqrte(CPUPPCState *env, float64 arg)
 {
+    CACHE_FN_NONE(env);
     /* "Estimate" the reciprocal with actual division.  */
     float64 rets = float64_sqrt(arg, &env->fp_status);
     float64 retd = float64_div(float64_one, rets, &env->fp_status);
@@ -906,6 +955,7 @@ float64 helper_frsqrte(CPUPPCState *env, float64 arg)
 /* frsqrtes  - frsqrtes. */
 float64 helper_frsqrtes(CPUPPCState *env, float64 arg)
 {
+    CACHE_FN_NONE(env);
     /* "Estimate" the reciprocal with actual division.  */
     float64 rets = float64_sqrt(arg, &env->fp_status);
     float64 retd = float64r32_div(float64_one, rets, &env->fp_status);
@@ -1706,6 +1756,7 @@ void helper_##name(CPUPPCState *env, ppc_vsr_t *xt,                          \
     int i;                                                                   \
                                                                              \
     helper_reset_fpstatus(env);                                              \
+    CACHE_FN_NONE(env);                                                      \
                                                                              \
     for (i = 0; i < nels; i++) {                                             \
         float_status tstat = env->fp_status;                                 \
@@ -1746,6 +1797,7 @@ void helper_xsaddqp(CPUPPCState *env, uint32_t opcode,
     float_status tstat;
 
     helper_reset_fpstatus(env);
+    CACHE_FN_NONE(env);
 
     tstat = env->fp_status;
     if (unlikely(Rc(opcode) != 0)) {
@@ -1853,6 +1905,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     int i;                                                                    \
                                                                               \
     helper_reset_fpstatus(env);                                               \
+    CACHE_FN_NONE(env);                                                       \
                                                                               \
     for (i = 0; i < nels; i++) {                                              \
         float_status tstat = env->fp_status;                                  \
@@ -2684,6 +2737,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)   \
     int i;                                                         \
                                                                    \
     helper_reset_fpstatus(env);                                    \
+    CACHE_FN_NONE(env);                                            \
                                                                    \
     for (i = 0; i < nels; i++) {                                   \
         t.tfld = stp##_to_##ttp(xb->sfld, &env->fp_status);        \
@@ -2711,6 +2765,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)      \
     int i;                                                            \
                                                                       \
     helper_reset_fpstatus(env);                                       \
+    CACHE_FN_NONE(env);                                               \
                                                                       \
     for (i = 0; i < nels; i++) {                                      \
         t.VsrW(2 * i) = stp##_to_##ttp(xb->VsrD(i), &env->fp_status); \
@@ -2750,6 +2805,7 @@ void helper_##op(CPUPPCState *env, uint32_t opcode,                     \
     int i;                                                              \
                                                                         \
     helper_reset_fpstatus(env);                                         \
+    CACHE_FN_NONE(env);                                                 \
                                                                         \
     for (i = 0; i < nels; i++) {                                        \
         t.tfld = stp##_to_##ttp(xb->sfld, &env->fp_status);             \
@@ -2787,6 +2843,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)   \
     int i;                                                         \
                                                                    \
     helper_reset_fpstatus(env);                                    \
+    CACHE_FN_NONE(env);                                            \
                                                                    \
     for (i = 0; i < nels; i++) {                                   \
         t.tfld = stp##_to_##ttp(xb->sfld, 1, &env->fp_status);     \
@@ -2836,6 +2893,7 @@ void helper_XSCVQPDP(CPUPPCState *env, uint32_t ro, ppc_vsr_t *xt,
     float_status tstat;
 
     helper_reset_fpstatus(env);
+    CACHE_FN_NONE(env);
 
     tstat = env->fp_status;
     if (ro != 0) {
@@ -2862,6 +2920,8 @@ uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
     float_status tstat = env->fp_status;
     set_float_exception_flags(0, &tstat);
 
+    CACHE_FN_NONE(env);
+
     sign = extract64(xb, 63,  1);
     exp  = extract64(xb, 52, 11);
     frac = extract64(xb,  0, 52) | 0x10000000000000ULL;
@@ -2897,6 +2957,7 @@ uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
 
 uint64_t helper_XSCVSPDPN(uint64_t xb)
 {
+    /* TODO: missing env for CACHE_FN_NONE(env); */
     return helper_todouble(xb >> 32);
 }
 
@@ -2919,6 +2980,8 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)             \
                                                                              \
     helper_reset_fpstatus(env);                                              \
                                                                              \
+    CACHE_FN_NONE(env);                                                      \
+                                                                             \
     for (i = 0; i < nels; i++) {                                             \
         t.tfld = stp##_to_##ttp##_round_to_zero(xb->sfld, &env->fp_status);  \
         flags = env->fp_status.float_exception_flags;                        \
@@ -2953,6 +3016,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)               \
     int flags;                                                                 \
                                                                                \
     helper_reset_fpstatus(env);                                                \
+    CACHE_FN_NONE(env);                                                        \
     t.s128 = float128_to_##tp##_round_to_zero(xb->f128, &env->fp_status);      \
     flags = get_float_exception_flags(&env->fp_status);                        \
     if (unlikely(flags & float_flag_invalid)) {                                \
@@ -2984,6 +3048,8 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)             \
                                                                              \
     helper_reset_fpstatus(env);                                              \
                                                                              \
+    CACHE_FN_NONE(env);                                                      \
+                                                                             \
     for (i = 0; i < nels; i++) {                                             \
         t.VsrW(2 * i) = stp##_to_##ttp##_round_to_zero(xb->VsrD(i),          \
                                                        &env->fp_status);     \
@@ -3021,6 +3087,7 @@ void helper_##op(CPUPPCState *env, uint32_t opcode,                          \
     int flags;                                                               \
                                                                              \
     helper_reset_fpstatus(env);                                              \
+    CACHE_FN_NONE(env);                                                      \
                                                                              \
     t.tfld = stp##_to_##ttp##_round_to_zero(xb->sfld, &env->fp_status);      \
     flags = get_float_exception_flags(&env->fp_status);                      \
@@ -3057,6 +3124,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)        \
     int i;                                                              \
                                                                         \
     helper_reset_fpstatus(env);                                         \
+    CACHE_FN_NONE(env);                                                 \
                                                                         \
     for (i = 0; i < nels; i++) {                                        \
         t.tfld = stp##_to_##ttp(xb->sfld, &env->fp_status);             \
@@ -3105,6 +3173,7 @@ VSX_CVT_INT_TO_FP2(xvcvuxdsp, uint64, float32)
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)\
 {                                                               \
     helper_reset_fpstatus(env);                                 \
+    CACHE_FN_NONE(env);                                         \
     xt->f128 = tp##_to_float128(xb->s128, &env->fp_status);     \
     helper_compute_fprf_float128(env, xt->f128);                \
     do_float_check_status(env, true, GETPC());                  \
@@ -3128,6 +3197,8 @@ void helper_##op(CPUPPCState *env, uint32_t opcode,                     \
     ppc_vsr_t t = *xt;                                                  \
                                                                         \
     helper_reset_fpstatus(env);                                         \
+    CACHE_FN_NONE(env);                                                 \
+                                                                        \
     t.tfld = stp##_to_##ttp(xb->sfld, &env->fp_status);                 \
     helper_compute_fprf_##ttp(env, t.tfld);                             \
                                                                         \
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 57eee07256..88147b68a0 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -76,6 +76,7 @@ DEF_HELPER_FLAGS_2(brinc, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_1(float_check_status, void, env)
 DEF_HELPER_1(fpscr_check_status, void, env)
 DEF_HELPER_1(reset_fpstatus, void, env)
+DEF_HELPER_1(execute_fp_cached, void, env)
 DEF_HELPER_2(compute_fprf_float64, void, env, i64)
 DEF_HELPER_3(store_fpscr, void, env, i64, i32)
 DEF_HELPER_2(fpscr_clrbit, void, env, i32)
diff --git a/target/ppc/translate/fp-impl.c.inc b/target/ppc/translate/fp-impl.c.inc
index 8d5cf0f982..10dbfb6edd 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -633,6 +633,7 @@ static bool trans_MFFS(DisasContext *ctx, arg_X_t_rc *a)
     REQUIRE_FPU(ctx);
 
     gen_reset_fpstatus();
+    gen_helper_execute_fp_cached(cpu_env);
     fpscr = place_from_fpscr(a->rt, UINT64_MAX);
     if (a->rc) {
         gen_set_cr1_from_fpscr(ctx);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 2/5] target/ppc: Implement instruction caching for fsqrt
  2022-10-26 19:25 [RFC PATCH v2 0/5] Idea for using hardfloat in PPC Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 1/5] target/ppc: prepare instructions to work with caching last FP insn Víctor Colombo
@ 2022-10-26 19:25 ` Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 3/5] target/ppc: Implement instruction caching for muladd Víctor Colombo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Víctor Colombo @ 2022-10-26 19:25 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: clg, danielhb413, david, groug, richard.henderson, aurelien,
	peter.maydell, alex.bennee, balaton, victor.colombo,
	matheus.ferst, lucas.araujo, leandro.lupori, lucas.coutinho

This patch adds the code necessary to cache fsqrt for usage
with hardfpu in Power. It is also the first instruction to
use the new cache instruction system.

fsqrt is an instruction that receives two arguments, one f64 and
one status, and returns f64. This info will be cached inside a new
union in env, which will grow when other instructions with other
signatures are added.

Hardfpu in QEMU only works when the inexact is already set. So,
CACHE_FN_3 will check if FP_XX is set, and set float_flag_inexact
to enable the hardfpu behavior. When the instruction is later
reexecuted, it will be with float_flag_inexact cleared, forcing
softfloat and correctly updating the relevant flags, as is today.

This implementation only works in linux-user. No test or effort
was done in this patch to make it work for softmmu. Future work
will be required to make it work correctly in this scenario.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
---
 target/ppc/cpu.h        | 11 +++++++++++
 target/ppc/fpu_helper.c | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 116ee639ff..e55c10b0db 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1082,6 +1082,14 @@ struct ppc_radix_page_info {
 
 enum {
     CACHED_FN_TYPE_NONE,
+    CACHED_FN_TYPE_F64_F64_FSTATUS,
+
+};
+
+struct cached_fn_f64_f64_fstatus {
+    float64 (*fn)(float64, float_status*);
+    float64 arg1;
+    float_status arg2;
 };
 
 struct CPUArchState {
@@ -1162,6 +1170,9 @@ struct CPUArchState {
     target_ulong fpscr;     /* Floating point status and control register */
 
     int cached_fn_type;
+    union {
+        struct cached_fn_f64_f64_fstatus f64_f64_fstatus;
+    } cached_fn;
 
     /* Internal devices resources */
     ppc_tb_t *tb_env;      /* Time base and decrementer */
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 34b242c025..1756719664 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -30,8 +30,24 @@
                  float_flag_inexact));                                        \
         env->cached_fn_type = CACHED_FN_TYPE_NONE;                            \
     } while (0)
+
+#define CACHE_FN_3(env, FN, ARG1, ARG2, FIELD, TYPE)                          \
+    do {                                                                      \
+        if (env->fpscr & FP_XX) {                                             \
+            env->cached_fn_type = TYPE;                                       \
+            env->cached_fn.FIELD.fn = FN;                                     \
+            env->cached_fn.FIELD.arg1 = ARG1;                                 \
+            env->cached_fn.FIELD.arg2 = ARG2;                                 \
+            env->fp_status.float_exception_flags |= float_flag_inexact;       \
+        } else {                                                              \
+            assert(!(env->fp_status.float_exception_flags &                   \
+                     float_flag_inexact));                                    \
+            env->cached_fn_type = CACHED_FN_TYPE_NONE;                        \
+        }                                                                     \
+    } while (0)
 #else
 #define CACHE_FN_NONE(env)
+#define CACHE_FN_3(env, FN, ARG1, ARG2, FIELD, TYPE)
 #endif
 
 static inline float128 float128_snan_to_qnan(float128 x)
@@ -535,6 +551,27 @@ void helper_execute_fp_cached(CPUPPCState *env)
          * so no need to execute it again
          */
         break;
+    case CACHED_FN_TYPE_F64_F64_FSTATUS:
+        /*
+         * execute the cached insn. At this point, float_exception_flags
+         * should have FI not set, otherwise the result will not be correct
+         */
+        assert((env->cached_fn.f64_f64_fstatus.arg2.float_exception_flags &
+               float_flag_inexact) == 0);
+        env->cached_fn.f64_f64_fstatus.fn(
+            env->cached_fn.f64_f64_fstatus.arg1,
+            &env->cached_fn.f64_f64_fstatus.arg2);
+
+        env->fpscr &= ~FP_FI;
+        /*
+         * if the cached instruction resulted in FI being set
+         * then we update fpscr with this value
+         */
+        if (env->cached_fn.f64_f64_fstatus.arg2.float_exception_flags &
+            float_flag_inexact) {
+            env->fpscr |= FP_FI | FP_XX;
+        }
+        break;
     default:
         g_assert_not_reached();
     }
@@ -878,7 +915,8 @@ static void float_invalid_op_sqrt(CPUPPCState *env, int flags,
 #define FPU_FSQRT(name, op)                                                   \
 float64 helper_##name(CPUPPCState *env, float64 arg)                          \
 {                                                                             \
-    CACHE_FN_NONE(env);                                                       \
+    CACHE_FN_3(env, op, arg, env->fp_status, f64_f64_fstatus,                 \
+        CACHED_FN_TYPE_F64_F64_FSTATUS);                                      \
     float64 ret = op(arg, &env->fp_status);                                   \
     int flags = get_float_exception_flags(&env->fp_status);                   \
                                                                               \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 3/5] target/ppc: Implement instruction caching for muladd
  2022-10-26 19:25 [RFC PATCH v2 0/5] Idea for using hardfloat in PPC Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 1/5] target/ppc: prepare instructions to work with caching last FP insn Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 2/5] target/ppc: Implement instruction caching for fsqrt Víctor Colombo
@ 2022-10-26 19:25 ` Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 4/5] target/ppc: Implement instruction caching for add/sub/mul/div Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 5/5] target/ppc: Enable hardfpu for Power Víctor Colombo
  4 siblings, 0 replies; 6+ messages in thread
From: Víctor Colombo @ 2022-10-26 19:25 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: clg, danielhb413, david, groug, richard.henderson, aurelien,
	peter.maydell, alex.bennee, balaton, victor.colombo,
	matheus.ferst, lucas.araujo, leandro.lupori, lucas.coutinho

This patch adds the code necessary to cache muladd instructions
for usage with hardfpu in Power.

muladd is an instruction that receives four arguments, three f64 and
one status, and returns f64. This info will be cached inside the
union in env, which grows when other instructions with other
signatures are added.

Hardfpu in QEMU only works when the inexact is already set. So,
CACHE_FN_5 will check if FP_XX is set, and set float_flag_inexact
to enable the hardfpu behavior. When the instruction is later
reexecuted, it will be with float_flag_inexact cleared, forcing
softfloat and correctly updating the relevant flags, as is today.

This implementation only works in linux-user. No test or effort
was done in this patch to make it work for softmmu. Future work
will be required to make it work correctly in this scenario.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
---
 target/ppc/cpu.h        | 11 +++++++++++
 target/ppc/fpu_helper.c | 35 +++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index e55c10b0db..f6803bf37b 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1083,6 +1083,7 @@ struct ppc_radix_page_info {
 enum {
     CACHED_FN_TYPE_NONE,
     CACHED_FN_TYPE_F64_F64_FSTATUS,
+    CACHED_FN_TYPE_F64_F64_F64_F64_I_FSTATUS,
 
 };
 
@@ -1092,6 +1093,15 @@ struct cached_fn_f64_f64_fstatus {
     float_status arg2;
 };
 
+struct cached_fn_f64_f64_f64_f64_i_fstatus {
+    float64 (*fn)(float64, float64, float64, int, float_status*);
+    float64 arg1;
+    float64 arg2;
+    float64 arg3;
+    int arg4;
+    float_status arg5;
+};
+
 struct CPUArchState {
     /* Most commonly used resources during translated code execution first */
     target_ulong gpr[32];  /* general purpose registers */
@@ -1172,6 +1182,7 @@ struct CPUArchState {
     int cached_fn_type;
     union {
         struct cached_fn_f64_f64_fstatus f64_f64_fstatus;
+        struct cached_fn_f64_f64_f64_f64_i_fstatus f64_f64_f64_f64_i_fstatus;
     } cached_fn;
 
     /* Internal devices resources */
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 1756719664..a152c018b2 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -45,9 +45,27 @@
             env->cached_fn_type = CACHED_FN_TYPE_NONE;                        \
         }                                                                     \
     } while (0)
+
+#define CACHE_FN_5(env, FN, ARG1, ARG2, ARG3, ARG4, FIELD, TYPE)              \
+    do {                                                                      \
+        if (env->fpscr & FP_XX) {                                             \
+            env->cached_fn_type = TYPE;                                       \
+            env->cached_fn.FIELD.fn = FN;                                     \
+            env->cached_fn.FIELD.arg1 = ARG1;                                 \
+            env->cached_fn.FIELD.arg2 = ARG2;                                 \
+            env->cached_fn.FIELD.arg3 = ARG3;                                 \
+            env->cached_fn.FIELD.arg4 = ARG4;                                 \
+            env->fp_status.float_exception_flags |= float_flag_inexact;       \
+        } else {                                                              \
+            assert(!(env->fp_status.float_exception_flags &                   \
+                     float_flag_inexact));                                    \
+            env->cached_fn_type = CACHED_FN_TYPE_NONE;                        \
+        }                                                                     \
+    } while (0)
 #else
 #define CACHE_FN_NONE(env)
 #define CACHE_FN_3(env, FN, ARG1, ARG2, FIELD, TYPE)
+#define CACHE_FN_5(env, FN, ARG1, ARG2, ARG3, ARG4, FIELD, TYPE)
 #endif
 
 static inline float128 float128_snan_to_qnan(float128 x)
@@ -572,6 +590,17 @@ void helper_execute_fp_cached(CPUPPCState *env)
             env->fpscr |= FP_FI | FP_XX;
         }
         break;
+    case CACHED_FN_TYPE_F64_F64_F64_F64_I_FSTATUS:
+        ; /* hack to allow declaration below */
+        struct cached_fn_f64_f64_f64_f64_i_fstatus args =
+            env->cached_fn.f64_f64_f64_f64_i_fstatus;
+        assert(!(args.arg5.float_exception_flags & float_flag_inexact));
+        args.fn(args.arg1, args.arg2, args.arg3, args.arg4, &args.arg5);
+        env->fpscr &= ~FP_FI;
+        if (args.arg5.float_exception_flags & float_flag_inexact) {
+            env->fpscr |= FP_FI | FP_XX;
+        }
+        break;
     default:
         g_assert_not_reached();
     }
@@ -843,7 +872,8 @@ static void float_invalid_op_madd(CPUPPCState *env, int flags,
 static float64 do_fmadd(CPUPPCState *env, float64 a, float64 b,
                          float64 c, int madd_flags, uintptr_t retaddr)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_5(env, float64_muladd, a, b, c, madd_flags,
+        f64_f64_f64_f64_i_fstatus, CACHED_FN_TYPE_F64_F64_F64_F64_I_FSTATUS);
     float64 ret = float64_muladd(a, b, c, madd_flags, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -856,7 +886,8 @@ static float64 do_fmadd(CPUPPCState *env, float64 a, float64 b,
 static uint64_t do_fmadds(CPUPPCState *env, float64 a, float64 b,
                           float64 c, int madd_flags, uintptr_t retaddr)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_5(env, float64r32_muladd, a, b, c, madd_flags,
+        f64_f64_f64_f64_i_fstatus, CACHED_FN_TYPE_F64_F64_F64_F64_I_FSTATUS);
     float64 ret = float64r32_muladd(a, b, c, madd_flags, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 4/5] target/ppc: Implement instruction caching for add/sub/mul/div
  2022-10-26 19:25 [RFC PATCH v2 0/5] Idea for using hardfloat in PPC Víctor Colombo
                   ` (2 preceding siblings ...)
  2022-10-26 19:25 ` [RFC PATCH v2 3/5] target/ppc: Implement instruction caching for muladd Víctor Colombo
@ 2022-10-26 19:25 ` Víctor Colombo
  2022-10-26 19:25 ` [RFC PATCH v2 5/5] target/ppc: Enable hardfpu for Power Víctor Colombo
  4 siblings, 0 replies; 6+ messages in thread
From: Víctor Colombo @ 2022-10-26 19:25 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: clg, danielhb413, david, groug, richard.henderson, aurelien,
	peter.maydell, alex.bennee, balaton, victor.colombo,
	matheus.ferst, lucas.araujo, leandro.lupori, lucas.coutinho

This patch adds the code necessary to cache add/sub/mul/div
instructions for usage with hardfpu in Power.

These instructions receives three arguments, two f64 and
one status, and returns f64. This info will be cached inside the
union in env, which grows when other instructions with other
signatures are added.

Hardfpu in QEMU only works when the inexact is already set. So,
CACHE_FN_4 will check if FP_XX is set, and set float_flag_inexact
to enable the hardfpu behavior. When the instruction is later
reexecuted, it will be with float_flag_inexact cleared, forcing
softfloat and correctly updating the relevant flags, as is today.

This implementation only works in linux-user. No test or effort
was done in this patch to make it work for softmmu. Future work
will be required to make it work correctly in this scenario.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
---
 target/ppc/cpu.h        |  9 +++++++
 target/ppc/fpu_helper.c | 56 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index f6803bf37b..a25787d939 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1083,6 +1083,7 @@ struct ppc_radix_page_info {
 enum {
     CACHED_FN_TYPE_NONE,
     CACHED_FN_TYPE_F64_F64_FSTATUS,
+    CACHED_FN_TYPE_F64_F64_F64_FSTATUS,
     CACHED_FN_TYPE_F64_F64_F64_F64_I_FSTATUS,
 
 };
@@ -1093,6 +1094,13 @@ struct cached_fn_f64_f64_fstatus {
     float_status arg2;
 };
 
+struct cached_fn_f64_f64_f64_fstatus {
+    float64 (*fn)(float64, float64, float_status*);
+    float64 arg1;
+    float64 arg2;
+    float_status arg3;
+};
+
 struct cached_fn_f64_f64_f64_f64_i_fstatus {
     float64 (*fn)(float64, float64, float64, int, float_status*);
     float64 arg1;
@@ -1182,6 +1190,7 @@ struct CPUArchState {
     int cached_fn_type;
     union {
         struct cached_fn_f64_f64_fstatus f64_f64_fstatus;
+        struct cached_fn_f64_f64_f64_fstatus f64_f64_f64_fstatus;
         struct cached_fn_f64_f64_f64_f64_i_fstatus f64_f64_f64_f64_i_fstatus;
     } cached_fn;
 
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index a152c018b2..0bea9df361 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -46,6 +46,22 @@
         }                                                                     \
     } while (0)
 
+#define CACHE_FN_4(env, FN, ARG1, ARG2, ARG3, FIELD, TYPE)                    \
+    do {                                                                      \
+        if (env->fpscr & FP_XX) {                                             \
+            env->cached_fn_type = TYPE;                                       \
+            env->cached_fn.FIELD.fn = FN;                                     \
+            env->cached_fn.FIELD.arg1 = ARG1;                                 \
+            env->cached_fn.FIELD.arg2 = ARG2;                                 \
+            env->cached_fn.FIELD.arg3 = ARG3;                                 \
+            env->fp_status.float_exception_flags |= float_flag_inexact;       \
+        } else {                                                              \
+            assert(!(env->fp_status.float_exception_flags &                   \
+                     float_flag_inexact));                                    \
+            env->cached_fn_type = CACHED_FN_TYPE_NONE;                        \
+        }                                                                     \
+    } while (0)
+
 #define CACHE_FN_5(env, FN, ARG1, ARG2, ARG3, ARG4, FIELD, TYPE)              \
     do {                                                                      \
         if (env->fpscr & FP_XX) {                                             \
@@ -65,6 +81,7 @@
 #else
 #define CACHE_FN_NONE(env)
 #define CACHE_FN_3(env, FN, ARG1, ARG2, FIELD, TYPE)
+#define CACHE_FN_4(env, FN, ARG1, ARG2, ARG3, FIELD, TYPE)
 #define CACHE_FN_5(env, FN, ARG1, ARG2, ARG3, ARG4, FIELD, TYPE)
 #endif
 
@@ -590,6 +607,24 @@ void helper_execute_fp_cached(CPUPPCState *env)
             env->fpscr |= FP_FI | FP_XX;
         }
         break;
+    case CACHED_FN_TYPE_F64_F64_F64_FSTATUS:
+        assert((env->cached_fn.f64_f64_f64_fstatus.arg3.float_exception_flags &
+               float_flag_inexact) == 0);
+        env->cached_fn.f64_f64_f64_fstatus.fn(
+            env->cached_fn.f64_f64_f64_fstatus.arg1,
+            env->cached_fn.f64_f64_f64_fstatus.arg2,
+            &env->cached_fn.f64_f64_f64_fstatus.arg3);
+
+        env->fpscr &= ~FP_FI;
+        /*
+         * if the cached instruction resulted in FI being set
+         * then we update fpscr with this value
+         */
+        if (env->cached_fn.f64_f64_f64_fstatus.arg3.float_exception_flags &
+            float_flag_inexact) {
+            env->fpscr |= FP_FI | FP_XX;
+        }
+        break;
     case CACHED_FN_TYPE_F64_F64_F64_F64_I_FSTATUS:
         ; /* hack to allow declaration below */
         struct cached_fn_f64_f64_f64_f64_i_fstatus args =
@@ -622,7 +657,8 @@ static void float_invalid_op_addsub(CPUPPCState *env, int flags,
 /* fadd - fadd. */
 float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64_add, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64_add(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -636,7 +672,8 @@ float64 helper_fadd(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fadds - fadds. */
 float64 helper_fadds(CPUPPCState *env, float64 arg1, float64 arg2)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64r32_add, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64r32_add(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -649,7 +686,8 @@ float64 helper_fadds(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fsub - fsub. */
 float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64_sub, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64_sub(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -663,7 +701,8 @@ float64 helper_fsub(CPUPPCState *env, float64 arg1, float64 arg2)
 /* fsubs - fsubs. */
 float64 helper_fsubs(CPUPPCState *env, float64 arg1, float64 arg2)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64r32_sub, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64r32_sub(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -686,7 +725,8 @@ static void float_invalid_op_mul(CPUPPCState *env, int flags,
 /* fmul - fmul. */
 float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
 {
-    CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64_mul, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64_mul(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -701,6 +741,8 @@ float64 helper_fmul(CPUPPCState *env, float64 arg1, float64 arg2)
 float64 helper_fmuls(CPUPPCState *env, float64 arg1, float64 arg2)
 {
     CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64r32_mul, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64r32_mul(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -726,6 +768,8 @@ static void float_invalid_op_div(CPUPPCState *env, int flags,
 float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
 {
     CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64_div, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64_div(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
@@ -743,6 +787,8 @@ float64 helper_fdiv(CPUPPCState *env, float64 arg1, float64 arg2)
 float64 helper_fdivs(CPUPPCState *env, float64 arg1, float64 arg2)
 {
     CACHE_FN_NONE(env);
+    CACHE_FN_4(env, float64r32_div, arg1, arg2, env->fp_status,
+        f64_f64_f64_fstatus, CACHED_FN_TYPE_F64_F64_F64_FSTATUS);
     float64 ret = float64r32_div(arg1, arg2, &env->fp_status);
     int flags = get_float_exception_flags(&env->fp_status);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 5/5] target/ppc: Enable hardfpu for Power
  2022-10-26 19:25 [RFC PATCH v2 0/5] Idea for using hardfloat in PPC Víctor Colombo
                   ` (3 preceding siblings ...)
  2022-10-26 19:25 ` [RFC PATCH v2 4/5] target/ppc: Implement instruction caching for add/sub/mul/div Víctor Colombo
@ 2022-10-26 19:25 ` Víctor Colombo
  4 siblings, 0 replies; 6+ messages in thread
From: Víctor Colombo @ 2022-10-26 19:25 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: clg, danielhb413, david, groug, richard.henderson, aurelien,
	peter.maydell, alex.bennee, balaton, victor.colombo,
	matheus.ferst, lucas.araujo, leandro.lupori, lucas.coutinho

Change the build conditional from softfloat.c, allowing TARGET_PPC
to use hardfpu. For PPC, this is only implemented in linux-user.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
---
 fpu/softfloat.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c7454c3eb1..f395096275 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -220,11 +220,13 @@ GEN_INPUT_FLUSH3(float64_input_flush3, float64)
  * the use of hardfloat, since hardfloat relies on the inexact flag being
  * already set.
  */
-#if defined(TARGET_PPC) || defined(__FAST_MATH__)
-# if defined(__FAST_MATH__)
-#  warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
+#if defined(__FAST_MATH__)
+# warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
     IEEE implementation
-# endif
+# define QEMU_NO_HARDFLOAT 1
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
+#elif defined(TARGET_PPC) && (!defined(CONFIG_USER_ONLY) || !defined(CONFIG_LINUX_USER))
+/* In PPC hardfloat only works for linux-user */
 # define QEMU_NO_HARDFLOAT 1
 # define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
 #else
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-26 19:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 19:25 [RFC PATCH v2 0/5] Idea for using hardfloat in PPC Víctor Colombo
2022-10-26 19:25 ` [RFC PATCH v2 1/5] target/ppc: prepare instructions to work with caching last FP insn Víctor Colombo
2022-10-26 19:25 ` [RFC PATCH v2 2/5] target/ppc: Implement instruction caching for fsqrt Víctor Colombo
2022-10-26 19:25 ` [RFC PATCH v2 3/5] target/ppc: Implement instruction caching for muladd Víctor Colombo
2022-10-26 19:25 ` [RFC PATCH v2 4/5] target/ppc: Implement instruction caching for add/sub/mul/div Víctor Colombo
2022-10-26 19:25 ` [RFC PATCH v2 5/5] target/ppc: Enable hardfpu for Power Víctor Colombo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.