All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations
@ 2012-10-09 20:27 Aurelien Jarno
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 01/14] softfloat: implement fused multiply-add NaN propagation for MIPS Aurelien Jarno
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

This patch series does some bug fixes and code cleanup in the MIPS
target, and then does some optimizations.

Aurelien Jarno (14):
  softfloat: implement fused multiply-add NaN propagation for MIPS
  target-mips: use the softfloat floatXX_muladd functions
  target-mips: fix FPU exceptions
  target-mips: use softfloat constants when possible
  target-mips: cleanup load/store operations
  target-mips: optimize load operations
  target-mips: simplify load/store microMIPS helpers
  target-mips: implement unaligned loads using TCG
  target-mips: don't use local temps for store conditional
  target-mips: implement movn/movz using movcond
  target-mips: optimize ddiv/ddivu/div/divu with movcond
  target-mips: use deposit instead of hardcoded version
  target-mips: fix TLBR wrt SEGMask
  target-mips: don't flush extra TLB on permissions upgrade

 fpu/softfloat-specialize.h |   27 +++
 target-mips/helper.h       |   12 +-
 target-mips/op_helper.c    |  573 +++++++++++++++-----------------------------
 target-mips/translate.c    |  364 ++++++++++++++--------------
 4 files changed, 409 insertions(+), 567 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 01/14] softfloat: implement fused multiply-add NaN propagation for MIPS
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions Aurelien Jarno
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Add a pickNaNMulAdd function for MIPS, implementing NaN propagation
rules for MIPS fused multiply-add instructions.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 fpu/softfloat-specialize.h |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index a1d489e..518f694 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -486,6 +486,33 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
         return 1;
     }
 }
+#elif defined(TARGET_MIPS)
+static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
+                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
+{
+    /* For MIPS, the (inf,zero,qnan) case sets InvalidOp and returns
+     * the default NaN
+     */
+    if (infzero) {
+        float_raise(float_flag_invalid STATUS_VAR);
+        return 3;
+    }
+
+    /* Prefer sNaN over qNaN, in the a, b, c order. */
+    if (aIsSNaN) {
+        return 0;
+    } else if (bIsSNaN) {
+        return 1;
+    } else if (cIsSNaN) {
+        return 2;
+    } else if (aIsQNaN) {
+        return 0;
+    } else if (bIsQNaN) {
+        return 1;
+    } else {
+        return 2;
+    }
+}
 #elif defined(TARGET_PPC)
 static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
                          flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 01/14] softfloat: implement fused multiply-add NaN propagation for MIPS Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 19:58   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 03/14] target-mips: fix FPU exceptions Aurelien Jarno
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Use the new softfloat floatXX_muladd() functions to implement the madd,
msub, nmadd and nmsub instructions. At the same time replace the name of
the helpers by the name of the instruction, as the only reason for the
previous names was to keep the macros simple.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/helper.h    |    8 +--
 target-mips/op_helper.c |  137 +++++++++++++++++------------------------------
 target-mips/translate.c |   24 ++++-----
 3 files changed, 64 insertions(+), 105 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index f35ed78..740178f 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -254,10 +254,10 @@ FOP_PROTO(rsqrt2)
 DEF_HELPER_4(float_ ## op ## _s, i32, env, i32, i32, i32)  \
 DEF_HELPER_4(float_ ## op ## _d, i64, env, i64, i64, i64)  \
 DEF_HELPER_4(float_ ## op ## _ps, i64, env, i64, i64, i64)
-FOP_PROTO(muladd)
-FOP_PROTO(mulsub)
-FOP_PROTO(nmuladd)
-FOP_PROTO(nmulsub)
+FOP_PROTO(madd)
+FOP_PROTO(msub)
+FOP_PROTO(nmadd)
+FOP_PROTO(nmsub)
 #undef FOP_PROTO
 
 #define FOP_PROTO(op)                                    \
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index ce5ddaf..9d6d54a 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -3016,95 +3016,54 @@ FLOAT_BINOP(mul)
 FLOAT_BINOP(div)
 #undef FLOAT_BINOP
 
-/* ternary operations */
-#define FLOAT_TERNOP(name1, name2)                                        \
-uint64_t helper_float_ ## name1 ## name2 ## _d(CPUMIPSState *env,         \
-                                               uint64_t fdt0,             \
-                                               uint64_t fdt1,             \
-                                               uint64_t fdt2)             \
-{                                                                         \
-    fdt0 = float64_ ## name1 (fdt0, fdt1, &env->active_fpu.fp_status);          \
-    return float64_ ## name2 (fdt0, fdt2, &env->active_fpu.fp_status);          \
-}                                                                         \
-                                                                          \
-uint32_t helper_float_ ## name1 ## name2 ## _s(CPUMIPSState *env,         \
-                                               uint32_t fst0,             \
-                                               uint32_t fst1,             \
-                                               uint32_t fst2)             \
-{                                                                         \
-    fst0 = float32_ ## name1 (fst0, fst1, &env->active_fpu.fp_status);          \
-    return float32_ ## name2 (fst0, fst2, &env->active_fpu.fp_status);          \
-}                                                                         \
-                                                                          \
-uint64_t helper_float_ ## name1 ## name2 ## _ps(CPUMIPSState *env,        \
-                                                uint64_t fdt0,            \
-                                                uint64_t fdt1,            \
-                                                uint64_t fdt2)            \
-{                                                                         \
-    uint32_t fst0 = fdt0 & 0XFFFFFFFF;                                    \
-    uint32_t fsth0 = fdt0 >> 32;                                          \
-    uint32_t fst1 = fdt1 & 0XFFFFFFFF;                                    \
-    uint32_t fsth1 = fdt1 >> 32;                                          \
-    uint32_t fst2 = fdt2 & 0XFFFFFFFF;                                    \
-    uint32_t fsth2 = fdt2 >> 32;                                          \
-                                                                          \
-    fst0 = float32_ ## name1 (fst0, fst1, &env->active_fpu.fp_status);          \
-    fsth0 = float32_ ## name1 (fsth0, fsth1, &env->active_fpu.fp_status);       \
-    fst2 = float32_ ## name2 (fst0, fst2, &env->active_fpu.fp_status);          \
-    fsth2 = float32_ ## name2 (fsth0, fsth2, &env->active_fpu.fp_status);       \
-    return ((uint64_t)fsth2 << 32) | fst2;                                \
-}
-
-FLOAT_TERNOP(mul, add)
-FLOAT_TERNOP(mul, sub)
-#undef FLOAT_TERNOP
-
-/* negated ternary operations */
-#define FLOAT_NTERNOP(name1, name2)                                       \
-uint64_t helper_float_n ## name1 ## name2 ## _d(CPUMIPSState *env,        \
-                                                uint64_t fdt0,            \
-                                                uint64_t fdt1,            \
-                                                uint64_t fdt2)            \
-{                                                                         \
-    fdt0 = float64_ ## name1 (fdt0, fdt1, &env->active_fpu.fp_status);          \
-    fdt2 = float64_ ## name2 (fdt0, fdt2, &env->active_fpu.fp_status);          \
-    return float64_chs(fdt2);                                             \
-}                                                                         \
-                                                                          \
-uint32_t helper_float_n ## name1 ## name2 ## _s(CPUMIPSState *env,        \
-                                                uint32_t fst0,            \
-                                                uint32_t fst1,            \
-                                                uint32_t fst2)            \
-{                                                                         \
-    fst0 = float32_ ## name1 (fst0, fst1, &env->active_fpu.fp_status);          \
-    fst2 = float32_ ## name2 (fst0, fst2, &env->active_fpu.fp_status);          \
-    return float32_chs(fst2);                                             \
-}                                                                         \
-                                                                          \
-uint64_t helper_float_n ## name1 ## name2 ## _ps(CPUMIPSState *env,       \
-                                                 uint64_t fdt0,           \
-                                                 uint64_t fdt1,           \
-                                                 uint64_t fdt2)           \
-{                                                                         \
-    uint32_t fst0 = fdt0 & 0XFFFFFFFF;                                    \
-    uint32_t fsth0 = fdt0 >> 32;                                          \
-    uint32_t fst1 = fdt1 & 0XFFFFFFFF;                                    \
-    uint32_t fsth1 = fdt1 >> 32;                                          \
-    uint32_t fst2 = fdt2 & 0XFFFFFFFF;                                    \
-    uint32_t fsth2 = fdt2 >> 32;                                          \
-                                                                          \
-    fst0 = float32_ ## name1 (fst0, fst1, &env->active_fpu.fp_status);          \
-    fsth0 = float32_ ## name1 (fsth0, fsth1, &env->active_fpu.fp_status);       \
-    fst2 = float32_ ## name2 (fst0, fst2, &env->active_fpu.fp_status);          \
-    fsth2 = float32_ ## name2 (fsth0, fsth2, &env->active_fpu.fp_status);       \
-    fst2 = float32_chs(fst2);                                             \
-    fsth2 = float32_chs(fsth2);                                           \
-    return ((uint64_t)fsth2 << 32) | fst2;                                \
-}
-
-FLOAT_NTERNOP(mul, add)
-FLOAT_NTERNOP(mul, sub)
-#undef FLOAT_NTERNOP
+/* FMA based operations */
+#define FLOAT_FMA(name, type)                                        \
+uint64_t helper_float_ ## name ## _d(CPUMIPSState *env,              \
+                                     uint64_t fdt0, uint64_t fdt1,   \
+                                     uint64_t fdt2)                  \
+{                                                                    \
+    set_float_exception_flags(0, &env->active_fpu.fp_status);        \
+    fdt0 = float64_muladd(fdt0, fdt1, fdt2, type,                    \
+                         &env->active_fpu.fp_status);                \
+    update_fcr31(env);                                               \
+    return fdt0;                                                     \
+}                                                                    \
+                                                                     \
+uint32_t helper_float_ ## name ## _s(CPUMIPSState *env,              \
+                                     uint32_t fst0, uint32_t fst1,   \
+                                     uint32_t fst2)                  \
+{                                                                    \
+    set_float_exception_flags(0, &env->active_fpu.fp_status);        \
+    fst0 = float32_muladd(fst0, fst1, fst2, type,                    \
+                         &env->active_fpu.fp_status);                \
+    update_fcr31(env);                                               \
+    return fst0;                                                     \
+}                                                                    \
+                                                                     \
+uint64_t helper_float_ ## name ## _ps(CPUMIPSState *env,             \
+                                      uint64_t fdt0, uint64_t fdt1,  \
+                                      uint64_t fdt2)                 \
+{                                                                    \
+    uint32_t fst0 = fdt0 & 0XFFFFFFFF;                               \
+    uint32_t fsth0 = fdt0 >> 32;                                     \
+    uint32_t fst1 = fdt1 & 0XFFFFFFFF;                               \
+    uint32_t fsth1 = fdt1 >> 32;                                     \
+    uint32_t fst2 = fdt2 & 0XFFFFFFFF;                               \
+    uint32_t fsth2 = fdt2 >> 32;                                     \
+                                                                     \
+    set_float_exception_flags(0, &env->active_fpu.fp_status);        \
+    fst0 = float32_muladd(fst0, fst1, fst2, type,                    \
+                          &env->active_fpu.fp_status);               \
+    fsth0 = float32_muladd(fsth0, fsth1, fsth2, type,                \
+                           &env->active_fpu.fp_status);              \
+    update_fcr31(env);                                               \
+    return ((uint64_t)fsth0 << 32) | fst0;                           \
+}
+FLOAT_FMA(madd, 0)
+FLOAT_FMA(msub, float_muladd_negate_c)
+FLOAT_FMA(nmadd, float_muladd_negate_result)
+FLOAT_FMA(nmsub, float_muladd_negate_result | float_muladd_negate_c)
+#undef FLOAT_FMA
 
 /* MIPS specific binary operations */
 uint64_t helper_float_recip2_d(CPUMIPSState *env, uint64_t fdt0, uint64_t fdt2)
diff --git a/target-mips/translate.c b/target-mips/translate.c
index ed55e26..8183854 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -8288,7 +8288,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr32(fp0, fs);
             gen_load_fpr32(fp1, ft);
             gen_load_fpr32(fp2, fr);
-            gen_helper_float_muladd_s(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_madd_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
             gen_store_fpr32(fp2, fd);
@@ -8307,7 +8307,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_muladd_d(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_madd_d(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8325,7 +8325,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_muladd_ps(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_madd_ps(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8343,7 +8343,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr32(fp0, fs);
             gen_load_fpr32(fp1, ft);
             gen_load_fpr32(fp2, fr);
-            gen_helper_float_mulsub_s(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_msub_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
             gen_store_fpr32(fp2, fd);
@@ -8362,7 +8362,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_mulsub_d(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_msub_d(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8380,7 +8380,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_mulsub_ps(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_msub_ps(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8398,7 +8398,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr32(fp0, fs);
             gen_load_fpr32(fp1, ft);
             gen_load_fpr32(fp2, fr);
-            gen_helper_float_nmuladd_s(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_nmadd_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
             gen_store_fpr32(fp2, fd);
@@ -8417,7 +8417,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_nmuladd_d(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_nmadd_d(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8435,7 +8435,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_nmuladd_ps(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_nmadd_ps(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8453,7 +8453,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr32(fp0, fs);
             gen_load_fpr32(fp1, ft);
             gen_load_fpr32(fp2, fr);
-            gen_helper_float_nmulsub_s(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_nmsub_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
             gen_store_fpr32(fp2, fd);
@@ -8472,7 +8472,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_nmulsub_d(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_nmsub_d(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
@@ -8490,7 +8490,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             gen_load_fpr64(ctx, fp0, fs);
             gen_load_fpr64(ctx, fp1, ft);
             gen_load_fpr64(ctx, fp2, fr);
-            gen_helper_float_nmulsub_ps(fp2, cpu_env, fp0, fp1, fp2);
+            gen_helper_float_nmsub_ps(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i64(fp0);
             tcg_temp_free_i64(fp1);
             gen_store_fpr64(ctx, fp2, fd);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 03/14] target-mips: fix FPU exceptions
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 01/14] softfloat: implement fused multiply-add NaN propagation for MIPS Aurelien Jarno
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:05   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible Aurelien Jarno
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

For each FPU instruction that can trigger an FPU exception, it is needed
to reset the softfloat status before and call update_fcr31() after.

Remove the manual NaN assignment in case of float to float operation, as
softfloat is already taking care of that. However for float to int
operation, the value has to be changed to the MIPS one. In the cvtpw_ps
case, the two registers have to be handled separately to guarantee
a correct final value in both registers.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/op_helper.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 9d6d54a..bd3c37c 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -2444,12 +2444,18 @@ static inline void update_fcr31(CPUMIPSState *env)
 /* unary operations, modifying fp status  */
 uint64_t helper_float_sqrt_d(CPUMIPSState *env, uint64_t fdt0)
 {
-    return float64_sqrt(fdt0, &env->active_fpu.fp_status);
+    set_float_exception_flags(0, &env->active_fpu.fp_status);
+    fdt0 = float64_sqrt(fdt0, &env->active_fpu.fp_status);
+    update_fcr31(env);
+    return fdt0;
 }
 
 uint32_t helper_float_sqrt_s(CPUMIPSState *env, uint32_t fst0)
 {
-    return float32_sqrt(fst0, &env->active_fpu.fp_status);
+    set_float_exception_flags(0, &env->active_fpu.fp_status);
+    fst0 = float32_sqrt(fst0, &env->active_fpu.fp_status);
+    update_fcr31(env);
+    return fst0;
 }
 
 uint64_t helper_float_cvtd_s(CPUMIPSState *env, uint32_t fst0)
@@ -2522,15 +2528,25 @@ uint64_t helper_float_cvtpw_ps(CPUMIPSState *env, uint64_t fdt0)
 {
     uint32_t wt2;
     uint32_t wth2;
+    int excp, excph;
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     wt2 = float32_to_int32(fdt0 & 0XFFFFFFFF, &env->active_fpu.fp_status);
-    wth2 = float32_to_int32(fdt0 >> 32, &env->active_fpu.fp_status);
-    update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+    excp = get_float_exception_flags(&env->active_fpu.fp_status);
+    if (excp & (float_flag_overflow | float_flag_invalid)) {
         wt2 = FLOAT_SNAN32;
+    }
+
+    set_float_exception_flags(0, &env->active_fpu.fp_status);
+    wth2 = float32_to_int32(fdt0 >> 32, &env->active_fpu.fp_status);
+    excph = get_float_exception_flags(&env->active_fpu.fp_status);
+    if (excph & (float_flag_overflow | float_flag_invalid)) {
         wth2 = FLOAT_SNAN32;
     }
+
+    set_float_exception_flags(excp | excph, &env->active_fpu.fp_status);
+    update_fcr31(env);
+
     return ((uint64_t)wth2 << 32) | wt2;
 }
 
@@ -2970,8 +2986,6 @@ uint64_t helper_float_ ## name ## _d(CPUMIPSState *env,            \
     set_float_exception_flags(0, &env->active_fpu.fp_status);            \
     dt2 = float64_ ## name (fdt0, fdt1, &env->active_fpu.fp_status);     \
     update_fcr31(env);                                             \
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & FP_INVALID)                \
-        dt2 = FLOAT_QNAN64;                                        \
     return dt2;                                                    \
 }                                                                  \
                                                                    \
@@ -2983,8 +2997,6 @@ uint32_t helper_float_ ## name ## _s(CPUMIPSState *env,            \
     set_float_exception_flags(0, &env->active_fpu.fp_status);            \
     wt2 = float32_ ## name (fst0, fst1, &env->active_fpu.fp_status);     \
     update_fcr31(env);                                             \
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & FP_INVALID)                \
-        wt2 = FLOAT_QNAN32;                                        \
     return wt2;                                                    \
 }                                                                  \
                                                                    \
@@ -3003,10 +3015,6 @@ uint64_t helper_float_ ## name ## _ps(CPUMIPSState *env,           \
     wt2 = float32_ ## name (fst0, fst1, &env->active_fpu.fp_status);     \
     wth2 = float32_ ## name (fsth0, fsth1, &env->active_fpu.fp_status);  \
     update_fcr31(env);                                             \
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & FP_INVALID) {              \
-        wt2 = FLOAT_QNAN32;                                        \
-        wth2 = FLOAT_QNAN32;                                       \
-    }                                                              \
     return ((uint64_t)wth2 << 32) | wt2;                           \
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (2 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 03/14] target-mips: fix FPU exceptions Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:09   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 05/14] target-mips: cleanup load/store operations Aurelien Jarno
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

softfloat already has a few constants defined, use them instead of
redefining them in target-mips.

Rename FLOAT_SNAN32 and FLOAT_SNAN64 to FP_TO_INT32_OVERFLOW and
FP_TO_INT64_OVERFLOW as even if they have the same value, they are
technically different (and defined differently in the MIPS ISA).

Remove the unused constants.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/op_helper.c |  157 +++++++++++++++++++++++++++--------------------
 1 file changed, 89 insertions(+), 68 deletions(-)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index bd3c37c..647858d 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -2317,14 +2317,10 @@ void cpu_unassigned_access(CPUMIPSState *env, target_phys_addr_t addr,
 
 /* Complex FPU operations which may need stack space. */
 
-#define FLOAT_ONE32 make_float32(0x3f8 << 20)
-#define FLOAT_ONE64 make_float64(0x3ffULL << 52)
 #define FLOAT_TWO32 make_float32(1 << 30)
 #define FLOAT_TWO64 make_float64(1ULL << 62)
-#define FLOAT_QNAN32 0x7fbfffff
-#define FLOAT_QNAN64 0x7ff7ffffffffffffULL
-#define FLOAT_SNAN32 0x7fffffff
-#define FLOAT_SNAN64 0x7fffffffffffffffULL
+#define FP_TO_INT32_OVERFLOW 0x7fffffff
+#define FP_TO_INT64_OVERFLOW 0x7fffffffffffffffULL
 
 /* convert MIPS rounding mode in FCR31 to IEEE library */
 static unsigned int ieee_rm[] = {
@@ -2495,8 +2491,9 @@ uint64_t helper_float_cvtl_d(CPUMIPSState *env, uint64_t fdt0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2507,8 +2504,9 @@ uint64_t helper_float_cvtl_s(CPUMIPSState *env, uint32_t fst0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     dt2 = float32_to_int64(fst0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2534,14 +2532,14 @@ uint64_t helper_float_cvtpw_ps(CPUMIPSState *env, uint64_t fdt0)
     wt2 = float32_to_int32(fdt0 & 0XFFFFFFFF, &env->active_fpu.fp_status);
     excp = get_float_exception_flags(&env->active_fpu.fp_status);
     if (excp & (float_flag_overflow | float_flag_invalid)) {
-        wt2 = FLOAT_SNAN32;
+        wt2 = FP_TO_INT32_OVERFLOW;
     }
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     wth2 = float32_to_int32(fdt0 >> 32, &env->active_fpu.fp_status);
     excph = get_float_exception_flags(&env->active_fpu.fp_status);
     if (excph & (float_flag_overflow | float_flag_invalid)) {
-        wth2 = FLOAT_SNAN32;
+        wth2 = FP_TO_INT32_OVERFLOW;
     }
 
     set_float_exception_flags(excp | excph, &env->active_fpu.fp_status);
@@ -2607,8 +2605,9 @@ uint32_t helper_float_cvtw_s(CPUMIPSState *env, uint32_t fst0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     wt2 = float32_to_int32(fst0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2619,8 +2618,9 @@ uint32_t helper_float_cvtw_d(CPUMIPSState *env, uint64_t fdt0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     wt2 = float64_to_int32(fdt0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2633,8 +2633,9 @@ uint64_t helper_float_roundl_d(CPUMIPSState *env, uint64_t fdt0)
     dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2647,8 +2648,9 @@ uint64_t helper_float_roundl_s(CPUMIPSState *env, uint32_t fst0)
     dt2 = float32_to_int64(fst0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2661,8 +2663,9 @@ uint32_t helper_float_roundw_d(CPUMIPSState *env, uint64_t fdt0)
     wt2 = float64_to_int32(fdt0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2675,8 +2678,9 @@ uint32_t helper_float_roundw_s(CPUMIPSState *env, uint32_t fst0)
     wt2 = float32_to_int32(fst0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2687,8 +2691,9 @@ uint64_t helper_float_truncl_d(CPUMIPSState *env, uint64_t fdt0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     dt2 = float64_to_int64_round_to_zero(fdt0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2699,8 +2704,9 @@ uint64_t helper_float_truncl_s(CPUMIPSState *env, uint32_t fst0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     dt2 = float32_to_int64_round_to_zero(fst0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2711,8 +2717,9 @@ uint32_t helper_float_truncw_d(CPUMIPSState *env, uint64_t fdt0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     wt2 = float64_to_int32_round_to_zero(fdt0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2723,8 +2730,9 @@ uint32_t helper_float_truncw_s(CPUMIPSState *env, uint32_t fst0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     wt2 = float32_to_int32_round_to_zero(fst0, &env->active_fpu.fp_status);
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2737,8 +2745,9 @@ uint64_t helper_float_ceill_d(CPUMIPSState *env, uint64_t fdt0)
     dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2751,8 +2760,9 @@ uint64_t helper_float_ceill_s(CPUMIPSState *env, uint32_t fst0)
     dt2 = float32_to_int64(fst0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2765,8 +2775,9 @@ uint32_t helper_float_ceilw_d(CPUMIPSState *env, uint64_t fdt0)
     wt2 = float64_to_int32(fdt0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2779,8 +2790,9 @@ uint32_t helper_float_ceilw_s(CPUMIPSState *env, uint32_t fst0)
     wt2 = float32_to_int32(fst0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2793,8 +2805,9 @@ uint64_t helper_float_floorl_d(CPUMIPSState *env, uint64_t fdt0)
     dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2807,8 +2820,9 @@ uint64_t helper_float_floorl_s(CPUMIPSState *env, uint32_t fst0)
     dt2 = float32_to_int64(fst0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        dt2 = FLOAT_SNAN64;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        dt2 = FP_TO_INT64_OVERFLOW;
+    }
     return dt2;
 }
 
@@ -2821,8 +2835,9 @@ uint32_t helper_float_floorw_d(CPUMIPSState *env, uint64_t fdt0)
     wt2 = float64_to_int32(fdt0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2835,8 +2850,9 @@ uint32_t helper_float_floorw_s(CPUMIPSState *env, uint32_t fst0)
     wt2 = float32_to_int32(fst0, &env->active_fpu.fp_status);
     RESTORE_ROUNDING_MODE;
     update_fcr31(env);
-    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
-        wt2 = FLOAT_SNAN32;
+    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
+        wt2 = FP_TO_INT32_OVERFLOW;
+    }
     return wt2;
 }
 
@@ -2869,7 +2885,7 @@ uint64_t helper_float_recip_d(CPUMIPSState *env, uint64_t fdt0)
     uint64_t fdt2;
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
-    fdt2 = float64_div(FLOAT_ONE64, fdt0, &env->active_fpu.fp_status);
+    fdt2 = float64_div(float64_one, fdt0, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fdt2;
 }
@@ -2879,7 +2895,7 @@ uint32_t helper_float_recip_s(CPUMIPSState *env, uint32_t fst0)
     uint32_t fst2;
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
-    fst2 = float32_div(FLOAT_ONE32, fst0, &env->active_fpu.fp_status);
+    fst2 = float32_div(float32_one, fst0, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fst2;
 }
@@ -2890,7 +2906,7 @@ uint64_t helper_float_rsqrt_d(CPUMIPSState *env, uint64_t fdt0)
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fdt2 = float64_sqrt(fdt0, &env->active_fpu.fp_status);
-    fdt2 = float64_div(FLOAT_ONE64, fdt2, &env->active_fpu.fp_status);
+    fdt2 = float64_div(float64_one, fdt2, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fdt2;
 }
@@ -2901,7 +2917,7 @@ uint32_t helper_float_rsqrt_s(CPUMIPSState *env, uint32_t fst0)
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_sqrt(fst0, &env->active_fpu.fp_status);
-    fst2 = float32_div(FLOAT_ONE32, fst2, &env->active_fpu.fp_status);
+    fst2 = float32_div(float32_one, fst2, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fst2;
 }
@@ -2911,7 +2927,7 @@ uint64_t helper_float_recip1_d(CPUMIPSState *env, uint64_t fdt0)
     uint64_t fdt2;
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
-    fdt2 = float64_div(FLOAT_ONE64, fdt0, &env->active_fpu.fp_status);
+    fdt2 = float64_div(float64_one, fdt0, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fdt2;
 }
@@ -2921,7 +2937,7 @@ uint32_t helper_float_recip1_s(CPUMIPSState *env, uint32_t fst0)
     uint32_t fst2;
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
-    fst2 = float32_div(FLOAT_ONE32, fst0, &env->active_fpu.fp_status);
+    fst2 = float32_div(float32_one, fst0, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fst2;
 }
@@ -2932,8 +2948,9 @@ uint64_t helper_float_recip1_ps(CPUMIPSState *env, uint64_t fdt0)
     uint32_t fsth2;
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
-    fst2 = float32_div(FLOAT_ONE32, fdt0 & 0XFFFFFFFF, &env->active_fpu.fp_status);
-    fsth2 = float32_div(FLOAT_ONE32, fdt0 >> 32, &env->active_fpu.fp_status);
+    fst2 = float32_div(float32_one, fdt0 & 0XFFFFFFFF,
+                       &env->active_fpu.fp_status);
+    fsth2 = float32_div(float32_one, fdt0 >> 32, &env->active_fpu.fp_status);
     update_fcr31(env);
     return ((uint64_t)fsth2 << 32) | fst2;
 }
@@ -2944,7 +2961,7 @@ uint64_t helper_float_rsqrt1_d(CPUMIPSState *env, uint64_t fdt0)
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fdt2 = float64_sqrt(fdt0, &env->active_fpu.fp_status);
-    fdt2 = float64_div(FLOAT_ONE64, fdt2, &env->active_fpu.fp_status);
+    fdt2 = float64_div(float64_one, fdt2, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fdt2;
 }
@@ -2955,7 +2972,7 @@ uint32_t helper_float_rsqrt1_s(CPUMIPSState *env, uint32_t fst0)
 
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_sqrt(fst0, &env->active_fpu.fp_status);
-    fst2 = float32_div(FLOAT_ONE32, fst2, &env->active_fpu.fp_status);
+    fst2 = float32_div(float32_one, fst2, &env->active_fpu.fp_status);
     update_fcr31(env);
     return fst2;
 }
@@ -2968,8 +2985,8 @@ uint64_t helper_float_rsqrt1_ps(CPUMIPSState *env, uint64_t fdt0)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_sqrt(fdt0 & 0XFFFFFFFF, &env->active_fpu.fp_status);
     fsth2 = float32_sqrt(fdt0 >> 32, &env->active_fpu.fp_status);
-    fst2 = float32_div(FLOAT_ONE32, fst2, &env->active_fpu.fp_status);
-    fsth2 = float32_div(FLOAT_ONE32, fsth2, &env->active_fpu.fp_status);
+    fst2 = float32_div(float32_one, fst2, &env->active_fpu.fp_status);
+    fsth2 = float32_div(float32_one, fsth2, &env->active_fpu.fp_status);
     update_fcr31(env);
     return ((uint64_t)fsth2 << 32) | fst2;
 }
@@ -3078,7 +3095,8 @@ uint64_t helper_float_recip2_d(CPUMIPSState *env, uint64_t fdt0, uint64_t fdt2)
 {
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fdt2 = float64_mul(fdt0, fdt2, &env->active_fpu.fp_status);
-    fdt2 = float64_chs(float64_sub(fdt2, FLOAT_ONE64, &env->active_fpu.fp_status));
+    fdt2 = float64_chs(float64_sub(fdt2, float64_one,
+                       &env->active_fpu.fp_status));
     update_fcr31(env);
     return fdt2;
 }
@@ -3087,7 +3105,8 @@ uint32_t helper_float_recip2_s(CPUMIPSState *env, uint32_t fst0, uint32_t fst2)
 {
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_mul(fst0, fst2, &env->active_fpu.fp_status);
-    fst2 = float32_chs(float32_sub(fst2, FLOAT_ONE32, &env->active_fpu.fp_status));
+    fst2 = float32_chs(float32_sub(fst2, float32_one,
+                       &env->active_fpu.fp_status));
     update_fcr31(env);
     return fst2;
 }
@@ -3102,8 +3121,10 @@ uint64_t helper_float_recip2_ps(CPUMIPSState *env, uint64_t fdt0, uint64_t fdt2)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_mul(fst0, fst2, &env->active_fpu.fp_status);
     fsth2 = float32_mul(fsth0, fsth2, &env->active_fpu.fp_status);
-    fst2 = float32_chs(float32_sub(fst2, FLOAT_ONE32, &env->active_fpu.fp_status));
-    fsth2 = float32_chs(float32_sub(fsth2, FLOAT_ONE32, &env->active_fpu.fp_status));
+    fst2 = float32_chs(float32_sub(fst2, float32_one,
+                       &env->active_fpu.fp_status));
+    fsth2 = float32_chs(float32_sub(fsth2, float32_one,
+                        &env->active_fpu.fp_status));
     update_fcr31(env);
     return ((uint64_t)fsth2 << 32) | fst2;
 }
@@ -3112,7 +3133,7 @@ uint64_t helper_float_rsqrt2_d(CPUMIPSState *env, uint64_t fdt0, uint64_t fdt2)
 {
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fdt2 = float64_mul(fdt0, fdt2, &env->active_fpu.fp_status);
-    fdt2 = float64_sub(fdt2, FLOAT_ONE64, &env->active_fpu.fp_status);
+    fdt2 = float64_sub(fdt2, float64_one, &env->active_fpu.fp_status);
     fdt2 = float64_chs(float64_div(fdt2, FLOAT_TWO64, &env->active_fpu.fp_status));
     update_fcr31(env);
     return fdt2;
@@ -3122,7 +3143,7 @@ uint32_t helper_float_rsqrt2_s(CPUMIPSState *env, uint32_t fst0, uint32_t fst2)
 {
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_mul(fst0, fst2, &env->active_fpu.fp_status);
-    fst2 = float32_sub(fst2, FLOAT_ONE32, &env->active_fpu.fp_status);
+    fst2 = float32_sub(fst2, float32_one, &env->active_fpu.fp_status);
     fst2 = float32_chs(float32_div(fst2, FLOAT_TWO32, &env->active_fpu.fp_status));
     update_fcr31(env);
     return fst2;
@@ -3138,8 +3159,8 @@ uint64_t helper_float_rsqrt2_ps(CPUMIPSState *env, uint64_t fdt0, uint64_t fdt2)
     set_float_exception_flags(0, &env->active_fpu.fp_status);
     fst2 = float32_mul(fst0, fst2, &env->active_fpu.fp_status);
     fsth2 = float32_mul(fsth0, fsth2, &env->active_fpu.fp_status);
-    fst2 = float32_sub(fst2, FLOAT_ONE32, &env->active_fpu.fp_status);
-    fsth2 = float32_sub(fsth2, FLOAT_ONE32, &env->active_fpu.fp_status);
+    fst2 = float32_sub(fst2, float32_one, &env->active_fpu.fp_status);
+    fsth2 = float32_sub(fsth2, float32_one, &env->active_fpu.fp_status);
     fst2 = float32_chs(float32_div(fst2, FLOAT_TWO32, &env->active_fpu.fp_status));
     fsth2 = float32_chs(float32_div(fsth2, FLOAT_TWO32, &env->active_fpu.fp_status));
     update_fcr31(env);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 05/14] target-mips: cleanup load/store operations
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (3 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:10   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 06/14] target-mips: optimize load operations Aurelien Jarno
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Load/store operations use macros for historical reasons. Now that there
is no point in keeping them, replace them by direct calls to qemu_ld/st.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   91 ++++++++++++++++-------------------------------
 1 file changed, 31 insertions(+), 60 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 8183854..c1438ff 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1028,35 +1028,6 @@ FOP_CONDS(abs, 1, ps, FMT_PS, 64)
 #undef gen_ldcmp_fpr64
 
 /* load/store instructions. */
-#define OP_LD(insn,fname)                                                 \
-static inline void op_ld_##insn(TCGv ret, TCGv arg1, DisasContext *ctx)   \
-{                                                                         \
-    tcg_gen_qemu_##fname(ret, arg1, ctx->mem_idx);                        \
-}
-OP_LD(lb,ld8s);
-OP_LD(lbu,ld8u);
-OP_LD(lh,ld16s);
-OP_LD(lhu,ld16u);
-OP_LD(lw,ld32s);
-#if defined(TARGET_MIPS64)
-OP_LD(lwu,ld32u);
-OP_LD(ld,ld64);
-#endif
-#undef OP_LD
-
-#define OP_ST(insn,fname)                                                  \
-static inline void op_st_##insn(TCGv arg1, TCGv arg2, DisasContext *ctx)   \
-{                                                                          \
-    tcg_gen_qemu_##fname(arg1, arg2, ctx->mem_idx);                        \
-}
-OP_ST(sb,st8);
-OP_ST(sh,st16);
-OP_ST(sw,st32);
-#if defined(TARGET_MIPS64)
-OP_ST(sd,st64);
-#endif
-#undef OP_ST
-
 #ifdef CONFIG_USER_ONLY
 #define OP_LD_ATOMIC(insn,fname)                                           \
 static inline void op_ld_##insn(TCGv ret, TCGv arg1, DisasContext *ctx)    \
@@ -1171,13 +1142,13 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
 #if defined(TARGET_MIPS64)
     case OPC_LWU:
         save_cpu_state(ctx, 0);
-        op_ld_lwu(t0, t0, ctx);
+        tcg_gen_qemu_ld32u(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lwu";
         break;
     case OPC_LD:
         save_cpu_state(ctx, 0);
-        op_ld_ld(t0, t0, ctx);
+        tcg_gen_qemu_ld64(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "ld";
         break;
@@ -1205,7 +1176,7 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         save_cpu_state(ctx, 0);
         tcg_gen_movi_tl(t1, pc_relative_pc(ctx));
         gen_op_addr_add(ctx, t0, t0, t1);
-        op_ld_ld(t0, t0, ctx);
+        tcg_gen_qemu_ld64(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "ldpc";
         break;
@@ -1214,37 +1185,37 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         save_cpu_state(ctx, 0);
         tcg_gen_movi_tl(t1, pc_relative_pc(ctx));
         gen_op_addr_add(ctx, t0, t0, t1);
-        op_ld_lw(t0, t0, ctx);
+        tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lwpc";
         break;
     case OPC_LW:
         save_cpu_state(ctx, 0);
-        op_ld_lw(t0, t0, ctx);
+        tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lw";
         break;
     case OPC_LH:
         save_cpu_state(ctx, 0);
-        op_ld_lh(t0, t0, ctx);
+        tcg_gen_qemu_ld16s(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lh";
         break;
     case OPC_LHU:
         save_cpu_state(ctx, 0);
-        op_ld_lhu(t0, t0, ctx);
+        tcg_gen_qemu_ld16u(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lhu";
         break;
     case OPC_LB:
         save_cpu_state(ctx, 0);
-        op_ld_lb(t0, t0, ctx);
+        tcg_gen_qemu_ld8s(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lb";
         break;
     case OPC_LBU:
         save_cpu_state(ctx, 0);
-        op_ld_lbu(t0, t0, ctx);
+        tcg_gen_qemu_ld8u(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lbu";
         break;
@@ -1289,7 +1260,7 @@ static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
 #if defined(TARGET_MIPS64)
     case OPC_SD:
         save_cpu_state(ctx, 0);
-        op_st_sd(t1, t0, ctx);
+        tcg_gen_qemu_st64(t1, t0, ctx->mem_idx);
         opn = "sd";
         break;
     case OPC_SDL:
@@ -1305,17 +1276,17 @@ static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
 #endif
     case OPC_SW:
         save_cpu_state(ctx, 0);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
         opn = "sw";
         break;
     case OPC_SH:
         save_cpu_state(ctx, 0);
-        op_st_sh(t1, t0, ctx);
+        tcg_gen_qemu_st16(t1, t0, ctx->mem_idx);
         opn = "sh";
         break;
     case OPC_SB:
         save_cpu_state(ctx, 0);
-        op_st_sb(t1, t0, ctx);
+        tcg_gen_qemu_st8(t1, t0, ctx->mem_idx);
         opn = "sb";
         break;
     case OPC_SWL:
@@ -8791,22 +8762,22 @@ static void gen_mips16_save (DisasContext *ctx,
     case 4:
         gen_base_offset_addr(ctx, t0, 29, 12);
         gen_load_gpr(t1, 7);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
         /* Fall through */
     case 3:
         gen_base_offset_addr(ctx, t0, 29, 8);
         gen_load_gpr(t1, 6);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
         /* Fall through */
     case 2:
         gen_base_offset_addr(ctx, t0, 29, 4);
         gen_load_gpr(t1, 5);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
         /* Fall through */
     case 1:
         gen_base_offset_addr(ctx, t0, 29, 0);
         gen_load_gpr(t1, 4);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
     }
 
     gen_load_gpr(t0, 29);
@@ -8814,7 +8785,7 @@ static void gen_mips16_save (DisasContext *ctx,
 #define DECR_AND_STORE(reg) do {                \
         tcg_gen_subi_tl(t0, t0, 4);             \
         gen_load_gpr(t1, reg);                  \
-        op_st_sw(t1, t0, ctx);                  \
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);                  \
     } while (0)
 
     if (do_ra) {
@@ -8912,10 +8883,10 @@ static void gen_mips16_restore (DisasContext *ctx,
 
     tcg_gen_addi_tl(t0, cpu_gpr[29], framesize);
 
-#define DECR_AND_LOAD(reg) do {                 \
-        tcg_gen_subi_tl(t0, t0, 4);             \
-        op_ld_lw(t1, t0, ctx);                  \
-        gen_store_gpr(t1, reg);                 \
+#define DECR_AND_LOAD(reg) do {                   \
+        tcg_gen_subi_tl(t0, t0, 4);               \
+        tcg_gen_qemu_ld32u(t1, t0, ctx->mem_idx); \
+        gen_store_gpr(t1, reg);                   \
     } while (0)
 
     if (do_ra) {
@@ -10422,7 +10393,7 @@ static void gen_ldxs (DisasContext *ctx, int base, int index, int rd)
     }
 
     save_cpu_state(ctx, 0);
-    op_ld_lw(t1, t0, ctx);
+    tcg_gen_qemu_ld32s(t1, t0, ctx->mem_idx);
     gen_store_gpr(t1, rd);
 
     tcg_temp_free(t0);
@@ -10452,22 +10423,22 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd,
             return;
         }
         save_cpu_state(ctx, 0);
-        op_ld_lw(t1, t0, ctx);
+        tcg_gen_qemu_ld32s(t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rd);
         tcg_gen_movi_tl(t1, 4);
         gen_op_addr_add(ctx, t0, t0, t1);
-        op_ld_lw(t1, t0, ctx);
+        tcg_gen_qemu_ld32s(t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rd+1);
         opn = "lwp";
         break;
     case SWP:
         save_cpu_state(ctx, 0);
         gen_load_gpr(t1, rd);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
         tcg_gen_movi_tl(t1, 4);
         gen_op_addr_add(ctx, t0, t0, t1);
         gen_load_gpr(t1, rd+1);
-        op_st_sw(t1, t0, ctx);
+        tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
         opn = "swp";
         break;
 #ifdef TARGET_MIPS64
@@ -10477,22 +10448,22 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd,
             return;
         }
         save_cpu_state(ctx, 0);
-        op_ld_ld(t1, t0, ctx);
+        tcg_gen_qemu_ld64(t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rd);
         tcg_gen_movi_tl(t1, 8);
         gen_op_addr_add(ctx, t0, t0, t1);
-        op_ld_ld(t1, t0, ctx);
+        tcg_gen_qemu_ld64(t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rd+1);
         opn = "ldp";
         break;
     case SDP:
         save_cpu_state(ctx, 0);
         gen_load_gpr(t1, rd);
-        op_st_sd(t1, t0, ctx);
+        tcg_gen_qemu_st64(t1, t0, ctx->mem_idx);
         tcg_gen_movi_tl(t1, 8);
         gen_op_addr_add(ctx, t0, t0, t1);
         gen_load_gpr(t1, rd+1);
-        op_st_sd(t1, t0, ctx);
+        tcg_gen_qemu_st64(t1, t0, ctx->mem_idx);
         opn = "sdp";
         break;
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 06/14] target-mips: optimize load operations
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (4 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 05/14] target-mips: cleanup load/store operations Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:11   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 07/14] target-mips: simplify load/store microMIPS helpers Aurelien Jarno
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Only allocate t1 when needed.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index c1438ff..f7d9467 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1135,7 +1135,6 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
     }
 
     t0 = tcg_temp_new();
-    t1 = tcg_temp_new();
     gen_base_offset_addr(ctx, t0, base, offset);
 
     switch (opc) {
@@ -1160,22 +1159,27 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         break;
     case OPC_LDL:
         save_cpu_state(ctx, 1);
+        t1 = tcg_temp_new();
         gen_load_gpr(t1, rt);
         gen_helper_1e2i(ldl, t1, t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rt);
+        tcg_temp_free(t1);
         opn = "ldl";
         break;
     case OPC_LDR:
         save_cpu_state(ctx, 1);
+        t1 = tcg_temp_new();
         gen_load_gpr(t1, rt);
         gen_helper_1e2i(ldr, t1, t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rt);
+        tcg_temp_free(t1);
         opn = "ldr";
         break;
     case OPC_LDPC:
         save_cpu_state(ctx, 0);
-        tcg_gen_movi_tl(t1, pc_relative_pc(ctx));
+        t1 = tcg_const_tl(pc_relative_pc(ctx));
         gen_op_addr_add(ctx, t0, t0, t1);
+        tcg_temp_free(t1);
         tcg_gen_qemu_ld64(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "ldpc";
@@ -1183,8 +1187,9 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
 #endif
     case OPC_LWPC:
         save_cpu_state(ctx, 0);
-        tcg_gen_movi_tl(t1, pc_relative_pc(ctx));
+        t1 = tcg_const_tl(pc_relative_pc(ctx));
         gen_op_addr_add(ctx, t0, t0, t1);
+        tcg_temp_free(t1);
         tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
         gen_store_gpr(t0, rt);
         opn = "lwpc";
@@ -1221,16 +1226,20 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         break;
     case OPC_LWL:
         save_cpu_state(ctx, 1);
+        t1 = tcg_temp_new();
         gen_load_gpr(t1, rt);
         gen_helper_1e2i(lwl, t1, t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rt);
+        tcg_temp_free(t1);
         opn = "lwl";
         break;
     case OPC_LWR:
         save_cpu_state(ctx, 1);
+        t1 = tcg_temp_new();
         gen_load_gpr(t1, rt);
         gen_helper_1e2i(lwr, t1, t1, t0, ctx->mem_idx);
         gen_store_gpr(t1, rt);
+        tcg_temp_free(t1);
         opn = "lwr";
         break;
     case OPC_LL:
@@ -1243,7 +1252,6 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
     (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
     tcg_temp_free(t0);
-    tcg_temp_free(t1);
 }
 
 /* Store */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 07/14] target-mips: simplify load/store microMIPS helpers
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (5 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 06/14] target-mips: optimize load operations Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:15   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 08/14] target-mips: implement unaligned loads using TCG Aurelien Jarno
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

load/store microMIPS helpers are reinventing the wheel. Call do_lw,
do_ll, do_sw and do_sl instead of using a macro calling the cpu_*
load/store functions.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/op_helper.c |   73 ++++++-----------------------------------------
 1 file changed, 9 insertions(+), 64 deletions(-)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 647858d..d88ac24 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -579,32 +579,19 @@ void helper_lwm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
 {
     target_ulong base_reglist = reglist & 0xf;
     target_ulong do_r31 = reglist & 0x10;
-#ifdef CONFIG_USER_ONLY
-#undef ldfun
-#define ldfun(env, addr) ldl_raw(addr)
-#else
-    uint32_t (*ldfun)(CPUMIPSState *env, target_ulong);
-
-    switch (mem_idx)
-    {
-    case 0: ldfun = cpu_ldl_kernel; break;
-    case 1: ldfun = cpu_ldl_super; break;
-    default:
-    case 2: ldfun = cpu_ldl_user; break;
-    }
-#endif
 
     if (base_reglist > 0 && base_reglist <= ARRAY_SIZE (multiple_regs)) {
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            env->active_tc.gpr[multiple_regs[i]] = (target_long)ldfun(env, addr);
+            env->active_tc.gpr[multiple_regs[i]] =
+                (target_long)do_lw(env, addr, mem_idx);
             addr += 4;
         }
     }
 
     if (do_r31) {
-        env->active_tc.gpr[31] = (target_long)ldfun(env, addr);
+        env->active_tc.gpr[31] = (target_long)do_lw(env, addr, mem_idx);
     }
 }
 
@@ -613,32 +600,18 @@ void helper_swm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
 {
     target_ulong base_reglist = reglist & 0xf;
     target_ulong do_r31 = reglist & 0x10;
-#ifdef CONFIG_USER_ONLY
-#undef stfun
-#define stfun(env, addr, val) stl_raw(addr, val)
-#else
-    void (*stfun)(CPUMIPSState *env, target_ulong, uint32_t);
-
-    switch (mem_idx)
-    {
-    case 0: stfun = cpu_stl_kernel; break;
-    case 1: stfun = cpu_stl_super; break;
-     default:
-    case 2: stfun = cpu_stl_user; break;
-    }
-#endif
 
     if (base_reglist > 0 && base_reglist <= ARRAY_SIZE (multiple_regs)) {
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            stfun(env, addr, env->active_tc.gpr[multiple_regs[i]]);
+            do_sw(env, addr, env->active_tc.gpr[multiple_regs[i]], mem_idx);
             addr += 4;
         }
     }
 
     if (do_r31) {
-        stfun(env, addr, env->active_tc.gpr[31]);
+        do_sw(env, addr, env->active_tc.gpr[31], mem_idx);
     }
 }
 
@@ -648,32 +621,18 @@ void helper_ldm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
 {
     target_ulong base_reglist = reglist & 0xf;
     target_ulong do_r31 = reglist & 0x10;
-#ifdef CONFIG_USER_ONLY
-#undef ldfun
-#define ldfun(env, addr) ldq_raw(addr)
-#else
-    uint64_t (*ldfun)(CPUMIPSState *env, target_ulong);
-
-    switch (mem_idx)
-    {
-    case 0: ldfun = cpu_ldq_kernel; break;
-    case 1: ldfun = cpu_ldq_super; break;
-    default:
-    case 2: ldfun = cpu_ldq_user; break;
-    }
-#endif
 
     if (base_reglist > 0 && base_reglist <= ARRAY_SIZE (multiple_regs)) {
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            env->active_tc.gpr[multiple_regs[i]] = ldfun(env, addr);
+            env->active_tc.gpr[multiple_regs[i]] = do_ld(env, addr, mem_idx);
             addr += 8;
         }
     }
 
     if (do_r31) {
-        env->active_tc.gpr[31] = ldfun(env, addr);
+        env->active_tc.gpr[31] = do_ld(env, addr, mem_idx);
     }
 }
 
@@ -682,32 +641,18 @@ void helper_sdm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
 {
     target_ulong base_reglist = reglist & 0xf;
     target_ulong do_r31 = reglist & 0x10;
-#ifdef CONFIG_USER_ONLY
-#undef stfun
-#define stfun(env, addr, val) stq_raw(addr, val)
-#else
-    void (*stfun)(CPUMIPSState *env, target_ulong, uint64_t);
-
-    switch (mem_idx)
-    {
-    case 0: stfun = cpu_stq_kernel; break;
-    case 1: stfun = cpu_stq_super; break;
-     default:
-    case 2: stfun = cpu_stq_user; break;
-    }
-#endif
 
     if (base_reglist > 0 && base_reglist <= ARRAY_SIZE (multiple_regs)) {
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            stfun(env, addr, env->active_tc.gpr[multiple_regs[i]]);
+            do_sd(env, addr, env->active_tc.gpr[multiple_regs[i]], mem_idx);
             addr += 8;
         }
     }
 
     if (do_r31) {
-        stfun(env, addr, env->active_tc.gpr[31]);
+        do_sd(env, addr, env->active_tc.gpr[31], mem_idx);
     }
 }
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 08/14] target-mips: implement unaligned loads using TCG
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (6 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 07/14] target-mips: simplify load/store microMIPS helpers Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:28   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 09/14] target-mips: don't use local temps for store conditional Aurelien Jarno
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Load/store from helpers should be avoided as they are quite
inefficient. Rewrite unaligned loads instructions using TCG and
aligned loads. The number of actual loads operations to implement
an unaligned load instruction is reduced from up to 8 to 1.

Note: As we can't rely on shift by 32 or 64 undefined behaviour,
the code loads already shift by one constants.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/helper.h    |    4 --
 target-mips/op_helper.c |  142 -----------------------------------------------
 target-mips/translate.c |   79 +++++++++++++++++++++-----
 3 files changed, 66 insertions(+), 159 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index 740178f..843c561 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -4,13 +4,9 @@ DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
 DEF_HELPER_2(raise_exception, noreturn, env, i32)
 
 #ifdef TARGET_MIPS64
-DEF_HELPER_4(ldl, tl, env, tl, tl, int)
-DEF_HELPER_4(ldr, tl, env, tl, tl, int)
 DEF_HELPER_4(sdl, void, env, tl, tl, int)
 DEF_HELPER_4(sdr, void, env, tl, tl, int)
 #endif
-DEF_HELPER_4(lwl, tl, env, tl, tl, int)
-DEF_HELPER_4(lwr, tl, env, tl, tl, int)
 DEF_HELPER_4(swl, void, env, tl, tl, int)
 DEF_HELPER_4(swr, void, env, tl, tl, int)
 
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index d88ac24..6ce27c1 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -335,56 +335,6 @@ HELPER_ST_ATOMIC(scd, ld, sd, 0x7)
 #define GET_OFFSET(addr, offset) (addr - (offset))
 #endif
 
-target_ulong helper_lwl(CPUMIPSState *env, target_ulong arg1,
-                        target_ulong arg2, int mem_idx)
-{
-    target_ulong tmp;
-
-    tmp = do_lbu(env, arg2, mem_idx);
-    arg1 = (arg1 & 0x00FFFFFF) | (tmp << 24);
-
-    if (GET_LMASK(arg2) <= 2) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 1), mem_idx);
-        arg1 = (arg1 & 0xFF00FFFF) | (tmp << 16);
-    }
-
-    if (GET_LMASK(arg2) <= 1) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 2), mem_idx);
-        arg1 = (arg1 & 0xFFFF00FF) | (tmp << 8);
-    }
-
-    if (GET_LMASK(arg2) == 0) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 3), mem_idx);
-        arg1 = (arg1 & 0xFFFFFF00) | tmp;
-    }
-    return (int32_t)arg1;
-}
-
-target_ulong helper_lwr(CPUMIPSState *env, target_ulong arg1,
-                        target_ulong arg2, int mem_idx)
-{
-    target_ulong tmp;
-
-    tmp = do_lbu(env, arg2, mem_idx);
-    arg1 = (arg1 & 0xFFFFFF00) | tmp;
-
-    if (GET_LMASK(arg2) >= 1) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -1), mem_idx);
-        arg1 = (arg1 & 0xFFFF00FF) | (tmp << 8);
-    }
-
-    if (GET_LMASK(arg2) >= 2) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -2), mem_idx);
-        arg1 = (arg1 & 0xFF00FFFF) | (tmp << 16);
-    }
-
-    if (GET_LMASK(arg2) == 3) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -3), mem_idx);
-        arg1 = (arg1 & 0x00FFFFFF) | (tmp << 24);
-    }
-    return (int32_t)arg1;
-}
-
 void helper_swl(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
                 int mem_idx)
 {
@@ -425,98 +375,6 @@ void helper_swr(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
 #define GET_LMASK64(v) (((v) & 7) ^ 7)
 #endif
 
-target_ulong helper_ldl(CPUMIPSState *env, target_ulong arg1,
-                        target_ulong arg2, int mem_idx)
-{
-    uint64_t tmp;
-
-    tmp = do_lbu(env, arg2, mem_idx);
-    arg1 = (arg1 & 0x00FFFFFFFFFFFFFFULL) | (tmp << 56);
-
-    if (GET_LMASK64(arg2) <= 6) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 1), mem_idx);
-        arg1 = (arg1 & 0xFF00FFFFFFFFFFFFULL) | (tmp << 48);
-    }
-
-    if (GET_LMASK64(arg2) <= 5) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 2), mem_idx);
-        arg1 = (arg1 & 0xFFFF00FFFFFFFFFFULL) | (tmp << 40);
-    }
-
-    if (GET_LMASK64(arg2) <= 4) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 3), mem_idx);
-        arg1 = (arg1 & 0xFFFFFF00FFFFFFFFULL) | (tmp << 32);
-    }
-
-    if (GET_LMASK64(arg2) <= 3) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 4), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFF00FFFFFFULL) | (tmp << 24);
-    }
-
-    if (GET_LMASK64(arg2) <= 2) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 5), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFFFF00FFFFULL) | (tmp << 16);
-    }
-
-    if (GET_LMASK64(arg2) <= 1) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 6), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFFFFFF00FFULL) | (tmp << 8);
-    }
-
-    if (GET_LMASK64(arg2) == 0) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, 7), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFFFFFFFF00ULL) | tmp;
-    }
-
-    return arg1;
-}
-
-target_ulong helper_ldr(CPUMIPSState *env, target_ulong arg1,
-                        target_ulong arg2, int mem_idx)
-{
-    uint64_t tmp;
-
-    tmp = do_lbu(env, arg2, mem_idx);
-    arg1 = (arg1 & 0xFFFFFFFFFFFFFF00ULL) | tmp;
-
-    if (GET_LMASK64(arg2) >= 1) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -1), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFFFFFF00FFULL) | (tmp  << 8);
-    }
-
-    if (GET_LMASK64(arg2) >= 2) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -2), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFFFF00FFFFULL) | (tmp << 16);
-    }
-
-    if (GET_LMASK64(arg2) >= 3) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -3), mem_idx);
-        arg1 = (arg1 & 0xFFFFFFFF00FFFFFFULL) | (tmp << 24);
-    }
-
-    if (GET_LMASK64(arg2) >= 4) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -4), mem_idx);
-        arg1 = (arg1 & 0xFFFFFF00FFFFFFFFULL) | (tmp << 32);
-    }
-
-    if (GET_LMASK64(arg2) >= 5) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -5), mem_idx);
-        arg1 = (arg1 & 0xFFFF00FFFFFFFFFFULL) | (tmp << 40);
-    }
-
-    if (GET_LMASK64(arg2) >= 6) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -6), mem_idx);
-        arg1 = (arg1 & 0xFF00FFFFFFFFFFFFULL) | (tmp << 48);
-    }
-
-    if (GET_LMASK64(arg2) == 7) {
-        tmp = do_lbu(env, GET_OFFSET(arg2, -7), mem_idx);
-        arg1 = (arg1 & 0x00FFFFFFFFFFFFFFULL) | (tmp << 56);
-    }
-
-    return arg1;
-}
-
 void helper_sdl(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
                 int mem_idx)
 {
diff --git a/target-mips/translate.c b/target-mips/translate.c
index f7d9467..8a7462b 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1124,7 +1124,7 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
                     int rt, int base, int16_t offset)
 {
     const char *opn = "ld";
-    TCGv t0, t1;
+    TCGv t0, t1, t2;
 
     if (rt == 0 && env->insn_flags & (INSN_LOONGSON2E | INSN_LOONGSON2F)) {
         /* Loongson CPU uses a load to zero register for prefetch.
@@ -1158,21 +1158,47 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "lld";
         break;
     case OPC_LDL:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         t1 = tcg_temp_new();
+        tcg_gen_andi_tl(t1, t0, 7);
+#ifndef TARGET_WORDS_BIGENDIAN
+        tcg_gen_xori_tl(t1, t1, 7);
+#endif
+        tcg_gen_shli_tl(t1, t1, 3);
+        tcg_gen_andi_tl(t0, t0, ~7);
+        tcg_gen_qemu_ld64(t0, t0, ctx->mem_idx);
+        tcg_gen_shl_tl(t0, t0, t1);
+        tcg_gen_xori_tl(t1, t1, 63);
+        t2 = tcg_const_tl(0x7fffffffffffffffull);
+        tcg_gen_shr_tl(t2, t2, t1);
         gen_load_gpr(t1, rt);
-        gen_helper_1e2i(ldl, t1, t1, t0, ctx->mem_idx);
-        gen_store_gpr(t1, rt);
+        tcg_gen_and_tl(t1, t1, t2);
+        tcg_temp_free(t2);
+        tcg_gen_or_tl(t0, t0, t1);
         tcg_temp_free(t1);
+        gen_store_gpr(t0, rt);
         opn = "ldl";
         break;
     case OPC_LDR:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         t1 = tcg_temp_new();
+        tcg_gen_andi_tl(t1, t0, 7);
+#ifdef TARGET_WORDS_BIGENDIAN
+        tcg_gen_xori_tl(t1, t1, 7);
+#endif
+        tcg_gen_shli_tl(t1, t1, 3);
+        tcg_gen_andi_tl(t0, t0, ~7);
+        tcg_gen_qemu_ld64(t0, t0, ctx->mem_idx);
+        tcg_gen_shr_tl(t0, t0, t1);
+        tcg_gen_xori_tl(t1, t1, 63);
+        t2 = tcg_const_tl(0xfffffffffffffffeull);
+        tcg_gen_shl_tl(t2, t2, t1);
         gen_load_gpr(t1, rt);
-        gen_helper_1e2i(ldr, t1, t1, t0, ctx->mem_idx);
-        gen_store_gpr(t1, rt);
+        tcg_gen_and_tl(t1, t1, t2);
+        tcg_temp_free(t2);
+        tcg_gen_or_tl(t0, t0, t1);
         tcg_temp_free(t1);
+        gen_store_gpr(t0, rt);
         opn = "ldr";
         break;
     case OPC_LDPC:
@@ -1225,21 +1251,48 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "lbu";
         break;
     case OPC_LWL:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         t1 = tcg_temp_new();
+        tcg_gen_andi_tl(t1, t0, 3);
+#ifndef TARGET_WORDS_BIGENDIAN
+        tcg_gen_xori_tl(t1, t1, 3);
+#endif
+        tcg_gen_shli_tl(t1, t1, 3);
+        tcg_gen_andi_tl(t0, t0, ~3);
+        tcg_gen_qemu_ld32u(t0, t0, ctx->mem_idx);
+        tcg_gen_shl_tl(t0, t0, t1);
+        tcg_gen_xori_tl(t1, t1, 31);
+        t2 = tcg_const_tl(0x7fffffffull);
+        tcg_gen_shr_tl(t2, t2, t1);
         gen_load_gpr(t1, rt);
-        gen_helper_1e2i(lwl, t1, t1, t0, ctx->mem_idx);
-        gen_store_gpr(t1, rt);
+        tcg_gen_and_tl(t1, t1, t2);
+        tcg_temp_free(t2);
+        tcg_gen_or_tl(t0, t0, t1);
         tcg_temp_free(t1);
+        tcg_gen_ext32s_tl(t0, t0);
+        gen_store_gpr(t0, rt);
         opn = "lwl";
         break;
     case OPC_LWR:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         t1 = tcg_temp_new();
+        tcg_gen_andi_tl(t1, t0, 3);
+#ifdef TARGET_WORDS_BIGENDIAN
+        tcg_gen_xori_tl(t1, t1, 3);
+#endif
+        tcg_gen_shli_tl(t1, t1, 3);
+        tcg_gen_andi_tl(t0, t0, ~3);
+        tcg_gen_qemu_ld32u(t0, t0, ctx->mem_idx);
+        tcg_gen_shr_tl(t0, t0, t1);
+        tcg_gen_xori_tl(t1, t1, 31);
+        t2 = tcg_const_tl(0xfffffffeull);
+        tcg_gen_shl_tl(t2, t2, t1);
         gen_load_gpr(t1, rt);
-        gen_helper_1e2i(lwr, t1, t1, t0, ctx->mem_idx);
-        gen_store_gpr(t1, rt);
+        tcg_gen_and_tl(t1, t1, t2);
+        tcg_temp_free(t2);
+        tcg_gen_or_tl(t0, t0, t1);
         tcg_temp_free(t1);
+        gen_store_gpr(t0, rt);
         opn = "lwr";
         break;
     case OPC_LL:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 09/14] target-mips: don't use local temps for store conditional
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (7 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 08/14] target-mips: implement unaligned loads using TCG Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:31   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 10/14] target-mips: implement movn/movz using movcond Aurelien Jarno
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Store conditional operations only need local temps in user mode. Fix
the code to use temp local only in user mode, this spares two memory
stores in system mode.

At the same time remove a wrong a wrong copied & pasted comment,
store operations don't have a register destination.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 8a7462b..b6eb46a 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1375,13 +1375,14 @@ static void gen_st_cond (DisasContext *ctx, uint32_t opc, int rt,
     const char *opn = "st_cond";
     TCGv t0, t1;
 
+#ifdef CONFIG_USER_ONLY
     t0 = tcg_temp_local_new();
-
-    gen_base_offset_addr(ctx, t0, base, offset);
-    /* Don't do NOP if destination is zero: we must perform the actual
-       memory access. */
-
     t1 = tcg_temp_local_new();
+#else
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
+#endif
+    gen_base_offset_addr(ctx, t0, base, offset);
     gen_load_gpr(t1, rt);
     switch (opc) {
 #if defined(TARGET_MIPS64)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 10/14] target-mips: implement movn/movz using movcond
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (8 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 09/14] target-mips: don't use local temps for store conditional Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:33   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 11/14] target-mips: optimize ddiv/ddivu/div/divu with movcond Aurelien Jarno
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Avoid the branches in movn/movz implementation and replace them with
movcond. Also update a wrong command.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index b6eb46a..9a22432 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1920,35 +1920,32 @@ static void gen_cond_move(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
                           int rd, int rs, int rt)
 {
     const char *opn = "cond move";
-    int l1;
+    TCGv t0, t1, t2;
 
     if (rd == 0) {
-        /* If no destination, treat it as a NOP.
-           For add & sub, we must generate the overflow exception when needed. */
+        /* If no destination, treat it as a NOP. */
         MIPS_DEBUG("NOP");
         return;
     }
 
-    l1 = gen_new_label();
+    t0 = tcg_temp_new();
+    gen_load_gpr(t0, rt);
+    t1 = tcg_const_tl(0);
+    t2 = tcg_temp_new();
+    gen_load_gpr(t2, rs);
     switch (opc) {
     case OPC_MOVN:
-        if (likely(rt != 0))
-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[rt], 0, l1);
-        else
-            tcg_gen_br(l1);
+        tcg_gen_movcond_tl(TCG_COND_NE, cpu_gpr[rd], t0, t1, t2, cpu_gpr[rd]);
         opn = "movn";
         break;
     case OPC_MOVZ:
-        if (likely(rt != 0))
-            tcg_gen_brcondi_tl(TCG_COND_NE, cpu_gpr[rt], 0, l1);
+        tcg_gen_movcond_tl(TCG_COND_EQ, cpu_gpr[rd], t0, t1, t2, cpu_gpr[rd]);
         opn = "movz";
         break;
     }
-    if (rs != 0)
-        tcg_gen_mov_tl(cpu_gpr[rd], cpu_gpr[rs]);
-    else
-        tcg_gen_movi_tl(cpu_gpr[rd], 0);
-    gen_set_label(l1);
+    tcg_temp_free(t2);
+    tcg_temp_free(t1);
+    tcg_temp_free(t0);
 
     (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 11/14] target-mips: optimize ddiv/ddivu/div/divu with movcond
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (9 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 10/14] target-mips: implement movn/movz using movcond Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:38   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 12/14] target-mips: use deposit instead of hardcoded version Aurelien Jarno
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

The result of a division by 0, or a division of INT_MIN by -1 in the
signed case, is unpredictable. Just replace 0 by 1 in that case so that
it doesn't trigger a floating point exception on the host.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   89 ++++++++++++++++++++++-------------------------
 1 file changed, 41 insertions(+), 48 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 9a22432..7d87f66 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -2171,60 +2171,50 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
     const char *opn = "mul/div";
     TCGv t0, t1;
 
-    switch (opc) {
-    case OPC_DIV:
-    case OPC_DIVU:
-#if defined(TARGET_MIPS64)
-    case OPC_DDIV:
-    case OPC_DDIVU:
-#endif
-        t0 = tcg_temp_local_new();
-        t1 = tcg_temp_local_new();
-        break;
-    default:
-        t0 = tcg_temp_new();
-        t1 = tcg_temp_new();
-        break;
-    }
+    t0 = tcg_temp_new();
+    t1 = tcg_temp_new();
 
     gen_load_gpr(t0, rs);
     gen_load_gpr(t1, rt);
+
     switch (opc) {
     case OPC_DIV:
         {
-            int l1 = gen_new_label();
-            int l2 = gen_new_label();
-
+            TCGv t2 = tcg_temp_new();
+            TCGv t3 = tcg_temp_new();
+            TCGv t4 = tcg_const_tl(0);
             tcg_gen_ext32s_tl(t0, t0);
             tcg_gen_ext32s_tl(t1, t1);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, t1, 0, l1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, INT_MIN, l2);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t1, -1, l2);
-
-            tcg_gen_mov_tl(cpu_LO[0], t0);
-            tcg_gen_movi_tl(cpu_HI[0], 0);
-            tcg_gen_br(l1);
-            gen_set_label(l2);
+            tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, INT_MIN);
+            tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1);
+            tcg_gen_and_tl(t2, t2, t3);
+            tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
+            tcg_gen_or_tl(t2, t2, t3);
+            tcg_gen_movi_tl(t3, 1);
+            tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t4, t3, t1);
             tcg_gen_div_tl(cpu_LO[0], t0, t1);
             tcg_gen_rem_tl(cpu_HI[0], t0, t1);
             tcg_gen_ext32s_tl(cpu_LO[0], cpu_LO[0]);
             tcg_gen_ext32s_tl(cpu_HI[0], cpu_HI[0]);
-            gen_set_label(l1);
+            tcg_temp_free(t4);
+            tcg_temp_free(t3);
+            tcg_temp_free(t2);
         }
         opn = "div";
         break;
     case OPC_DIVU:
         {
-            int l1 = gen_new_label();
-
+            TCGv t2 = tcg_const_tl(0);
+            TCGv t3 = tcg_const_tl(1);
             tcg_gen_ext32u_tl(t0, t0);
             tcg_gen_ext32u_tl(t1, t1);
-            tcg_gen_brcondi_tl(TCG_COND_EQ, t1, 0, l1);
+            tcg_gen_movcond_tl(TCG_COND_EQ, t1, t1, t2, t3, t1);
             tcg_gen_divu_tl(cpu_LO[0], t0, t1);
             tcg_gen_remu_tl(cpu_HI[0], t0, t1);
             tcg_gen_ext32s_tl(cpu_LO[0], cpu_LO[0]);
             tcg_gen_ext32s_tl(cpu_HI[0], cpu_HI[0]);
-            gen_set_label(l1);
+            tcg_temp_free(t3);
+            tcg_temp_free(t2);
         }
         opn = "divu";
         break;
@@ -2269,30 +2259,33 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
 #if defined(TARGET_MIPS64)
     case OPC_DDIV:
         {
-            int l1 = gen_new_label();
-            int l2 = gen_new_label();
-
-            tcg_gen_brcondi_tl(TCG_COND_EQ, t1, 0, l1);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t0, -1LL << 63, l2);
-            tcg_gen_brcondi_tl(TCG_COND_NE, t1, -1LL, l2);
-            tcg_gen_mov_tl(cpu_LO[0], t0);
-            tcg_gen_movi_tl(cpu_HI[0], 0);
-            tcg_gen_br(l1);
-            gen_set_label(l2);
-            tcg_gen_div_i64(cpu_LO[0], t0, t1);
-            tcg_gen_rem_i64(cpu_HI[0], t0, t1);
-            gen_set_label(l1);
+            TCGv t2 = tcg_temp_new();
+            TCGv t3 = tcg_temp_new();
+            TCGv t4 = tcg_const_tl(0);
+            tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, -1LL << 63);
+            tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1LL);
+            tcg_gen_and_tl(t2, t2, t3);
+            tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
+            tcg_gen_or_tl(t2, t2, t3);
+            tcg_gen_movi_tl(t3, 1);
+            tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t4, t3, t1);
+            tcg_gen_div_tl(cpu_LO[0], t0, t1);
+            tcg_gen_rem_tl(cpu_HI[0], t0, t1);
+            tcg_temp_free(t4);
+            tcg_temp_free(t3);
+            tcg_temp_free(t2);
         }
         opn = "ddiv";
         break;
     case OPC_DDIVU:
         {
-            int l1 = gen_new_label();
-
-            tcg_gen_brcondi_tl(TCG_COND_EQ, t1, 0, l1);
+            TCGv t2 = tcg_const_tl(0);
+            TCGv t3 = tcg_const_tl(1);
+            tcg_gen_movcond_tl(TCG_COND_EQ, t1, t1, t2, t3, t1);
             tcg_gen_divu_i64(cpu_LO[0], t0, t1);
             tcg_gen_remu_i64(cpu_HI[0], t0, t1);
-            gen_set_label(l1);
+            tcg_temp_free(t3);
+            tcg_temp_free(t2);
         }
         opn = "ddivu";
         break;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 12/14] target-mips: use deposit instead of hardcoded version
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (10 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 11/14] target-mips: optimize ddiv/ddivu/div/divu with movcond Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:43   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 13/14] target-mips: fix TLBR wrt SEGMask Aurelien Jarno
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 14/14] target-mips: don't flush extra TLB on permissions upgrade Aurelien Jarno
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Use the deposit op instead of and hardcoded bit field insertion. It
allows the host to emit the corresponding instruction if available.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/translate.c |   27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 7d87f66..d996fd2 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -3406,7 +3406,6 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
 {
     TCGv t0 = tcg_temp_new();
     TCGv t1 = tcg_temp_new();
-    target_ulong mask;
 
     gen_load_gpr(t1, rs);
     switch (opc) {
@@ -3439,45 +3438,27 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
     case OPC_INS:
         if (lsb > msb)
             goto fail;
-        mask = ((msb - lsb + 1 < 32) ? ((1 << (msb - lsb + 1)) - 1) : ~0) << lsb;
         gen_load_gpr(t0, rt);
-        tcg_gen_andi_tl(t0, t0, ~mask);
-        tcg_gen_shli_tl(t1, t1, lsb);
-        tcg_gen_andi_tl(t1, t1, mask);
-        tcg_gen_or_tl(t0, t0, t1);
+        tcg_gen_deposit_tl(t0, t0, t1, lsb, msb - lsb + 1);
         tcg_gen_ext32s_tl(t0, t0);
         break;
 #if defined(TARGET_MIPS64)
     case OPC_DINSM:
         if (lsb > msb)
             goto fail;
-        mask = ((msb - lsb + 1 + 32 < 64) ? ((1ULL << (msb - lsb + 1 + 32)) - 1) : ~0ULL) << lsb;
-        gen_load_gpr(t0, rt);
-        tcg_gen_andi_tl(t0, t0, ~mask);
-        tcg_gen_shli_tl(t1, t1, lsb);
-        tcg_gen_andi_tl(t1, t1, mask);
-        tcg_gen_or_tl(t0, t0, t1);
+        tcg_gen_deposit_tl(t0, t0, t1, lsb, msb + 32 - lsb + 1);
         break;
     case OPC_DINSU:
         if (lsb > msb)
             goto fail;
-        mask = ((1ULL << (msb - lsb + 1)) - 1) << (lsb + 32);
         gen_load_gpr(t0, rt);
-        tcg_gen_andi_tl(t0, t0, ~mask);
-        tcg_gen_shli_tl(t1, t1, lsb + 32);
-        tcg_gen_andi_tl(t1, t1, mask);
-        tcg_gen_or_tl(t0, t0, t1);
+        tcg_gen_deposit_tl(t0, t0, t1, lsb + 32, msb - lsb + 1);
         break;
     case OPC_DINS:
         if (lsb > msb)
             goto fail;
         gen_load_gpr(t0, rt);
-        mask = ((1ULL << (msb - lsb + 1)) - 1) << lsb;
-        gen_load_gpr(t0, rt);
-        tcg_gen_andi_tl(t0, t0, ~mask);
-        tcg_gen_shli_tl(t1, t1, lsb);
-        tcg_gen_andi_tl(t1, t1, mask);
-        tcg_gen_or_tl(t0, t0, t1);
+        tcg_gen_deposit_tl(t0, t0, t1, lsb, msb - lsb + 1);
         break;
 #endif
     default:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 13/14] target-mips: fix TLBR wrt SEGMask
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (11 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 12/14] target-mips: use deposit instead of hardcoded version Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  2012-10-10 20:44   ` Richard Henderson
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 14/14] target-mips: don't flush extra TLB on permissions upgrade Aurelien Jarno
  13 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Like r4k_map_address(), r4k_helper_tlbp() should use SEGMask to mask the
address.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/op_helper.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 6ce27c1..ad5d1c2 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -1826,6 +1826,9 @@ void r4k_helper_tlbp(CPUMIPSState *env)
         mask = tlb->PageMask | ~(TARGET_PAGE_MASK << 1);
         tag = env->CP0_EntryHi & ~mask;
         VPN = tlb->VPN & ~mask;
+#if defined(TARGET_MIPS64)
+        tag &= env->SEGMask;
+#endif
         /* Check ASID, virtual page number & size */
         if ((tlb->G == 1 || tlb->ASID == ASID) && VPN == tag) {
             /* TLB match */
@@ -1841,6 +1844,9 @@ void r4k_helper_tlbp(CPUMIPSState *env)
             mask = tlb->PageMask | ~(TARGET_PAGE_MASK << 1);
             tag = env->CP0_EntryHi & ~mask;
             VPN = tlb->VPN & ~mask;
+#if defined(TARGET_MIPS64)
+        tag &= env->SEGMask;
+#endif
             /* Check ASID, virtual page number & size */
             if ((tlb->G == 1 || tlb->ASID == ASID) && VPN == tag) {
                 r4k_mips_tlb_flush_extra (env, i);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 14/14] target-mips: don't flush extra TLB on permissions upgrade
  2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
                   ` (12 preceding siblings ...)
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 13/14] target-mips: fix TLBR wrt SEGMask Aurelien Jarno
@ 2012-10-09 20:27 ` Aurelien Jarno
  13 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-09 20:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

If the guest uses a TLBWI instruction for upgrading permissions, we
don't need to flush the extra TLBs. This improve boot time performance
by about 10%.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-mips/op_helper.c |   28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index ad5d1c2..7b0b9fa 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -1789,14 +1789,32 @@ static void r4k_fill_tlb(CPUMIPSState *env, int idx)
 
 void r4k_helper_tlbwi(CPUMIPSState *env)
 {
+    r4k_tlb_t *tlb;
     int idx;
+    target_ulong VPN;
+    uint8_t ASID;
+    bool G, V0, D0, V1, D1;
 
     idx = (env->CP0_Index & ~0x80000000) % env->tlb->nb_tlb;
-
-    /* Discard cached TLB entries.  We could avoid doing this if the
-       tlbwi is just upgrading access permissions on the current entry;
-       that might be a further win.  */
-    r4k_mips_tlb_flush_extra (env, env->tlb->nb_tlb);
+    tlb = &env->tlb->mmu.r4k.tlb[idx];
+    VPN = env->CP0_EntryHi & (TARGET_PAGE_MASK << 1);
+#if defined(TARGET_MIPS64)
+    VPN &= env->SEGMask;
+#endif
+    ASID = env->CP0_EntryHi & 0xff;
+    G = env->CP0_EntryLo0 & env->CP0_EntryLo1 & 1;
+    V0 = (env->CP0_EntryLo0 & 2) != 0;
+    D0 = (env->CP0_EntryLo0 & 4) != 0;
+    V1 = (env->CP0_EntryLo1 & 2) != 0;
+    D1 = (env->CP0_EntryLo1 & 4) != 0;
+
+    /* Discard cached TLB entries, unless tlbwi is just upgrading access
+       permissions on the current entry. */
+    if (tlb->VPN != VPN || tlb->ASID != ASID || tlb->G != G ||
+        (tlb->V0 && !V0) || (tlb->D0 && !D0) ||
+        (tlb->V1 && !V1) || (tlb->D1 && !D1)) {
+        r4k_mips_tlb_flush_extra(env, env->tlb->nb_tlb);
+    }
 
     r4k_invalidate_tlb(env, idx, 0);
     r4k_fill_tlb(env, idx);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions Aurelien Jarno
@ 2012-10-10 19:58   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 19:58 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Use the new softfloat floatXX_muladd() functions to implement the madd,
> msub, nmadd and nmsub instructions. At the same time replace the name of
> the helpers by the name of the instruction, as the only reason for the
> previous names was to keep the macros simple.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 03/14] target-mips: fix FPU exceptions
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 03/14] target-mips: fix FPU exceptions Aurelien Jarno
@ 2012-10-10 20:05   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:05 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> -    return float64_sqrt(fdt0, &env->active_fpu.fp_status);
> +    set_float_exception_flags(0, &env->active_fpu.fp_status);
> +    fdt0 = float64_sqrt(fdt0, &env->active_fpu.fp_status);
> +    update_fcr31(env);
> +    return fdt0;

While accurate, I can't help but think there'd be less memory traffic
if the invariant "fp_status == 0" is maintained between insns.  Thus
you don't need to reset the flags to 0 in each insn, merely change

static inline void update_fcr31(CPUMIPSState *env)
{
    int tmp = ieee_ex_to_mips(get_float_exception_flags(&env->active_fpu.fp_status));
    if (tmp) {
        set_float_exception_flags(0, &env->active_fpu.fp_status);
        SET_FP_CAUSE(env->active_fpu.fcr31, tmp);
        if (GET_FP_ENABLE(env->active_fpu.fcr31) & tmp) {
            helper_raise_exception(env, EXCP_FPE);
        } else {
            UPDATE_FP_FLAGS(env->active_fpu.fcr31, tmp);
        }
    }
}

I'll also note that we don't get the proper PC for the trap there.
We don't save the PC in the translator before the insn, or, more
properly invoke do_restore_state.  That clearly ought to be a
separate change, however.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible Aurelien Jarno
@ 2012-10-10 20:09   ` Richard Henderson
  2012-10-16 23:26     ` Aurelien Jarno
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:09 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> softfloat already has a few constants defined, use them instead of
> redefining them in target-mips.
> 
> Rename FLOAT_SNAN32 and FLOAT_SNAN64 to FP_TO_INT32_OVERFLOW and
> FP_TO_INT64_OVERFLOW as even if they have the same value, they are
> technically different (and defined differently in the MIPS ISA).
> 
> Remove the unused constants.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>

> @@ -2495,8 +2491,9 @@ uint64_t helper_float_cvtl_d(CPUMIPSState *env, uint64_t fdt0)
>      set_float_exception_flags(0, &env->active_fpu.fp_status);
>      dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
>      update_fcr31(env);
> -    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
> -        dt2 = FLOAT_SNAN64;
> +    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
> +        dt2 = FP_TO_INT64_OVERFLOW;
> +    }
>      return dt2;

That said, the existing code you're patching is incorrect.

This code will fold to OVERFLOW if any previous operation caused an overflow,
not checking that the *current* operation caused an overflow.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 05/14] target-mips: cleanup load/store operations
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 05/14] target-mips: cleanup load/store operations Aurelien Jarno
@ 2012-10-10 20:10   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:10 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Load/store operations use macros for historical reasons. Now that there
> is no point in keeping them, replace them by direct calls to qemu_ld/st.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  target-mips/translate.c |   91 ++++++++++++++++-------------------------------
>  1 file changed, 31 insertions(+), 60 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 06/14] target-mips: optimize load operations
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 06/14] target-mips: optimize load operations Aurelien Jarno
@ 2012-10-10 20:11   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:11 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Only allocate t1 when needed.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 07/14] target-mips: simplify load/store microMIPS helpers
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 07/14] target-mips: simplify load/store microMIPS helpers Aurelien Jarno
@ 2012-10-10 20:15   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:15 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> load/store microMIPS helpers are reinventing the wheel. Call do_lw,
> do_ll, do_sw and do_sl instead of using a macro calling the cpu_*
> load/store functions.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 08/14] target-mips: implement unaligned loads using TCG
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 08/14] target-mips: implement unaligned loads using TCG Aurelien Jarno
@ 2012-10-10 20:28   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:28 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Load/store from helpers should be avoided as they are quite
> inefficient. Rewrite unaligned loads instructions using TCG and
> aligned loads. The number of actual loads operations to implement
> an unaligned load instruction is reduced from up to 8 to 1.
> 
> Note: As we can't rely on shift by 32 or 64 undefined behaviour,
> the code loads already shift by one constants.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 09/14] target-mips: don't use local temps for store conditional
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 09/14] target-mips: don't use local temps for store conditional Aurelien Jarno
@ 2012-10-10 20:31   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:31 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Store conditional operations only need local temps in user mode. Fix
> the code to use temp local only in user mode, this spares two memory
> stores in system mode.
> 
> At the same time remove a wrong a wrong copied & pasted comment,
> store operations don't have a register destination.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 10/14] target-mips: implement movn/movz using movcond
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 10/14] target-mips: implement movn/movz using movcond Aurelien Jarno
@ 2012-10-10 20:33   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:33 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Avoid the branches in movn/movz implementation and replace them with
> movcond. Also update a wrong command.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 11/14] target-mips: optimize ddiv/ddivu/div/divu with movcond
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 11/14] target-mips: optimize ddiv/ddivu/div/divu with movcond Aurelien Jarno
@ 2012-10-10 20:38   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:38 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> +            tcg_gen_setcondi_tl(TCG_COND_EQ, t2, t0, INT_MIN);
> +            tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, -1);
> +            tcg_gen_and_tl(t2, t2, t3);
> +            tcg_gen_setcondi_tl(TCG_COND_EQ, t3, t1, 0);
> +            tcg_gen_or_tl(t2, t2, t3);
> +            tcg_gen_movi_tl(t3, 1);
> +            tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t4, t3, t1);

You don't need t3.

  tcg_gen_movcond_tl(TCG_COND_NE, t1, t2, t4, t2, t1);

I.e. if true, t2 already contains 1.

That said, the patch isn't incorrect.

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] target-mips: use deposit instead of hardcoded version
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 12/14] target-mips: use deposit instead of hardcoded version Aurelien Jarno
@ 2012-10-10 20:43   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:43 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> Use the deposit op instead of and hardcoded bit field insertion. It
> allows the host to emit the corresponding instruction if available.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 13/14] target-mips: fix TLBR wrt SEGMask
  2012-10-09 20:27 ` [Qemu-devel] [PATCH 13/14] target-mips: fix TLBR wrt SEGMask Aurelien Jarno
@ 2012-10-10 20:44   ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2012-10-10 20:44 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
>              tag = env->CP0_EntryHi & ~mask;
>              VPN = tlb->VPN & ~mask;
> +#if defined(TARGET_MIPS64)
> +        tag &= env->SEGMask;
> +#endif
>              /* Check ASID, virtual page number & size */

Indentation.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible
  2012-10-10 20:09   ` Richard Henderson
@ 2012-10-16 23:26     ` Aurelien Jarno
  0 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2012-10-16 23:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Oct 10, 2012 at 01:09:49PM -0700, Richard Henderson wrote:
> On 10/09/2012 01:27 PM, Aurelien Jarno wrote:
> > softfloat already has a few constants defined, use them instead of
> > redefining them in target-mips.
> > 
> > Rename FLOAT_SNAN32 and FLOAT_SNAN64 to FP_TO_INT32_OVERFLOW and
> > FP_TO_INT64_OVERFLOW as even if they have the same value, they are
> > technically different (and defined differently in the MIPS ISA).
> > 
> > Remove the unused constants.
> > 
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> 
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> 
> > @@ -2495,8 +2491,9 @@ uint64_t helper_float_cvtl_d(CPUMIPSState *env, uint64_t fdt0)
> >      set_float_exception_flags(0, &env->active_fpu.fp_status);
> >      dt2 = float64_to_int64(fdt0, &env->active_fpu.fp_status);
> >      update_fcr31(env);
> > -    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID))
> > -        dt2 = FLOAT_SNAN64;
> > +    if (GET_FP_CAUSE(env->active_fpu.fcr31) & (FP_OVERFLOW | FP_INVALID)) {
> > +        dt2 = FP_TO_INT64_OVERFLOW;
> > +    }
> >      return dt2;
> 
> That said, the existing code you're patching is incorrect.
> 
> This code will fold to OVERFLOW if any previous operation caused an overflow,
> not checking that the *current* operation caused an overflow.
> 

While I agree it should check the softfloat flags instead, I disagree it
is wrong. The part that GET_FP_CAUSE() is looking at is not the
accumulated flags, but the flags for the last instruction.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions
@ 2013-01-11 13:23 Tom de Vries
  0 siblings, 0 replies; 29+ messages in thread
From: Tom de Vries @ 2013-01-11 13:23 UTC (permalink / raw)
  To: aurelien; +Cc: qemu-devel, Richard Henderson

Aurelien,

> @@ -8307,7 +8307,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t 
> opc,
>              gen_load_fpr64(ctx, fp0, fs);
>              gen_load_fpr64(ctx, fp1, ft);
>              gen_load_fpr64(ctx, fp2, fr);
> -            gen_helper_float_muladd_d(fp2, cpu_env, fp0, fp1, fp2);
> +            gen_helper_float_madd_d(fp2, cpu_env, fp0, fp1, fp2);
>              tcg_temp_free_i64(fp0);
>              tcg_temp_free_i64(fp1);
>              gen_store_fpr64(ctx, fp2, fd);

AFAIU:
- you're replacing here a non-fused mAC with a fused MAC and
- for all mips cores (except the r8000) the madd.d is non-fused.
So shouldn't we use a non-fused MAC here?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-01-11 13:24 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-09 20:27 [Qemu-devel] [PATCH 00/14] target-mips: misc fixes and optimizations Aurelien Jarno
2012-10-09 20:27 ` [Qemu-devel] [PATCH 01/14] softfloat: implement fused multiply-add NaN propagation for MIPS Aurelien Jarno
2012-10-09 20:27 ` [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions Aurelien Jarno
2012-10-10 19:58   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 03/14] target-mips: fix FPU exceptions Aurelien Jarno
2012-10-10 20:05   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 04/14] target-mips: use softfloat constants when possible Aurelien Jarno
2012-10-10 20:09   ` Richard Henderson
2012-10-16 23:26     ` Aurelien Jarno
2012-10-09 20:27 ` [Qemu-devel] [PATCH 05/14] target-mips: cleanup load/store operations Aurelien Jarno
2012-10-10 20:10   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 06/14] target-mips: optimize load operations Aurelien Jarno
2012-10-10 20:11   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 07/14] target-mips: simplify load/store microMIPS helpers Aurelien Jarno
2012-10-10 20:15   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 08/14] target-mips: implement unaligned loads using TCG Aurelien Jarno
2012-10-10 20:28   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 09/14] target-mips: don't use local temps for store conditional Aurelien Jarno
2012-10-10 20:31   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 10/14] target-mips: implement movn/movz using movcond Aurelien Jarno
2012-10-10 20:33   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 11/14] target-mips: optimize ddiv/ddivu/div/divu with movcond Aurelien Jarno
2012-10-10 20:38   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 12/14] target-mips: use deposit instead of hardcoded version Aurelien Jarno
2012-10-10 20:43   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 13/14] target-mips: fix TLBR wrt SEGMask Aurelien Jarno
2012-10-10 20:44   ` Richard Henderson
2012-10-09 20:27 ` [Qemu-devel] [PATCH 14/14] target-mips: don't flush extra TLB on permissions upgrade Aurelien Jarno
2013-01-11 13:23 [Qemu-devel] [PATCH 02/14] target-mips: use the softfloat floatXX_muladd functions Tom de Vries

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.