All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
@ 2020-06-26 20:59 ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

Multiple precison shoule be supported by NaN-boxing. That means, we should
flush not valid NaN-boxing input to canonical NaN before effective
calculation and we should NaN-boxing the result after the effective
calculation.

In this patch set, split the implementation to three steps for compute,
sign-injection, and some covert insns, which are check_nanboxed,
effective calculation and gen_nanbox_fpr.

Check_nanboxed checks the inputs and flushes not valid inputs to cancical NaN.
Effective calculation is direct calculation on fp32 values.
Gen_nanbox_fpr does the NaN-boxing, writing the 1s to upper 32 bits.

LIU Zhiwei (6):
  target/riscv: move gen_nanbox_fpr to translate.c
  target/riscv: NaN-boxing compute, sign-injection and convert
    instructions.
  target/riscv: Check for LEGAL NaN-boxing
  target/riscv: check before allocating TCG temps
  target/riscv: Flush not valid NaN-boxing input to canonical NaN
  target/riscv: clean up fmv.w.x

 target/riscv/insn_trans/trans_rvd.inc.c |  16 +-
 target/riscv/insn_trans/trans_rvf.inc.c | 317 +++++++++++++++++++-----
 target/riscv/translate.c                |  43 ++++
 3 files changed, 306 insertions(+), 70 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
@ 2020-06-26 20:59 ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

Multiple precison shoule be supported by NaN-boxing. That means, we should
flush not valid NaN-boxing input to canonical NaN before effective
calculation and we should NaN-boxing the result after the effective
calculation.

In this patch set, split the implementation to three steps for compute,
sign-injection, and some covert insns, which are check_nanboxed,
effective calculation and gen_nanbox_fpr.

Check_nanboxed checks the inputs and flushes not valid inputs to cancical NaN.
Effective calculation is direct calculation on fp32 values.
Gen_nanbox_fpr does the NaN-boxing, writing the 1s to upper 32 bits.

LIU Zhiwei (6):
  target/riscv: move gen_nanbox_fpr to translate.c
  target/riscv: NaN-boxing compute, sign-injection and convert
    instructions.
  target/riscv: Check for LEGAL NaN-boxing
  target/riscv: check before allocating TCG temps
  target/riscv: Flush not valid NaN-boxing input to canonical NaN
  target/riscv: clean up fmv.w.x

 target/riscv/insn_trans/trans_rvd.inc.c |  16 +-
 target/riscv/insn_trans/trans_rvf.inc.c | 317 +++++++++++++++++++-----
 target/riscv/translate.c                |  43 ++++
 3 files changed, 306 insertions(+), 70 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/6] target/riscv: move gen_nanbox_fpr to translate.c
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 20:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

As this function will be used by fcvt.d.s in trans_rvd.inc.c,
make it a visible function for RVF and RVD.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 14 --------------
 target/riscv/translate.c                | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 3bfd8881e7..0d5ce373cb 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -23,20 +23,6 @@
         return false;                       \
 } while (0)
 
-/*
- * RISC-V requires NaN-boxing of narrower width floating
- * point values.  This applies when a 32-bit value is
- * assigned to a 64-bit FP register.  Thus this does not
- * apply when the RVD extension is not present.
- */
-static void gen_nanbox_fpr(DisasContext *ctx, int regno)
-{
-    if (has_ext(ctx, RVD)) {
-        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
-                        MAKE_64BIT_MASK(32, 32));
-    }
-}
-
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
     TCGv t0 = tcg_temp_new();
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 9632e79cf3..4b1534c9a6 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -90,6 +90,20 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
     return ctx->misa & ext;
 }
 
+/*
+ * RISC-V requires NaN-boxing of narrower width floating
+ * point values.  This applies when a 32-bit value is
+ * assigned to a 64-bit FP register.  Thus this does not
+ * apply when the RVD extension is not present.
+ */
+static void gen_nanbox_fpr(DisasContext *ctx, int regno)
+{
+    if (has_ext(ctx, RVD)) {
+        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
+                        MAKE_64BIT_MASK(32, 32));
+    }
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 1/6] target/riscv: move gen_nanbox_fpr to translate.c
@ 2020-06-26 20:59   ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

As this function will be used by fcvt.d.s in trans_rvd.inc.c,
make it a visible function for RVF and RVD.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 14 --------------
 target/riscv/translate.c                | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 3bfd8881e7..0d5ce373cb 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -23,20 +23,6 @@
         return false;                       \
 } while (0)
 
-/*
- * RISC-V requires NaN-boxing of narrower width floating
- * point values.  This applies when a 32-bit value is
- * assigned to a 64-bit FP register.  Thus this does not
- * apply when the RVD extension is not present.
- */
-static void gen_nanbox_fpr(DisasContext *ctx, int regno)
-{
-    if (has_ext(ctx, RVD)) {
-        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
-                        MAKE_64BIT_MASK(32, 32));
-    }
-}
-
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
     TCGv t0 = tcg_temp_new();
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 9632e79cf3..4b1534c9a6 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -90,6 +90,20 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
     return ctx->misa & ext;
 }
 
+/*
+ * RISC-V requires NaN-boxing of narrower width floating
+ * point values.  This applies when a 32-bit value is
+ * assigned to a 64-bit FP register.  Thus this does not
+ * apply when the RVD extension is not present.
+ */
+static void gen_nanbox_fpr(DisasContext *ctx, int regno)
+{
+    if (has_ext(ctx, RVD)) {
+        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
+                        MAKE_64BIT_MASK(32, 32));
+    }
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/6] target/riscv: NaN-boxing compute, sign-injection and convert instructions.
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 20:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

An n-bit foating-point result is written to the n least-significant bits
of the destination f register, with all 1s written to the uppermost
FLEN - n bits to yield a legal NaN-boxed value

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvd.inc.c |  1 +
 target/riscv/insn_trans/trans_rvf.inc.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index ea1044f13b..cd73a326f4 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -230,6 +230,7 @@ static bool trans_fcvt_s_d(DisasContext *ctx, arg_fcvt_s_d *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_d(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     return true;
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 0d5ce373cb..a3d74dd83d 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -61,6 +61,7 @@ static bool trans_fmadd_s(DisasContext *ctx, arg_fmadd_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -72,6 +73,7 @@ static bool trans_fmsub_s(DisasContext *ctx, arg_fmsub_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -83,6 +85,7 @@ static bool trans_fnmsub_s(DisasContext *ctx, arg_fnmsub_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                         cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -95,6 +98,7 @@ static bool trans_fnmadd_s(DisasContext *ctx, arg_fnmadd_s *a)
     gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                         cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
     mark_fs_dirty(ctx);
+    gen_nanbox_fpr(ctx, a->rd);
     return true;
 }
 
@@ -106,6 +110,7 @@ static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -118,6 +123,7 @@ static bool trans_fsub_s(DisasContext *ctx, arg_fsub_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -130,6 +136,7 @@ static bool trans_fmul_s(DisasContext *ctx, arg_fmul_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -142,6 +149,7 @@ static bool trans_fdiv_s(DisasContext *ctx, arg_fdiv_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -153,6 +161,7 @@ static bool trans_fsqrt_s(DisasContext *ctx, arg_fsqrt_s *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -167,6 +176,7 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
                             0, 31);
     }
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -183,6 +193,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -199,6 +210,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
         tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -210,6 +222,7 @@ static bool trans_fmin_s(DisasContext *ctx, arg_fmin_s *a)
 
     gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                       cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -221,6 +234,7 @@ static bool trans_fmax_s(DisasContext *ctx, arg_fmax_s *a)
 
     gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                       cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -331,6 +345,7 @@ static bool trans_fcvt_s_w(DisasContext *ctx, arg_fcvt_s_w *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_w(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -348,6 +363,7 @@ static bool trans_fcvt_s_wu(DisasContext *ctx, arg_fcvt_s_wu *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_wu(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -369,6 +385,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
 #else
     tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
 #endif
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -413,6 +430,7 @@ static bool trans_fcvt_s_l(DisasContext *ctx, arg_fcvt_s_l *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_l(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -429,6 +447,7 @@ static bool trans_fcvt_s_lu(DisasContext *ctx, arg_fcvt_s_lu *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_lu(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/6] target/riscv: NaN-boxing compute, sign-injection and convert instructions.
@ 2020-06-26 20:59   ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

An n-bit foating-point result is written to the n least-significant bits
of the destination f register, with all 1s written to the uppermost
FLEN - n bits to yield a legal NaN-boxed value

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvd.inc.c |  1 +
 target/riscv/insn_trans/trans_rvf.inc.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index ea1044f13b..cd73a326f4 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -230,6 +230,7 @@ static bool trans_fcvt_s_d(DisasContext *ctx, arg_fcvt_s_d *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_d(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     return true;
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 0d5ce373cb..a3d74dd83d 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -61,6 +61,7 @@ static bool trans_fmadd_s(DisasContext *ctx, arg_fmadd_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -72,6 +73,7 @@ static bool trans_fmsub_s(DisasContext *ctx, arg_fmsub_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -83,6 +85,7 @@ static bool trans_fnmsub_s(DisasContext *ctx, arg_fnmsub_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                         cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -95,6 +98,7 @@ static bool trans_fnmadd_s(DisasContext *ctx, arg_fnmadd_s *a)
     gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                         cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
     mark_fs_dirty(ctx);
+    gen_nanbox_fpr(ctx, a->rd);
     return true;
 }
 
@@ -106,6 +110,7 @@ static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -118,6 +123,7 @@ static bool trans_fsub_s(DisasContext *ctx, arg_fsub_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -130,6 +136,7 @@ static bool trans_fmul_s(DisasContext *ctx, arg_fmul_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -142,6 +149,7 @@ static bool trans_fdiv_s(DisasContext *ctx, arg_fdiv_s *a)
     gen_set_rm(ctx, a->rm);
     gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
                       cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -153,6 +161,7 @@ static bool trans_fsqrt_s(DisasContext *ctx, arg_fsqrt_s *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -167,6 +176,7 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
                             0, 31);
     }
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -183,6 +193,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -199,6 +210,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
         tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -210,6 +222,7 @@ static bool trans_fmin_s(DisasContext *ctx, arg_fmin_s *a)
 
     gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                       cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -221,6 +234,7 @@ static bool trans_fmax_s(DisasContext *ctx, arg_fmax_s *a)
 
     gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
                       cpu_fpr[a->rs2]);
+    gen_nanbox_fpr(ctx, a->rd);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -331,6 +345,7 @@ static bool trans_fcvt_s_w(DisasContext *ctx, arg_fcvt_s_w *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_w(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -348,6 +363,7 @@ static bool trans_fcvt_s_wu(DisasContext *ctx, arg_fcvt_s_wu *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_wu(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -369,6 +385,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
 #else
     tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
 #endif
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -413,6 +430,7 @@ static bool trans_fcvt_s_l(DisasContext *ctx, arg_fcvt_s_l *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_l(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
@@ -429,6 +447,7 @@ static bool trans_fcvt_s_lu(DisasContext *ctx, arg_fcvt_s_lu *a)
 
     gen_set_rm(ctx, a->rm);
     gen_helper_fcvt_s_lu(cpu_fpr[a->rd], cpu_env, t0);
+    gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 20:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

A narrow n-bit operation, where n < FLEN, checks that input operands
are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
If so, the n least-significant bits of the input are used as the input value,
otherwise the input value is treated as an n-bit canonical NaN.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/translate.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 4b1534c9a6..1c9b809d4a 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -104,6 +104,35 @@ static void gen_nanbox_fpr(DisasContext *ctx, int regno)
     }
 }
 
+/*
+ * A narrow n-bit operation, where n < FLEN, checks that input operands
+ * are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
+ * If so, the n least-signicant bits of the input are used as the input value,
+ * otherwise the input value is treated as an n-bit canonical NaN.
+ * (riscv-spec-v2.2 Section 9.2).
+ */
+static void check_nanboxed(DisasContext *ctx, int num, ...)
+{
+    if (has_ext(ctx, RVD)) {
+        int i;
+        TCGv_i64 cond1 = tcg_temp_new_i64();
+        TCGv_i64 t_nan = tcg_const_i64(0x7fc00000);
+        TCGv_i64 t_max = tcg_const_i64(MAKE_64BIT_MASK(32, 32));
+        va_list valist;
+        va_start(valist, num);
+
+        for (i = 0; i < num; i++) {
+            TCGv_i64 t = va_arg(valist, TCGv_i64);
+            tcg_gen_movcond_i64(TCG_COND_GEU, t, t, t_max, t, t_nan);
+        }
+
+        va_end(valist);
+        tcg_temp_free_i64(cond1);
+        tcg_temp_free_i64(t_nan);
+        tcg_temp_free_i64(t_max);
+    }
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing
@ 2020-06-26 20:59   ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

A narrow n-bit operation, where n < FLEN, checks that input operands
are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
If so, the n least-significant bits of the input are used as the input value,
otherwise the input value is treated as an n-bit canonical NaN.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/translate.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 4b1534c9a6..1c9b809d4a 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -104,6 +104,35 @@ static void gen_nanbox_fpr(DisasContext *ctx, int regno)
     }
 }
 
+/*
+ * A narrow n-bit operation, where n < FLEN, checks that input operands
+ * are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
+ * If so, the n least-signicant bits of the input are used as the input value,
+ * otherwise the input value is treated as an n-bit canonical NaN.
+ * (riscv-spec-v2.2 Section 9.2).
+ */
+static void check_nanboxed(DisasContext *ctx, int num, ...)
+{
+    if (has_ext(ctx, RVD)) {
+        int i;
+        TCGv_i64 cond1 = tcg_temp_new_i64();
+        TCGv_i64 t_nan = tcg_const_i64(0x7fc00000);
+        TCGv_i64 t_max = tcg_const_i64(MAKE_64BIT_MASK(32, 32));
+        va_list valist;
+        va_start(valist, num);
+
+        for (i = 0; i < num; i++) {
+            TCGv_i64 t = va_arg(valist, TCGv_i64);
+            tcg_gen_movcond_i64(TCG_COND_GEU, t, t, t_max, t, t_nan);
+        }
+
+        va_end(valist);
+        tcg_temp_free_i64(cond1);
+        tcg_temp_free_i64(t_nan);
+        tcg_temp_free_i64(t_max);
+    }
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/6] target/riscv: check before allocating TCG temps
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 20:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
 target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index cd73a326f4..c0f4a0c789 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -20,10 +20,10 @@
 
 static bool trans_fld(DisasContext *ctx, arg_fld *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
@@ -35,10 +35,10 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
 
 static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index a3d74dd83d..04bc8e5cb5 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -25,10 +25,10 @@
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
@@ -41,11 +41,11 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
 
 static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 {
+    REQUIRE_FPU;
+    REQUIRE_EXT(ctx, RVF);
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-    REQUIRE_FPU;
-    REQUIRE_EXT(ctx, RVF);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/6] target/riscv: check before allocating TCG temps
@ 2020-06-26 20:59   ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
 target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index cd73a326f4..c0f4a0c789 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -20,10 +20,10 @@
 
 static bool trans_fld(DisasContext *ctx, arg_fld *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
@@ -35,10 +35,10 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
 
 static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index a3d74dd83d..04bc8e5cb5 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -25,10 +25,10 @@
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
@@ -41,11 +41,11 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
 
 static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 {
+    REQUIRE_FPU;
+    REQUIRE_EXT(ctx, RVF);
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-    REQUIRE_FPU;
-    REQUIRE_EXT(ctx, RVF);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 20:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
 target/riscv/insn_trans/trans_rvf.inc.c | 272 ++++++++++++++++++++----
 2 files changed, 235 insertions(+), 44 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index c0f4a0c789..16947ea6da 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext *ctx, arg_fcvt_d_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
 
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 04bc8e5cb5..b0379b9d1f 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx, arg_fmadd_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx, arg_fmsub_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx, arg_fnmsub_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx, arg_fnmadd_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
-    mark_fs_dirty(ctx);
+    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
+    mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx, arg_fsub_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx, arg_fmul_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx, arg_fdiv_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx, arg_fsqrt_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     if (a->rs1 == a->rs2) { /* FMOV */
-        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
+        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
     } else { /* FSGNJ */
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
-                            0, 31);
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
     }
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     if (a->rs1 == a->rs2) { /* FNEG */
-        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
+        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
+        tcg_gen_not_i64(t0, t2);
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
         tcg_temp_free_i64(t0);
     }
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     if (a->rs1 == a->rs2) { /* FABS */
-        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
+        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
-        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
+        tcg_gen_andi_i64(t0, t2, INT32_MIN);
+        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
         tcg_temp_free_i64(t0);
     }
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx, arg_fmin_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
-    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                      cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx, arg_fmax_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
-    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                      cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext *ctx, arg_fcvt_w_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_w_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
-    tcg_temp_free(t0);
 
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext *ctx, arg_fcvt_wu_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
-    tcg_temp_free(t0);
 
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx, arg_feq_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     TCGv t0 = tcg_temp_new();
-    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_feq_s(t0, cpu_env, t1, t2);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx, arg_flt_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     TCGv t0 = tcg_temp_new();
-    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_flt_s(t0, cpu_env, t1, t2);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx, arg_fle_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     TCGv t0 = tcg_temp_new();
-    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_fle_s(t0, cpu_env, t1, t2);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext *ctx, arg_fclass_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
 
-    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
-
+    gen_helper_fclass_s(t0, t1);
     gen_set_gpr(a->rd, t0);
-    tcg_temp_free(t0);
 
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext *ctx, arg_fcvt_l_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_l_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext *ctx, arg_fcvt_lu_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
@ 2020-06-26 20:59   ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
 target/riscv/insn_trans/trans_rvf.inc.c | 272 ++++++++++++++++++++----
 2 files changed, 235 insertions(+), 44 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index c0f4a0c789..16947ea6da 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext *ctx, arg_fcvt_d_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
 
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 04bc8e5cb5..b0379b9d1f 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx, arg_fmadd_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx, arg_fmsub_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx, arg_fnmsub_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
+    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx, arg_fnmadd_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
+    check_nanboxed(ctx, 3, t1, t2, t3);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
-    mark_fs_dirty(ctx);
+    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
     gen_nanbox_fpr(ctx, a->rd);
+
+    mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
     return true;
 }
 
@@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx, arg_fsub_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx, arg_fmul_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx, arg_fdiv_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
-                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx, arg_fsqrt_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     if (a->rs1 == a->rs2) { /* FMOV */
-        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
+        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
     } else { /* FSGNJ */
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
-                            0, 31);
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
     }
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     if (a->rs1 == a->rs2) { /* FNEG */
-        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
+        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
+        tcg_gen_not_i64(t0, t2);
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
         tcg_temp_free_i64(t0);
     }
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
     if (a->rs1 == a->rs2) { /* FABS */
-        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
+        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
-        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
+        tcg_gen_andi_i64(t0, t2, INT32_MIN);
+        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
         tcg_temp_free_i64(t0);
     }
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx, arg_fmin_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
-    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                      cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx, arg_fmax_s *a)
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
 
-    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
-                      cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
     gen_nanbox_fpr(ctx, a->rd);
+
     mark_fs_dirty(ctx);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext *ctx, arg_fcvt_w_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_w_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
-    tcg_temp_free(t0);
 
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext *ctx, arg_fcvt_wu_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
-    tcg_temp_free(t0);
 
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx, arg_feq_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     TCGv t0 = tcg_temp_new();
-    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_feq_s(t0, cpu_env, t1, t2);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx, arg_flt_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     TCGv t0 = tcg_temp_new();
-    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_flt_s(t0, cpu_env, t1, t2);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx, arg_fle_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     TCGv t0 = tcg_temp_new();
-    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
+    check_nanboxed(ctx, 2, t1, t2);
+
+    gen_helper_fle_s(t0, cpu_env, t1, t2);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
     return true;
 }
 
@@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext *ctx, arg_fclass_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
 
-    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
-
+    gen_helper_fclass_s(t0, t1);
     gen_set_gpr(a->rd, t0);
-    tcg_temp_free(t0);
 
+    tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext *ctx, arg_fcvt_l_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_l_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
@@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext *ctx, arg_fcvt_lu_s *a)
     REQUIRE_EXT(ctx, RVF);
 
     TCGv t0 = tcg_temp_new();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
+    check_nanboxed(ctx, 1, t1);
+
     gen_set_rm(ctx, a->rm);
-    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
+    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
     gen_set_gpr(a->rd, t0);
+
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
     return true;
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/6] target/riscv: clean up fmv.w.x
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 20:59   ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, wxy194768, wenmeng_zhang, Alistair.Francis,
	palmer, LIU Zhiwei, ianjiang.ict

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index b0379b9d1f..fabcd0eccf 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -554,11 +554,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-#if defined(TARGET_RISCV64)
-    tcg_gen_mov_i64(cpu_fpr[a->rd], t0);
-#else
-    tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
-#endif
+    tcg_gen_extu_tl_i64(cpu_fpr[a->rd], t0);
     gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/6] target/riscv: clean up fmv.w.x
@ 2020-06-26 20:59   ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-26 20:59 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: richard.henderson, Alistair.Francis, palmer, wenmeng_zhang,
	wxy194768, ianjiang.ict, LIU Zhiwei

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index b0379b9d1f..fabcd0eccf 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -554,11 +554,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-#if defined(TARGET_RISCV64)
-    tcg_gen_mov_i64(cpu_fpr[a->rd], t0);
-#else
-    tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
-#endif
+    tcg_gen_extu_tl_i64(cpu_fpr[a->rd], t0);
     gen_nanbox_fpr(ctx, a->rd);
 
     mark_fs_dirty(ctx);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-06-26 21:21   ` no-reply
  -1 siblings, 0 replies; 40+ messages in thread
From: no-reply @ 2020-06-26 21:21 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-riscv, richard.henderson, qemu-devel, wxy194768,
	wenmeng_zhang, Alistair.Francis, palmer, zhiwei_liu,
	ianjiang.ict

Patchew URL: https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=c28f1fcf68b949df96a6698f1d098a1e', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-2v7la0ef/src/docker-src.2020-06-26-17.16.29.17531:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=c28f1fcf68b949df96a6698f1d098a1e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-2v7la0ef/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    5m18.805s
user    0m9.201s


The full log is available at
http://patchew.org/logs/20200626205917.4545-1-zhiwei_liu@c-sky.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
@ 2020-06-26 21:21   ` no-reply
  0 siblings, 0 replies; 40+ messages in thread
From: no-reply @ 2020-06-26 21:21 UTC (permalink / raw)
  To: zhiwei_liu
  Cc: qemu-devel, qemu-riscv, richard.henderson, wxy194768,
	wenmeng_zhang, Alistair.Francis, palmer, zhiwei_liu,
	ianjiang.ict

Patchew URL: https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=c28f1fcf68b949df96a6698f1d098a1e', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-2v7la0ef/src/docker-src.2020-06-26-17.16.29.17531:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=c28f1fcf68b949df96a6698f1d098a1e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-2v7la0ef/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    5m18.805s
user    0m9.201s


The full log is available at
http://patchew.org/logs/20200626205917.4545-1-zhiwei_liu@c-sky.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing
  2020-06-26 20:59   ` LIU Zhiwei
@ 2020-06-30  7:20     ` Chih-Min Chao
  -1 siblings, 0 replies; 40+ messages in thread
From: Chih-Min Chao @ 2020-06-30  7:20 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, wenmeng_zhang,
	Alistair Francis, Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]

On Sat, Jun 27, 2020 at 5:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

> A narrow n-bit operation, where n < FLEN, checks that input operands
> are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
> If so, the n least-significant bits of the input are used as the input
> value,
> otherwise the input value is treated as an n-bit canonical NaN.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/translate.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 4b1534c9a6..1c9b809d4a 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -104,6 +104,35 @@ static void gen_nanbox_fpr(DisasContext *ctx, int
> regno)
>      }
>  }
>
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the n least-signicant bits of the input are used as the input
> value,
> + * otherwise the input value is treated as an n-bit canonical NaN.
> + * (riscv-spec-v2.2 Section 9.2).
> + */
> +static void check_nanboxed(DisasContext *ctx, int num, ...)
> +{
> +    if (has_ext(ctx, RVD)) {
> +        int i;
> +        TCGv_i64 cond1 = tcg_temp_new_i64();
>
forget to remove ?

> +        TCGv_i64 t_nan = tcg_const_i64(0x7fc00000);
> +        TCGv_i64 t_max = tcg_const_i64(MAKE_64BIT_MASK(32, 32));
> +        va_list valist;
> +        va_start(valist, num);
> +
> +        for (i = 0; i < num; i++) {
> +            TCGv_i64 t = va_arg(valist, TCGv_i64);
> +            tcg_gen_movcond_i64(TCG_COND_GEU, t, t, t_max, t, t_nan);
> +        }
> +
> +        va_end(valist);
> +        tcg_temp_free_i64(cond1);
>
forget to remove ?

> +        tcg_temp_free_i64(t_nan);
> +        tcg_temp_free_i64(t_max);
> +    }
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.23.0
>
>

 Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 3182 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing
@ 2020-06-30  7:20     ` Chih-Min Chao
  0 siblings, 0 replies; 40+ messages in thread
From: Chih-Min Chao @ 2020-06-30  7:20 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, wxy194768, wenmeng_zhang, Alistair Francis,
	Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]

On Sat, Jun 27, 2020 at 5:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

> A narrow n-bit operation, where n < FLEN, checks that input operands
> are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
> If so, the n least-significant bits of the input are used as the input
> value,
> otherwise the input value is treated as an n-bit canonical NaN.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/translate.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 4b1534c9a6..1c9b809d4a 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -104,6 +104,35 @@ static void gen_nanbox_fpr(DisasContext *ctx, int
> regno)
>      }
>  }
>
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the n least-signicant bits of the input are used as the input
> value,
> + * otherwise the input value is treated as an n-bit canonical NaN.
> + * (riscv-spec-v2.2 Section 9.2).
> + */
> +static void check_nanboxed(DisasContext *ctx, int num, ...)
> +{
> +    if (has_ext(ctx, RVD)) {
> +        int i;
> +        TCGv_i64 cond1 = tcg_temp_new_i64();
>
forget to remove ?

> +        TCGv_i64 t_nan = tcg_const_i64(0x7fc00000);
> +        TCGv_i64 t_max = tcg_const_i64(MAKE_64BIT_MASK(32, 32));
> +        va_list valist;
> +        va_start(valist, num);
> +
> +        for (i = 0; i < num; i++) {
> +            TCGv_i64 t = va_arg(valist, TCGv_i64);
> +            tcg_gen_movcond_i64(TCG_COND_GEU, t, t, t_max, t, t_nan);
> +        }
> +
> +        va_end(valist);
> +        tcg_temp_free_i64(cond1);
>
forget to remove ?

> +        tcg_temp_free_i64(t_nan);
> +        tcg_temp_free_i64(t_max);
> +    }
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.23.0
>
>

 Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 3182 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing
  2020-06-30  7:20     ` Chih-Min Chao
@ 2020-06-30  7:31       ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-30  7:31 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, wenmeng_zhang,
	Alistair Francis, Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]



On 2020/6/30 15:20, Chih-Min Chao wrote:
>
>
> On Sat, Jun 27, 2020 at 5:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>     A narrow n-bit operation, where n < FLEN, checks that input operands
>     are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
>     If so, the n least-significant bits of the input are used as the
>     input value,
>     otherwise the input value is treated as an n-bit canonical NaN.
>
>     Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com
>     <mailto:zhiwei_liu@c-sky.com>>
>     ---
>      target/riscv/translate.c | 29 +++++++++++++++++++++++++++++
>      1 file changed, 29 insertions(+)
>
>     diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>     index 4b1534c9a6..1c9b809d4a 100644
>     --- a/target/riscv/translate.c
>     +++ b/target/riscv/translate.c
>     @@ -104,6 +104,35 @@ static void gen_nanbox_fpr(DisasContext *ctx,
>     int regno)
>          }
>      }
>
>     +/*
>     + * A narrow n-bit operation, where n < FLEN, checks that input
>     operands
>     + * are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
>     + * If so, the n least-signicant bits of the input are used as the
>     input value,
>     + * otherwise the input value is treated as an n-bit canonical NaN.
>     + * (riscv-spec-v2.2 Section 9.2).
>     + */
>     +static void check_nanboxed(DisasContext *ctx, int num, ...)
>     +{
>     +    if (has_ext(ctx, RVD)) {
>     +        int i;
>     +        TCGv_i64 cond1 = tcg_temp_new_i64();
>
> forget to remove ?
Oops!  Once I wanted to use tcg_gen_setcond_i64.

Thanks for pointing it out. I will fixed it next patch set.

Zhiwei

>     +        TCGv_i64 t_nan = tcg_const_i64(0x7fc00000);
>     +        TCGv_i64 t_max = tcg_const_i64(MAKE_64BIT_MASK(32, 32));
>     +        va_list valist;
>     +        va_start(valist, num);
>     +
>     +        for (i = 0; i < num; i++) {
>     +            TCGv_i64 t = va_arg(valist, TCGv_i64);
>     +            tcg_gen_movcond_i64(TCG_COND_GEU, t, t, t_max, t, t_nan);
>     +        }
>     +
>     +        va_end(valist);
>     +        tcg_temp_free_i64(cond1);
>
> forget to remove ?
>
>     +        tcg_temp_free_i64(t_nan);
>     +        tcg_temp_free_i64(t_max);
>     +    }
>     +}
>     +
>      static void generate_exception(DisasContext *ctx, int excp)
>      {
>          tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
>     -- 
>     2.23.0
>
>
>  Chih-Min Chao


[-- Attachment #2: Type: text/html, Size: 5247 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing
@ 2020-06-30  7:31       ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-30  7:31 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, wxy194768, wenmeng_zhang, Alistair Francis,
	Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]



On 2020/6/30 15:20, Chih-Min Chao wrote:
>
>
> On Sat, Jun 27, 2020 at 5:05 AM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>     A narrow n-bit operation, where n < FLEN, checks that input operands
>     are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
>     If so, the n least-significant bits of the input are used as the
>     input value,
>     otherwise the input value is treated as an n-bit canonical NaN.
>
>     Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com
>     <mailto:zhiwei_liu@c-sky.com>>
>     ---
>      target/riscv/translate.c | 29 +++++++++++++++++++++++++++++
>      1 file changed, 29 insertions(+)
>
>     diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>     index 4b1534c9a6..1c9b809d4a 100644
>     --- a/target/riscv/translate.c
>     +++ b/target/riscv/translate.c
>     @@ -104,6 +104,35 @@ static void gen_nanbox_fpr(DisasContext *ctx,
>     int regno)
>          }
>      }
>
>     +/*
>     + * A narrow n-bit operation, where n < FLEN, checks that input
>     operands
>     + * are correctly NaN-boxed, i.e., all upper FLEN - n bits are 1.
>     + * If so, the n least-signicant bits of the input are used as the
>     input value,
>     + * otherwise the input value is treated as an n-bit canonical NaN.
>     + * (riscv-spec-v2.2 Section 9.2).
>     + */
>     +static void check_nanboxed(DisasContext *ctx, int num, ...)
>     +{
>     +    if (has_ext(ctx, RVD)) {
>     +        int i;
>     +        TCGv_i64 cond1 = tcg_temp_new_i64();
>
> forget to remove ?
Oops!  Once I wanted to use tcg_gen_setcond_i64.

Thanks for pointing it out. I will fixed it next patch set.

Zhiwei

>     +        TCGv_i64 t_nan = tcg_const_i64(0x7fc00000);
>     +        TCGv_i64 t_max = tcg_const_i64(MAKE_64BIT_MASK(32, 32));
>     +        va_list valist;
>     +        va_start(valist, num);
>     +
>     +        for (i = 0; i < num; i++) {
>     +            TCGv_i64 t = va_arg(valist, TCGv_i64);
>     +            tcg_gen_movcond_i64(TCG_COND_GEU, t, t, t_max, t, t_nan);
>     +        }
>     +
>     +        va_end(valist);
>     +        tcg_temp_free_i64(cond1);
>
> forget to remove ?
>
>     +        tcg_temp_free_i64(t_nan);
>     +        tcg_temp_free_i64(t_max);
>     +    }
>     +}
>     +
>      static void generate_exception(DisasContext *ctx, int excp)
>      {
>          tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
>     -- 
>     2.23.0
>
>
>  Chih-Min Chao


[-- Attachment #2: Type: text/html, Size: 5247 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
  2020-06-26 20:59   ` LIU Zhiwei
@ 2020-06-30  7:31     ` Chih-Min Chao
  -1 siblings, 0 replies; 40+ messages in thread
From: Chih-Min Chao @ 2020-06-30  7:31 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, wenmeng_zhang,
	Alistair Francis, Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 18302 bytes --]

On Sat, Jun 27, 2020 at 5:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
>  target/riscv/insn_trans/trans_rvf.inc.c | 272 ++++++++++++++++++++----
>  2 files changed, 235 insertions(+), 44 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
> b/target/riscv/insn_trans/trans_rvd.inc.c
> index c0f4a0c789..16947ea6da 100644
> --- a/target/riscv/insn_trans/trans_rvd.inc.c
> +++ b/target/riscv/insn_trans/trans_rvd.inc.c
> @@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext *ctx,
> arg_fcvt_d_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVD);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
>
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 04bc8e5cb5..b0379b9d1f 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx,
> arg_fmadd_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> +    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx,
> arg_fmsub_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> +    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx,
> arg_fnmsub_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> +    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx,
> arg_fnmadd_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> -    mark_fs_dirty(ctx);
> +    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
> +    mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx,
> arg_fadd_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx,
> arg_fsub_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx,
> arg_fmul_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx,
> arg_fdiv_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx,
> arg_fsqrt_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
>      } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
> -                            0, 31);
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
>      }
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext *ctx,
> arg_fsgnjn_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> +        tcg_gen_not_i64(t0, t2);
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
>          tcg_temp_free_i64(t0);
>      }
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext *ctx,
> arg_fsgnjx_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> +        tcg_gen_andi_i64(t0, t2, INT32_MIN);
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
>          tcg_temp_free_i64(t0);
>      }
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx,
> arg_fmin_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> -    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                      cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx,
> arg_fmax_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> -    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                      cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext *ctx,
> arg_fcvt_w_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_w_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> -    tcg_temp_free(t0);
>
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext *ctx,
> arg_fcvt_wu_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> -    tcg_temp_free(t0);
>
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx, arg_feq_s
> *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      TCGv t0 = tcg_temp_new();
> -    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_feq_s(t0, cpu_env, t1, t2);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx, arg_flt_s
> *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      TCGv t0 = tcg_temp_new();
> -    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_flt_s(t0, cpu_env, t1, t2);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx, arg_fle_s
> *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      TCGv t0 = tcg_temp_new();
> -    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_fle_s(t0, cpu_env, t1, t2);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext *ctx,
> arg_fclass_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
>
> -    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
> -
> +    gen_helper_fclass_s(t0, t1);
>      gen_set_gpr(a->rd, t0);
> -    tcg_temp_free(t0);
>
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext *ctx,
> arg_fcvt_l_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_l_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext *ctx,
> arg_fcvt_lu_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> --
> 2.23.0
>
>
It may be more readable to use local macro to wrap allocation and free of
tcg temp variables. Most functions are two-operands,
some requires one and the other needs three operands.  They may be like

#define GEN_ONE_OPERAND \

      TCGv_i64 t1 = tcg_temp_new_i64(); \
      tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
      check_nanboxed(ctx, 1, t1);


 #define GEN_TWO_OPERAND \

      TCGv_i64 t1 = tcg_temp_new_i64(); \

      TCGv_i64 t2 = tcg_temp_new_i64(); \

      tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
      tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
      check_nanboxed(ctx, 2, t1, t2);


  #define GEN_THREE_OPERAND \

      TCGv_i64 t1 = tcg_temp_new_i64(); \

      TCGv_i64 t2 = tcg_temp_new_i64(); \

      TCGv_i64 t3 = tcg_temp_new_i64(); \

      tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
      tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
      tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]); \
      check_nanboxed(ctx, 3, t1, t2, t3);


  #define FREE_ONE_OPERAND \

      tcg_temp_free_i64(t1);



  #define FREE_TWO_OPERAND \

      tcg_temp_free_i64(t1); \

      tcg_temp_free_i64(t2);



  #define FREE_THREE_OPERAND \

      tcg_temp_free_i64(t1); \

      tcg_temp_free_i64(t2); \

      tcg_temp_free_i64(t3);

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 23877 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
@ 2020-06-30  7:31     ` Chih-Min Chao
  0 siblings, 0 replies; 40+ messages in thread
From: Chih-Min Chao @ 2020-06-30  7:31 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, wxy194768, wenmeng_zhang, Alistair Francis,
	Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 18302 bytes --]

On Sat, Jun 27, 2020 at 5:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
>  target/riscv/insn_trans/trans_rvf.inc.c | 272 ++++++++++++++++++++----
>  2 files changed, 235 insertions(+), 44 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
> b/target/riscv/insn_trans/trans_rvd.inc.c
> index c0f4a0c789..16947ea6da 100644
> --- a/target/riscv/insn_trans/trans_rvd.inc.c
> +++ b/target/riscv/insn_trans/trans_rvd.inc.c
> @@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext *ctx,
> arg_fcvt_d_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVD);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
>
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 04bc8e5cb5..b0379b9d1f 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx,
> arg_fmadd_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> +    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx,
> arg_fmsub_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> +    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx,
> arg_fnmsub_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> +    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx,
> arg_fnmadd_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    TCGv_i64 t3 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
> +    check_nanboxed(ctx, 3, t1, t2, t3);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
> -    mark_fs_dirty(ctx);
> +    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>      gen_nanbox_fpr(ctx, a->rd);
> +
> +    mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +    tcg_temp_free_i64(t3);
>      return true;
>  }
>
> @@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx,
> arg_fadd_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx,
> arg_fsub_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx,
> arg_fmul_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx,
> arg_fdiv_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx,
> arg_fsqrt_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
>      } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
> -                            0, 31);
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
>      }
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext *ctx,
> arg_fsgnjn_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> +        tcg_gen_not_i64(t0, t2);
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
>          tcg_temp_free_i64(t0);
>      }
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext *ctx,
> arg_fsgnjx_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
>      if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> +        tcg_gen_andi_i64(t0, t2, INT32_MIN);
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
>          tcg_temp_free_i64(t0);
>      }
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx,
> arg_fmin_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> -    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                      cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx,
> arg_fmax_s *a)
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
>
> -    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
> -                      cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>      gen_nanbox_fpr(ctx, a->rd);
> +
>      mark_fs_dirty(ctx);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext *ctx,
> arg_fcvt_w_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_w_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> -    tcg_temp_free(t0);
>
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext *ctx,
> arg_fcvt_wu_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> -    tcg_temp_free(t0);
>
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx, arg_feq_s
> *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      TCGv t0 = tcg_temp_new();
> -    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_feq_s(t0, cpu_env, t1, t2);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx, arg_flt_s
> *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      TCGv t0 = tcg_temp_new();
> -    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_flt_s(t0, cpu_env, t1, t2);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx, arg_fle_s
> *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      TCGv t0 = tcg_temp_new();
> -    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
> +    check_nanboxed(ctx, 2, t1, t2);
> +
> +    gen_helper_fle_s(t0, cpu_env, t1, t2);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>      return true;
>  }
>
> @@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext *ctx,
> arg_fclass_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
>
> -    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
> -
> +    gen_helper_fclass_s(t0, t1);
>      gen_set_gpr(a->rd, t0);
> -    tcg_temp_free(t0);
>
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext *ctx,
> arg_fcvt_l_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_l_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> @@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext *ctx,
> arg_fcvt_lu_s *a)
>      REQUIRE_EXT(ctx, RVF);
>
>      TCGv t0 = tcg_temp_new();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
> +    check_nanboxed(ctx, 1, t1);
> +
>      gen_set_rm(ctx, a->rm);
> -    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
> +    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
>      gen_set_gpr(a->rd, t0);
> +
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>      return true;
>  }
>
> --
> 2.23.0
>
>
It may be more readable to use local macro to wrap allocation and free of
tcg temp variables. Most functions are two-operands,
some requires one and the other needs three operands.  They may be like

#define GEN_ONE_OPERAND \

      TCGv_i64 t1 = tcg_temp_new_i64(); \
      tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
      check_nanboxed(ctx, 1, t1);


 #define GEN_TWO_OPERAND \

      TCGv_i64 t1 = tcg_temp_new_i64(); \

      TCGv_i64 t2 = tcg_temp_new_i64(); \

      tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
      tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
      check_nanboxed(ctx, 2, t1, t2);


  #define GEN_THREE_OPERAND \

      TCGv_i64 t1 = tcg_temp_new_i64(); \

      TCGv_i64 t2 = tcg_temp_new_i64(); \

      TCGv_i64 t3 = tcg_temp_new_i64(); \

      tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
      tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
      tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]); \
      check_nanboxed(ctx, 3, t1, t2, t3);


  #define FREE_ONE_OPERAND \

      tcg_temp_free_i64(t1);



  #define FREE_TWO_OPERAND \

      tcg_temp_free_i64(t1); \

      tcg_temp_free_i64(t2);



  #define FREE_THREE_OPERAND \

      tcg_temp_free_i64(t1); \

      tcg_temp_free_i64(t2); \

      tcg_temp_free_i64(t3);

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 23877 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
  2020-06-30  7:31     ` Chih-Min Chao
@ 2020-06-30  7:37       ` LIU Zhiwei
  -1 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-30  7:37 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, wenmeng_zhang,
	Alistair Francis, Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 21785 bytes --]



On 2020/6/30 15:31, Chih-Min Chao wrote:
> On Sat, Jun 27, 2020 at 5:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>     Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com
>     <mailto:zhiwei_liu@c-sky.com>>
>     ---
>      target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
>      target/riscv/insn_trans/trans_rvf.inc.c | 272
>     ++++++++++++++++++++----
>      2 files changed, 235 insertions(+), 44 deletions(-)
>
>     diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
>     b/target/riscv/insn_trans/trans_rvd.inc.c
>     index c0f4a0c789..16947ea6da 100644
>     --- a/target/riscv/insn_trans/trans_rvd.inc.c
>     +++ b/target/riscv/insn_trans/trans_rvd.inc.c
>     @@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext
>     *ctx, arg_fcvt_d_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVD);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
>
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>     b/target/riscv/insn_trans/trans_rvf.inc.c
>     index 04bc8e5cb5..b0379b9d1f 100644
>     --- a/target/riscv/insn_trans/trans_rvf.inc.c
>     +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>     @@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx,
>     arg_fmadd_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     +    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx,
>     arg_fmsub_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     +    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx,
>     arg_fnmsub_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     +    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx,
>     arg_fnmadd_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     -    mark_fs_dirty(ctx);
>     +    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>     +    mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx,
>     arg_fadd_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx,
>     arg_fsub_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx,
>     arg_fmul_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx,
>     arg_fdiv_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx,
>     arg_fsqrt_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx,
>     arg_fsgnj_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          if (a->rs1 == a->rs2) { /* FMOV */
>     -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>     +        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
>          } else { /* FSGNJ */
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>     cpu_fpr[a->rs1],
>     -                            0, 31);
>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
>          }
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext
>     *ctx, arg_fsgnjn_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          if (a->rs1 == a->rs2) { /* FNEG */
>     -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
>     +        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
>          } else {
>              TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1],
>     0, 31);
>     +        tcg_gen_not_i64(t0, t2);
>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
>              tcg_temp_free_i64(t0);
>          }
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext
>     *ctx, arg_fsgnjx_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          if (a->rs1 == a->rs2) { /* FABS */
>     -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>     ~INT32_MIN);
>     +        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
>          } else {
>              TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>     -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>     +        tcg_gen_andi_i64(t0, t2, INT32_MIN);
>     +        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
>              tcg_temp_free_i64(t0);
>          }
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx,
>     arg_fmin_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     -    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                      cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx,
>     arg_fmax_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     -    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                      cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext
>     *ctx, arg_fcvt_w_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_w_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     -    tcg_temp_free(t0);
>
>     +    tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext
>     *ctx, arg_fcvt_wu_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     -    tcg_temp_free(t0);
>
>     +    tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx,
>     arg_feq_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          TCGv t0 = tcg_temp_new();
>     -    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_feq_s(t0, cpu_env, t1, t2);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx,
>     arg_flt_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          TCGv t0 = tcg_temp_new();
>     -    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_flt_s(t0, cpu_env, t1, t2);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx,
>     arg_fle_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          TCGv t0 = tcg_temp_new();
>     -    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_fle_s(t0, cpu_env, t1, t2);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext
>     *ctx, arg_fclass_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>
>     -    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
>     -
>     +    gen_helper_fclass_s(t0, t1);
>          gen_set_gpr(a->rd, t0);
>     -    tcg_temp_free(t0);
>
>     +    tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext
>     *ctx, arg_fcvt_l_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_l_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext
>     *ctx, arg_fcvt_lu_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     -- 
>     2.23.0
>
>
> It may be more readable to use local macro to wrap allocation and free 
> of tcg temp variables. Most functions are two-operands,
> some requires one and the other needs three operands. They may be like
>
> #define GEN_ONE_OPERAND \
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       check_nanboxed(ctx, 1, t1);
>
>  #define GEN_TWO_OPERAND \
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       check_nanboxed(ctx, 2, t1, t2);
>
>   #define GEN_THREE_OPERAND \
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>       TCGv_i64 t3 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]); \
>       check_nanboxed(ctx, 3, t1, t2, t3);
>
>   #define FREE_ONE_OPERAND \
>       tcg_temp_free_i64(t1);
>
>   #define FREE_TWO_OPERAND \
>       tcg_temp_free_i64(t1); \
>       tcg_temp_free_i64(t2);
>
>   #define FREE_THREE_OPERAND \
>       tcg_temp_free_i64(t1); \
>       tcg_temp_free_i64(t2); \
>       tcg_temp_free_i64(t3);
>
Good.

Do you think inline function will be better? I just don't like many 
MACROS in one function.

Zhiwei
> Chih-Min Chao


[-- Attachment #2: Type: text/html, Size: 32255 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
@ 2020-06-30  7:37       ` LIU Zhiwei
  0 siblings, 0 replies; 40+ messages in thread
From: LIU Zhiwei @ 2020-06-30  7:37 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, wxy194768, wenmeng_zhang, Alistair Francis,
	Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 21785 bytes --]



On 2020/6/30 15:31, Chih-Min Chao wrote:
> On Sat, Jun 27, 2020 at 5:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>     Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com
>     <mailto:zhiwei_liu@c-sky.com>>
>     ---
>      target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
>      target/riscv/insn_trans/trans_rvf.inc.c | 272
>     ++++++++++++++++++++----
>      2 files changed, 235 insertions(+), 44 deletions(-)
>
>     diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
>     b/target/riscv/insn_trans/trans_rvd.inc.c
>     index c0f4a0c789..16947ea6da 100644
>     --- a/target/riscv/insn_trans/trans_rvd.inc.c
>     +++ b/target/riscv/insn_trans/trans_rvd.inc.c
>     @@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext
>     *ctx, arg_fcvt_d_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVD);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
>
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>     b/target/riscv/insn_trans/trans_rvf.inc.c
>     index 04bc8e5cb5..b0379b9d1f 100644
>     --- a/target/riscv/insn_trans/trans_rvf.inc.c
>     +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>     @@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx,
>     arg_fmadd_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     +    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx,
>     arg_fmsub_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     +    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx,
>     arg_fnmsub_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     +    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx,
>     arg_fnmadd_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    TCGv_i64 t3 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>     +    check_nanboxed(ctx, 3, t1, t2, t3);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>     -    mark_fs_dirty(ctx);
>     +    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>     +    mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>     +    tcg_temp_free_i64(t3);
>          return true;
>      }
>
>     @@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx,
>     arg_fadd_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx,
>     arg_fsub_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx,
>     arg_fmul_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx,
>     arg_fdiv_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
>     -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx,
>     arg_fsqrt_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx,
>     arg_fsgnj_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          if (a->rs1 == a->rs2) { /* FMOV */
>     -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>     +        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
>          } else { /* FSGNJ */
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>     cpu_fpr[a->rs1],
>     -                            0, 31);
>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
>          }
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext
>     *ctx, arg_fsgnjn_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          if (a->rs1 == a->rs2) { /* FNEG */
>     -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
>     +        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
>          } else {
>              TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1],
>     0, 31);
>     +        tcg_gen_not_i64(t0, t2);
>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
>              tcg_temp_free_i64(t0);
>          }
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext
>     *ctx, arg_fsgnjx_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>          if (a->rs1 == a->rs2) { /* FABS */
>     -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>     ~INT32_MIN);
>     +        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
>          } else {
>              TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>     -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>     +        tcg_gen_andi_i64(t0, t2, INT32_MIN);
>     +        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
>              tcg_temp_free_i64(t0);
>          }
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx,
>     arg_fmin_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     -    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                      cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx,
>     arg_fmax_s *a)
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>
>     -    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>     -                      cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>          gen_nanbox_fpr(ctx, a->rd);
>     +
>          mark_fs_dirty(ctx);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext
>     *ctx, arg_fcvt_w_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_w_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     -    tcg_temp_free(t0);
>
>     +    tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext
>     *ctx, arg_fcvt_wu_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     -    tcg_temp_free(t0);
>
>     +    tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx,
>     arg_feq_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          TCGv t0 = tcg_temp_new();
>     -    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_feq_s(t0, cpu_env, t1, t2);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx,
>     arg_flt_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          TCGv t0 = tcg_temp_new();
>     -    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_flt_s(t0, cpu_env, t1, t2);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx,
>     arg_fle_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          TCGv t0 = tcg_temp_new();
>     -    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    TCGv_i64 t2 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>     +    check_nanboxed(ctx, 2, t1, t2);
>     +
>     +    gen_helper_fle_s(t0, cpu_env, t1, t2);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>     +    tcg_temp_free_i64(t2);
>          return true;
>      }
>
>     @@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext
>     *ctx, arg_fclass_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>
>     -    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
>     -
>     +    gen_helper_fclass_s(t0, t1);
>          gen_set_gpr(a->rd, t0);
>     -    tcg_temp_free(t0);
>
>     +    tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext
>     *ctx, arg_fcvt_l_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_l_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     @@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext
>     *ctx, arg_fcvt_lu_s *a)
>          REQUIRE_EXT(ctx, RVF);
>
>          TCGv t0 = tcg_temp_new();
>     +    TCGv_i64 t1 = tcg_temp_new_i64();
>     +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>     +    check_nanboxed(ctx, 1, t1);
>     +
>          gen_set_rm(ctx, a->rm);
>     -    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>     +    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
>          gen_set_gpr(a->rd, t0);
>     +
>          tcg_temp_free(t0);
>     +    tcg_temp_free_i64(t1);
>          return true;
>      }
>
>     -- 
>     2.23.0
>
>
> It may be more readable to use local macro to wrap allocation and free 
> of tcg temp variables. Most functions are two-operands,
> some requires one and the other needs three operands. They may be like
>
> #define GEN_ONE_OPERAND \
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       check_nanboxed(ctx, 1, t1);
>
>  #define GEN_TWO_OPERAND \
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       check_nanboxed(ctx, 2, t1, t2);
>
>   #define GEN_THREE_OPERAND \
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>       TCGv_i64 t3 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]); \
>       check_nanboxed(ctx, 3, t1, t2, t3);
>
>   #define FREE_ONE_OPERAND \
>       tcg_temp_free_i64(t1);
>
>   #define FREE_TWO_OPERAND \
>       tcg_temp_free_i64(t1); \
>       tcg_temp_free_i64(t2);
>
>   #define FREE_THREE_OPERAND \
>       tcg_temp_free_i64(t1); \
>       tcg_temp_free_i64(t2); \
>       tcg_temp_free_i64(t3);
>
Good.

Do you think inline function will be better? I just don't like many 
MACROS in one function.

Zhiwei
> Chih-Min Chao


[-- Attachment #2: Type: text/html, Size: 32255 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
  2020-06-30  7:37       ` LIU Zhiwei
@ 2020-07-02  6:29         ` Chih-Min Chao
  -1 siblings, 0 replies; 40+ messages in thread
From: Chih-Min Chao @ 2020-07-02  6:29 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: open list:RISC-V, Richard Henderson,
	qemu-devel@nongnu.org Developers, wxy194768, wenmeng_zhang,
	Alistair Francis, Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 19516 bytes --]

On Tue, Jun 30, 2020 at 3:37 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

>
>
> On 2020/6/30 15:31, Chih-Min Chao wrote:
>
> On Sat, Jun 27, 2020 at 5:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>  target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
>>  target/riscv/insn_trans/trans_rvf.inc.c | 272 ++++++++++++++++++++----
>>  2 files changed, 235 insertions(+), 44 deletions(-)
>>
>> diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
>> b/target/riscv/insn_trans/trans_rvd.inc.c
>> index c0f4a0c789..16947ea6da 100644
>> --- a/target/riscv/insn_trans/trans_rvd.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvd.inc.c
>> @@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext *ctx,
>> arg_fcvt_d_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVD);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
>>
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>> b/target/riscv/insn_trans/trans_rvf.inc.c
>> index 04bc8e5cb5..b0379b9d1f 100644
>> --- a/target/riscv/insn_trans/trans_rvf.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>> @@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx,
>> arg_fmadd_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> +    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx,
>> arg_fmsub_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> +    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx,
>> arg_fnmsub_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> +    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx,
>> arg_fnmadd_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> -    mark_fs_dirty(ctx);
>> +    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>> +    mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx,
>> arg_fadd_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx,
>> arg_fsub_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx,
>> arg_fmul_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx,
>> arg_fdiv_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx,
>> arg_fsqrt_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx,
>> arg_fsgnj_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      if (a->rs1 == a->rs2) { /* FMOV */
>> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>> +        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
>>      } else { /* FSGNJ */
>> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>> cpu_fpr[a->rs1],
>> -                            0, 31);
>> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
>>      }
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext *ctx,
>> arg_fsgnjn_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      if (a->rs1 == a->rs2) { /* FNEG */
>> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
>> +        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
>>      } else {
>>          TCGv_i64 t0 = tcg_temp_new_i64();
>> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
>> +        tcg_gen_not_i64(t0, t2);
>> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
>
>  t0 is not necessary but t2 could be leveraged directly.

+        tcg_gen_not_i64(t2, t2);
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);

>          tcg_temp_free_i64(t0);
>>      }
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext *ctx,
>> arg_fsgnjx_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      if (a->rs1 == a->rs2) { /* FABS */
>> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
>> +        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
>>      } else {
>>          TCGv_i64 t0 = tcg_temp_new_i64();
>> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>> +        tcg_gen_andi_i64(t0, t2, INT32_MIN);
>> +        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
>>          tcg_temp_free_i64(t0);
>>
> the same as above,  t0 could be removed

>      }
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx,
>> arg_fmin_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> -    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                      cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx,
>> arg_fmax_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> -    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                      cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext *ctx,
>> arg_fcvt_w_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_w_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> -    tcg_temp_free(t0);
>>
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext *ctx,
>> arg_fcvt_wu_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> -    tcg_temp_free(t0);
>>
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx,
>> arg_feq_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>>      TCGv t0 = tcg_temp_new();
>> -    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_feq_s(t0, cpu_env, t1, t2);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx,
>> arg_flt_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>>      TCGv t0 = tcg_temp_new();
>> -    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_flt_s(t0, cpu_env, t1, t2);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx,
>> arg_fle_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>>      TCGv t0 = tcg_temp_new();
>> -    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_fle_s(t0, cpu_env, t1, t2);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext *ctx,
>> arg_fclass_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>>
>> -    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
>> -
>> +    gen_helper_fclass_s(t0, t1);
>>      gen_set_gpr(a->rd, t0);
>> -    tcg_temp_free(t0);
>>
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext *ctx,
>> arg_fcvt_l_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_l_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext *ctx,
>> arg_fcvt_lu_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> --
>> 2.23.0
>>
>>
> It may be more readable to use local macro to wrap allocation and free of
> tcg temp variables. Most functions are two-operands,
> some requires one and the other needs three operands.  They may be like
>
> #define GEN_ONE_OPERAND \
>
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       check_nanboxed(ctx, 1, t1);
>
>
>  #define GEN_TWO_OPERAND \
>
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       check_nanboxed(ctx, 2, t1, t2);
>
>
>   #define GEN_THREE_OPERAND \
>
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>
>       TCGv_i64 t3 = tcg_temp_new_i64(); \
>
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]); \
>       check_nanboxed(ctx, 3, t1, t2, t3);
>
>
>   #define FREE_ONE_OPERAND \
>
>       tcg_temp_free_i64(t1);
>
>
>
>   #define FREE_TWO_OPERAND \
>
>       tcg_temp_free_i64(t1); \
>
>       tcg_temp_free_i64(t2);
>
>
>
>   #define FREE_THREE_OPERAND \
>
>       tcg_temp_free_i64(t1); \
>
>       tcg_temp_free_i64(t2); \
>
>       tcg_temp_free_i64(t3);
>
> Good.
>
> Do you think inline function will be better? I just don't like many MACROS
> in one function.
>
> Zhiwei
>
> Chih-Min Chao
>
> Either macro or inline is ok to me.  Just want an simple wrapping of
duplicated tcg new and free.

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 34108 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN
@ 2020-07-02  6:29         ` Chih-Min Chao
  0 siblings, 0 replies; 40+ messages in thread
From: Chih-Min Chao @ 2020-07-02  6:29 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel@nongnu.org Developers, open list:RISC-V,
	Richard Henderson, wxy194768, wenmeng_zhang, Alistair Francis,
	Palmer Dabbelt, Ian Jiang

[-- Attachment #1: Type: text/plain, Size: 19516 bytes --]

On Tue, Jun 30, 2020 at 3:37 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

>
>
> On 2020/6/30 15:31, Chih-Min Chao wrote:
>
> On Sat, Jun 27, 2020 at 5:09 AM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
>> ---
>>  target/riscv/insn_trans/trans_rvd.inc.c |   7 +-
>>  target/riscv/insn_trans/trans_rvf.inc.c | 272 ++++++++++++++++++++----
>>  2 files changed, 235 insertions(+), 44 deletions(-)
>>
>> diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
>> b/target/riscv/insn_trans/trans_rvd.inc.c
>> index c0f4a0c789..16947ea6da 100644
>> --- a/target/riscv/insn_trans/trans_rvd.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvd.inc.c
>> @@ -241,10 +241,15 @@ static bool trans_fcvt_d_s(DisasContext *ctx,
>> arg_fcvt_d_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVD);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_d_s(cpu_fpr[a->rd], cpu_env, t1);
>>
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>> b/target/riscv/insn_trans/trans_rvf.inc.c
>> index 04bc8e5cb5..b0379b9d1f 100644
>> --- a/target/riscv/insn_trans/trans_rvf.inc.c
>> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>> @@ -58,11 +58,23 @@ static bool trans_fmadd_s(DisasContext *ctx,
>> arg_fmadd_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> +    gen_helper_fmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -70,11 +82,23 @@ static bool trans_fmsub_s(DisasContext *ctx,
>> arg_fmsub_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                       cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> +    gen_helper_fmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -82,11 +106,23 @@ static bool trans_fnmsub_s(DisasContext *ctx,
>> arg_fnmsub_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> +    gen_helper_fnmsub_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -94,11 +130,23 @@ static bool trans_fnmadd_s(DisasContext *ctx,
>> arg_fnmadd_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    TCGv_i64 t3 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]);
>> +    check_nanboxed(ctx, 3, t1, t2, t3);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                        cpu_fpr[a->rs2], cpu_fpr[a->rs3]);
>> -    mark_fs_dirty(ctx);
>> +    gen_helper_fnmadd_s(cpu_fpr[a->rd], cpu_env, t1, t2, t3);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>> +    mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>> +    tcg_temp_free_i64(t3);
>>      return true;
>>  }
>>
>> @@ -107,11 +155,19 @@ static bool trans_fadd_s(DisasContext *ctx,
>> arg_fadd_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fadd_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -120,11 +176,19 @@ static bool trans_fsub_s(DisasContext *ctx,
>> arg_fsub_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fsub_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -133,11 +197,19 @@ static bool trans_fmul_s(DisasContext *ctx,
>> arg_fmul_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fmul_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -146,11 +218,19 @@ static bool trans_fdiv_s(DisasContext *ctx,
>> arg_fdiv_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env,
>> -                      cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    gen_helper_fdiv_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -159,10 +239,16 @@ static bool trans_fsqrt_s(DisasContext *ctx,
>> arg_fsqrt_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fsqrt_s(cpu_fpr[a->rd], cpu_env, t1);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -170,14 +256,23 @@ static bool trans_fsgnj_s(DisasContext *ctx,
>> arg_fsgnj_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      if (a->rs1 == a->rs2) { /* FMOV */
>> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>> +        tcg_gen_mov_i64(cpu_fpr[a->rd], t1);
>>      } else { /* FSGNJ */
>> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>> cpu_fpr[a->rs1],
>> -                            0, 31);
>> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);
>>      }
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -185,16 +280,26 @@ static bool trans_fsgnjn_s(DisasContext *ctx,
>> arg_fsgnjn_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      if (a->rs1 == a->rs2) { /* FNEG */
>> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
>> +        tcg_gen_xori_i64(cpu_fpr[a->rd], t1, INT32_MIN);
>>      } else {
>>          TCGv_i64 t0 = tcg_temp_new_i64();
>> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
>> +        tcg_gen_not_i64(t0, t2);
>> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, t1, 0, 31);
>
>  t0 is not necessary but t2 could be leveraged directly.

+        tcg_gen_not_i64(t2, t2);
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], t2, t1, 0, 31);

>          tcg_temp_free_i64(t0);
>>      }
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -202,16 +307,26 @@ static bool trans_fsgnjx_s(DisasContext *ctx,
>> arg_fsgnjx_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>>      if (a->rs1 == a->rs2) { /* FABS */
>> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
>> +        tcg_gen_andi_i64(cpu_fpr[a->rd], t1, ~INT32_MIN);
>>      } else {
>>          TCGv_i64 t0 = tcg_temp_new_i64();
>> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>> +        tcg_gen_andi_i64(t0, t2, INT32_MIN);
>> +        tcg_gen_xor_i64(cpu_fpr[a->rd], t1, t0);
>>          tcg_temp_free_i64(t0);
>>
> the same as above,  t0 could be removed

>      }
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -220,10 +335,18 @@ static bool trans_fmin_s(DisasContext *ctx,
>> arg_fmin_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> -    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                      cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_fmin_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -232,10 +355,18 @@ static bool trans_fmax_s(DisasContext *ctx,
>> arg_fmax_s *a)
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>>
>> -    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, cpu_fpr[a->rs1],
>> -                      cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_fmax_s(cpu_fpr[a->rd], cpu_env, t1, t2);
>>      gen_nanbox_fpr(ctx, a->rd);
>> +
>>      mark_fs_dirty(ctx);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -245,11 +376,16 @@ static bool trans_fcvt_w_s(DisasContext *ctx,
>> arg_fcvt_w_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_w_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> -    tcg_temp_free(t0);
>>
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -259,11 +395,16 @@ static bool trans_fcvt_wu_s(DisasContext *ctx,
>> arg_fcvt_wu_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_wu_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> -    tcg_temp_free(t0);
>>
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -291,10 +432,20 @@ static bool trans_feq_s(DisasContext *ctx,
>> arg_feq_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>>      TCGv t0 = tcg_temp_new();
>> -    gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_feq_s(t0, cpu_env, t1, t2);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -302,10 +453,20 @@ static bool trans_flt_s(DisasContext *ctx,
>> arg_flt_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>>      TCGv t0 = tcg_temp_new();
>> -    gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_flt_s(t0, cpu_env, t1, t2);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -313,10 +474,20 @@ static bool trans_fle_s(DisasContext *ctx,
>> arg_fle_s *a)
>>  {
>>      REQUIRE_FPU;
>>      REQUIRE_EXT(ctx, RVF);
>> +
>>      TCGv t0 = tcg_temp_new();
>> -    gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    TCGv_i64 t2 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]);
>> +    check_nanboxed(ctx, 2, t1, t2);
>> +
>> +    gen_helper_fle_s(t0, cpu_env, t1, t2);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>      return true;
>>  }
>>
>> @@ -326,12 +497,15 @@ static bool trans_fclass_s(DisasContext *ctx,
>> arg_fclass_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>>
>> -    gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
>> -
>> +    gen_helper_fclass_s(t0, t1);
>>      gen_set_gpr(a->rd, t0);
>> -    tcg_temp_free(t0);
>>
>> +    tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -400,10 +574,16 @@ static bool trans_fcvt_l_s(DisasContext *ctx,
>> arg_fcvt_l_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_l_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_l_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> @@ -413,10 +593,16 @@ static bool trans_fcvt_lu_s(DisasContext *ctx,
>> arg_fcvt_lu_s *a)
>>      REQUIRE_EXT(ctx, RVF);
>>
>>      TCGv t0 = tcg_temp_new();
>> +    TCGv_i64 t1 = tcg_temp_new_i64();
>> +    tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]);
>> +    check_nanboxed(ctx, 1, t1);
>> +
>>      gen_set_rm(ctx, a->rm);
>> -    gen_helper_fcvt_lu_s(t0, cpu_env, cpu_fpr[a->rs1]);
>> +    gen_helper_fcvt_lu_s(t0, cpu_env, t1);
>>      gen_set_gpr(a->rd, t0);
>> +
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>      return true;
>>  }
>>
>> --
>> 2.23.0
>>
>>
> It may be more readable to use local macro to wrap allocation and free of
> tcg temp variables. Most functions are two-operands,
> some requires one and the other needs three operands.  They may be like
>
> #define GEN_ONE_OPERAND \
>
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       check_nanboxed(ctx, 1, t1);
>
>
>  #define GEN_TWO_OPERAND \
>
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       check_nanboxed(ctx, 2, t1, t2);
>
>
>   #define GEN_THREE_OPERAND \
>
>       TCGv_i64 t1 = tcg_temp_new_i64(); \
>
>       TCGv_i64 t2 = tcg_temp_new_i64(); \
>
>       TCGv_i64 t3 = tcg_temp_new_i64(); \
>
>       tcg_gen_mov_i64(t1, cpu_fpr[a->rs1]); \
>       tcg_gen_mov_i64(t2, cpu_fpr[a->rs2]); \
>       tcg_gen_mov_i64(t3, cpu_fpr[a->rs3]); \
>       check_nanboxed(ctx, 3, t1, t2, t3);
>
>
>   #define FREE_ONE_OPERAND \
>
>       tcg_temp_free_i64(t1);
>
>
>
>   #define FREE_TWO_OPERAND \
>
>       tcg_temp_free_i64(t1); \
>
>       tcg_temp_free_i64(t2);
>
>
>
>   #define FREE_THREE_OPERAND \
>
>       tcg_temp_free_i64(t1); \
>
>       tcg_temp_free_i64(t2); \
>
>       tcg_temp_free_i64(t3);
>
> Good.
>
> Do you think inline function will be better? I just don't like many MACROS
> in one function.
>
> Zhiwei
>
> Chih-Min Chao
>
> Either macro or inline is ok to me.  Just want an simple wrapping of
duplicated tcg new and free.

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 34108 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/6] target/riscv: move gen_nanbox_fpr to translate.c
  2020-06-26 20:59   ` LIU Zhiwei
@ 2020-07-02 17:13     ` Richard Henderson
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:13 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: palmer, wenmeng_zhang, Alistair.Francis, ianjiang.ict, wxy194768

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> As this function will be used by fcvt.d.s in trans_rvd.inc.c,
> make it a visible function for RVF and RVD.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 14 --------------
>  target/riscv/translate.c                | 14 ++++++++++++++
>  2 files changed, 14 insertions(+), 14 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/6] target/riscv: move gen_nanbox_fpr to translate.c
@ 2020-07-02 17:13     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:13 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, wenmeng_zhang, wxy194768, ianjiang.ict

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> As this function will be used by fcvt.d.s in trans_rvd.inc.c,
> make it a visible function for RVF and RVD.
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 14 --------------
>  target/riscv/translate.c                | 14 ++++++++++++++
>  2 files changed, 14 insertions(+), 14 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/6] target/riscv: check before allocating TCG temps
  2020-06-26 20:59   ` LIU Zhiwei
@ 2020-07-02 17:13     ` Richard Henderson
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:13 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: palmer, wenmeng_zhang, Alistair.Francis, ianjiang.ict, wxy194768

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
>  target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
>  2 files changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/6] target/riscv: check before allocating TCG temps
@ 2020-07-02 17:13     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:13 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, wenmeng_zhang, wxy194768, ianjiang.ict

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
>  target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
>  2 files changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] target/riscv: NaN-boxing compute, sign-injection and convert instructions.
  2020-06-26 20:59   ` LIU Zhiwei
@ 2020-07-02 17:15     ` Richard Henderson
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:15 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: palmer, wenmeng_zhang, Alistair.Francis, ianjiang.ict, wxy194768

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> An n-bit foating-point result is written to the n least-significant bits
> of the destination f register, with all 1s written to the uppermost
> FLEN - n bits to yield a legal NaN-boxed value
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c |  1 +
>  target/riscv/insn_trans/trans_rvf.inc.c | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/6] target/riscv: NaN-boxing compute, sign-injection and convert instructions.
@ 2020-07-02 17:15     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:15 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, wenmeng_zhang, wxy194768, ianjiang.ict

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> An n-bit foating-point result is written to the n least-significant bits
> of the destination f register, with all 1s written to the uppermost
> FLEN - n bits to yield a legal NaN-boxed value
> 
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c |  1 +
>  target/riscv/insn_trans/trans_rvf.inc.c | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
  2020-06-26 20:59 ` LIU Zhiwei
@ 2020-07-02 17:37   ` Richard Henderson
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:37 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: palmer, wenmeng_zhang, Alistair.Francis, ianjiang.ict, wxy194768

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> Multiple precison shoule be supported by NaN-boxing. That means, we should
> flush not valid NaN-boxing input to canonical NaN before effective
> calculation and we should NaN-boxing the result after the effective
> calculation.
> 
> In this patch set, split the implementation to three steps for compute,
> sign-injection, and some covert insns, which are check_nanboxed,
> effective calculation and gen_nanbox_fpr.
> 
> Check_nanboxed checks the inputs and flushes not valid inputs to cancical NaN.
> Effective calculation is direct calculation on fp32 values.
> Gen_nanbox_fpr does the NaN-boxing, writing the 1s to upper 32 bits.

I know I just reviewed a couple of these, but then I got to thinking about
patch 3 more closely.

I think it would be better to do all of the nan-boxing work inside of the
helpers, including the return values.

Since we must have a helper call for the actual fp arithmetic, we might as well
put the rest of the logic in there too.  That way the JIT code is smaller.

If, for RVF && !RVD, we always maintain the invariant that the values are
nanboxed anyway, then we do not even have to check for RVD at runtime.

Thoughts?


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
@ 2020-07-02 17:37   ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:37 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, wenmeng_zhang, wxy194768, ianjiang.ict

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> Multiple precison shoule be supported by NaN-boxing. That means, we should
> flush not valid NaN-boxing input to canonical NaN before effective
> calculation and we should NaN-boxing the result after the effective
> calculation.
> 
> In this patch set, split the implementation to three steps for compute,
> sign-injection, and some covert insns, which are check_nanboxed,
> effective calculation and gen_nanbox_fpr.
> 
> Check_nanboxed checks the inputs and flushes not valid inputs to cancical NaN.
> Effective calculation is direct calculation on fp32 values.
> Gen_nanbox_fpr does the NaN-boxing, writing the 1s to upper 32 bits.

I know I just reviewed a couple of these, but then I got to thinking about
patch 3 more closely.

I think it would be better to do all of the nan-boxing work inside of the
helpers, including the return values.

Since we must have a helper call for the actual fp arithmetic, we might as well
put the rest of the logic in there too.  That way the JIT code is smaller.

If, for RVF && !RVD, we always maintain the invariant that the values are
nanboxed anyway, then we do not even have to check for RVD at runtime.

Thoughts?


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/6] target/riscv: clean up fmv.w.x
  2020-06-26 20:59   ` LIU Zhiwei
@ 2020-07-02 17:38     ` Richard Henderson
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:38 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: palmer, wenmeng_zhang, Alistair.Francis, ianjiang.ict, wxy194768

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/6] target/riscv: clean up fmv.w.x
@ 2020-07-02 17:38     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-02 17:38 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel, qemu-riscv
  Cc: Alistair.Francis, palmer, wenmeng_zhang, wxy194768, ianjiang.ict

On 6/26/20 1:59 PM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
       [not found]   ` <3c139607-9cac-a28a-c296-b0e147b3b20f@c-sky.com>
@ 2020-07-07 21:45     ` LIU Zhiwei
  2020-07-08 15:35       ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: LIU Zhiwei @ 2020-07-07 21:45 UTC (permalink / raw)
  To: Richard Henderson, Alistair Francis,
	qemu-devel@nongnu.org Developers, Chih-Min Chao

Hi Richard,

Ping for other patches in this patch set.

I may not get you ideas. Could you give more information?

Zhiwei

On 2020/7/3 20:33, LIU Zhiwei wrote:
>
>
> On 2020/7/3 1:37, Richard Henderson wrote:
>> On 6/26/20 1:59 PM, LIU Zhiwei wrote:
>>> Multiple precison shoule be supported by NaN-boxing. That means, we 
>>> should
>>> flush not valid NaN-boxing input to canonical NaN before effective
>>> calculation and we should NaN-boxing the result after the effective
>>> calculation.
>>>
>>> In this patch set, split the implementation to three steps for compute,
>>> sign-injection, and some covert insns, which are check_nanboxed,
>>> effective calculation and gen_nanbox_fpr.
>>>
>>> Check_nanboxed checks the inputs and flushes not valid inputs to 
>>> cancical NaN.
>>> Effective calculation is direct calculation on fp32 values.
>>> Gen_nanbox_fpr does the NaN-boxing, writing the 1s to upper 32 bits.
>> I know I just reviewed a couple of these, but then I got to thinking 
>> about
>> patch 3 more closely.
>>
>> I think it would be better to do all of the nan-boxing work inside of 
>> the
>> helpers, including the return values.
> Do you mean a helper function just for nan-boxing work?
>
> I don't think so.
>
> The inputs are flushed to canonical NAN only when they are
> not legal nan-boxed values.
>
> The result is nan-boxed before writing  to  destination register.
>
> Both of them have some relations to nan-boxing, but they are not the 
> same.
>> Since we must have a helper call for the actual fp arithmetic, we 
>> might as well
>> put the rest of the logic in there too.  That way the JIT code is 
>> smaller.
> Yes, we can. But I think it is clearer just let helper do calculation.
>
>  By the way, is there some advantages of  smaller JIT code?
>> If, for RVF && !RVD, we always maintain the invariant that the values 
>> are
>> nanboxed anyway, then we do not even have to check for RVD at runtime.
> Do you mean if FMV.X.S and FLW are nan-boxed, then we will not get the 
> invalid values?
>
> I don't think so.
>
> First, FMV.X.D can transfer any 64 bits value to float register.
> Second, users may set  invalid values  to float register by GDB.
>
> I think it's necessary to do the inputs check and result nan-boxing.
>
>
> Zhiwei
>> Thoughts?
>>
>>
>> r~
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
  2020-07-07 21:45     ` LIU Zhiwei
@ 2020-07-08 15:35       ` Richard Henderson
  2020-07-10  7:03         ` LIU Zhiwei
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2020-07-08 15:35 UTC (permalink / raw)
  To: LIU Zhiwei, Alistair Francis, qemu-devel@nongnu.org Developers,
	Chih-Min Chao

On 7/7/20 2:45 PM, LIU Zhiwei wrote:
>> On 2020/7/3 1:37, Richard Henderson wrote:
>>> I think it would be better to do all of the nan-boxing work inside of the
>>> helpers, including the return values.
>> Do you mean a helper function just for nan-boxing work?

No, that's not what I mean.

>> I don't think so.
>>
>> The inputs are flushed to canonical NAN only when they are
>> not legal nan-boxed values.
>>
>> The result is nan-boxed before writing  to  destination register.
>>
>> Both of them have some relations to nan-boxing, but they are not the same.

I mean

uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1,
                       uint64_t frs2)
{
    float32 in1 = check_nanbox(frs1);
    float32 in2 = check_nanbox(frs2);
    float32 res = float32_add(in1, in2, &env->fp_status);

    return gen_nanbox(res);
}

I.e., always require nan-boxed inputs and return a nan-boxed output.

>>> If, for RVF && !RVD, we always maintain the invariant that the values are
>>> nanboxed anyway, then we do not even have to check for RVD at runtime.
>> Do you mean if FMV.X.S and FLW are nan-boxed, then we will not get the
>> invalid values?

No, I mean that if !RVD, there is no way to put an unboxed value into the fp
registers because...

>> First, FMV.X.D can transfer any 64 bits value to float register.
>> Second, users may set  invalid values  to float register by GDB.

... FMV.X.D does not exist for !RVD, nor does FLD.

The check_nanbox test will always succeed for !RVD, so we do not need to check
that RVD is set before performing check_nanbox.

Because the check is inexpensive, and because we expect !RVD to be an unusual
configuration, we do not bother to provide a second set of helpers that do not
perform the nan-boxing.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
  2020-07-08 15:35       ` Richard Henderson
@ 2020-07-10  7:03         ` LIU Zhiwei
  2020-07-10 16:03           ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: LIU Zhiwei @ 2020-07-10  7:03 UTC (permalink / raw)
  To: Richard Henderson, Alistair Francis,
	qemu-devel@nongnu.org Developers, Chih-Min Chao



On 2020/7/8 23:35, Richard Henderson wrote:
> On 7/7/20 2:45 PM, LIU Zhiwei wrote:
>>> On 2020/7/3 1:37, Richard Henderson wrote:
>>>> I think it would be better to do all of the nan-boxing work inside of the
>>>> helpers, including the return values.
>>> Do you mean a helper function just for nan-boxing work?
> No, that's not what I mean.
>
>>> I don't think so.
>>>
>>> The inputs are flushed to canonical NAN only when they are
>>> not legal nan-boxed values.
>>>
>>> The result is nan-boxed before writing  to  destination register.
>>>
>>> Both of them have some relations to nan-boxing, but they are not the same.
> I mean
>
> uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1,
>                         uint64_t frs2)
> {
>      float32 in1 = check_nanbox(frs1);
>      float32 in2 = check_nanbox(frs2);
>      float32 res = float32_add(in1, in2, &env->fp_status);
>
>      return gen_nanbox(res);
> }
>
> I.e., always require nan-boxed inputs and return a nan-boxed output.
>
>>>> If, for RVF && !RVD, we always maintain the invariant that the values are
>>>> nanboxed anyway, then we do not even have to check for RVD at runtime.
>>> Do you mean if FMV.X.S and FLW are nan-boxed, then we will not get the
>>> invalid values?
> No, I mean that if !RVD, there is no way to put an unboxed value into the fp
> registers because...
>
>>> First, FMV.X.D can transfer any 64 bits value to float register.
>>> Second, users may set  invalid values  to float register by GDB.
> ... FMV.X.D does not exist for !RVD, nor does FLD.
>
> The check_nanbox test will always succeed for !RVD, so we do not need to check
> that RVD is set before performing check_nanbox.
>
> Because the check is inexpensive, and because we expect !RVD to be an unusual
> configuration, we do not bother to provide a second set of helpers that do not
> perform the nan-boxing.
Get it.

The comment is moving both inputs check and the result nan-boxing code 
to helper functions.

In my opinion, it doesn't matter whether put them into helper functions 
or into translation functions.
More importantly, we should add inputs check and result nan-boxing for 
all single float point instructions.

If you insist on we should move it to helper functions, I'd like to.:-)

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/6] target/riscv: NaN-boxing for multiple precison
  2020-07-10  7:03         ` LIU Zhiwei
@ 2020-07-10 16:03           ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2020-07-10 16:03 UTC (permalink / raw)
  To: LIU Zhiwei, Alistair Francis, qemu-devel@nongnu.org Developers,
	Chih-Min Chao

On 7/10/20 12:03 AM, LIU Zhiwei wrote:
> The comment is moving both inputs check and the result nan-boxing code to
> helper functions.
> 
> In my opinion, it doesn't matter whether put them into helper functions or into
> translation functions.
> More importantly, we should add inputs check and result nan-boxing for all
> single float point instructions.
> 
> If you insist on we should move it to helper functions, I'd like to.:-)

I don't insist, but I think it makes sense to do so.

Less code in translate means less time in the JIT, and more sharing of the icache.

Sometimes it's a tradeoff, but in this case because we will always call a
helper, I think that the benefits are all positive to move the extra code into
the helper.


r~



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-07-10 16:04 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-26 20:59 [PATCH 0/6] target/riscv: NaN-boxing for multiple precison LIU Zhiwei
2020-06-26 20:59 ` LIU Zhiwei
2020-06-26 20:59 ` [PATCH 1/6] target/riscv: move gen_nanbox_fpr to translate.c LIU Zhiwei
2020-06-26 20:59   ` LIU Zhiwei
2020-07-02 17:13   ` Richard Henderson
2020-07-02 17:13     ` Richard Henderson
2020-06-26 20:59 ` [PATCH 2/6] target/riscv: NaN-boxing compute, sign-injection and convert instructions LIU Zhiwei
2020-06-26 20:59   ` LIU Zhiwei
2020-07-02 17:15   ` Richard Henderson
2020-07-02 17:15     ` Richard Henderson
2020-06-26 20:59 ` [PATCH 3/6] target/riscv: Check for LEGAL NaN-boxing LIU Zhiwei
2020-06-26 20:59   ` LIU Zhiwei
2020-06-30  7:20   ` Chih-Min Chao
2020-06-30  7:20     ` Chih-Min Chao
2020-06-30  7:31     ` LIU Zhiwei
2020-06-30  7:31       ` LIU Zhiwei
2020-06-26 20:59 ` [PATCH 4/6] target/riscv: check before allocating TCG temps LIU Zhiwei
2020-06-26 20:59   ` LIU Zhiwei
2020-07-02 17:13   ` Richard Henderson
2020-07-02 17:13     ` Richard Henderson
2020-06-26 20:59 ` [PATCH 5/6] target/riscv: Flush not valid NaN-boxing input to canonical NaN LIU Zhiwei
2020-06-26 20:59   ` LIU Zhiwei
2020-06-30  7:31   ` Chih-Min Chao
2020-06-30  7:31     ` Chih-Min Chao
2020-06-30  7:37     ` LIU Zhiwei
2020-06-30  7:37       ` LIU Zhiwei
2020-07-02  6:29       ` Chih-Min Chao
2020-07-02  6:29         ` Chih-Min Chao
2020-06-26 20:59 ` [PATCH 6/6] target/riscv: clean up fmv.w.x LIU Zhiwei
2020-06-26 20:59   ` LIU Zhiwei
2020-07-02 17:38   ` Richard Henderson
2020-07-02 17:38     ` Richard Henderson
2020-06-26 21:21 ` [PATCH 0/6] target/riscv: NaN-boxing for multiple precison no-reply
2020-06-26 21:21   ` no-reply
2020-07-02 17:37 ` Richard Henderson
2020-07-02 17:37   ` Richard Henderson
     [not found]   ` <3c139607-9cac-a28a-c296-b0e147b3b20f@c-sky.com>
2020-07-07 21:45     ` LIU Zhiwei
2020-07-08 15:35       ` Richard Henderson
2020-07-10  7:03         ` LIU Zhiwei
2020-07-10 16:03           ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.