[PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison
@ 2020-07-24  0:28 ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

This is my take on Liu Zhiwei's patch set:
https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com

This differs from Zhiwei's v1 in:

 * If a helper is involved, the helper does the boxing and unboxing.

 * Which leaves only LDW and FSGN*.S as the only instructions that
   are expanded inline which need to handle nanboxing.

 * All mention of RVD is dropped vs boxing.  This means that an
   RVF-only cpu will still generate and check nanboxes into the
   64-bit cpu_fpu slots.  There should be no way an RVF-only cpu
   can generate an unboxed cpu_fpu value.

   This choice is made to speed up the common case: RVF+RVD, so
   that we do not have to check whether RVD is enabled.

 * The translate.c primitives take TCGv values rather than fpu
   regno, which will make it possible to use them with RVV,
   since v0.9 does proper nanboxing.

 * I have adjusted the current naming to be float32 specific ("*_s"),
   to avoid confusion with the float16 data type supported by RVV.

r~

LIU Zhiwei (2):
  target/riscv: Clean up fmv.w.x
  target/riscv: check before allocating TCG temps

Richard Henderson (5):
  target/riscv: Generate nanboxed results from fp helpers
  target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
  target/riscv: Generate nanboxed results from trans_rvf.inc.c
  target/riscv: Check nanboxed inputs to fp helpers
  target/riscv: Check nanboxed inputs in trans_rvf.inc.c

 target/riscv/internals.h                |  16 ++++
 target/riscv/fpu_helper.c               | 102 ++++++++++++++++--------
 target/riscv/insn_trans/trans_rvd.inc.c |   8 +-
 target/riscv/insn_trans/trans_rvf.inc.c |  99 ++++++++++++++---------
 target/riscv/translate.c                |  29 +++++++
 5 files changed, 178 insertions(+), 76 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison
@ 2020-07-24  0:28 ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

This is my take on Liu Zhiwei's patch set:
https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com

This differs from Zhiwei's v1 in:

 * If a helper is involved, the helper does the boxing and unboxing.

 * Which leaves only LDW and FSGN*.S as the only instructions that
   are expanded inline which need to handle nanboxing.

 * All mention of RVD is dropped vs boxing.  This means that an
   RVF-only cpu will still generate and check nanboxes into the
   64-bit cpu_fpu slots.  There should be no way an RVF-only cpu
   can generate an unboxed cpu_fpu value.

   This choice is made to speed up the common case: RVF+RVD, so
   that we do not have to check whether RVD is enabled.

 * The translate.c primitives take TCGv values rather than fpu
   regno, which will make it possible to use them with RVV,
   since v0.9 does proper nanboxing.

 * I have adjusted the current naming to be float32 specific ("*_s"),
   to avoid confusion with the float16 data type supported by RVV.

r~

LIU Zhiwei (2):
  target/riscv: Clean up fmv.w.x
  target/riscv: check before allocating TCG temps

Richard Henderson (5):
  target/riscv: Generate nanboxed results from fp helpers
  target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
  target/riscv: Generate nanboxed results from trans_rvf.inc.c
  target/riscv: Check nanboxed inputs to fp helpers
  target/riscv: Check nanboxed inputs in trans_rvf.inc.c

 target/riscv/internals.h                |  16 ++++
 target/riscv/fpu_helper.c               | 102 ++++++++++++++++--------
 target/riscv/insn_trans/trans_rvd.inc.c |   8 +-
 target/riscv/insn_trans/trans_rvf.inc.c |  99 ++++++++++++++---------
 target/riscv/translate.c                |  29 +++++++
 5 files changed, 178 insertions(+), 76 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

Make sure that all results from single-precision scalar helpers
are properly nan-boxed to 64-bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/internals.h  |  5 +++++
 target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index 37d33820ad..9f4ba7d617 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
 #define SEW32 2
 #define SEW64 3
 
+static inline uint64_t nanbox_s(float32 f)
+{
+    return f | MAKE_64BIT_MASK(32, 32);
+}
+
 #endif
diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
index 4379756dc4..72541958a7 100644
--- a/target/riscv/fpu_helper.c
+++ b/target/riscv/fpu_helper.c
@@ -81,10 +81,16 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
     set_float_rounding_mode(softrm, &env->fp_status);
 }
 
+static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
+                           uint64_t frs3, int flags)
+{
+    return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
+}
+
 uint64_t helper_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                         uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, 0, &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3, 0);
 }
 
 uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -96,8 +102,7 @@ uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                         uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c,
-                          &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_c);
 }
 
 uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -110,8 +115,7 @@ uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                          uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_product,
-                          &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_product);
 }
 
 uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -124,8 +128,8 @@ uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                          uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c |
-                          float_muladd_negate_product, &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3,
+                      float_muladd_negate_c | float_muladd_negate_product);
 }
 
 uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -137,37 +141,37 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 
 uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_add(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_sub(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_mul(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_div(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_minnum(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_maxnum(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
 {
-    return float32_sqrt(frs1, &env->fp_status);
+    return nanbox_s(float32_sqrt(frs1, &env->fp_status));
 }
 
 target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
@@ -209,23 +213,23 @@ uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
 
 uint64_t helper_fcvt_s_w(CPURISCVState *env, target_ulong rs1)
 {
-    return int32_to_float32((int32_t)rs1, &env->fp_status);
+    return nanbox_s(int32_to_float32((int32_t)rs1, &env->fp_status));
 }
 
 uint64_t helper_fcvt_s_wu(CPURISCVState *env, target_ulong rs1)
 {
-    return uint32_to_float32((uint32_t)rs1, &env->fp_status);
+    return nanbox_s(uint32_to_float32((uint32_t)rs1, &env->fp_status));
 }
 
 #if defined(TARGET_RISCV64)
 uint64_t helper_fcvt_s_l(CPURISCVState *env, uint64_t rs1)
 {
-    return int64_to_float32(rs1, &env->fp_status);
+    return nanbox_s(int64_to_float32(rs1, &env->fp_status));
 }
 
 uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
 {
-    return uint64_to_float32(rs1, &env->fp_status);
+    return nanbox_s(uint64_to_float32(rs1, &env->fp_status));
 }
 #endif
 
@@ -266,7 +270,7 @@ uint64_t helper_fmax_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 
 uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
 {
-    return float64_to_float32(rs1, &env->fp_status);
+    return nanbox_s(float64_to_float32(rs1, &env->fp_status));
 }
 
 uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

Make sure that all results from single-precision scalar helpers
are properly nan-boxed to 64-bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/internals.h  |  5 +++++
 target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index 37d33820ad..9f4ba7d617 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
 #define SEW32 2
 #define SEW64 3
 
+static inline uint64_t nanbox_s(float32 f)
+{
+    return f | MAKE_64BIT_MASK(32, 32);
+}
+
 #endif
diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
index 4379756dc4..72541958a7 100644
--- a/target/riscv/fpu_helper.c
+++ b/target/riscv/fpu_helper.c
@@ -81,10 +81,16 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
     set_float_rounding_mode(softrm, &env->fp_status);
 }
 
+static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
+                           uint64_t frs3, int flags)
+{
+    return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
+}
+
 uint64_t helper_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                         uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, 0, &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3, 0);
 }
 
 uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -96,8 +102,7 @@ uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                         uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c,
-                          &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_c);
 }
 
 uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -110,8 +115,7 @@ uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                          uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_product,
-                          &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_product);
 }
 
 uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -124,8 +128,8 @@ uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                          uint64_t frs3)
 {
-    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c |
-                          float_muladd_negate_product, &env->fp_status);
+    return do_fmadd_s(env, frs1, frs2, frs3,
+                      float_muladd_negate_c | float_muladd_negate_product);
 }
 
 uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
@@ -137,37 +141,37 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
 
 uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_add(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_sub(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_mul(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_div(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_minnum(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-    return float32_maxnum(frs1, frs2, &env->fp_status);
+    return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
 }
 
 uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
 {
-    return float32_sqrt(frs1, &env->fp_status);
+    return nanbox_s(float32_sqrt(frs1, &env->fp_status));
 }
 
 target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
@@ -209,23 +213,23 @@ uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
 
 uint64_t helper_fcvt_s_w(CPURISCVState *env, target_ulong rs1)
 {
-    return int32_to_float32((int32_t)rs1, &env->fp_status);
+    return nanbox_s(int32_to_float32((int32_t)rs1, &env->fp_status));
 }
 
 uint64_t helper_fcvt_s_wu(CPURISCVState *env, target_ulong rs1)
 {
-    return uint32_to_float32((uint32_t)rs1, &env->fp_status);
+    return nanbox_s(uint32_to_float32((uint32_t)rs1, &env->fp_status));
 }
 
 #if defined(TARGET_RISCV64)
 uint64_t helper_fcvt_s_l(CPURISCVState *env, uint64_t rs1)
 {
-    return int64_to_float32(rs1, &env->fp_status);
+    return nanbox_s(int64_to_float32(rs1, &env->fp_status));
 }
 
 uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
 {
-    return uint64_to_float32(rs1, &env->fp_status);
+    return nanbox_s(uint64_to_float32(rs1, &env->fp_status));
 }
 #endif
 
@@ -266,7 +270,7 @@ uint64_t helper_fmax_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 
 uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
 {
-    return float64_to_float32(rs1, &env->fp_status);
+    return nanbox_s(float64_to_float32(rs1, &env->fp_status));
 }
 
 uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

Do not depend on the RVD extension, take input and output via
TCGv_i64 instead of fpu regno.  Move the function to translate.c
so that it can be used in multiple trans_*.inc.c files.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 16 +---------------
 target/riscv/translate.c                | 11 +++++++++++
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 3bfd8881e7..c7057482e8 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -23,20 +23,6 @@
         return false;                       \
 } while (0)
 
-/*
- * RISC-V requires NaN-boxing of narrower width floating
- * point values.  This applies when a 32-bit value is
- * assigned to a 64-bit FP register.  Thus this does not
- * apply when the RVD extension is not present.
- */
-static void gen_nanbox_fpr(DisasContext *ctx, int regno)
-{
-    if (has_ext(ctx, RVD)) {
-        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
-                        MAKE_64BIT_MASK(32, 32));
-    }
-}
-
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
     TCGv t0 = tcg_temp_new();
@@ -46,7 +32,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
-    gen_nanbox_fpr(ctx, a->rd);
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
 
     tcg_temp_free(t0);
     mark_fs_dirty(ctx);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 9632e79cf3..12a746da97 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -90,6 +90,17 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
     return ctx->misa & ext;
 }
 
+/*
+ * RISC-V requires NaN-boxing of narrower width floating point values.
+ * This applies when a 32-bit value is assigned to a 64-bit FP register.
+ * For consistency and simplicity, we nanbox results even when the RVD
+ * extension is not present.
+ */
+static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
+{
+    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

Do not depend on the RVD extension, take input and output via
TCGv_i64 instead of fpu regno.  Move the function to translate.c
so that it can be used in multiple trans_*.inc.c files.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 16 +---------------
 target/riscv/translate.c                | 11 +++++++++++
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 3bfd8881e7..c7057482e8 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -23,20 +23,6 @@
         return false;                       \
 } while (0)
 
-/*
- * RISC-V requires NaN-boxing of narrower width floating
- * point values.  This applies when a 32-bit value is
- * assigned to a 64-bit FP register.  Thus this does not
- * apply when the RVD extension is not present.
- */
-static void gen_nanbox_fpr(DisasContext *ctx, int regno)
-{
-    if (has_ext(ctx, RVD)) {
-        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
-                        MAKE_64BIT_MASK(32, 32));
-    }
-}
-
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
     TCGv t0 = tcg_temp_new();
@@ -46,7 +32,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
-    gen_nanbox_fpr(ctx, a->rd);
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
 
     tcg_temp_free(t0);
     mark_fs_dirty(ctx);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 9632e79cf3..12a746da97 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -90,6 +90,17 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
     return ctx->misa & ext;
 }
 
+/*
+ * RISC-V requires NaN-boxing of narrower width floating point values.
+ * This applies when a 32-bit value is assigned to a 64-bit FP register.
+ * For consistency and simplicity, we nanbox results even when the RVD
+ * extension is not present.
+ */
+static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
+{
+    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

Make sure that all results from inline single-precision scalar
operations are properly nan-boxed to 64-bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index c7057482e8..264d3139f1 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -167,6 +167,7 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
                             0, 31);
     }
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -183,6 +184,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -199,6 +201,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
         tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -369,6 +372,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
 #else
     tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
 #endif
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

Make sure that all results from inline single-precision scalar
operations are properly nan-boxed to 64-bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index c7057482e8..264d3139f1 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -167,6 +167,7 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
                             0, 31);
     }
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -183,6 +184,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
         tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -199,6 +201,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
         tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
         tcg_temp_free_i64(t0);
     }
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
@@ -369,6 +372,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
 #else
     tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
 #endif
+    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
 
     mark_fs_dirty(ctx);
     tcg_temp_free(t0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

If a 32-bit input is not properly nanboxed, then the input is
replaced with the default qnan.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/internals.h  | 11 +++++++
 target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
 2 files changed, 57 insertions(+), 18 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index 9f4ba7d617..f1a546dba6 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
     return f | MAKE_64BIT_MASK(32, 32);
 }
 
+static inline float32 check_nanbox_s(uint64_t f)
+{
+    uint64_t mask = MAKE_64BIT_MASK(32, 32);
+
+    if (likely((f & mask) == mask)) {
+        return (uint32_t)f;
+    } else {
+        return 0x7fc00000u; /* default qnan */
+    }
+}
+
 #endif
diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
index 72541958a7..bb346a8249 100644
--- a/target/riscv/fpu_helper.c
+++ b/target/riscv/fpu_helper.c
@@ -81,9 +81,12 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
     set_float_rounding_mode(softrm, &env->fp_status);
 }
 
-static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
-                           uint64_t frs3, int flags)
+static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2,
+                           uint64_t rs3, int flags)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
+    float32 frs3 = check_nanbox_s(rs3);
     return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
 }
 
@@ -139,74 +142,97 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                           float_muladd_negate_product, &env->fp_status);
 }
 
-uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fsub_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fmul_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
+uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return nanbox_s(float32_sqrt(frs1, &env->fp_status));
 }
 
-target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+target_ulong helper_fle_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return float32_le(frs1, frs2, &env->fp_status);
 }
 
-target_ulong helper_flt_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+target_ulong helper_flt_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return float32_lt(frs1, frs2, &env->fp_status);
 }
 
-target_ulong helper_feq_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+target_ulong helper_feq_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return float32_eq_quiet(frs1, frs2, &env->fp_status);
 }
 
-target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1)
+target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return float32_to_int32(frs1, &env->fp_status);
 }
 
-target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1)
+target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return (int32_t)float32_to_uint32(frs1, &env->fp_status);
 }
 
 #if defined(TARGET_RISCV64)
-uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t frs1)
+uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return float32_to_int64(frs1, &env->fp_status);
 }
 
-uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
+uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return float32_to_uint64(frs1, &env->fp_status);
 }
 #endif
@@ -233,8 +259,9 @@ uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
 }
 #endif
 
-target_ulong helper_fclass_s(uint64_t frs1)
+target_ulong helper_fclass_s(uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return fclass_s(frs1);
 }
 
@@ -275,7 +302,8 @@ uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
 
 uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
 {
-    return float32_to_float64(rs1, &env->fp_status);
+    float32 frs1 = check_nanbox_s(rs1);
+    return float32_to_float64(frs1, &env->fp_status);
 }
 
 uint64_t helper_fsqrt_d(CPURISCVState *env, uint64_t frs1)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

If a 32-bit input is not properly nanboxed, then the input is
replaced with the default qnan.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/internals.h  | 11 +++++++
 target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
 2 files changed, 57 insertions(+), 18 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index 9f4ba7d617..f1a546dba6 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
     return f | MAKE_64BIT_MASK(32, 32);
 }
 
+static inline float32 check_nanbox_s(uint64_t f)
+{
+    uint64_t mask = MAKE_64BIT_MASK(32, 32);
+
+    if (likely((f & mask) == mask)) {
+        return (uint32_t)f;
+    } else {
+        return 0x7fc00000u; /* default qnan */
+    }
+}
+
 #endif
diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
index 72541958a7..bb346a8249 100644
--- a/target/riscv/fpu_helper.c
+++ b/target/riscv/fpu_helper.c
@@ -81,9 +81,12 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
     set_float_rounding_mode(softrm, &env->fp_status);
 }
 
-static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
-                           uint64_t frs3, int flags)
+static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2,
+                           uint64_t rs3, int flags)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
+    float32 frs3 = check_nanbox_s(rs3);
     return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
 }
 
@@ -139,74 +142,97 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
                           float_muladd_negate_product, &env->fp_status);
 }
 
-uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fsub_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fmul_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
 }
 
-uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
+uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return nanbox_s(float32_sqrt(frs1, &env->fp_status));
 }
 
-target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+target_ulong helper_fle_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return float32_le(frs1, frs2, &env->fp_status);
 }
 
-target_ulong helper_flt_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+target_ulong helper_flt_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return float32_lt(frs1, frs2, &env->fp_status);
 }
 
-target_ulong helper_feq_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+target_ulong helper_feq_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
+    float32 frs1 = check_nanbox_s(rs1);
+    float32 frs2 = check_nanbox_s(rs2);
     return float32_eq_quiet(frs1, frs2, &env->fp_status);
 }
 
-target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1)
+target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return float32_to_int32(frs1, &env->fp_status);
 }
 
-target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1)
+target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return (int32_t)float32_to_uint32(frs1, &env->fp_status);
 }
 
 #if defined(TARGET_RISCV64)
-uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t frs1)
+uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return float32_to_int64(frs1, &env->fp_status);
 }
 
-uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
+uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return float32_to_uint64(frs1, &env->fp_status);
 }
 #endif
@@ -233,8 +259,9 @@ uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
 }
 #endif
 
-target_ulong helper_fclass_s(uint64_t frs1)
+target_ulong helper_fclass_s(uint64_t rs1)
 {
+    float32 frs1 = check_nanbox_s(rs1);
     return fclass_s(frs1);
 }
 
@@ -275,7 +302,8 @@ uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
 
 uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
 {
-    return float32_to_float64(rs1, &env->fp_status);
+    float32 frs1 = check_nanbox_s(rs1);
+    return float32_to_float64(frs1, &env->fp_status);
 }
 
 uint64_t helper_fsqrt_d(CPURISCVState *env, uint64_t frs1)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

If a 32-bit input is not properly nanboxed, then the input is replaced
with the default qnan.  The only inline expansion is for the sign-changing
set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
 target/riscv/translate.c                | 18 +++++++
 2 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 264d3139f1..f9a9e0643a 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     if (a->rs1 == a->rs2) { /* FMOV */
-        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
+        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
     } else { /* FSGNJ */
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
-                            0, 31);
+        TCGv_i64 rs1 = tcg_temp_new_i64();
+        TCGv_i64 rs2 = tcg_temp_new_i64();
+
+        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
+        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
+
+        /* This formulation retains the nanboxing of rs2. */
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
+        tcg_temp_free_i64(rs1);
+        tcg_temp_free_i64(rs2);
     }
-    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
 
 static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
 {
+    TCGv_i64 rs1, rs2, mask;
+
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    rs1 = tcg_temp_new_i64();
+    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
+
     if (a->rs1 == a->rs2) { /* FNEG */
-        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
+        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
     } else {
-        TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
-        tcg_temp_free_i64(t0);
+        rs2 = tcg_temp_new_i64();
+        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
+
+        /*
+         * Replace bit 31 in rs1 with inverse in rs2.
+         * This formulation retains the nanboxing of rs1.
+         */
+        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
+        tcg_gen_andc_i64(rs2, mask, rs2);
+        tcg_gen_and_i64(rs1, mask, rs1);
+        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
+
+        tcg_temp_free_i64(mask);
+        tcg_temp_free_i64(rs2);
     }
-    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
+    tcg_temp_free_i64(rs1);
+
     mark_fs_dirty(ctx);
     return true;
 }
 
 static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
 {
+    TCGv_i64 rs1, rs2;
+
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    rs1 = tcg_temp_new_i64();
+    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
+
     if (a->rs1 == a->rs2) { /* FABS */
-        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
+        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
     } else {
-        TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
-        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
-        tcg_temp_free_i64(t0);
+        rs2 = tcg_temp_new_i64();
+        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
+
+        /*
+         * Xor bit 31 in rs1 with that in rs2.
+         * This formulation retains the nanboxing of rs1.
+         */
+        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
+        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
+
+        tcg_temp_free_i64(rs2);
     }
-    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
+    tcg_temp_free_i64(rs1);
+
     mark_fs_dirty(ctx);
     return true;
 }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 12a746da97..bf35182776 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
     tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
 }
 
+/*
+ * A narrow n-bit operation, where n < FLEN, checks that input operands
+ * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
+ * If so, the least-significant bits of the input are used, otherwise the
+ * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
+ *
+ * Here, the result is always nan-boxed, even the canonical nan.
+ */
+static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
+{
+    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
+    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
+
+    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
+    tcg_temp_free_i64(t_max);
+    tcg_temp_free_i64(t_nan);
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

If a 32-bit input is not properly nanboxed, then the input is replaced
with the default qnan.  The only inline expansion is for the sign-changing
set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
 target/riscv/translate.c                | 18 +++++++
 2 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 264d3139f1..f9a9e0643a 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
 {
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
     if (a->rs1 == a->rs2) { /* FMOV */
-        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
+        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
     } else { /* FSGNJ */
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
-                            0, 31);
+        TCGv_i64 rs1 = tcg_temp_new_i64();
+        TCGv_i64 rs2 = tcg_temp_new_i64();
+
+        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
+        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
+
+        /* This formulation retains the nanboxing of rs2. */
+        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
+        tcg_temp_free_i64(rs1);
+        tcg_temp_free_i64(rs2);
     }
-    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
     mark_fs_dirty(ctx);
     return true;
 }
 
 static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
 {
+    TCGv_i64 rs1, rs2, mask;
+
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    rs1 = tcg_temp_new_i64();
+    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
+
     if (a->rs1 == a->rs2) { /* FNEG */
-        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
+        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
     } else {
-        TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
-        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
-        tcg_temp_free_i64(t0);
+        rs2 = tcg_temp_new_i64();
+        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
+
+        /*
+         * Replace bit 31 in rs1 with inverse in rs2.
+         * This formulation retains the nanboxing of rs1.
+         */
+        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
+        tcg_gen_andc_i64(rs2, mask, rs2);
+        tcg_gen_and_i64(rs1, mask, rs1);
+        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
+
+        tcg_temp_free_i64(mask);
+        tcg_temp_free_i64(rs2);
     }
-    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
+    tcg_temp_free_i64(rs1);
+
     mark_fs_dirty(ctx);
     return true;
 }
 
 static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
 {
+    TCGv_i64 rs1, rs2;
+
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+
+    rs1 = tcg_temp_new_i64();
+    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
+
     if (a->rs1 == a->rs2) { /* FABS */
-        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
+        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
     } else {
-        TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
-        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
-        tcg_temp_free_i64(t0);
+        rs2 = tcg_temp_new_i64();
+        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
+
+        /*
+         * Xor bit 31 in rs1 with that in rs2.
+         * This formulation retains the nanboxing of rs1.
+         */
+        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
+        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
+
+        tcg_temp_free_i64(rs2);
     }
-    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
+    tcg_temp_free_i64(rs1);
+
     mark_fs_dirty(ctx);
     return true;
 }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 12a746da97..bf35182776 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
     tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
 }
 
+/*
+ * A narrow n-bit operation, where n < FLEN, checks that input operands
+ * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
+ * If so, the least-significant bits of the input are used, otherwise the
+ * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
+ *
+ * Here, the result is always nan-boxed, even the canonical nan.
+ */
+static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
+{
+    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
+    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
+
+    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
+    tcg_temp_free_i64(t_max);
+    tcg_temp_free_i64(t_nan);
+}
+
 static void generate_exception(DisasContext *ctx, int excp)
 {
     tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 6/7] target/riscv: Clean up fmv.w.x
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

From: LIU Zhiwei <zhiwei_liu@c-sky.com>

Use tcg_gen_extu_tl_i64 to avoid the ifdef.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Message-Id: <20200626205917.4545-7-zhiwei_liu@c-sky.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index f9a9e0643a..0d04677a02 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -406,11 +406,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-#if defined(TARGET_RISCV64)
-    tcg_gen_mov_i64(cpu_fpr[a->rd], t0);
-#else
-    tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
-#endif
+    tcg_gen_extu_tl_i64(cpu_fpr[a->rd], t0);
     gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
 
     mark_fs_dirty(ctx);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 6/7] target/riscv: Clean up fmv.w.x
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

From: LIU Zhiwei <zhiwei_liu@c-sky.com>

Use tcg_gen_extu_tl_i64 to avoid the ifdef.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Message-Id: <20200626205917.4545-7-zhiwei_liu@c-sky.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index f9a9e0643a..0d04677a02 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -406,11 +406,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-#if defined(TARGET_RISCV64)
-    tcg_gen_mov_i64(cpu_fpr[a->rd], t0);
-#else
-    tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
-#endif
+    tcg_gen_extu_tl_i64(cpu_fpr[a->rd], t0);
     gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
 
     mark_fs_dirty(ctx);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 7/7] target/riscv: check before allocating TCG temps
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  0:28   ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv, zhiwei_liu

From: LIU Zhiwei <zhiwei_liu@c-sky.com>

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Message-Id: <20200626205917.4545-5-zhiwei_liu@c-sky.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
 target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index ea1044f13b..4f832637fa 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -20,10 +20,10 @@
 
 static bool trans_fld(DisasContext *ctx, arg_fld *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
@@ -35,10 +35,10 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
 
 static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 0d04677a02..16df9c5ee2 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -25,10 +25,10 @@
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
@@ -41,11 +41,11 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
 
 static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 {
+    REQUIRE_FPU;
+    REQUIRE_EXT(ctx, RVF);
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-    REQUIRE_FPU;
-    REQUIRE_EXT(ctx, RVF);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 7/7] target/riscv: check before allocating TCG temps
@ 2020-07-24  0:28   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  0:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23, zhiwei_liu

From: LIU Zhiwei <zhiwei_liu@c-sky.com>

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Message-Id: <20200626205917.4545-5-zhiwei_liu@c-sky.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
 target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.inc.c b/target/riscv/insn_trans/trans_rvd.inc.c
index ea1044f13b..4f832637fa 100644
--- a/target/riscv/insn_trans/trans_rvd.inc.c
+++ b/target/riscv/insn_trans/trans_rvd.inc.c
@@ -20,10 +20,10 @@
 
 static bool trans_fld(DisasContext *ctx, arg_fld *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
@@ -35,10 +35,10 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
 
 static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVD);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
index 0d04677a02..16df9c5ee2 100644
--- a/target/riscv/insn_trans/trans_rvf.inc.c
+++ b/target/riscv/insn_trans/trans_rvf.inc.c
@@ -25,10 +25,10 @@
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
-    TCGv t0 = tcg_temp_new();
-    gen_get_gpr(t0, a->rs1);
     REQUIRE_FPU;
     REQUIRE_EXT(ctx, RVF);
+    TCGv t0 = tcg_temp_new();
+    gen_get_gpr(t0, a->rs1);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
@@ -41,11 +41,11 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
 
 static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 {
+    REQUIRE_FPU;
+    REQUIRE_EXT(ctx, RVF);
     TCGv t0 = tcg_temp_new();
     gen_get_gpr(t0, a->rs1);
 
-    REQUIRE_FPU;
-    REQUIRE_EXT(ctx, RVF);
     tcg_gen_addi_tl(t0, t0, a->imm);
 
     tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-24  2:31   ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 8:28, Richard Henderson wrote:
> This is my take on Liu Zhiwei's patch set:
> https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com
>
> This differs from Zhiwei's v1 in:
>
>   * If a helper is involved, the helper does the boxing and unboxing.
>
>   * Which leaves only LDW and FSGN*.S as the only instructions that
>     are expanded inline which need to handle nanboxing.
>
>   * All mention of RVD is dropped vs boxing.  This means that an
>     RVF-only cpu will still generate and check nanboxes into the
>     64-bit cpu_fpu slots.  There should be no way an RVF-only cpu
>     can generate an unboxed cpu_fpu value.
>
>     This choice is made to speed up the common case: RVF+RVD, so
>     that we do not have to check whether RVD is enabled.
>
>   * The translate.c primitives take TCGv values rather than fpu
>     regno, which will make it possible to use them with RVV,
>     since v0.9 does proper nanboxing.
Agree.

And I think this patch set should be applied  if possible, because it is 
bug fix.
>   * I have adjusted the current naming to be float32 specific ("*_s"),
>     to avoid confusion with the float16 data type supported by RVV.
It's OK.

A more general function with flen is better in my opinion. So that it 
can be used
everywhere, both in scalar and vector instructions, even the future fp16 or
bf16 instructions.

Zhiwei
>
> r~
>
>
> LIU Zhiwei (2):
>    target/riscv: Clean up fmv.w.x
>    target/riscv: check before allocating TCG temps
>
> Richard Henderson (5):
>    target/riscv: Generate nanboxed results from fp helpers
>    target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
>    target/riscv: Generate nanboxed results from trans_rvf.inc.c
>    target/riscv: Check nanboxed inputs to fp helpers
>    target/riscv: Check nanboxed inputs in trans_rvf.inc.c
>
>   target/riscv/internals.h                |  16 ++++
>   target/riscv/fpu_helper.c               | 102 ++++++++++++++++--------
>   target/riscv/insn_trans/trans_rvd.inc.c |   8 +-
>   target/riscv/insn_trans/trans_rvf.inc.c |  99 ++++++++++++++---------
>   target/riscv/translate.c                |  29 +++++++
>   5 files changed, 178 insertions(+), 76 deletions(-)
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison
@ 2020-07-24  2:31   ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 8:28, Richard Henderson wrote:
> This is my take on Liu Zhiwei's patch set:
> https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com
>
> This differs from Zhiwei's v1 in:
>
>   * If a helper is involved, the helper does the boxing and unboxing.
>
>   * Which leaves only LDW and FSGN*.S as the only instructions that
>     are expanded inline which need to handle nanboxing.
>
>   * All mention of RVD is dropped vs boxing.  This means that an
>     RVF-only cpu will still generate and check nanboxes into the
>     64-bit cpu_fpu slots.  There should be no way an RVF-only cpu
>     can generate an unboxed cpu_fpu value.
>
>     This choice is made to speed up the common case: RVF+RVD, so
>     that we do not have to check whether RVD is enabled.
>
>   * The translate.c primitives take TCGv values rather than fpu
>     regno, which will make it possible to use them with RVV,
>     since v0.9 does proper nanboxing.
Agree.

And I think this patch set should be applied  if possible, because it is 
bug fix.
>   * I have adjusted the current naming to be float32 specific ("*_s"),
>     to avoid confusion with the float16 data type supported by RVV.
It's OK.

A more general function with flen is better in my opinion. So that it 
can be used
everywhere, both in scalar and vector instructions, even the future fp16 or
bf16 instructions.

Zhiwei
>
> r~
>
>
> LIU Zhiwei (2):
>    target/riscv: Clean up fmv.w.x
>    target/riscv: check before allocating TCG temps
>
> Richard Henderson (5):
>    target/riscv: Generate nanboxed results from fp helpers
>    target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
>    target/riscv: Generate nanboxed results from trans_rvf.inc.c
>    target/riscv: Check nanboxed inputs to fp helpers
>    target/riscv: Check nanboxed inputs in trans_rvf.inc.c
>
>   target/riscv/internals.h                |  16 ++++
>   target/riscv/fpu_helper.c               | 102 ++++++++++++++++--------
>   target/riscv/insn_trans/trans_rvd.inc.c |   8 +-
>   target/riscv/insn_trans/trans_rvf.inc.c |  99 ++++++++++++++---------
>   target/riscv/translate.c                |  29 +++++++
>   5 files changed, 178 insertions(+), 76 deletions(-)
>



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-07-24  0:28   ` Richard Henderson
@ 2020-07-24  2:35     ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 8:28, Richard Henderson wrote:
> Make sure that all results from single-precision scalar helpers
> are properly nan-boxed to 64-bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/internals.h  |  5 +++++
>   target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
>   2 files changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index 37d33820ad..9f4ba7d617 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>   #define SEW32 2
>   #define SEW64 3
>   
> +static inline uint64_t nanbox_s(float32 f)
> +{
> +    return f | MAKE_64BIT_MASK(32, 32);
> +}
> +
If define it here,  we can also define a more general  function with flen.

+static inline uint64_t nanbox_s(float32 f, uint32_t flen)
+{
+    return f | MAKE_64BIT_MASK(flen, 64 - flen);
+}
+

So we can reuse it in fp16 or bf16 scalar instruction and in vector 
instructions.

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   #endif
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index 4379756dc4..72541958a7 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -81,10 +81,16 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
>       set_float_rounding_mode(softrm, &env->fp_status);
>   }
>   
> +static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> +                           uint64_t frs3, int flags)
> +{
> +    return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
> +}
> +
>   uint64_t helper_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                           uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, 0, &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3, 0);
>   }
>   
>   uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -96,8 +102,7 @@ uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                           uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c,
> -                          &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_c);
>   }
>   
>   uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -110,8 +115,7 @@ uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                            uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_product,
> -                          &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_product);
>   }
>   
>   uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -124,8 +128,8 @@ uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                            uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c |
> -                          float_muladd_negate_product, &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3,
> +                      float_muladd_negate_c | float_muladd_negate_product);
>   }
>   
>   uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -137,37 +141,37 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   
>   uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_add(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_sub(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_mul(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_div(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_minnum(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_maxnum(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
>   {
> -    return float32_sqrt(frs1, &env->fp_status);
> +    return nanbox_s(float32_sqrt(frs1, &env->fp_status));
>   }
>   
>   target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> @@ -209,23 +213,23 @@ uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
>   
>   uint64_t helper_fcvt_s_w(CPURISCVState *env, target_ulong rs1)
>   {
> -    return int32_to_float32((int32_t)rs1, &env->fp_status);
> +    return nanbox_s(int32_to_float32((int32_t)rs1, &env->fp_status));
>   }
>   
>   uint64_t helper_fcvt_s_wu(CPURISCVState *env, target_ulong rs1)
>   {
> -    return uint32_to_float32((uint32_t)rs1, &env->fp_status);
> +    return nanbox_s(uint32_to_float32((uint32_t)rs1, &env->fp_status));
>   }
>   
>   #if defined(TARGET_RISCV64)
>   uint64_t helper_fcvt_s_l(CPURISCVState *env, uint64_t rs1)
>   {
> -    return int64_to_float32(rs1, &env->fp_status);
> +    return nanbox_s(int64_to_float32(rs1, &env->fp_status));
>   }
>   
>   uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
>   {
> -    return uint64_to_float32(rs1, &env->fp_status);
> +    return nanbox_s(uint64_to_float32(rs1, &env->fp_status));
>   }
>   #endif
>   
> @@ -266,7 +270,7 @@ uint64_t helper_fmax_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   
>   uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
>   {
> -    return float64_to_float32(rs1, &env->fp_status);
> +    return nanbox_s(float64_to_float32(rs1, &env->fp_status));
>   }
>   
>   uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-07-24  2:35     ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 8:28, Richard Henderson wrote:
> Make sure that all results from single-precision scalar helpers
> are properly nan-boxed to 64-bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/internals.h  |  5 +++++
>   target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
>   2 files changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index 37d33820ad..9f4ba7d617 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>   #define SEW32 2
>   #define SEW64 3
>   
> +static inline uint64_t nanbox_s(float32 f)
> +{
> +    return f | MAKE_64BIT_MASK(32, 32);
> +}
> +
If define it here,  we can also define a more general  function with flen.

+static inline uint64_t nanbox_s(float32 f, uint32_t flen)
+{
+    return f | MAKE_64BIT_MASK(flen, 64 - flen);
+}
+

So we can reuse it in fp16 or bf16 scalar instruction and in vector 
instructions.

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   #endif
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index 4379756dc4..72541958a7 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -81,10 +81,16 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
>       set_float_rounding_mode(softrm, &env->fp_status);
>   }
>   
> +static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> +                           uint64_t frs3, int flags)
> +{
> +    return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
> +}
> +
>   uint64_t helper_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                           uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, 0, &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3, 0);
>   }
>   
>   uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -96,8 +102,7 @@ uint64_t helper_fmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                           uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c,
> -                          &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_c);
>   }
>   
>   uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -110,8 +115,7 @@ uint64_t helper_fmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                            uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_product,
> -                          &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3, float_muladd_negate_product);
>   }
>   
>   uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -124,8 +128,8 @@ uint64_t helper_fnmsub_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                            uint64_t frs3)
>   {
> -    return float32_muladd(frs1, frs2, frs3, float_muladd_negate_c |
> -                          float_muladd_negate_product, &env->fp_status);
> +    return do_fmadd_s(env, frs1, frs2, frs3,
> +                      float_muladd_negate_c | float_muladd_negate_product);
>   }
>   
>   uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> @@ -137,37 +141,37 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>   
>   uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_add(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_sub(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_mul(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_div(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_minnum(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   {
> -    return float32_maxnum(frs1, frs2, &env->fp_status);
> +    return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
>   }
>   
>   uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
>   {
> -    return float32_sqrt(frs1, &env->fp_status);
> +    return nanbox_s(float32_sqrt(frs1, &env->fp_status));
>   }
>   
>   target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> @@ -209,23 +213,23 @@ uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
>   
>   uint64_t helper_fcvt_s_w(CPURISCVState *env, target_ulong rs1)
>   {
> -    return int32_to_float32((int32_t)rs1, &env->fp_status);
> +    return nanbox_s(int32_to_float32((int32_t)rs1, &env->fp_status));
>   }
>   
>   uint64_t helper_fcvt_s_wu(CPURISCVState *env, target_ulong rs1)
>   {
> -    return uint32_to_float32((uint32_t)rs1, &env->fp_status);
> +    return nanbox_s(uint32_to_float32((uint32_t)rs1, &env->fp_status));
>   }
>   
>   #if defined(TARGET_RISCV64)
>   uint64_t helper_fcvt_s_l(CPURISCVState *env, uint64_t rs1)
>   {
> -    return int64_to_float32(rs1, &env->fp_status);
> +    return nanbox_s(int64_to_float32(rs1, &env->fp_status));
>   }
>   
>   uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
>   {
> -    return uint64_to_float32(rs1, &env->fp_status);
> +    return nanbox_s(uint64_to_float32(rs1, &env->fp_status));
>   }
>   #endif
>   
> @@ -266,7 +270,7 @@ uint64_t helper_fmax_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
>   
>   uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
>   {
> -    return float64_to_float32(rs1, &env->fp_status);
> +    return nanbox_s(float64_to_float32(rs1, &env->fp_status));
>   }
>   
>   uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
  2020-07-24  0:28   ` Richard Henderson
@ 2020-07-24  2:39     ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 8:28, Richard Henderson wrote:
> Do not depend on the RVD extension, take input and output via
> TCGv_i64 instead of fpu regno.  Move the function to translate.c
> so that it can be used in multiple trans_*.inc.c files.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/insn_trans/trans_rvf.inc.c | 16 +---------------
>   target/riscv/translate.c                | 11 +++++++++++
>   2 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
> index 3bfd8881e7..c7057482e8 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -23,20 +23,6 @@
>           return false;                       \
>   } while (0)
>   
> -/*
> - * RISC-V requires NaN-boxing of narrower width floating
> - * point values.  This applies when a 32-bit value is
> - * assigned to a 64-bit FP register.  Thus this does not
> - * apply when the RVD extension is not present.
> - */
> -static void gen_nanbox_fpr(DisasContext *ctx, int regno)
> -{
> -    if (has_ext(ctx, RVD)) {
> -        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
> -                        MAKE_64BIT_MASK(32, 32));
> -    }
> -}
> -
>   static bool trans_flw(DisasContext *ctx, arg_flw *a)
>   {
>       TCGv t0 = tcg_temp_new();
> @@ -46,7 +32,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>       tcg_gen_addi_tl(t0, t0, a->imm);
>   
>       tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
> -    gen_nanbox_fpr(ctx, a->rd);
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>   
>       tcg_temp_free(t0);
>       mark_fs_dirty(ctx);
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 9632e79cf3..12a746da97 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -90,6 +90,17 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
>       return ctx->misa & ext;
>   }
>   
> +/*
> + * RISC-V requires NaN-boxing of narrower width floating point values.
> + * This applies when a 32-bit value is assigned to a 64-bit FP register.
> + * For consistency and simplicity, we nanbox results even when the RVD
> + * extension is not present.
> + */
> +static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
> +}
> +
If possible,

+static void gen_nanbox(TCGv_i64 out, TCGv_i64 in, uint32_t flen)
+{
+    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(flen, 64 - flen));
+}
+

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   static void generate_exception(DisasContext *ctx, int excp)
>   {
>       tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
@ 2020-07-24  2:39     ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 8:28, Richard Henderson wrote:
> Do not depend on the RVD extension, take input and output via
> TCGv_i64 instead of fpu regno.  Move the function to translate.c
> so that it can be used in multiple trans_*.inc.c files.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/insn_trans/trans_rvf.inc.c | 16 +---------------
>   target/riscv/translate.c                | 11 +++++++++++
>   2 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
> index 3bfd8881e7..c7057482e8 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -23,20 +23,6 @@
>           return false;                       \
>   } while (0)
>   
> -/*
> - * RISC-V requires NaN-boxing of narrower width floating
> - * point values.  This applies when a 32-bit value is
> - * assigned to a 64-bit FP register.  Thus this does not
> - * apply when the RVD extension is not present.
> - */
> -static void gen_nanbox_fpr(DisasContext *ctx, int regno)
> -{
> -    if (has_ext(ctx, RVD)) {
> -        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
> -                        MAKE_64BIT_MASK(32, 32));
> -    }
> -}
> -
>   static bool trans_flw(DisasContext *ctx, arg_flw *a)
>   {
>       TCGv t0 = tcg_temp_new();
> @@ -46,7 +32,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>       tcg_gen_addi_tl(t0, t0, a->imm);
>   
>       tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
> -    gen_nanbox_fpr(ctx, a->rd);
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>   
>       tcg_temp_free(t0);
>       mark_fs_dirty(ctx);
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 9632e79cf3..12a746da97 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -90,6 +90,17 @@ static inline bool has_ext(DisasContext *ctx, uint32_t ext)
>       return ctx->misa & ext;
>   }
>   
> +/*
> + * RISC-V requires NaN-boxing of narrower width floating point values.
> + * This applies when a 32-bit value is assigned to a 64-bit FP register.
> + * For consistency and simplicity, we nanbox results even when the RVD
> + * extension is not present.
> + */
> +static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
> +}
> +
If possible,

+static void gen_nanbox(TCGv_i64 out, TCGv_i64 in, uint32_t flen)
+{
+    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(flen, 64 - flen));
+}
+

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   static void generate_exception(DisasContext *ctx, int excp)
>   {
>       tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c
  2020-07-24  0:28   ` Richard Henderson
@ 2020-07-24  2:41     ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 8:28, Richard Henderson wrote:
> Make sure that all results from inline single-precision scalar
> operations are properly nan-boxed to 64-bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/insn_trans/trans_rvf.inc.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
> index c7057482e8..264d3139f1 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -167,6 +167,7 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
>           tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
>                               0, 31);
>       }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
> @@ -183,6 +184,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>           tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
>           tcg_temp_free_i64(t0);
>       }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
> @@ -199,6 +201,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>           tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>           tcg_temp_free_i64(t0);
>       }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
> @@ -369,6 +372,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
>   #else
>       tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
>   #endif
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>   
>       mark_fs_dirty(ctx);
>       tcg_temp_free(t0);
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c
@ 2020-07-24  2:41     ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 8:28, Richard Henderson wrote:
> Make sure that all results from inline single-precision scalar
> operations are properly nan-boxed to 64-bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/insn_trans/trans_rvf.inc.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
> index c7057482e8..264d3139f1 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -167,6 +167,7 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
>           tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
>                               0, 31);
>       }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
> @@ -183,6 +184,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>           tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
>           tcg_temp_free_i64(t0);
>       }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
> @@ -199,6 +201,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>           tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>           tcg_temp_free_i64(t0);
>       }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
> @@ -369,6 +372,7 @@ static bool trans_fmv_w_x(DisasContext *ctx, arg_fmv_w_x *a)
>   #else
>       tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
>   #endif
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>   
>       mark_fs_dirty(ctx);
>       tcg_temp_free(t0);
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
  2020-07-24  0:28   ` Richard Henderson
@ 2020-07-24  2:47     ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 8:28, Richard Henderson wrote:
> If a 32-bit input is not properly nanboxed, then the input is
> replaced with the default qnan.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/internals.h  | 11 +++++++
>   target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
>   2 files changed, 57 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index 9f4ba7d617..f1a546dba6 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
>       return f | MAKE_64BIT_MASK(32, 32);
>   }
>   
> +static inline float32 check_nanbox_s(uint64_t f)
> +{
> +    uint64_t mask = MAKE_64BIT_MASK(32, 32);
> +
> +    if (likely((f & mask) == mask)) {
> +        return (uint32_t)f;
> +    } else {
> +        return 0x7fc00000u; /* default qnan */
> +    }
> +}
> +
If possible,

+static inline float32 check_nanbox(uint64_t f, uint32_t flen)
+{
+    uint64_t mask = MAKE_64BIT_MASK(flen, 64 - flen);
+
+    if (likely((f & mask) == mask)) {
+        return (uint32_t)f;
+    } else {
+        return (flen == 32) ? 0x7fc00000u : 0x7e00u; /* default qnan */
+    }
+}
+

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   #endif
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index 72541958a7..bb346a8249 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -81,9 +81,12 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
>       set_float_rounding_mode(softrm, &env->fp_status);
>   }
>   
> -static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> -                           uint64_t frs3, int flags)
> +static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2,
> +                           uint64_t rs3, int flags)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
> +    float32 frs3 = check_nanbox_s(rs3);
>       return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
>   }
>   
> @@ -139,74 +142,97 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                             float_muladd_negate_product, &env->fp_status);
>   }
>   
> -uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fsub_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmul_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return nanbox_s(float32_sqrt(frs1, &env->fp_status));
>   }
>   
> -target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +target_ulong helper_fle_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return float32_le(frs1, frs2, &env->fp_status);
>   }
>   
> -target_ulong helper_flt_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +target_ulong helper_flt_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return float32_lt(frs1, frs2, &env->fp_status);
>   }
>   
> -target_ulong helper_feq_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +target_ulong helper_feq_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return float32_eq_quiet(frs1, frs2, &env->fp_status);
>   }
>   
> -target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return float32_to_int32(frs1, &env->fp_status);
>   }
>   
> -target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return (int32_t)float32_to_uint32(frs1, &env->fp_status);
>   }
>   
>   #if defined(TARGET_RISCV64)
> -uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return float32_to_int64(frs1, &env->fp_status);
>   }
>   
> -uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return float32_to_uint64(frs1, &env->fp_status);
>   }
>   #endif
> @@ -233,8 +259,9 @@ uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
>   }
>   #endif
>   
> -target_ulong helper_fclass_s(uint64_t frs1)
> +target_ulong helper_fclass_s(uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return fclass_s(frs1);
>   }
>   
> @@ -275,7 +302,8 @@ uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
>   
>   uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
>   {
> -    return float32_to_float64(rs1, &env->fp_status);
> +    float32 frs1 = check_nanbox_s(rs1);
> +    return float32_to_float64(frs1, &env->fp_status);
>   }
>   
>   uint64_t helper_fsqrt_d(CPURISCVState *env, uint64_t frs1)



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
@ 2020-07-24  2:47     ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  2:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 8:28, Richard Henderson wrote:
> If a 32-bit input is not properly nanboxed, then the input is
> replaced with the default qnan.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/internals.h  | 11 +++++++
>   target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
>   2 files changed, 57 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index 9f4ba7d617..f1a546dba6 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
>       return f | MAKE_64BIT_MASK(32, 32);
>   }
>   
> +static inline float32 check_nanbox_s(uint64_t f)
> +{
> +    uint64_t mask = MAKE_64BIT_MASK(32, 32);
> +
> +    if (likely((f & mask) == mask)) {
> +        return (uint32_t)f;
> +    } else {
> +        return 0x7fc00000u; /* default qnan */
> +    }
> +}
> +
If possible,

+static inline float32 check_nanbox(uint64_t f, uint32_t flen)
+{
+    uint64_t mask = MAKE_64BIT_MASK(flen, 64 - flen);
+
+    if (likely((f & mask) == mask)) {
+        return (uint32_t)f;
+    } else {
+        return (flen == 32) ? 0x7fc00000u : 0x7e00u; /* default qnan */
+    }
+}
+

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   #endif
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index 72541958a7..bb346a8249 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -81,9 +81,12 @@ void helper_set_rounding_mode(CPURISCVState *env, uint32_t rm)
>       set_float_rounding_mode(softrm, &env->fp_status);
>   }
>   
> -static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> -                           uint64_t frs3, int flags)
> +static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2,
> +                           uint64_t rs3, int flags)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
> +    float32 frs3 = check_nanbox_s(rs3);
>       return nanbox_s(float32_muladd(frs1, frs2, frs3, flags, &env->fp_status));
>   }
>   
> @@ -139,74 +142,97 @@ uint64_t helper_fnmadd_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>                             float_muladd_negate_product, &env->fp_status);
>   }
>   
> -uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fsub_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmul_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
>   }
>   
> -uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return nanbox_s(float32_sqrt(frs1, &env->fp_status));
>   }
>   
> -target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +target_ulong helper_fle_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return float32_le(frs1, frs2, &env->fp_status);
>   }
>   
> -target_ulong helper_flt_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +target_ulong helper_flt_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return float32_lt(frs1, frs2, &env->fp_status);
>   }
>   
> -target_ulong helper_feq_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +target_ulong helper_feq_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>       return float32_eq_quiet(frs1, frs2, &env->fp_status);
>   }
>   
> -target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return float32_to_int32(frs1, &env->fp_status);
>   }
>   
> -target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return (int32_t)float32_to_uint32(frs1, &env->fp_status);
>   }
>   
>   #if defined(TARGET_RISCV64)
> -uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return float32_to_int64(frs1, &env->fp_status);
>   }
>   
> -uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return float32_to_uint64(frs1, &env->fp_status);
>   }
>   #endif
> @@ -233,8 +259,9 @@ uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t rs1)
>   }
>   #endif
>   
> -target_ulong helper_fclass_s(uint64_t frs1)
> +target_ulong helper_fclass_s(uint64_t rs1)
>   {
> +    float32 frs1 = check_nanbox_s(rs1);
>       return fclass_s(frs1);
>   }
>   
> @@ -275,7 +302,8 @@ uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
>   
>   uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
>   {
> -    return float32_to_float64(rs1, &env->fp_status);
> +    float32 frs1 = check_nanbox_s(rs1);
> +    return float32_to_float64(frs1, &env->fp_status);
>   }
>   
>   uint64_t helper_fsqrt_d(CPURISCVState *env, uint64_t frs1)



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-07-24  2:35     ` LIU Zhiwei
@ 2020-07-24  3:55       ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  3:55 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv

On 7/23/20 7:35 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/7/24 8:28, Richard Henderson wrote:
>> Make sure that all results from single-precision scalar helpers
>> are properly nan-boxed to 64-bits.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/riscv/internals.h  |  5 +++++
>>   target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
>>   2 files changed, 28 insertions(+), 19 deletions(-)
>>
>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>> index 37d33820ad..9f4ba7d617 100644
>> --- a/target/riscv/internals.h
>> +++ b/target/riscv/internals.h
>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>>   #define SEW32 2
>>   #define SEW64 3
>>   +static inline uint64_t nanbox_s(float32 f)
>> +{
>> +    return f | MAKE_64BIT_MASK(32, 32);
>> +}
>> +
> If define it here,  we can also define a more general  function with flen.
> 
> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
> +{
> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
> +}
> +
> 
> So we can reuse it in fp16 or bf16 scalar instruction and in vector instructions.

While we could do that, we will not encounter all possible lengths.  In the
cover letter, I mentioned defining a second function,

static inline uint64_t nanbox_h(float16 f)
{
   return f | MAKE_64BIT_MASK(16, 48);
}

Having two separate functions will, I believe, be easier to use in practice.


r~


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-07-24  3:55       ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  3:55 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23

On 7/23/20 7:35 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/7/24 8:28, Richard Henderson wrote:
>> Make sure that all results from single-precision scalar helpers
>> are properly nan-boxed to 64-bits.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/riscv/internals.h  |  5 +++++
>>   target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
>>   2 files changed, 28 insertions(+), 19 deletions(-)
>>
>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>> index 37d33820ad..9f4ba7d617 100644
>> --- a/target/riscv/internals.h
>> +++ b/target/riscv/internals.h
>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>>   #define SEW32 2
>>   #define SEW64 3
>>   +static inline uint64_t nanbox_s(float32 f)
>> +{
>> +    return f | MAKE_64BIT_MASK(32, 32);
>> +}
>> +
> If define it here,  we can also define a more general  function with flen.
> 
> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
> +{
> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
> +}
> +
> 
> So we can reuse it in fp16 or bf16 scalar instruction and in vector instructions.

While we could do that, we will not encounter all possible lengths.  In the
cover letter, I mentioned defining a second function,

static inline uint64_t nanbox_h(float16 f)
{
   return f | MAKE_64BIT_MASK(16, 48);
}

Having two separate functions will, I believe, be easier to use in practice.


r~


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
  2020-07-24  2:47     ` LIU Zhiwei
@ 2020-07-24  3:59       ` Richard Henderson
  -1 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  3:59 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv

On 7/23/20 7:47 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/7/24 8:28, Richard Henderson wrote:
>> If a 32-bit input is not properly nanboxed, then the input is
>> replaced with the default qnan.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/riscv/internals.h  | 11 +++++++
>>   target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
>>   2 files changed, 57 insertions(+), 18 deletions(-)
>>
>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>> index 9f4ba7d617..f1a546dba6 100644
>> --- a/target/riscv/internals.h
>> +++ b/target/riscv/internals.h
>> @@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
>>       return f | MAKE_64BIT_MASK(32, 32);
>>   }
>>   +static inline float32 check_nanbox_s(uint64_t f)
>> +{
>> +    uint64_t mask = MAKE_64BIT_MASK(32, 32);
>> +
>> +    if (likely((f & mask) == mask)) {
>> +        return (uint32_t)f;
>> +    } else {
>> +        return 0x7fc00000u; /* default qnan */
>> +    }
>> +}
>> +
> If possible,
> 
> +static inline float32 check_nanbox(uint64_t f, uint32_t flen)
> +{
> +    uint64_t mask = MAKE_64BIT_MASK(flen, 64 - flen);
> +
> +    if (likely((f & mask) == mask)) {
> +        return (uint32_t)f;
> +    } else {
> +        return (flen == 32) ? 0x7fc00000u : 0x7e00u; /* default qnan */
> +    }
> +}

The difficulty of choosing the proper default qnan is an example of why we
should *not* attempt to make this function fully general, but should instead
define separate functions for each type.  E.g.

static inline float16 check_nanbox_h(uint64_t f);
static inline bfloat16 check_nanbox_b(uint64_t f);


r~


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
@ 2020-07-24  3:59       ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2020-07-24  3:59 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23

On 7/23/20 7:47 PM, LIU Zhiwei wrote:
> 
> 
> On 2020/7/24 8:28, Richard Henderson wrote:
>> If a 32-bit input is not properly nanboxed, then the input is
>> replaced with the default qnan.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/riscv/internals.h  | 11 +++++++
>>   target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
>>   2 files changed, 57 insertions(+), 18 deletions(-)
>>
>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>> index 9f4ba7d617..f1a546dba6 100644
>> --- a/target/riscv/internals.h
>> +++ b/target/riscv/internals.h
>> @@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
>>       return f | MAKE_64BIT_MASK(32, 32);
>>   }
>>   +static inline float32 check_nanbox_s(uint64_t f)
>> +{
>> +    uint64_t mask = MAKE_64BIT_MASK(32, 32);
>> +
>> +    if (likely((f & mask) == mask)) {
>> +        return (uint32_t)f;
>> +    } else {
>> +        return 0x7fc00000u; /* default qnan */
>> +    }
>> +}
>> +
> If possible,
> 
> +static inline float32 check_nanbox(uint64_t f, uint32_t flen)
> +{
> +    uint64_t mask = MAKE_64BIT_MASK(flen, 64 - flen);
> +
> +    if (likely((f & mask) == mask)) {
> +        return (uint32_t)f;
> +    } else {
> +        return (flen == 32) ? 0x7fc00000u : 0x7e00u; /* default qnan */
> +    }
> +}

The difficulty of choosing the proper default qnan is an example of why we
should *not* attempt to make this function fully general, but should instead
define separate functions for each type.  E.g.

static inline float16 check_nanbox_h(uint64_t f);
static inline bfloat16 check_nanbox_b(uint64_t f);


r~


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
  2020-07-24  0:28   ` Richard Henderson
@ 2020-07-24  6:04     ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  6:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 8:28, Richard Henderson wrote:
> If a 32-bit input is not properly nanboxed, then the input is replaced
> with the default qnan.  The only inline expansion is for the sign-changing
> set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
>   target/riscv/translate.c                | 18 +++++++
>   2 files changed, 73 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
> index 264d3139f1..f9a9e0643a 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
>   {
>       REQUIRE_FPU;
>       REQUIRE_EXT(ctx, RVF);
> +
>       if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>       } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
> -                            0, 31);
> +        TCGv_i64 rs1 = tcg_temp_new_i64();
> +        TCGv_i64 rs2 = tcg_temp_new_i64();
> +
> +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /* This formulation retains the nanboxing of rs2. */
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
> +        tcg_temp_free_i64(rs1);
> +        tcg_temp_free_i64(rs2);
>       }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
>   
>   static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>   {
> +    TCGv_i64 rs1, rs2, mask;
> +
>       REQUIRE_FPU;
>       REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>       if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
>       } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Replace bit 31 in rs1 with inverse in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_andc_i64(rs2, mask, rs2);
> +        tcg_gen_and_i64(rs1, mask, rs1);
> +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(mask);
> +        tcg_temp_free_i64(rs2);
>       }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>       mark_fs_dirty(ctx);
>       return true;
>   }
>   
>   static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>   {
> +    TCGv_i64 rs1, rs2;
> +
>       REQUIRE_FPU;
>       REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>       if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
>       } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Xor bit 31 in rs1 with that in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(rs2);
>       }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>       mark_fs_dirty(ctx);
>       return true;
>   }
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 12a746da97..bf35182776 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>       tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>   }
>   
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the least-significant bits of the input are used, otherwise the
> + * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
> + *
> + * Here, the result is always nan-boxed, even the canonical nan.
> + */
> +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
> +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> +    tcg_temp_free_i64(t_max);
> +    tcg_temp_free_i64(t_nan);
> +}
> +
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   static void generate_exception(DisasContext *ctx, int excp)
>   {
>       tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
@ 2020-07-24  6:04     ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  6:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 8:28, Richard Henderson wrote:
> If a 32-bit input is not properly nanboxed, then the input is replaced
> with the default qnan.  The only inline expansion is for the sign-changing
> set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
>   target/riscv/translate.c                | 18 +++++++
>   2 files changed, 73 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c b/target/riscv/insn_trans/trans_rvf.inc.c
> index 264d3139f1..f9a9e0643a 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx, arg_fsgnj_s *a)
>   {
>       REQUIRE_FPU;
>       REQUIRE_EXT(ctx, RVF);
> +
>       if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>       } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2], cpu_fpr[a->rs1],
> -                            0, 31);
> +        TCGv_i64 rs1 = tcg_temp_new_i64();
> +        TCGv_i64 rs2 = tcg_temp_new_i64();
> +
> +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /* This formulation retains the nanboxing of rs2. */
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
> +        tcg_temp_free_i64(rs1);
> +        tcg_temp_free_i64(rs2);
>       }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>       mark_fs_dirty(ctx);
>       return true;
>   }
>   
>   static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>   {
> +    TCGv_i64 rs1, rs2, mask;
> +
>       REQUIRE_FPU;
>       REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>       if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
>       } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Replace bit 31 in rs1 with inverse in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_andc_i64(rs2, mask, rs2);
> +        tcg_gen_and_i64(rs1, mask, rs1);
> +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(mask);
> +        tcg_temp_free_i64(rs2);
>       }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>       mark_fs_dirty(ctx);
>       return true;
>   }
>   
>   static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>   {
> +    TCGv_i64 rs1, rs2;
> +
>       REQUIRE_FPU;
>       REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>       if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
>       } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Xor bit 31 in rs1 with that in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(rs2);
>       }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>       mark_fs_dirty(ctx);
>       return true;
>   }
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 12a746da97..bf35182776 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>       tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>   }
>   
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the least-significant bits of the input are used, otherwise the
> + * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
> + *
> + * Here, the result is always nan-boxed, even the canonical nan.
> + */
> +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
> +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> +    tcg_temp_free_i64(t_max);
> +    tcg_temp_free_i64(t_nan);
> +}
> +
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>

Zhiwei
>   static void generate_exception(DisasContext *ctx, int excp)
>   {
>       tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-07-24  3:55       ` Richard Henderson
@ 2020-07-24  6:05         ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  6:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, alistair23, qemu-riscv



On 2020/7/24 11:55, Richard Henderson wrote:
> On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>>
>> On 2020/7/24 8:28, Richard Henderson wrote:
>>> Make sure that all results from single-precision scalar helpers
>>> are properly nan-boxed to 64-bits.
>>>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>    target/riscv/internals.h  |  5 +++++
>>>    target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
>>>    2 files changed, 28 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>>> index 37d33820ad..9f4ba7d617 100644
>>> --- a/target/riscv/internals.h
>>> +++ b/target/riscv/internals.h
>>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>>>    #define SEW32 2
>>>    #define SEW64 3
>>>    +static inline uint64_t nanbox_s(float32 f)
>>> +{
>>> +    return f | MAKE_64BIT_MASK(32, 32);
>>> +}
>>> +
>> If define it here,  we can also define a more general  function with flen.
>>
>> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>> +{
>> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>> +}
>> +
>>
>> So we can reuse it in fp16 or bf16 scalar instruction and in vector instructions.
> While we could do that, we will not encounter all possible lengths.  In the
> cover letter, I mentioned defining a second function,
>
> static inline uint64_t nanbox_h(float16 f)
> {
>     return f | MAKE_64BIT_MASK(16, 48);
> }
>
> Having two separate functions will, I believe, be easier to use in practice.
>
Get  it. Thanks.

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-07-24  6:05         ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-07-24  6:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: frank.chang, qemu-riscv, alistair23



On 2020/7/24 11:55, Richard Henderson wrote:
> On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>>
>> On 2020/7/24 8:28, Richard Henderson wrote:
>>> Make sure that all results from single-precision scalar helpers
>>> are properly nan-boxed to 64-bits.
>>>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>    target/riscv/internals.h  |  5 +++++
>>>    target/riscv/fpu_helper.c | 42 +++++++++++++++++++++------------------
>>>    2 files changed, 28 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>>> index 37d33820ad..9f4ba7d617 100644
>>> --- a/target/riscv/internals.h
>>> +++ b/target/riscv/internals.h
>>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>>>    #define SEW32 2
>>>    #define SEW64 3
>>>    +static inline uint64_t nanbox_s(float32 f)
>>> +{
>>> +    return f | MAKE_64BIT_MASK(32, 32);
>>> +}
>>> +
>> If define it here,  we can also define a more general  function with flen.
>>
>> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>> +{
>> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>> +}
>> +
>>
>> So we can reuse it in fp16 or bf16 scalar instruction and in vector instructions.
> While we could do that, we will not encounter all possible lengths.  In the
> cover letter, I mentioned defining a second function,
>
> static inline uint64_t nanbox_h(float16 f)
> {
>     return f | MAKE_64BIT_MASK(16, 48);
> }
>
> Having two separate functions will, I believe, be easier to use in practice.
>
Get  it. Thanks.

Zhiwei
>
> r~



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison
  2020-07-24  0:28 ` Richard Henderson
@ 2020-07-27 23:37   ` Alistair Francis
  -1 siblings, 0 replies; 62+ messages in thread
From: Alistair Francis @ 2020-07-27 23:37 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, open list:RISC-V, qemu-devel@nongnu.org Developers,
	liuzhiwei

On Thu, Jul 23, 2020 at 5:28 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is my take on Liu Zhiwei's patch set:
> https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com
>
> This differs from Zhiwei's v1 in:
>
>  * If a helper is involved, the helper does the boxing and unboxing.
>
>  * Which leaves only LDW and FSGN*.S as the only instructions that
>    are expanded inline which need to handle nanboxing.
>
>  * All mention of RVD is dropped vs boxing.  This means that an
>    RVF-only cpu will still generate and check nanboxes into the
>    64-bit cpu_fpu slots.  There should be no way an RVF-only cpu
>    can generate an unboxed cpu_fpu value.
>
>    This choice is made to speed up the common case: RVF+RVD, so
>    that we do not have to check whether RVD is enabled.
>
>  * The translate.c primitives take TCGv values rather than fpu
>    regno, which will make it possible to use them with RVV,
>    since v0.9 does proper nanboxing.
>
>  * I have adjusted the current naming to be float32 specific ("*_s"),
>    to avoid confusion with the float16 data type supported by RVV.

Thanks Richard. As Zhiwei has reviewed all of these I have applied
them to the riscv-to-apply.next tree for 5.2.

Alistair

>
>
> r~
>
>
> LIU Zhiwei (2):
>   target/riscv: Clean up fmv.w.x
>   target/riscv: check before allocating TCG temps
>
> Richard Henderson (5):
>   target/riscv: Generate nanboxed results from fp helpers
>   target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
>   target/riscv: Generate nanboxed results from trans_rvf.inc.c
>   target/riscv: Check nanboxed inputs to fp helpers
>   target/riscv: Check nanboxed inputs in trans_rvf.inc.c
>
>  target/riscv/internals.h                |  16 ++++
>  target/riscv/fpu_helper.c               | 102 ++++++++++++++++--------
>  target/riscv/insn_trans/trans_rvd.inc.c |   8 +-
>  target/riscv/insn_trans/trans_rvf.inc.c |  99 ++++++++++++++---------
>  target/riscv/translate.c                |  29 +++++++
>  5 files changed, 178 insertions(+), 76 deletions(-)
>
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison
@ 2020-07-27 23:37   ` Alistair Francis
  0 siblings, 0 replies; 62+ messages in thread
From: Alistair Francis @ 2020-07-27 23:37 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, open list:RISC-V,
	liuzhiwei

On Thu, Jul 23, 2020 at 5:28 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is my take on Liu Zhiwei's patch set:
> https://patchew.org/QEMU/20200626205917.4545-1-zhiwei_liu@c-sky.com
>
> This differs from Zhiwei's v1 in:
>
>  * If a helper is involved, the helper does the boxing and unboxing.
>
>  * Which leaves only LDW and FSGN*.S as the only instructions that
>    are expanded inline which need to handle nanboxing.
>
>  * All mention of RVD is dropped vs boxing.  This means that an
>    RVF-only cpu will still generate and check nanboxes into the
>    64-bit cpu_fpu slots.  There should be no way an RVF-only cpu
>    can generate an unboxed cpu_fpu value.
>
>    This choice is made to speed up the common case: RVF+RVD, so
>    that we do not have to check whether RVD is enabled.
>
>  * The translate.c primitives take TCGv values rather than fpu
>    regno, which will make it possible to use them with RVV,
>    since v0.9 does proper nanboxing.
>
>  * I have adjusted the current naming to be float32 specific ("*_s"),
>    to avoid confusion with the float16 data type supported by RVV.

Thanks Richard. As Zhiwei has reviewed all of these I have applied
them to the riscv-to-apply.next tree for 5.2.

Alistair

>
>
> r~
>
>
> LIU Zhiwei (2):
>   target/riscv: Clean up fmv.w.x
>   target/riscv: check before allocating TCG temps
>
> Richard Henderson (5):
>   target/riscv: Generate nanboxed results from fp helpers
>   target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
>   target/riscv: Generate nanboxed results from trans_rvf.inc.c
>   target/riscv: Check nanboxed inputs to fp helpers
>   target/riscv: Check nanboxed inputs in trans_rvf.inc.c
>
>  target/riscv/internals.h                |  16 ++++
>  target/riscv/fpu_helper.c               | 102 ++++++++++++++++--------
>  target/riscv/insn_trans/trans_rvd.inc.c |   8 +-
>  target/riscv/insn_trans/trans_rvf.inc.c |  99 ++++++++++++++---------
>  target/riscv/translate.c                |  29 +++++++
>  5 files changed, 178 insertions(+), 76 deletions(-)
>
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-07-24  6:05         ` LIU Zhiwei
@ 2020-08-06  6:09           ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:09 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Frank Chang, Alistair Francis, Richard Henderson,
	qemu-devel@nongnu.org Developers, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

>
>
> On 2020/7/24 11:55, Richard Henderson wrote:
> > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
> >>
> >> On 2020/7/24 8:28, Richard Henderson wrote:
> >>> Make sure that all results from single-precision scalar helpers
> >>> are properly nan-boxed to 64-bits.
> >>>
> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >>> ---
> >>>    target/riscv/internals.h  |  5 +++++
> >>>    target/riscv/fpu_helper.c | 42
> +++++++++++++++++++++------------------
> >>>    2 files changed, 28 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> >>> index 37d33820ad..9f4ba7d617 100644
> >>> --- a/target/riscv/internals.h
> >>> +++ b/target/riscv/internals.h
> >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
> >>>    #define SEW32 2
> >>>    #define SEW64 3
> >>>    +static inline uint64_t nanbox_s(float32 f)
> >>> +{
> >>> +    return f | MAKE_64BIT_MASK(32, 32);
> >>> +}
> >>> +
> >> If define it here,  we can also define a more general  function with
> flen.
> >>
> >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
> >> +{
> >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
> >> +}
> >> +
> >>
> >> So we can reuse it in fp16 or bf16 scalar instruction and in vector
> instructions.
> > While we could do that, we will not encounter all possible lengths.  In
> the
> > cover letter, I mentioned defining a second function,
> >
> > static inline uint64_t nanbox_h(float16 f)
> > {
> >     return f | MAKE_64BIT_MASK(16, 48);
> > }
> >
> > Having two separate functions will, I believe, be easier to use in
> practice.
> >
> Get  it. Thanks.
>
> Zhiwei
> >
> > r~
>
>
>
That is what has been implemented in spike.  It fills up the Nan-Box when
value is stored back internal structure and
unbox the value with difference floating type (half/single/double/quad).

By the way,  I prefer to keeping the suffix to tell different floating
type rather than pass arbitrary
since each floating type belong to each extension.

Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

[-- Attachment #2: Type: text/html, Size: 3199 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-08-06  6:09           ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:09 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, qemu-devel@nongnu.org Developers, Frank Chang,
	Alistair Francis, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

>
>
> On 2020/7/24 11:55, Richard Henderson wrote:
> > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
> >>
> >> On 2020/7/24 8:28, Richard Henderson wrote:
> >>> Make sure that all results from single-precision scalar helpers
> >>> are properly nan-boxed to 64-bits.
> >>>
> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >>> ---
> >>>    target/riscv/internals.h  |  5 +++++
> >>>    target/riscv/fpu_helper.c | 42
> +++++++++++++++++++++------------------
> >>>    2 files changed, 28 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> >>> index 37d33820ad..9f4ba7d617 100644
> >>> --- a/target/riscv/internals.h
> >>> +++ b/target/riscv/internals.h
> >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
> >>>    #define SEW32 2
> >>>    #define SEW64 3
> >>>    +static inline uint64_t nanbox_s(float32 f)
> >>> +{
> >>> +    return f | MAKE_64BIT_MASK(32, 32);
> >>> +}
> >>> +
> >> If define it here,  we can also define a more general  function with
> flen.
> >>
> >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
> >> +{
> >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
> >> +}
> >> +
> >>
> >> So we can reuse it in fp16 or bf16 scalar instruction and in vector
> instructions.
> > While we could do that, we will not encounter all possible lengths.  In
> the
> > cover letter, I mentioned defining a second function,
> >
> > static inline uint64_t nanbox_h(float16 f)
> > {
> >     return f | MAKE_64BIT_MASK(16, 48);
> > }
> >
> > Having two separate functions will, I believe, be easier to use in
> practice.
> >
> Get  it. Thanks.
>
> Zhiwei
> >
> > r~
>
>
>
That is what has been implemented in spike.  It fills up the Nan-Box when
value is stored back internal structure and
unbox the value with difference floating type (half/single/double/quad).

By the way,  I prefer to keeping the suffix to tell different floating
type rather than pass arbitrary
since each floating type belong to each extension.

Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

[-- Attachment #2: Type: text/html, Size: 3199 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-06  6:24     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> Do not depend on the RVD extension, take input and output via
> TCGv_i64 instead of fpu regno.  Move the function to translate.c
> so that it can be used in multiple trans_*.inc.c files.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 16 +---------------
>  target/riscv/translate.c                | 11 +++++++++++
>  2 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 3bfd8881e7..c7057482e8 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -23,20 +23,6 @@
>          return false;                       \
>  } while (0)
>
> -/*
> - * RISC-V requires NaN-boxing of narrower width floating
> - * point values.  This applies when a 32-bit value is
> - * assigned to a 64-bit FP register.  Thus this does not
> - * apply when the RVD extension is not present.
> - */
> -static void gen_nanbox_fpr(DisasContext *ctx, int regno)
> -{
> -    if (has_ext(ctx, RVD)) {
> -        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
> -                        MAKE_64BIT_MASK(32, 32));
> -    }
> -}
> -
>  static bool trans_flw(DisasContext *ctx, arg_flw *a)
>  {
>      TCGv t0 = tcg_temp_new();
> @@ -46,7 +32,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
> -    gen_nanbox_fpr(ctx, a->rd);
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>
>      tcg_temp_free(t0);
>      mark_fs_dirty(ctx);
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 9632e79cf3..12a746da97 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -90,6 +90,17 @@ static inline bool has_ext(DisasContext *ctx, uint32_t
> ext)
>      return ctx->misa & ext;
>  }
>
> +/*
> + * RISC-V requires NaN-boxing of narrower width floating point values.
> + * This applies when a 32-bit value is assigned to a 64-bit FP register.
> + * For consistency and simplicity, we nanbox results even when the RVD
> + * extension is not present.
> + */
> +static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

[-- Attachment #2: Type: text/html, Size: 3520 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s
@ 2020-08-06  6:24     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> Do not depend on the RVD extension, take input and output via
> TCGv_i64 instead of fpu regno.  Move the function to translate.c
> so that it can be used in multiple trans_*.inc.c files.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 16 +---------------
>  target/riscv/translate.c                | 11 +++++++++++
>  2 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 3bfd8881e7..c7057482e8 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -23,20 +23,6 @@
>          return false;                       \
>  } while (0)
>
> -/*
> - * RISC-V requires NaN-boxing of narrower width floating
> - * point values.  This applies when a 32-bit value is
> - * assigned to a 64-bit FP register.  Thus this does not
> - * apply when the RVD extension is not present.
> - */
> -static void gen_nanbox_fpr(DisasContext *ctx, int regno)
> -{
> -    if (has_ext(ctx, RVD)) {
> -        tcg_gen_ori_i64(cpu_fpr[regno], cpu_fpr[regno],
> -                        MAKE_64BIT_MASK(32, 32));
> -    }
> -}
> -
>  static bool trans_flw(DisasContext *ctx, arg_flw *a)
>  {
>      TCGv t0 = tcg_temp_new();
> @@ -46,7 +32,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
> -    gen_nanbox_fpr(ctx, a->rd);
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>
>      tcg_temp_free(t0);
>      mark_fs_dirty(ctx);
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 9632e79cf3..12a746da97 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -90,6 +90,17 @@ static inline bool has_ext(DisasContext *ctx, uint32_t
> ext)
>      return ctx->misa & ext;
>  }
>
> +/*
> + * RISC-V requires NaN-boxing of narrower width floating point values.
> + * This applies when a 32-bit value is assigned to a 64-bit FP register.
> + * For consistency and simplicity, we nanbox results even when the RVD
> + * extension is not present.
> + */
> +static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

[-- Attachment #2: Type: text/html, Size: 3520 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-06  6:24     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> Make sure that all results from inline single-precision scalar
> operations are properly nan-boxed to 64-bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index c7057482e8..264d3139f1 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -167,6 +167,7 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>          tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
>                              0, 31);
>      }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
> @@ -183,6 +184,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx,
> arg_fsgnjn_s *a)
>          tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
>          tcg_temp_free_i64(t0);
>      }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
> @@ -199,6 +201,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx,
> arg_fsgnjx_s *a)
>          tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>          tcg_temp_free_i64(t0);
>      }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
> @@ -369,6 +372,7 @@ static bool trans_fmv_w_x(DisasContext *ctx,
> arg_fmv_w_x *a)
>  #else
>      tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
>  #endif
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>
>      mark_fs_dirty(ctx);
>      tcg_temp_free(t0);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 2811 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c
@ 2020-08-06  6:24     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> Make sure that all results from inline single-precision scalar
> operations are properly nan-boxed to 64-bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index c7057482e8..264d3139f1 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -167,6 +167,7 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>          tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
>                              0, 31);
>      }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
> @@ -183,6 +184,7 @@ static bool trans_fsgnjn_s(DisasContext *ctx,
> arg_fsgnjn_s *a)
>          tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
>          tcg_temp_free_i64(t0);
>      }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
> @@ -199,6 +201,7 @@ static bool trans_fsgnjx_s(DisasContext *ctx,
> arg_fsgnjx_s *a)
>          tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>          tcg_temp_free_i64(t0);
>      }
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
> @@ -369,6 +372,7 @@ static bool trans_fmv_w_x(DisasContext *ctx,
> arg_fmv_w_x *a)
>  #else
>      tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
>  #endif
> +    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>
>      mark_fs_dirty(ctx);
>      tcg_temp_free(t0);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 2811 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-06  6:26     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:26 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 6821 bytes --]

On Fri, Jul 24, 2020 at 8:29 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> If a 32-bit input is not properly nanboxed, then the input is
> replaced with the default qnan.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/internals.h  | 11 +++++++
>  target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
>  2 files changed, 57 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index 9f4ba7d617..f1a546dba6 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
>      return f | MAKE_64BIT_MASK(32, 32);
>  }
>
> +static inline float32 check_nanbox_s(uint64_t f)
> +{
> +    uint64_t mask = MAKE_64BIT_MASK(32, 32);
> +
> +    if (likely((f & mask) == mask)) {
> +        return (uint32_t)f;
> +    } else {
> +        return 0x7fc00000u; /* default qnan */
> +    }
> +}
> +
>  #endif
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index 72541958a7..bb346a8249 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -81,9 +81,12 @@ void helper_set_rounding_mode(CPURISCVState *env,
> uint32_t rm)
>      set_float_rounding_mode(softrm, &env->fp_status);
>  }
>
> -static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> -                           uint64_t frs3, int flags)
> +static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2,
> +                           uint64_t rs3, int flags)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
> +    float32 frs3 = check_nanbox_s(rs3);
>      return nanbox_s(float32_muladd(frs1, frs2, frs3, flags,
> &env->fp_status));
>  }
>
> @@ -139,74 +142,97 @@ uint64_t helper_fnmadd_d(CPURISCVState *env,
> uint64_t frs1, uint64_t frs2,
>                            float_muladd_negate_product, &env->fp_status);
>  }
>
> -uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fsub_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmul_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return nanbox_s(float32_sqrt(frs1, &env->fp_status));
>  }
>
> -target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2)
> +target_ulong helper_fle_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return float32_le(frs1, frs2, &env->fp_status);
>  }
>
> -target_ulong helper_flt_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2)
> +target_ulong helper_flt_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return float32_lt(frs1, frs2, &env->fp_status);
>  }
>
> -target_ulong helper_feq_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2)
> +target_ulong helper_feq_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return float32_eq_quiet(frs1, frs2, &env->fp_status);
>  }
>
> -target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return float32_to_int32(frs1, &env->fp_status);
>  }
>
> -target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return (int32_t)float32_to_uint32(frs1, &env->fp_status);
>  }
>
>  #if defined(TARGET_RISCV64)
> -uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return float32_to_int64(frs1, &env->fp_status);
>  }
>
> -uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return float32_to_uint64(frs1, &env->fp_status);
>  }
>  #endif
> @@ -233,8 +259,9 @@ uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t
> rs1)
>  }
>  #endif
>
> -target_ulong helper_fclass_s(uint64_t frs1)
> +target_ulong helper_fclass_s(uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return fclass_s(frs1);
>  }
>
> @@ -275,7 +302,8 @@ uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t
> rs1)
>
>  uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
>  {
> -    return float32_to_float64(rs1, &env->fp_status);
> +    float32 frs1 = check_nanbox_s(rs1);
> +    return float32_to_float64(frs1, &env->fp_status);
>  }
>
>  uint64_t helper_fsqrt_d(CPURISCVState *env, uint64_t frs1)
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 8450 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers
@ 2020-08-06  6:26     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:26 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 6821 bytes --]

On Fri, Jul 24, 2020 at 8:29 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> If a 32-bit input is not properly nanboxed, then the input is
> replaced with the default qnan.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/internals.h  | 11 +++++++
>  target/riscv/fpu_helper.c | 64 ++++++++++++++++++++++++++++-----------
>  2 files changed, 57 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
> index 9f4ba7d617..f1a546dba6 100644
> --- a/target/riscv/internals.h
> +++ b/target/riscv/internals.h
> @@ -43,4 +43,15 @@ static inline uint64_t nanbox_s(float32 f)
>      return f | MAKE_64BIT_MASK(32, 32);
>  }
>
> +static inline float32 check_nanbox_s(uint64_t f)
> +{
> +    uint64_t mask = MAKE_64BIT_MASK(32, 32);
> +
> +    if (likely((f & mask) == mask)) {
> +        return (uint32_t)f;
> +    } else {
> +        return 0x7fc00000u; /* default qnan */
> +    }
> +}
> +
>  #endif
> diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> index 72541958a7..bb346a8249 100644
> --- a/target/riscv/fpu_helper.c
> +++ b/target/riscv/fpu_helper.c
> @@ -81,9 +81,12 @@ void helper_set_rounding_mode(CPURISCVState *env,
> uint32_t rm)
>      set_float_rounding_mode(softrm, &env->fp_status);
>  }
>
> -static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> -                           uint64_t frs3, int flags)
> +static uint64_t do_fmadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2,
> +                           uint64_t rs3, int flags)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
> +    float32 frs3 = check_nanbox_s(rs3);
>      return nanbox_s(float32_muladd(frs1, frs2, frs3, flags,
> &env->fp_status));
>  }
>
> @@ -139,74 +142,97 @@ uint64_t helper_fnmadd_d(CPURISCVState *env,
> uint64_t frs1, uint64_t frs2,
>                            float_muladd_negate_product, &env->fp_status);
>  }
>
> -uint64_t helper_fadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fadd_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_add(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fsub_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_sub(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fmul_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmul_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_mul(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fdiv_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_div(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_minnum(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return nanbox_s(float32_maxnum(frs1, frs2, &env->fp_status));
>  }
>
> -uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return nanbox_s(float32_sqrt(frs1, &env->fp_status));
>  }
>
> -target_ulong helper_fle_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2)
> +target_ulong helper_fle_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return float32_le(frs1, frs2, &env->fp_status);
>  }
>
> -target_ulong helper_flt_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2)
> +target_ulong helper_flt_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return float32_lt(frs1, frs2, &env->fp_status);
>  }
>
> -target_ulong helper_feq_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2)
> +target_ulong helper_feq_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
> +    float32 frs2 = check_nanbox_s(rs2);
>      return float32_eq_quiet(frs1, frs2, &env->fp_status);
>  }
>
> -target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return float32_to_int32(frs1, &env->fp_status);
>  }
>
> -target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1)
> +target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return (int32_t)float32_to_uint32(frs1, &env->fp_status);
>  }
>
>  #if defined(TARGET_RISCV64)
> -uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_l_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return float32_to_int64(frs1, &env->fp_status);
>  }
>
> -uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t frs1)
> +uint64_t helper_fcvt_lu_s(CPURISCVState *env, uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return float32_to_uint64(frs1, &env->fp_status);
>  }
>  #endif
> @@ -233,8 +259,9 @@ uint64_t helper_fcvt_s_lu(CPURISCVState *env, uint64_t
> rs1)
>  }
>  #endif
>
> -target_ulong helper_fclass_s(uint64_t frs1)
> +target_ulong helper_fclass_s(uint64_t rs1)
>  {
> +    float32 frs1 = check_nanbox_s(rs1);
>      return fclass_s(frs1);
>  }
>
> @@ -275,7 +302,8 @@ uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t
> rs1)
>
>  uint64_t helper_fcvt_d_s(CPURISCVState *env, uint64_t rs1)
>  {
> -    return float32_to_float64(rs1, &env->fp_status);
> +    float32 frs1 = check_nanbox_s(rs1);
> +    return float32_to_float64(frs1, &env->fp_status);
>  }
>
>  uint64_t helper_fsqrt_d(CPURISCVState *env, uint64_t frs1)
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 8450 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-06  6:27     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 5522 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> If a 32-bit input is not properly nanboxed, then the input is replaced
> with the default qnan.  The only inline expansion is for the sign-changing
> set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
>  target/riscv/translate.c                | 18 +++++++
>  2 files changed, 73 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 264d3139f1..f9a9e0643a 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>      } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
> -                            0, 31);
> +        TCGv_i64 rs1 = tcg_temp_new_i64();
> +        TCGv_i64 rs2 = tcg_temp_new_i64();
> +
> +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /* This formulation retains the nanboxing of rs2. */
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
> +        tcg_temp_free_i64(rs1);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>  {
> +    TCGv_i64 rs1, rs2, mask;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Replace bit 31 in rs1 with inverse in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_andc_i64(rs2, mask, rs2);
> +        tcg_gen_and_i64(rs1, mask, rs1);
> +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(mask);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>  {
> +    TCGv_i64 rs1, rs2;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Xor bit 31 in rs1 with that in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 12a746da97..bf35182776 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>      tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>  }
>
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the least-significant bits of the input are used, otherwise the
> + * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
> + *
> + * Here, the result is always nan-boxed, even the canonical nan.
> + */
> +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
> +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> +    tcg_temp_free_i64(t_max);
> +    tcg_temp_free_i64(t_nan);
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 7173 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
@ 2020-08-06  6:27     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 5522 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> If a 32-bit input is not properly nanboxed, then the input is replaced
> with the default qnan.  The only inline expansion is for the sign-changing
> set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
>  target/riscv/translate.c                | 18 +++++++
>  2 files changed, 73 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 264d3139f1..f9a9e0643a 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>      } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
> -                            0, 31);
> +        TCGv_i64 rs1 = tcg_temp_new_i64();
> +        TCGv_i64 rs2 = tcg_temp_new_i64();
> +
> +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /* This formulation retains the nanboxing of rs2. */
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
> +        tcg_temp_free_i64(rs1);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>  {
> +    TCGv_i64 rs1, rs2, mask;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Replace bit 31 in rs1 with inverse in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_andc_i64(rs2, mask, rs2);
> +        tcg_gen_and_i64(rs1, mask, rs1);
> +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(mask);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>  {
> +    TCGv_i64 rs1, rs2;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Xor bit 31 in rs1 with that in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 12a746da97..bf35182776 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>      tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>  }
>
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the least-significant bits of the input are used, otherwise the
> + * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
> + *
> + * Here, the result is always nan-boxed, even the canonical nan.
> + */
> +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
> +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> +    tcg_temp_free_i64(t_max);
> +    tcg_temp_free_i64(t_nan);
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 7173 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 6/7] target/riscv: Clean up fmv.w.x
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-06  6:28     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> From: LIU Zhiwei <zhiwei_liu@c-sky.com>
>
> Use tcg_gen_extu_tl_i64 to avoid the ifdef.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> Message-Id: <20200626205917.4545-7-zhiwei_liu@c-sky.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index f9a9e0643a..0d04677a02 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -406,11 +406,7 @@ static bool trans_fmv_w_x(DisasContext *ctx,
> arg_fmv_w_x *a)
>      TCGv t0 = tcg_temp_new();
>      gen_get_gpr(t0, a->rs1);
>
> -#if defined(TARGET_RISCV64)
> -    tcg_gen_mov_i64(cpu_fpr[a->rd], t0);
> -#else
> -    tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
> -#endif
> +    tcg_gen_extu_tl_i64(cpu_fpr[a->rd], t0);
>      gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>
>      mark_fs_dirty(ctx);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 2227 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 6/7] target/riscv: Clean up fmv.w.x
@ 2020-08-06  6:28     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> From: LIU Zhiwei <zhiwei_liu@c-sky.com>
>
> Use tcg_gen_extu_tl_i64 to avoid the ifdef.
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> Message-Id: <20200626205917.4545-7-zhiwei_liu@c-sky.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index f9a9e0643a..0d04677a02 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -406,11 +406,7 @@ static bool trans_fmv_w_x(DisasContext *ctx,
> arg_fmv_w_x *a)
>      TCGv t0 = tcg_temp_new();
>      gen_get_gpr(t0, a->rs1);
>
> -#if defined(TARGET_RISCV64)
> -    tcg_gen_mov_i64(cpu_fpr[a->rd], t0);
> -#else
> -    tcg_gen_extu_i32_i64(cpu_fpr[a->rd], t0);
> -#endif
> +    tcg_gen_extu_tl_i64(cpu_fpr[a->rd], t0);
>      gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>
>      mark_fs_dirty(ctx);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 2227 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 7/7] target/riscv: check before allocating TCG temps
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-06  6:28     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

On Fri, Jul 24, 2020 at 8:32 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> From: LIU Zhiwei <zhiwei_liu@c-sky.com>
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> Message-Id: <20200626205917.4545-5-zhiwei_liu@c-sky.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
>  target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
> b/target/riscv/insn_trans/trans_rvd.inc.c
> index ea1044f13b..4f832637fa 100644
> --- a/target/riscv/insn_trans/trans_rvd.inc.c
> +++ b/target/riscv/insn_trans/trans_rvd.inc.c
> @@ -20,10 +20,10 @@
>
>  static bool trans_fld(DisasContext *ctx, arg_fld *a)
>  {
> -    TCGv t0 = tcg_temp_new();
> -    gen_get_gpr(t0, a->rs1);
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVD);
> +    TCGv t0 = tcg_temp_new();
> +    gen_get_gpr(t0, a->rs1);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
> @@ -35,10 +35,10 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
>
>  static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
>  {
> -    TCGv t0 = tcg_temp_new();
> -    gen_get_gpr(t0, a->rs1);
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVD);
> +    TCGv t0 = tcg_temp_new();
> +    gen_get_gpr(t0, a->rs1);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 0d04677a02..16df9c5ee2 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -25,10 +25,10 @@
>
>  static bool trans_flw(DisasContext *ctx, arg_flw *a)
>  {
> -    TCGv t0 = tcg_temp_new();
> -    gen_get_gpr(t0, a->rs1);
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +    TCGv t0 = tcg_temp_new();
> +    gen_get_gpr(t0, a->rs1);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
> @@ -41,11 +41,11 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>
>  static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
>  {
> +    REQUIRE_FPU;
> +    REQUIRE_EXT(ctx, RVF);
>      TCGv t0 = tcg_temp_new();
>      gen_get_gpr(t0, a->rs1);
>
> -    REQUIRE_FPU;
> -    REQUIRE_EXT(ctx, RVF);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 3991 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 7/7] target/riscv: check before allocating TCG temps
@ 2020-08-06  6:28     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  6:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

On Fri, Jul 24, 2020 at 8:32 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> From: LIU Zhiwei <zhiwei_liu@c-sky.com>
>
> Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
> Message-Id: <20200626205917.4545-5-zhiwei_liu@c-sky.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvd.inc.c | 8 ++++----
>  target/riscv/insn_trans/trans_rvf.inc.c | 8 ++++----
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvd.inc.c
> b/target/riscv/insn_trans/trans_rvd.inc.c
> index ea1044f13b..4f832637fa 100644
> --- a/target/riscv/insn_trans/trans_rvd.inc.c
> +++ b/target/riscv/insn_trans/trans_rvd.inc.c
> @@ -20,10 +20,10 @@
>
>  static bool trans_fld(DisasContext *ctx, arg_fld *a)
>  {
> -    TCGv t0 = tcg_temp_new();
> -    gen_get_gpr(t0, a->rs1);
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVD);
> +    TCGv t0 = tcg_temp_new();
> +    gen_get_gpr(t0, a->rs1);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
> @@ -35,10 +35,10 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
>
>  static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
>  {
> -    TCGv t0 = tcg_temp_new();
> -    gen_get_gpr(t0, a->rs1);
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVD);
> +    TCGv t0 = tcg_temp_new();
> +    gen_get_gpr(t0, a->rs1);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 0d04677a02..16df9c5ee2 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -25,10 +25,10 @@
>
>  static bool trans_flw(DisasContext *ctx, arg_flw *a)
>  {
> -    TCGv t0 = tcg_temp_new();
> -    gen_get_gpr(t0, a->rs1);
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +    TCGv t0 = tcg_temp_new();
> +    gen_get_gpr(t0, a->rs1);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
> @@ -41,11 +41,11 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>
>  static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
>  {
> +    REQUIRE_FPU;
> +    REQUIRE_EXT(ctx, RVF);
>      TCGv t0 = tcg_temp_new();
>      gen_get_gpr(t0, a->rs1);
>
> -    REQUIRE_FPU;
> -    REQUIRE_EXT(ctx, RVF);
>      tcg_gen_addi_tl(t0, t0, a->imm);
>
>      tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
> --
> 2.25.1
>
>
>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>

Chih-Min Chao

[-- Attachment #2: Type: text/html, Size: 3991 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-08-06  6:09           ` Chih-Min Chao
@ 2020-08-06  7:05             ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-06  7:05 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: Frank Chang, Alistair Francis, Richard Henderson,
	qemu-devel@nongnu.org Developers, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]



On 2020/8/6 14:09, Chih-Min Chao wrote:
> On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>
>
>     On 2020/7/24 11:55, Richard Henderson wrote:
>     > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>     >>
>     >> On 2020/7/24 8:28, Richard Henderson wrote:
>     >>> Make sure that all results from single-precision scalar helpers
>     >>> are properly nan-boxed to 64-bits.
>     >>>
>     >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org
>     <mailto:richard.henderson@linaro.org>>
>     >>> ---
>     >>>    target/riscv/internals.h  |  5 +++++
>     >>>    target/riscv/fpu_helper.c | 42
>     +++++++++++++++++++++------------------
>     >>>    2 files changed, 28 insertions(+), 19 deletions(-)
>     >>>
>     >>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>     >>> index 37d33820ad..9f4ba7d617 100644
>     >>> --- a/target/riscv/internals.h
>     >>> +++ b/target/riscv/internals.h
>     >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>     >>>    #define SEW32 2
>     >>>    #define SEW64 3
>     >>>    +static inline uint64_t nanbox_s(float32 f)
>     >>> +{
>     >>> +    return f | MAKE_64BIT_MASK(32, 32);
>     >>> +}
>     >>> +
>     >> If define it here,  we can also define a more general  function
>     with flen.
>     >>
>     >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>     >> +{
>     >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>     >> +}
>     >> +
>     >>
>     >> So we can reuse it in fp16 or bf16 scalar instruction and in
>     vector instructions.
>     > While we could do that, we will not encounter all possible
>     lengths.  In the
>     > cover letter, I mentioned defining a second function,
>     >
>     > static inline uint64_t nanbox_h(float16 f)
>     > {
>     >     return f | MAKE_64BIT_MASK(16, 48);
>     > }
>     >
>     > Having two separate functions will, I believe, be easier to use
>     in practice.
>     >
>     Get  it. Thanks.
>
>     Zhiwei
>     >
>     > r~
>
>
>
> That is what has been implemented in spike.  It fills up the Nan-Box 
> when value is stored back internal structure and
> unbox the value with difference floating type (half/single/double/quad).
Hi Chih-Min,

Has half-precision been a part of RVV? Or do you know the ISA 
abbreviation of half-precision?

Thanks very much.

Best Regards,
Zhiwei
>
> By the way,  I prefer to keeping the suffix to tell different floating 
> type rather than pass arbitrary
> since each floating type belong to each extension.
>
> Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com 
> <mailto:chihmin.chao@sifive.com>>


[-- Attachment #2: Type: text/html, Size: 5302 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-08-06  7:05             ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-06  7:05 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: Richard Henderson, qemu-devel@nongnu.org Developers, Frank Chang,
	Alistair Francis, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]



On 2020/8/6 14:09, Chih-Min Chao wrote:
> On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>
>
>     On 2020/7/24 11:55, Richard Henderson wrote:
>     > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>     >>
>     >> On 2020/7/24 8:28, Richard Henderson wrote:
>     >>> Make sure that all results from single-precision scalar helpers
>     >>> are properly nan-boxed to 64-bits.
>     >>>
>     >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org
>     <mailto:richard.henderson@linaro.org>>
>     >>> ---
>     >>>    target/riscv/internals.h  |  5 +++++
>     >>>    target/riscv/fpu_helper.c | 42
>     +++++++++++++++++++++------------------
>     >>>    2 files changed, 28 insertions(+), 19 deletions(-)
>     >>>
>     >>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>     >>> index 37d33820ad..9f4ba7d617 100644
>     >>> --- a/target/riscv/internals.h
>     >>> +++ b/target/riscv/internals.h
>     >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>     >>>    #define SEW32 2
>     >>>    #define SEW64 3
>     >>>    +static inline uint64_t nanbox_s(float32 f)
>     >>> +{
>     >>> +    return f | MAKE_64BIT_MASK(32, 32);
>     >>> +}
>     >>> +
>     >> If define it here,  we can also define a more general  function
>     with flen.
>     >>
>     >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>     >> +{
>     >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>     >> +}
>     >> +
>     >>
>     >> So we can reuse it in fp16 or bf16 scalar instruction and in
>     vector instructions.
>     > While we could do that, we will not encounter all possible
>     lengths.  In the
>     > cover letter, I mentioned defining a second function,
>     >
>     > static inline uint64_t nanbox_h(float16 f)
>     > {
>     >     return f | MAKE_64BIT_MASK(16, 48);
>     > }
>     >
>     > Having two separate functions will, I believe, be easier to use
>     in practice.
>     >
>     Get  it. Thanks.
>
>     Zhiwei
>     >
>     > r~
>
>
>
> That is what has been implemented in spike.  It fills up the Nan-Box 
> when value is stored back internal structure and
> unbox the value with difference floating type (half/single/double/quad).
Hi Chih-Min,

Has half-precision been a part of RVV? Or do you know the ISA 
abbreviation of half-precision?

Thanks very much.

Best Regards,
Zhiwei
>
> By the way,  I prefer to keeping the suffix to tell different floating 
> type rather than pass arbitrary
> since each floating type belong to each extension.
>
> Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com 
> <mailto:chihmin.chao@sifive.com>>


[-- Attachment #2: Type: text/html, Size: 5302 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-08-06  7:05             ` LIU Zhiwei
@ 2020-08-06  8:42               ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  8:42 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Frank Chang, Alistair Francis, Richard Henderson,
	qemu-devel@nongnu.org Developers, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]

On Thu, Aug 6, 2020 at 3:05 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

>
>
> On 2020/8/6 14:09, Chih-Min Chao wrote:
>
> On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>>
>>
>> On 2020/7/24 11:55, Richard Henderson wrote:
>> > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>> >>
>> >> On 2020/7/24 8:28, Richard Henderson wrote:
>> >>> Make sure that all results from single-precision scalar helpers
>> >>> are properly nan-boxed to 64-bits.
>> >>>
>> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> >>> ---
>> >>>    target/riscv/internals.h  |  5 +++++
>> >>>    target/riscv/fpu_helper.c | 42
>> +++++++++++++++++++++------------------
>> >>>    2 files changed, 28 insertions(+), 19 deletions(-)
>> >>>
>> >>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>> >>> index 37d33820ad..9f4ba7d617 100644
>> >>> --- a/target/riscv/internals.h
>> >>> +++ b/target/riscv/internals.h
>> >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>> >>>    #define SEW32 2
>> >>>    #define SEW64 3
>> >>>    +static inline uint64_t nanbox_s(float32 f)
>> >>> +{
>> >>> +    return f | MAKE_64BIT_MASK(32, 32);
>> >>> +}
>> >>> +
>> >> If define it here,  we can also define a more general  function with
>> flen.
>> >>
>> >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>> >> +{
>> >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>> >> +}
>> >> +
>> >>
>> >> So we can reuse it in fp16 or bf16 scalar instruction and in vector
>> instructions.
>> > While we could do that, we will not encounter all possible lengths.  In
>> the
>> > cover letter, I mentioned defining a second function,
>> >
>> > static inline uint64_t nanbox_h(float16 f)
>> > {
>> >     return f | MAKE_64BIT_MASK(16, 48);
>> > }
>> >
>> > Having two separate functions will, I believe, be easier to use in
>> practice.
>> >
>> Get  it. Thanks.
>>
>> Zhiwei
>> >
>> > r~
>>
>>
>>
> That is what has been implemented in spike.  It fills up the Nan-Box when
> value is stored back internal structure and
> unbox the value with difference floating type (half/single/double/quad).
>
> Hi Chih-Min,
>
> Has half-precision been a part of RVV? Or do you know the ISA abbreviation
> of half-precision?
>
> Thanks very much.
>
> Best Regards,
> Zhiwei
>
>
> By the way,  I prefer to keeping the suffix to tell different floating
> type rather than pass arbitrary
> since each floating type belong to each extension.
>
> Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>
>
>
Hi  ZhiWei,

It is still under branch https://github.com/riscv/riscv-isa-manual/tree/zfh and
I am not sure about the working group progress.
I have an implementation based on this draft and will send it as RFC patch
next week.

Thanks
Chih-Min

[-- Attachment #2: Type: text/html, Size: 5715 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-08-06  8:42               ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-06  8:42 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: Richard Henderson, qemu-devel@nongnu.org Developers, Frank Chang,
	Alistair Francis, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]

On Thu, Aug 6, 2020 at 3:05 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:

>
>
> On 2020/8/6 14:09, Chih-Min Chao wrote:
>
> On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com> wrote:
>
>>
>>
>> On 2020/7/24 11:55, Richard Henderson wrote:
>> > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>> >>
>> >> On 2020/7/24 8:28, Richard Henderson wrote:
>> >>> Make sure that all results from single-precision scalar helpers
>> >>> are properly nan-boxed to 64-bits.
>> >>>
>> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> >>> ---
>> >>>    target/riscv/internals.h  |  5 +++++
>> >>>    target/riscv/fpu_helper.c | 42
>> +++++++++++++++++++++------------------
>> >>>    2 files changed, 28 insertions(+), 19 deletions(-)
>> >>>
>> >>> diff --git a/target/riscv/internals.h b/target/riscv/internals.h
>> >>> index 37d33820ad..9f4ba7d617 100644
>> >>> --- a/target/riscv/internals.h
>> >>> +++ b/target/riscv/internals.h
>> >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>> >>>    #define SEW32 2
>> >>>    #define SEW64 3
>> >>>    +static inline uint64_t nanbox_s(float32 f)
>> >>> +{
>> >>> +    return f | MAKE_64BIT_MASK(32, 32);
>> >>> +}
>> >>> +
>> >> If define it here,  we can also define a more general  function with
>> flen.
>> >>
>> >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>> >> +{
>> >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>> >> +}
>> >> +
>> >>
>> >> So we can reuse it in fp16 or bf16 scalar instruction and in vector
>> instructions.
>> > While we could do that, we will not encounter all possible lengths.  In
>> the
>> > cover letter, I mentioned defining a second function,
>> >
>> > static inline uint64_t nanbox_h(float16 f)
>> > {
>> >     return f | MAKE_64BIT_MASK(16, 48);
>> > }
>> >
>> > Having two separate functions will, I believe, be easier to use in
>> practice.
>> >
>> Get  it. Thanks.
>>
>> Zhiwei
>> >
>> > r~
>>
>>
>>
> That is what has been implemented in spike.  It fills up the Nan-Box when
> value is stored back internal structure and
> unbox the value with difference floating type (half/single/double/quad).
>
> Hi Chih-Min,
>
> Has half-precision been a part of RVV? Or do you know the ISA abbreviation
> of half-precision?
>
> Thanks very much.
>
> Best Regards,
> Zhiwei
>
>
> By the way,  I prefer to keeping the suffix to tell different floating
> type rather than pass arbitrary
> since each floating type belong to each extension.
>
> Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>
>
>
Hi  ZhiWei,

It is still under branch https://github.com/riscv/riscv-isa-manual/tree/zfh and
I am not sure about the working group progress.
I have an implementation based on this draft and will send it as RFC patch
next week.

Thanks
Chih-Min

[-- Attachment #2: Type: text/html, Size: 5715 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
  2020-08-06  8:42               ` Chih-Min Chao
@ 2020-08-06 10:02                 ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-06 10:02 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: Frank Chang, Alistair Francis, Richard Henderson,
	qemu-devel@nongnu.org Developers, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 3921 bytes --]



On 2020/8/6 16:42, Chih-Min Chao wrote:
>
>
>
> On Thu, Aug 6, 2020 at 3:05 PM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>
>
>     On 2020/8/6 14:09, Chih-Min Chao wrote:
>>     On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com
>>     <mailto:zhiwei_liu@c-sky.com>> wrote:
>>
>>
>>
>>         On 2020/7/24 11:55, Richard Henderson wrote:
>>         > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>>         >>
>>         >> On 2020/7/24 8:28, Richard Henderson wrote:
>>         >>> Make sure that all results from single-precision scalar
>>         helpers
>>         >>> are properly nan-boxed to 64-bits.
>>         >>>
>>         >>> Signed-off-by: Richard Henderson
>>         <richard.henderson@linaro.org
>>         <mailto:richard.henderson@linaro.org>>
>>         >>> ---
>>         >>>    target/riscv/internals.h  |  5 +++++
>>         >>>    target/riscv/fpu_helper.c | 42
>>         +++++++++++++++++++++------------------
>>         >>>    2 files changed, 28 insertions(+), 19 deletions(-)
>>         >>>
>>         >>> diff --git a/target/riscv/internals.h
>>         b/target/riscv/internals.h
>>         >>> index 37d33820ad..9f4ba7d617 100644
>>         >>> --- a/target/riscv/internals.h
>>         >>> +++ b/target/riscv/internals.h
>>         >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>>         >>>    #define SEW32 2
>>         >>>    #define SEW64 3
>>         >>>    +static inline uint64_t nanbox_s(float32 f)
>>         >>> +{
>>         >>> +    return f | MAKE_64BIT_MASK(32, 32);
>>         >>> +}
>>         >>> +
>>         >> If define it here,  we can also define a more general 
>>         function with flen.
>>         >>
>>         >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>>         >> +{
>>         >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>>         >> +}
>>         >> +
>>         >>
>>         >> So we can reuse it in fp16 or bf16 scalar instruction and
>>         in vector instructions.
>>         > While we could do that, we will not encounter all possible
>>         lengths.  In the
>>         > cover letter, I mentioned defining a second function,
>>         >
>>         > static inline uint64_t nanbox_h(float16 f)
>>         > {
>>         >     return f | MAKE_64BIT_MASK(16, 48);
>>         > }
>>         >
>>         > Having two separate functions will, I believe, be easier to
>>         use in practice.
>>         >
>>         Get  it. Thanks.
>>
>>         Zhiwei
>>         >
>>         > r~
>>
>>
>>
>>     That is what has been implemented in spike.  It fills up the
>>     Nan-Box when value is stored back internal structure and
>>     unbox the value with difference floating type
>>     (half/single/double/quad).
>     Hi Chih-Min,
>
>     Has half-precision been a part of RVV? Or do you know the ISA
>     abbreviation of half-precision?
>
>     Thanks very much.
>
>     Best Regards,
>     Zhiwei
>>
>>     By the way,  I prefer to keeping the suffix to tell
>>     different floating type rather than pass arbitrary
>>     since each floating type belong to each extension.
>>
>>     Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com
>>     <mailto:chihmin.chao@sifive.com>>
>
>
> Hi  ZhiWei,
>
> It is still under branch 
> https://github.com/riscv/riscv-isa-manual/tree/zfh and I am not sure 
> about the working group progress.
> I have an implementation based on this draft and will send it as RFC 
> patch next week.
Hi Chih-Min,

Thanks for your information.

As Krste said once,  as we don't have RV16, FP16 separated won't make 
sense.  Obviously, it has changed.:-P

I also have implemented a version of FP16 ,“obvious set including 
existing FP instructions with format field set to "half" (fmt=10)“

If you want to send the patch, I will not send it again.:-)


Zhiwei
>
> Thanks
> Chih-Min


[-- Attachment #2: Type: text/html, Size: 9406 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers
@ 2020-08-06 10:02                 ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-06 10:02 UTC (permalink / raw)
  To: Chih-Min Chao
  Cc: Richard Henderson, qemu-devel@nongnu.org Developers, Frank Chang,
	Alistair Francis, open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 3921 bytes --]



On 2020/8/6 16:42, Chih-Min Chao wrote:
>
>
>
> On Thu, Aug 6, 2020 at 3:05 PM LIU Zhiwei <zhiwei_liu@c-sky.com 
> <mailto:zhiwei_liu@c-sky.com>> wrote:
>
>
>
>     On 2020/8/6 14:09, Chih-Min Chao wrote:
>>     On Fri, Jul 24, 2020 at 2:06 PM LIU Zhiwei <zhiwei_liu@c-sky.com
>>     <mailto:zhiwei_liu@c-sky.com>> wrote:
>>
>>
>>
>>         On 2020/7/24 11:55, Richard Henderson wrote:
>>         > On 7/23/20 7:35 PM, LIU Zhiwei wrote:
>>         >>
>>         >> On 2020/7/24 8:28, Richard Henderson wrote:
>>         >>> Make sure that all results from single-precision scalar
>>         helpers
>>         >>> are properly nan-boxed to 64-bits.
>>         >>>
>>         >>> Signed-off-by: Richard Henderson
>>         <richard.henderson@linaro.org
>>         <mailto:richard.henderson@linaro.org>>
>>         >>> ---
>>         >>>    target/riscv/internals.h  |  5 +++++
>>         >>>    target/riscv/fpu_helper.c | 42
>>         +++++++++++++++++++++------------------
>>         >>>    2 files changed, 28 insertions(+), 19 deletions(-)
>>         >>>
>>         >>> diff --git a/target/riscv/internals.h
>>         b/target/riscv/internals.h
>>         >>> index 37d33820ad..9f4ba7d617 100644
>>         >>> --- a/target/riscv/internals.h
>>         >>> +++ b/target/riscv/internals.h
>>         >>> @@ -38,4 +38,9 @@ target_ulong fclass_d(uint64_t frs1);
>>         >>>    #define SEW32 2
>>         >>>    #define SEW64 3
>>         >>>    +static inline uint64_t nanbox_s(float32 f)
>>         >>> +{
>>         >>> +    return f | MAKE_64BIT_MASK(32, 32);
>>         >>> +}
>>         >>> +
>>         >> If define it here,  we can also define a more general 
>>         function with flen.
>>         >>
>>         >> +static inline uint64_t nanbox_s(float32 f, uint32_t flen)
>>         >> +{
>>         >> +    return f | MAKE_64BIT_MASK(flen, 64 - flen);
>>         >> +}
>>         >> +
>>         >>
>>         >> So we can reuse it in fp16 or bf16 scalar instruction and
>>         in vector instructions.
>>         > While we could do that, we will not encounter all possible
>>         lengths.  In the
>>         > cover letter, I mentioned defining a second function,
>>         >
>>         > static inline uint64_t nanbox_h(float16 f)
>>         > {
>>         >     return f | MAKE_64BIT_MASK(16, 48);
>>         > }
>>         >
>>         > Having two separate functions will, I believe, be easier to
>>         use in practice.
>>         >
>>         Get  it. Thanks.
>>
>>         Zhiwei
>>         >
>>         > r~
>>
>>
>>
>>     That is what has been implemented in spike.  It fills up the
>>     Nan-Box when value is stored back internal structure and
>>     unbox the value with difference floating type
>>     (half/single/double/quad).
>     Hi Chih-Min,
>
>     Has half-precision been a part of RVV? Or do you know the ISA
>     abbreviation of half-precision?
>
>     Thanks very much.
>
>     Best Regards,
>     Zhiwei
>>
>>     By the way,  I prefer to keeping the suffix to tell
>>     different floating type rather than pass arbitrary
>>     since each floating type belong to each extension.
>>
>>     Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com
>>     <mailto:chihmin.chao@sifive.com>>
>
>
> Hi  ZhiWei,
>
> It is still under branch 
> https://github.com/riscv/riscv-isa-manual/tree/zfh and I am not sure 
> about the working group progress.
> I have an implementation based on this draft and will send it as RFC 
> patch next week.
Hi Chih-Min,

Thanks for your information.

As Krste said once,  as we don't have RV16, FP16 separated won't make 
sense.  Obviously, it has changed.:-P

I also have implemented a version of FP16 ,“obvious set including 
existing FP instructions with format field set to "half" (fmt=10)“

If you want to send the patch, I will not send it again.:-)


Zhiwei
>
> Thanks
> Chih-Min


[-- Attachment #2: Type: text/html, Size: 9406 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
  2020-07-24  0:28   ` Richard Henderson
@ 2020-08-07 20:24     ` Chih-Min Chao
  -1 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-07 20:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> If a 32-bit input is not properly nanboxed, then the input is replaced
> with the default qnan.  The only inline expansion is for the sign-changing
> set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
>  target/riscv/translate.c                | 18 +++++++
>  2 files changed, 73 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 264d3139f1..f9a9e0643a 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>      } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
> -                            0, 31);
> +        TCGv_i64 rs1 = tcg_temp_new_i64();
> +        TCGv_i64 rs2 = tcg_temp_new_i64();
> +
> +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /* This formulation retains the nanboxing of rs2. */
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
> +        tcg_temp_free_i64(rs1);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>  {
> +    TCGv_i64 rs1, rs2, mask;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Replace bit 31 in rs1 with inverse in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_andc_i64(rs2, mask, rs2);
>

should be
              tcg_gen_not_i64(rs2, rs2);         // forget to inverse rs2
              tcg_gen_andc_i64(rs2, rs2, mask);  //mask needs to be
inverted to get only sign

 Chih-Min Chao

> +        tcg_gen_and_i64(rs1, mask, rs1);
> +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(mask);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>  {
> +    TCGv_i64 rs1, rs2;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Xor bit 31 in rs1 with that in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 12a746da97..bf35182776 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>      tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>  }
>
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the least-significant bits of the input are used, otherwise the
> + * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
> + *
> + * Here, the result is always nan-boxed, even the canonical nan.
> + */
> +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
> +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> +    tcg_temp_free_i64(t_max);
> +    tcg_temp_free_i64(t_nan);
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.25.1
>
>
>

[-- Attachment #2: Type: text/html, Size: 7222 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
@ 2020-08-07 20:24     ` Chih-Min Chao
  0 siblings, 0 replies; 62+ messages in thread
From: Chih-Min Chao @ 2020-08-07 20:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V, liuzhiwei

[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]

On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson <
richard.henderson@linaro.org> wrote:

> If a 32-bit input is not properly nanboxed, then the input is replaced
> with the default qnan.  The only inline expansion is for the sign-changing
> set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/riscv/insn_trans/trans_rvf.inc.c | 71 +++++++++++++++++++------
>  target/riscv/translate.c                | 18 +++++++
>  2 files changed, 73 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
> b/target/riscv/insn_trans/trans_rvf.inc.c
> index 264d3139f1..f9a9e0643a 100644
> --- a/target/riscv/insn_trans/trans_rvf.inc.c
> +++ b/target/riscv/insn_trans/trans_rvf.inc.c
> @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx,
> arg_fsgnj_s *a)
>  {
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
>      if (a->rs1 == a->rs2) { /* FMOV */
> -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>      } else { /* FSGNJ */
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
> cpu_fpr[a->rs1],
> -                            0, 31);
> +        TCGv_i64 rs1 = tcg_temp_new_i64();
> +        TCGv_i64 rs2 = tcg_temp_new_i64();
> +
> +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /* This formulation retains the nanboxing of rs2. */
> +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
> +        tcg_temp_free_i64(rs1);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>  {
> +    TCGv_i64 rs1, rs2, mask;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FNEG */
> -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
> +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
> -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1], 0, 31);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Replace bit 31 in rs1 with inverse in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_andc_i64(rs2, mask, rs2);
>

should be
              tcg_gen_not_i64(rs2, rs2);         // forget to inverse rs2
              tcg_gen_andc_i64(rs2, rs2, mask);  //mask needs to be
inverted to get only sign

 Chih-Min Chao

> +        tcg_gen_and_i64(rs1, mask, rs1);
> +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(mask);
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
>
>  static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>  {
> +    TCGv_i64 rs1, rs2;
> +
>      REQUIRE_FPU;
>      REQUIRE_EXT(ctx, RVF);
> +
> +    rs1 = tcg_temp_new_i64();
> +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
> +
>      if (a->rs1 == a->rs2) { /* FABS */
> -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], ~INT32_MIN);
> +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1, ~MAKE_64BIT_MASK(31, 1));
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
> -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
> -        tcg_temp_free_i64(t0);
> +        rs2 = tcg_temp_new_i64();
> +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
> +
> +        /*
> +         * Xor bit 31 in rs1 with that in rs2.
> +         * This formulation retains the nanboxing of rs1.
> +         */
> +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
> +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
> +
> +        tcg_temp_free_i64(rs2);
>      }
> -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
> +    tcg_temp_free_i64(rs1);
> +
>      mark_fs_dirty(ctx);
>      return true;
>  }
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 12a746da97..bf35182776 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>      tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>  }
>
> +/*
> + * A narrow n-bit operation, where n < FLEN, checks that input operands
> + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
> + * If so, the least-significant bits of the input are used, otherwise the
> + * input value is treated as an n-bit canonical NaN (v2.2 section 9.2).
> + *
> + * Here, the result is always nan-boxed, even the canonical nan.
> + */
> +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
> +{
> +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
> +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> +    tcg_temp_free_i64(t_max);
> +    tcg_temp_free_i64(t_nan);
> +}
> +
>  static void generate_exception(DisasContext *ctx, int excp)
>  {
>      tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
> --
> 2.25.1
>
>
>

[-- Attachment #2: Type: text/html, Size: 7222 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
  2020-08-07 20:24     ` Chih-Min Chao
@ 2020-08-08 14:18       ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-08 14:18 UTC (permalink / raw)
  To: Chih-Min Chao, Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers

[-- Attachment #1: Type: text/plain, Size: 7001 bytes --]



On 2020/8/8 4:24, Chih-Min Chao wrote:
> On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson 
> <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>> 
> wrote:
>
>     If a 32-bit input is not properly nanboxed, then the input is replaced
>     with the default qnan.  The only inline expansion is for the
>     sign-changing
>     set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org
>     <mailto:richard.henderson@linaro.org>>
>     ---
>      target/riscv/insn_trans/trans_rvf.inc.c | 71
>     +++++++++++++++++++------
>      target/riscv/translate.c                | 18 +++++++
>      2 files changed, 73 insertions(+), 16 deletions(-)
>
>     diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>     b/target/riscv/insn_trans/trans_rvf.inc.c
>     index 264d3139f1..f9a9e0643a 100644
>     --- a/target/riscv/insn_trans/trans_rvf.inc.c
>     +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>     @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx,
>     arg_fsgnj_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          if (a->rs1 == a->rs2) { /* FMOV */
>     -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>     +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>          } else { /* FSGNJ */
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>     cpu_fpr[a->rs1],
>     -                            0, 31);
>     +        TCGv_i64 rs1 = tcg_temp_new_i64();
>     +        TCGv_i64 rs2 = tcg_temp_new_i64();
>     +
>     +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>     +
>     +        /* This formulation retains the nanboxing of rs2. */
>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
>     +        tcg_temp_free_i64(rs1);
>     +        tcg_temp_free_i64(rs2);
>          }
>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>          mark_fs_dirty(ctx);
>          return true;
>      }
>
>      static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>      {
>     +    TCGv_i64 rs1, rs2, mask;
>     +
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    rs1 = tcg_temp_new_i64();
>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>     +
>          if (a->rs1 == a->rs2) { /* FNEG */
>     -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
>     +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31,
>     1));
>          } else {
>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1],
>     0, 31);
>     -        tcg_temp_free_i64(t0);
>     +        rs2 = tcg_temp_new_i64();
>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>     +
>     +        /*
>     +         * Replace bit 31 in rs1 with inverse in rs2.
>     +         * This formulation retains the nanboxing of rs1.
>     +         */
>     +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
>     +        tcg_gen_andc_i64(rs2, mask, rs2);
>
>
> should be
>               tcg_gen_not_i64(rs2, rs2);         // forget to inverse rs2
>               tcg_gen_andc_i64(rs2, rs2, mask);  //mask needs to be 
> inverted to get only sign
Hi Chih-Min，

Thanks for pointing it out. It's a bug here. However, I think it should be

tcg_gen_andc_i64(rs2, rs2, mask);  // only get rs2 bit 31

tcg_gen_not_i64(rs2, rs2);  // inverse rs2


Best Regards,
Zhiwei
>
>  Chih-Min Chao
>
>     +        tcg_gen_and_i64(rs1, mask, rs1);
>     +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
>     +
>     +        tcg_temp_free_i64(mask);
>     +        tcg_temp_free_i64(rs2);
>          }
>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>     +    tcg_temp_free_i64(rs1);
>     +
>          mark_fs_dirty(ctx);
>          return true;
>      }
>
>      static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>      {
>     +    TCGv_i64 rs1, rs2;
>     +
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    rs1 = tcg_temp_new_i64();
>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>     +
>          if (a->rs1 == a->rs2) { /* FABS */
>     -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>     ~INT32_MIN);
>     +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1,
>     ~MAKE_64BIT_MASK(31, 1));
>          } else {
>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>     -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>     -        tcg_temp_free_i64(t0);
>     +        rs2 = tcg_temp_new_i64();
>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>     +
>     +        /*
>     +         * Xor bit 31 in rs1 with that in rs2.
>     +         * This formulation retains the nanboxing of rs1.
>     +         */
>     +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
>     +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
>     +
>     +        tcg_temp_free_i64(rs2);
>          }
>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>     +    tcg_temp_free_i64(rs1);
>     +
>          mark_fs_dirty(ctx);
>          return true;
>      }
>     diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>     index 12a746da97..bf35182776 100644
>     --- a/target/riscv/translate.c
>     +++ b/target/riscv/translate.c
>     @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out,
>     TCGv_i64 in)
>          tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>      }
>
>     +/*
>     + * A narrow n-bit operation, where n < FLEN, checks that input
>     operands
>     + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
>     + * If so, the least-significant bits of the input are used,
>     otherwise the
>     + * input value is treated as an n-bit canonical NaN (v2.2 section
>     9.2).
>     + *
>     + * Here, the result is always nan-boxed, even the canonical nan.
>     + */
>     +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>     +{
>     +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
>     +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
>     +
>     +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
>     +    tcg_temp_free_i64(t_max);
>     +    tcg_temp_free_i64(t_nan);
>     +}
>     +
>      static void generate_exception(DisasContext *ctx, int excp)
>      {
>          tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
>     -- 
>     2.25.1
>
>


[-- Attachment #2: Type: text/html, Size: 10635 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
@ 2020-08-08 14:18       ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-08 14:18 UTC (permalink / raw)
  To: Chih-Min Chao, Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 7001 bytes --]



On 2020/8/8 4:24, Chih-Min Chao wrote:
> On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson 
> <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>> 
> wrote:
>
>     If a 32-bit input is not properly nanboxed, then the input is replaced
>     with the default qnan.  The only inline expansion is for the
>     sign-changing
>     set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>
>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org
>     <mailto:richard.henderson@linaro.org>>
>     ---
>      target/riscv/insn_trans/trans_rvf.inc.c | 71
>     +++++++++++++++++++------
>      target/riscv/translate.c                | 18 +++++++
>      2 files changed, 73 insertions(+), 16 deletions(-)
>
>     diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>     b/target/riscv/insn_trans/trans_rvf.inc.c
>     index 264d3139f1..f9a9e0643a 100644
>     --- a/target/riscv/insn_trans/trans_rvf.inc.c
>     +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>     @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext *ctx,
>     arg_fsgnj_s *a)
>      {
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>          if (a->rs1 == a->rs2) { /* FMOV */
>     -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>     +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>          } else { /* FSGNJ */
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>     cpu_fpr[a->rs1],
>     -                            0, 31);
>     +        TCGv_i64 rs1 = tcg_temp_new_i64();
>     +        TCGv_i64 rs2 = tcg_temp_new_i64();
>     +
>     +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>     +
>     +        /* This formulation retains the nanboxing of rs2. */
>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
>     +        tcg_temp_free_i64(rs1);
>     +        tcg_temp_free_i64(rs2);
>          }
>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>          mark_fs_dirty(ctx);
>          return true;
>      }
>
>      static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>      {
>     +    TCGv_i64 rs1, rs2, mask;
>     +
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    rs1 = tcg_temp_new_i64();
>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>     +
>          if (a->rs1 == a->rs2) { /* FNEG */
>     -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], INT32_MIN);
>     +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1, MAKE_64BIT_MASK(31,
>     1));
>          } else {
>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1],
>     0, 31);
>     -        tcg_temp_free_i64(t0);
>     +        rs2 = tcg_temp_new_i64();
>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>     +
>     +        /*
>     +         * Replace bit 31 in rs1 with inverse in rs2.
>     +         * This formulation retains the nanboxing of rs1.
>     +         */
>     +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
>     +        tcg_gen_andc_i64(rs2, mask, rs2);
>
>
> should be
>               tcg_gen_not_i64(rs2, rs2);         // forget to inverse rs2
>               tcg_gen_andc_i64(rs2, rs2, mask);  //mask needs to be 
> inverted to get only sign
Hi Chih-Min，

Thanks for pointing it out. It's a bug here. However, I think it should be

tcg_gen_andc_i64(rs2, rs2, mask);  // only get rs2 bit 31

tcg_gen_not_i64(rs2, rs2);  // inverse rs2


Best Regards,
Zhiwei
>
>  Chih-Min Chao
>
>     +        tcg_gen_and_i64(rs1, mask, rs1);
>     +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
>     +
>     +        tcg_temp_free_i64(mask);
>     +        tcg_temp_free_i64(rs2);
>          }
>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>     +    tcg_temp_free_i64(rs1);
>     +
>          mark_fs_dirty(ctx);
>          return true;
>      }
>
>      static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>      {
>     +    TCGv_i64 rs1, rs2;
>     +
>          REQUIRE_FPU;
>          REQUIRE_EXT(ctx, RVF);
>     +
>     +    rs1 = tcg_temp_new_i64();
>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>     +
>          if (a->rs1 == a->rs2) { /* FABS */
>     -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>     ~INT32_MIN);
>     +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1,
>     ~MAKE_64BIT_MASK(31, 1));
>          } else {
>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>     -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>     -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>     -        tcg_temp_free_i64(t0);
>     +        rs2 = tcg_temp_new_i64();
>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>     +
>     +        /*
>     +         * Xor bit 31 in rs1 with that in rs2.
>     +         * This formulation retains the nanboxing of rs1.
>     +         */
>     +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
>     +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
>     +
>     +        tcg_temp_free_i64(rs2);
>          }
>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>     +    tcg_temp_free_i64(rs1);
>     +
>          mark_fs_dirty(ctx);
>          return true;
>      }
>     diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>     index 12a746da97..bf35182776 100644
>     --- a/target/riscv/translate.c
>     +++ b/target/riscv/translate.c
>     @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out,
>     TCGv_i64 in)
>          tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>      }
>
>     +/*
>     + * A narrow n-bit operation, where n < FLEN, checks that input
>     operands
>     + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
>     + * If so, the least-significant bits of the input are used,
>     otherwise the
>     + * input value is treated as an n-bit canonical NaN (v2.2 section
>     9.2).
>     + *
>     + * Here, the result is always nan-boxed, even the canonical nan.
>     + */
>     +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>     +{
>     +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
>     +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
>     +
>     +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
>     +    tcg_temp_free_i64(t_max);
>     +    tcg_temp_free_i64(t_nan);
>     +}
>     +
>      static void generate_exception(DisasContext *ctx, int excp)
>      {
>          tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
>     -- 
>     2.25.1
>
>


[-- Attachment #2: Type: text/html, Size: 10635 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
  2020-08-08 14:18       ` LIU Zhiwei
@ 2020-08-08 23:06         ` LIU Zhiwei
  -1 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-08 23:06 UTC (permalink / raw)
  To: Chih-Min Chao, Richard Henderson
  Cc: Frank Chang, Alistair Francis, open list:RISC-V,
	qemu-devel@nongnu.org Developers

[-- Attachment #1: Type: text/plain, Size: 7293 bytes --]



On 2020/8/8 22:18, LIU Zhiwei wrote:
>
>
> On 2020/8/8 4:24, Chih-Min Chao wrote:
>> On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson 
>> <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>> 
>> wrote:
>>
>>     If a 32-bit input is not properly nanboxed, then the input is
>>     replaced
>>     with the default qnan.  The only inline expansion is for the
>>     sign-changing
>>     set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>>
>>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org
>>     <mailto:richard.henderson@linaro.org>>
>>     ---
>>      target/riscv/insn_trans/trans_rvf.inc.c | 71
>>     +++++++++++++++++++------
>>      target/riscv/translate.c                | 18 +++++++
>>      2 files changed, 73 insertions(+), 16 deletions(-)
>>
>>     diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>>     b/target/riscv/insn_trans/trans_rvf.inc.c
>>     index 264d3139f1..f9a9e0643a 100644
>>     --- a/target/riscv/insn_trans/trans_rvf.inc.c
>>     +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>>     @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext
>>     *ctx, arg_fsgnj_s *a)
>>      {
>>          REQUIRE_FPU;
>>          REQUIRE_EXT(ctx, RVF);
>>     +
>>          if (a->rs1 == a->rs2) { /* FMOV */
>>     -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>>     +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>>          } else { /* FSGNJ */
>>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>>     cpu_fpr[a->rs1],
>>     -                            0, 31);
>>     +        TCGv_i64 rs1 = tcg_temp_new_i64();
>>     +        TCGv_i64 rs2 = tcg_temp_new_i64();
>>     +
>>     +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>>     +
>>     +        /* This formulation retains the nanboxing of rs2. */
>>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
>>     +        tcg_temp_free_i64(rs1);
>>     +        tcg_temp_free_i64(rs2);
>>          }
>>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>>          mark_fs_dirty(ctx);
>>          return true;
>>      }
>>
>>      static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>>      {
>>     +    TCGv_i64 rs1, rs2, mask;
>>     +
>>          REQUIRE_FPU;
>>          REQUIRE_EXT(ctx, RVF);
>>     +
>>     +    rs1 = tcg_temp_new_i64();
>>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>>     +
>>          if (a->rs1 == a->rs2) { /* FNEG */
>>     -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>>     INT32_MIN);
>>     +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1,
>>     MAKE_64BIT_MASK(31, 1));
>>          } else {
>>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>>     -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1],
>>     0, 31);
>>     -        tcg_temp_free_i64(t0);
>>     +        rs2 = tcg_temp_new_i64();
>>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>>     +
>>     +        /*
>>     +         * Replace bit 31 in rs1 with inverse in rs2.
>>     +         * This formulation retains the nanboxing of rs1.
>>     +         */
>>     +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
>>     +        tcg_gen_andc_i64(rs2, mask, rs2);
>>
>>
>> should be
>>               tcg_gen_not_i64(rs2, rs2);         // forget to inverse rs2
>>               tcg_gen_andc_i64(rs2, rs2, mask);  //mask needs to be 
>> inverted to get only sign
> Hi Chih-Min，
>
> Thanks for pointing it out. It's a bug here. However, I think it 
> should be
>
> tcg_gen_andc_i64(rs2, rs2, mask);  // only get rs2 bit 31
> tcg_gen_not_i64(rs2, rs2);  // inverse rs2
>
Hi Chih-Min,

Sorry， your code is right.

Zhiwei
> Best Regards,
> Zhiwei
>>
>>  Chih-Min Chao
>>
>>     + tcg_gen_and_i64(rs1, mask, rs1);
>>     +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
>>     +
>>     +        tcg_temp_free_i64(mask);
>>     +        tcg_temp_free_i64(rs2);
>>          }
>>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>>     +    tcg_temp_free_i64(rs1);
>>     +
>>          mark_fs_dirty(ctx);
>>          return true;
>>      }
>>
>>      static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>>      {
>>     +    TCGv_i64 rs1, rs2;
>>     +
>>          REQUIRE_FPU;
>>          REQUIRE_EXT(ctx, RVF);
>>     +
>>     +    rs1 = tcg_temp_new_i64();
>>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>>     +
>>          if (a->rs1 == a->rs2) { /* FABS */
>>     -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>>     ~INT32_MIN);
>>     +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1,
>>     ~MAKE_64BIT_MASK(31, 1));
>>          } else {
>>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>>     -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>>     -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>>     -        tcg_temp_free_i64(t0);
>>     +        rs2 = tcg_temp_new_i64();
>>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>>     +
>>     +        /*
>>     +         * Xor bit 31 in rs1 with that in rs2.
>>     +         * This formulation retains the nanboxing of rs1.
>>     +         */
>>     +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
>>     +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
>>     +
>>     +        tcg_temp_free_i64(rs2);
>>          }
>>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>>     +    tcg_temp_free_i64(rs1);
>>     +
>>          mark_fs_dirty(ctx);
>>          return true;
>>      }
>>     diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>>     index 12a746da97..bf35182776 100644
>>     --- a/target/riscv/translate.c
>>     +++ b/target/riscv/translate.c
>>     @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out,
>>     TCGv_i64 in)
>>          tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>>      }
>>
>>     +/*
>>     + * A narrow n-bit operation, where n < FLEN, checks that input
>>     operands
>>     + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
>>     + * If so, the least-significant bits of the input are used,
>>     otherwise the
>>     + * input value is treated as an n-bit canonical NaN (v2.2
>>     section 9.2).
>>     + *
>>     + * Here, the result is always nan-boxed, even the canonical nan.
>>     + */
>>     +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>>     +{
>>     +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
>>     +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
>>     +
>>     +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
>>     +    tcg_temp_free_i64(t_max);
>>     +    tcg_temp_free_i64(t_nan);
>>     +}
>>     +
>>      static void generate_exception(DisasContext *ctx, int excp)
>>      {
>>          tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
>>     -- 
>>     2.25.1
>>
>>
>


[-- Attachment #2: Type: text/html, Size: 11685 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c
@ 2020-08-08 23:06         ` LIU Zhiwei
  0 siblings, 0 replies; 62+ messages in thread
From: LIU Zhiwei @ 2020-08-08 23:06 UTC (permalink / raw)
  To: Chih-Min Chao, Richard Henderson
  Cc: qemu-devel@nongnu.org Developers, Frank Chang, Alistair Francis,
	open list:RISC-V

[-- Attachment #1: Type: text/plain, Size: 7293 bytes --]



On 2020/8/8 22:18, LIU Zhiwei wrote:
>
>
> On 2020/8/8 4:24, Chih-Min Chao wrote:
>> On Fri, Jul 24, 2020 at 8:28 AM Richard Henderson 
>> <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>> 
>> wrote:
>>
>>     If a 32-bit input is not properly nanboxed, then the input is
>>     replaced
>>     with the default qnan.  The only inline expansion is for the
>>     sign-changing
>>     set of instructions: FSGNJ.S, FSGNJX.S, FSGNJN.S.
>>
>>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org
>>     <mailto:richard.henderson@linaro.org>>
>>     ---
>>      target/riscv/insn_trans/trans_rvf.inc.c | 71
>>     +++++++++++++++++++------
>>      target/riscv/translate.c                | 18 +++++++
>>      2 files changed, 73 insertions(+), 16 deletions(-)
>>
>>     diff --git a/target/riscv/insn_trans/trans_rvf.inc.c
>>     b/target/riscv/insn_trans/trans_rvf.inc.c
>>     index 264d3139f1..f9a9e0643a 100644
>>     --- a/target/riscv/insn_trans/trans_rvf.inc.c
>>     +++ b/target/riscv/insn_trans/trans_rvf.inc.c
>>     @@ -161,47 +161,86 @@ static bool trans_fsgnj_s(DisasContext
>>     *ctx, arg_fsgnj_s *a)
>>      {
>>          REQUIRE_FPU;
>>          REQUIRE_EXT(ctx, RVF);
>>     +
>>          if (a->rs1 == a->rs2) { /* FMOV */
>>     -        tcg_gen_mov_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>>     +        gen_check_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rs1]);
>>          } else { /* FSGNJ */
>>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], cpu_fpr[a->rs2],
>>     cpu_fpr[a->rs1],
>>     -                            0, 31);
>>     +        TCGv_i64 rs1 = tcg_temp_new_i64();
>>     +        TCGv_i64 rs2 = tcg_temp_new_i64();
>>     +
>>     +        gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>>     +
>>     +        /* This formulation retains the nanboxing of rs2. */
>>     +        tcg_gen_deposit_i64(cpu_fpr[a->rd], rs2, rs1, 0, 31);
>>     +        tcg_temp_free_i64(rs1);
>>     +        tcg_temp_free_i64(rs2);
>>          }
>>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>>          mark_fs_dirty(ctx);
>>          return true;
>>      }
>>
>>      static bool trans_fsgnjn_s(DisasContext *ctx, arg_fsgnjn_s *a)
>>      {
>>     +    TCGv_i64 rs1, rs2, mask;
>>     +
>>          REQUIRE_FPU;
>>          REQUIRE_EXT(ctx, RVF);
>>     +
>>     +    rs1 = tcg_temp_new_i64();
>>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>>     +
>>          if (a->rs1 == a->rs2) { /* FNEG */
>>     -        tcg_gen_xori_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>>     INT32_MIN);
>>     +        tcg_gen_xori_i64(cpu_fpr[a->rd], rs1,
>>     MAKE_64BIT_MASK(31, 1));
>>          } else {
>>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>>     -        tcg_gen_not_i64(t0, cpu_fpr[a->rs2]);
>>     -        tcg_gen_deposit_i64(cpu_fpr[a->rd], t0, cpu_fpr[a->rs1],
>>     0, 31);
>>     -        tcg_temp_free_i64(t0);
>>     +        rs2 = tcg_temp_new_i64();
>>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>>     +
>>     +        /*
>>     +         * Replace bit 31 in rs1 with inverse in rs2.
>>     +         * This formulation retains the nanboxing of rs1.
>>     +         */
>>     +        mask = tcg_const_i64(~MAKE_64BIT_MASK(31, 1));
>>     +        tcg_gen_andc_i64(rs2, mask, rs2);
>>
>>
>> should be
>>               tcg_gen_not_i64(rs2, rs2);         // forget to inverse rs2
>>               tcg_gen_andc_i64(rs2, rs2, mask);  //mask needs to be 
>> inverted to get only sign
> Hi Chih-Min，
>
> Thanks for pointing it out. It's a bug here. However, I think it 
> should be
>
> tcg_gen_andc_i64(rs2, rs2, mask);  // only get rs2 bit 31
> tcg_gen_not_i64(rs2, rs2);  // inverse rs2
>
Hi Chih-Min,

Sorry， your code is right.

Zhiwei
> Best Regards,
> Zhiwei
>>
>>  Chih-Min Chao
>>
>>     + tcg_gen_and_i64(rs1, mask, rs1);
>>     +        tcg_gen_or_i64(cpu_fpr[a->rd], rs1, rs2);
>>     +
>>     +        tcg_temp_free_i64(mask);
>>     +        tcg_temp_free_i64(rs2);
>>          }
>>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>>     +    tcg_temp_free_i64(rs1);
>>     +
>>          mark_fs_dirty(ctx);
>>          return true;
>>      }
>>
>>      static bool trans_fsgnjx_s(DisasContext *ctx, arg_fsgnjx_s *a)
>>      {
>>     +    TCGv_i64 rs1, rs2;
>>     +
>>          REQUIRE_FPU;
>>          REQUIRE_EXT(ctx, RVF);
>>     +
>>     +    rs1 = tcg_temp_new_i64();
>>     +    gen_check_nanbox_s(rs1, cpu_fpr[a->rs1]);
>>     +
>>          if (a->rs1 == a->rs2) { /* FABS */
>>     -        tcg_gen_andi_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1],
>>     ~INT32_MIN);
>>     +        tcg_gen_andi_i64(cpu_fpr[a->rd], rs1,
>>     ~MAKE_64BIT_MASK(31, 1));
>>          } else {
>>     -        TCGv_i64 t0 = tcg_temp_new_i64();
>>     -        tcg_gen_andi_i64(t0, cpu_fpr[a->rs2], INT32_MIN);
>>     -        tcg_gen_xor_i64(cpu_fpr[a->rd], cpu_fpr[a->rs1], t0);
>>     -        tcg_temp_free_i64(t0);
>>     +        rs2 = tcg_temp_new_i64();
>>     +        gen_check_nanbox_s(rs2, cpu_fpr[a->rs2]);
>>     +
>>     +        /*
>>     +         * Xor bit 31 in rs1 with that in rs2.
>>     +         * This formulation retains the nanboxing of rs1.
>>     +         */
>>     +        tcg_gen_andi_i64(rs2, rs2, MAKE_64BIT_MASK(31, 1));
>>     +        tcg_gen_xor_i64(cpu_fpr[a->rd], rs1, rs2);
>>     +
>>     +        tcg_temp_free_i64(rs2);
>>          }
>>     -    gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
>>     +    tcg_temp_free_i64(rs1);
>>     +
>>          mark_fs_dirty(ctx);
>>          return true;
>>      }
>>     diff --git a/target/riscv/translate.c b/target/riscv/translate.c
>>     index 12a746da97..bf35182776 100644
>>     --- a/target/riscv/translate.c
>>     +++ b/target/riscv/translate.c
>>     @@ -101,6 +101,24 @@ static void gen_nanbox_s(TCGv_i64 out,
>>     TCGv_i64 in)
>>          tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
>>      }
>>
>>     +/*
>>     + * A narrow n-bit operation, where n < FLEN, checks that input
>>     operands
>>     + * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
>>     + * If so, the least-significant bits of the input are used,
>>     otherwise the
>>     + * input value is treated as an n-bit canonical NaN (v2.2
>>     section 9.2).
>>     + *
>>     + * Here, the result is always nan-boxed, even the canonical nan.
>>     + */
>>     +static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
>>     +{
>>     +    TCGv_i64 t_max = tcg_const_i64(0xffffffff00000000ull);
>>     +    TCGv_i64 t_nan = tcg_const_i64(0xffffffff7fc00000ull);
>>     +
>>     +    tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
>>     +    tcg_temp_free_i64(t_max);
>>     +    tcg_temp_free_i64(t_nan);
>>     +}
>>     +
>>      static void generate_exception(DisasContext *ctx, int excp)
>>      {
>>          tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
>>     -- 
>>     2.25.1
>>
>>
>


[-- Attachment #2: Type: text/html, Size: 11685 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2020-08-08 23:06 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24  0:28 [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison Richard Henderson
2020-07-24  0:28 ` Richard Henderson
2020-07-24  0:28 ` [PATCH v2 1/7] target/riscv: Generate nanboxed results from fp helpers Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-07-24  2:35   ` LIU Zhiwei
2020-07-24  2:35     ` LIU Zhiwei
2020-07-24  3:55     ` Richard Henderson
2020-07-24  3:55       ` Richard Henderson
2020-07-24  6:05       ` LIU Zhiwei
2020-07-24  6:05         ` LIU Zhiwei
2020-08-06  6:09         ` Chih-Min Chao
2020-08-06  6:09           ` Chih-Min Chao
2020-08-06  7:05           ` LIU Zhiwei
2020-08-06  7:05             ` LIU Zhiwei
2020-08-06  8:42             ` Chih-Min Chao
2020-08-06  8:42               ` Chih-Min Chao
2020-08-06 10:02               ` LIU Zhiwei
2020-08-06 10:02                 ` LIU Zhiwei
2020-07-24  0:28 ` [PATCH v2 2/7] target/riscv: Generalize gen_nanbox_fpr to gen_nanbox_s Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-07-24  2:39   ` LIU Zhiwei
2020-07-24  2:39     ` LIU Zhiwei
2020-08-06  6:24   ` Chih-Min Chao
2020-08-06  6:24     ` Chih-Min Chao
2020-07-24  0:28 ` [PATCH v2 3/7] target/riscv: Generate nanboxed results from trans_rvf.inc.c Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-07-24  2:41   ` LIU Zhiwei
2020-07-24  2:41     ` LIU Zhiwei
2020-08-06  6:24   ` Chih-Min Chao
2020-08-06  6:24     ` Chih-Min Chao
2020-07-24  0:28 ` [PATCH v2 4/7] target/riscv: Check nanboxed inputs to fp helpers Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-07-24  2:47   ` LIU Zhiwei
2020-07-24  2:47     ` LIU Zhiwei
2020-07-24  3:59     ` Richard Henderson
2020-07-24  3:59       ` Richard Henderson
2020-08-06  6:26   ` Chih-Min Chao
2020-08-06  6:26     ` Chih-Min Chao
2020-07-24  0:28 ` [PATCH v2 5/7] target/riscv: Check nanboxed inputs in trans_rvf.inc.c Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-07-24  6:04   ` LIU Zhiwei
2020-07-24  6:04     ` LIU Zhiwei
2020-08-06  6:27   ` Chih-Min Chao
2020-08-06  6:27     ` Chih-Min Chao
2020-08-07 20:24   ` Chih-Min Chao
2020-08-07 20:24     ` Chih-Min Chao
2020-08-08 14:18     ` LIU Zhiwei
2020-08-08 14:18       ` LIU Zhiwei
2020-08-08 23:06       ` LIU Zhiwei
2020-08-08 23:06         ` LIU Zhiwei
2020-07-24  0:28 ` [PATCH v2 6/7] target/riscv: Clean up fmv.w.x Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-08-06  6:28   ` Chih-Min Chao
2020-08-06  6:28     ` Chih-Min Chao
2020-07-24  0:28 ` [PATCH v2 7/7] target/riscv: check before allocating TCG temps Richard Henderson
2020-07-24  0:28   ` Richard Henderson
2020-08-06  6:28   ` Chih-Min Chao
2020-08-06  6:28     ` Chih-Min Chao
2020-07-24  2:31 ` [PATCH v2 0/7] target/riscv: NaN-boxing for multiple precison LIU Zhiwei
2020-07-24  2:31   ` LIU Zhiwei
2020-07-27 23:37 ` Alistair Francis
2020-07-27 23:37   ` Alistair Francis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.