All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
@ 2018-04-25  1:22 Richard Henderson
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16 Richard Henderson
                   ` (12 more replies)
  0 siblings, 13 replies; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

When running the gcc testsuite with current aarch64-linux-user,
the testsuite detects the presence of the fp16 extension and
enables lots of extra tests for builtins.

Quite a few of these new tests fail because we missed implementing
some instructions.  We really should go back and verify that nothing
else is missing from this (rather large) extension.

In addition, it tests some edge conditions on data that show flaws
in the way we were performing integer<->fp conversion; particularly
with respect to scaled conversion.


r~

PS: FWIW, this was written against my tgt-arm-sve-9 tree, since I
was trying to test sve as generated by gcc.  I don't *think* there
are any dependencies on any of the sve patches, but I didn't check.

PPS: There are two more failures that might be qemu fp16 failures,
but those are SIGSEGV.  This patch set cures all of the SIGILL and
(subsequent) SIGABRT type failures within the testsuite.


Richard Henderson (9):
  target/arm: Implement vector shifted SCVF/UCVF for fp16
  target/arm: Implement vector shifted FCVT for fp16
  target/arm: Fix float16 to/from int16
  target/arm: Clear SVE high bits for FMOV
  target/arm: Implement FMOV (general) for fp16
  target/arm: Implement FCVT (scalar,integer) for fp16
  target/arm: Implement FCVT (scalar,fixed-point) for fp16
  target/arm: Implement FP data-processing (2 source) for fp16
  target/arm: Implement FP data-processing (3 source) for fp16

 target/arm/helper.h        |   6 +
 target/arm/helper.c        |  87 ++++++++++-
 target/arm/translate-a64.c | 371 +++++++++++++++++++++++++++++++++++++--------
 3 files changed, 399 insertions(+), 65 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-04-27 16:04   ` Alex Bennée
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 2/9] target/arm: Implement vector shifted FCVT " Richard Henderson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

While we have some of the scalar paths for *CVF for fp16,
we failed to decode the fp16 version of these instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b47319d437..c92e052686 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -7077,13 +7077,26 @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
                                          int immh, int immb, int opcode,
                                          int rn, int rd)
 {
-    bool is_double = extract32(immh, 3, 1);
-    int size = is_double ? MO_64 : MO_32;
-    int elements;
+    int size, elements, fracbits;
     int immhb = immh << 3 | immb;
-    int fracbits = (is_double ? 128 : 64) - immhb;
 
-    if (!extract32(immh, 2, 2)) {
+    if (immh & 8) {
+        size = MO_64;
+        if (!is_scalar && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+    } else if (immh & 4) {
+        size = MO_32;
+    } else if (immh & 2) {
+        size = MO_16;
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+    } else {
+        /* immh == 0 would be a failure of the decode logic */
+        g_assert(immh == 1);
         unallocated_encoding(s);
         return;
     }
@@ -7091,20 +7104,14 @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
     if (is_scalar) {
         elements = 1;
     } else {
-        elements = is_double ? 2 : is_q ? 4 : 2;
-        if (is_double && !is_q) {
-            unallocated_encoding(s);
-            return;
-        }
+        elements = 8 << is_q >> size;
     }
+    fracbits = (16 << size) - immhb;
 
     if (!fp_access_check(s)) {
         return;
     }
 
-    /* immh == 0 would be a failure of the decode logic */
-    g_assert(immh);
-
     handle_simd_intfp_conv(s, rd, rn, elements, !is_u, fracbits, size);
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 2/9] target/arm: Implement vector shifted FCVT for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16 Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-04-30 15:55   ` Alex Bennée
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 3/9] target/arm: Fix float16 to/from int16 Richard Henderson
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

While we have some of the scalar paths for FCVT for fp16,
we failed to decode the fp16 version of these instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c92e052686..e2d11998bd 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -7120,19 +7120,28 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
                                          bool is_q, bool is_u,
                                          int immh, int immb, int rn, int rd)
 {
-    bool is_double = extract32(immh, 3, 1);
     int immhb = immh << 3 | immb;
-    int fracbits = (is_double ? 128 : 64) - immhb;
-    int pass;
+    int pass, size, fracbits;
     TCGv_ptr tcg_fpstatus;
     TCGv_i32 tcg_rmode, tcg_shift;
 
-    if (!extract32(immh, 2, 2)) {
-        unallocated_encoding(s);
-        return;
-    }
-
-    if (!is_scalar && !is_q && is_double) {
+    if (immh & 0x8) {
+        size = MO_64;
+        if (!is_scalar && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+    } else if (immh & 0x4) {
+        size = MO_32;
+    } else if (immh & 0x2) {
+        size = MO_16;
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+    } else {
+        /* Should have split out AdvSIMD modified immediate earlier.  */
+        assert(immh == 1);
         unallocated_encoding(s);
         return;
     }
@@ -7144,11 +7153,12 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
     assert(!(is_scalar && is_q));
 
     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(FPROUNDING_ZERO));
-    tcg_fpstatus = get_fpstatus_ptr(false);
+    tcg_fpstatus = get_fpstatus_ptr(size == MO_16);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
+    fracbits = (16 << size) - immhb;
     tcg_shift = tcg_const_i32(fracbits);
 
-    if (is_double) {
+    if (size == 3) {
         int maxpass = is_scalar ? 1 : 2;
 
         for (pass = 0; pass < maxpass; pass++) {
@@ -7165,20 +7175,37 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
         }
         clear_vec_high(s, is_q, rd);
     } else {
-        int maxpass = is_scalar ? 1 : is_q ? 4 : 2;
+        void (*fn)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+        int maxpass = is_scalar ? 1 : (8 << is_q >> size);
+
+        switch (size) {
+        case MO_16:
+            if (is_u) {
+                fn = gen_helper_vfp_toulh;
+            } else {
+                fn = gen_helper_vfp_toslh;
+            }
+            break;
+        case MO_32:
+            if (is_u) {
+                fn = gen_helper_vfp_touls;
+            } else {
+                fn = gen_helper_vfp_tosls;
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
         for (pass = 0; pass < maxpass; pass++) {
             TCGv_i32 tcg_op = tcg_temp_new_i32();
 
-            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
-            if (is_u) {
-                gen_helper_vfp_touls(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
-            } else {
-                gen_helper_vfp_tosls(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
-            }
+            read_vec_element_i32(s, tcg_op, rn, pass, size);
+            fn(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
             if (is_scalar) {
                 write_fp_sreg(s, rd, tcg_op);
             } else {
-                write_vec_element_i32(s, tcg_op, rd, pass, MO_32);
+                write_vec_element_i32(s, tcg_op, rd, pass, size);
             }
             tcg_temp_free_i32(tcg_op);
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 3/9] target/arm: Fix float16 to/from int16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16 Richard Henderson
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 2/9] target/arm: Implement vector shifted FCVT " Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-05-01 10:10   ` Alex Bennée
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 4/9] target/arm: Clear SVE high bits for FMOV Richard Henderson
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

The instruction "ucvtf v0.4h, v04h, #2", with input 0x8000u,
overflows the intermediate float16 to infinity before we have a
chance to scale the output.  Use float64 as the intermediate type
so that no input argument (uint32_t in this case) can overflow
or round before scaling.  Given the declared argument, the signed
int32_t function has the same problem.

When converting from float16 to integer, using u/int32_t instead
of u/int16_t means that the bounding is incorrect.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  4 ++--
 target/arm/helper.c        | 53 ++++++++++++++++++++++++++++++++++++++++++++--
 target/arm/translate-a64.c |  4 ++--
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index b3ae394b4f..eafd5d746b 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -149,8 +149,8 @@ DEF_HELPER_3(vfp_toshd_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_tosld_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
-DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
-DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index ea09510599..743f34bd0a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11409,11 +11409,60 @@ VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
 VFP_CONV_FIX(uh, s, 32, 32, uint16)
 VFP_CONV_FIX(ul, s, 32, 32, uint32)
 VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
-VFP_CONV_FIX_A64(sl, h, 16, 32, int32)
-VFP_CONV_FIX_A64(ul, h, 16, 32, uint32)
+
 #undef VFP_CONV_FIX
 #undef VFP_CONV_FIX_FLOAT
 #undef VFP_CONV_FLOAT_FIX_ROUND
+#undef VFP_CONV_FIX_A64
+
+/* Conversion to/from f16 can overflow to infinity before/after scaling.
+ * Therefore we convert to f64 (which does not round), scale,
+ * and then convert f64 to f16 (which may round).
+ */
+
+static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
+{
+    return float64_to_float16(float64_scalbn(f, -shift, fpst), true, fpst);
+}
+
+float16 HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(int32_to_float64(x, fpst), shift, fpst);
+}
+
+float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
+}
+
+static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
+{
+    if (unlikely(float16_is_any_nan(f))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    } else {
+        int old_exc_flags = get_float_exception_flags(fpst);
+        float64 ret;
+
+        ret = float16_to_float64(f, true, fpst);
+        ret = float64_scalbn(ret, shift, fpst);
+        old_exc_flags |= get_float_exception_flags(fpst)
+            & float_flag_input_denormal;
+        set_float_exception_flags(old_exc_flags, fpst);
+
+        return ret;
+    }
+}
+
+uint32_t HELPER(vfp_toshh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int16(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
+}
 
 /* Set the current fp rounding mode and return the old one.
  * The argument is a softfloat float_round_ value.
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e2d11998bd..b27892d971 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -7181,9 +7181,9 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
         switch (size) {
         case MO_16:
             if (is_u) {
-                fn = gen_helper_vfp_toulh;
+                fn = gen_helper_vfp_touhh;
             } else {
-                fn = gen_helper_vfp_toslh;
+                fn = gen_helper_vfp_toshh;
             }
             break;
         case MO_32:
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 4/9] target/arm: Clear SVE high bits for FMOV
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (2 preceding siblings ...)
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 3/9] target/arm: Fix float16 to/from int16 Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-05-01 10:44   ` Alex Bennée
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16 Richard Henderson
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

Use write_fp_dreg and clear_vec_high to zero the bits
that need zeroing for these cases.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b27892d971..f2241d8174 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5356,31 +5356,24 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
 
     if (itof) {
         TCGv_i64 tcg_rn = cpu_reg(s, rn);
+        TCGv_i64 tmp;
 
         switch (type) {
         case 0:
-        {
             /* 32 bit */
-            TCGv_i64 tmp = tcg_temp_new_i64();
+            tmp = tcg_temp_new_i64();
             tcg_gen_ext32u_i64(tmp, tcg_rn);
-            tcg_gen_st_i64(tmp, cpu_env, fp_reg_offset(s, rd, MO_64));
-            tcg_gen_movi_i64(tmp, 0);
-            tcg_gen_st_i64(tmp, cpu_env, fp_reg_hi_offset(s, rd));
+            write_fp_dreg(s, rd, tmp);
             tcg_temp_free_i64(tmp);
             break;
-        }
         case 1:
-        {
             /* 64 bit */
-            TCGv_i64 tmp = tcg_const_i64(0);
-            tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_offset(s, rd, MO_64));
-            tcg_gen_st_i64(tmp, cpu_env, fp_reg_hi_offset(s, rd));
-            tcg_temp_free_i64(tmp);
+            write_fp_dreg(s, rd, tcg_rn);
             break;
-        }
         case 2:
             /* 64 bit to top half. */
             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
+            clear_vec_high(s, true, rd);
             break;
         }
     } else {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (3 preceding siblings ...)
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 4/9] target/arm: Clear SVE high bits for FMOV Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-04-25  1:31   ` Philippe Mathieu-Daudé
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 6/9] target/arm: Implement FCVT (scalar, integer) " Richard Henderson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

Adding the fp16 moves to/from general registers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index f2241d8174..36bb5f6f08 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5375,6 +5375,15 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
             clear_vec_high(s, true, rd);
             break;
+        case 3:
+            /* 16 bit */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_ext16u_i64(tmp, tcg_rn);
+            write_fp_dreg(s, rd, tmp);
+            tcg_temp_free_i64(tmp);
+            break;
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_rd = cpu_reg(s, rd);
@@ -5392,6 +5401,12 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             /* 64 bits from top half */
             tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
             break;
+        case 3:
+            /* 16 bit */
+            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 }
@@ -5431,10 +5446,15 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         case 0xa: /* 64 bit */
         case 0xd: /* 64 bit to top half of quad */
             break;
+        case 0x6: /* 16-bit */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
         default:
             /* all other sf/type/rmode combinations are invalid */
             unallocated_encoding(s);
-            break;
+            return;
         }
 
         if (!fp_access_check(s)) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 6/9] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (4 preceding siblings ...)
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16 Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-05-01 10:55   ` Alex Bennée
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 7/9] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  6 +++
 target/arm/helper.c        | 38 +++++++++++++++++-
 target/arm/translate-a64.c | 96 ++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 122 insertions(+), 18 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index eafd5d746b..f494b10f1b 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -151,6 +151,10 @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
+DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
@@ -177,6 +181,8 @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
 
 DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
 DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 743f34bd0a..dbc10b454a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11416,8 +11416,12 @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 #undef VFP_CONV_FIX_A64
 
 /* Conversion to/from f16 can overflow to infinity before/after scaling.
- * Therefore we convert to f64 (which does not round), scale,
- * and then convert f64 to f16 (which may round).
+ * Therefore we convert to f64, scale, and then convert f64 to f16; or
+ * vice versa for conversion to integer.
+ *
+ * For 16- and 32-bit integers, the conversion to f64 never rounds.
+ * For 64-bit integers, any integer that would cause rounding will also
+ * overflow to f16 infinity, so there is no double rounding problem.
  */
 
 static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
@@ -11435,6 +11439,16 @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
     return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
 }
 
+float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
+}
+
+float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
+}
+
 static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
 {
     if (unlikely(float16_is_any_nan(f))) {
@@ -11464,6 +11478,26 @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
     return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
 }
 
+uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
 /* Set the current fp rounding mode and return the old one.
  * The argument is a softfloat float_round_ value.
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 36bb5f6f08..4f6317aa0f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5186,11 +5186,11 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                            bool itof, int rmode, int scale, int sf, int type)
 {
     bool is_signed = !(opcode & 1);
-    bool is_double = type;
     TCGv_ptr tcg_fpstatus;
-    TCGv_i32 tcg_shift;
+    TCGv_i32 tcg_shift, tcg_single;
+    TCGv_i64 tcg_double;
 
-    tcg_fpstatus = get_fpstatus_ptr(false);
+    tcg_fpstatus = get_fpstatus_ptr(type == 3);
 
     tcg_shift = tcg_const_i32(64 - scale);
 
@@ -5208,8 +5208,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             tcg_int = tcg_extend;
         }
 
-        if (is_double) {
-            TCGv_i64 tcg_double = tcg_temp_new_i64();
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = tcg_temp_new_i64();
             if (is_signed) {
                 gen_helper_vfp_sqtod(tcg_double, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -5219,8 +5220,10 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_dreg(s, rd, tcg_double);
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = tcg_temp_new_i32();
+            break;
+
+        case 0: /* float32 */
+            tcg_single = tcg_temp_new_i32();
             if (is_signed) {
                 gen_helper_vfp_sqtos(tcg_single, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -5230,6 +5233,23 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_sreg(s, rd, tcg_single);
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = tcg_temp_new_i32();
+            if (is_signed) {
+                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            } else {
+                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            }
+            write_fp_sreg(s, rd, tcg_single);
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_int = cpu_reg(s, rd);
@@ -5246,8 +5266,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 
-        if (is_double) {
-            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = read_fp_dreg(s, rn);
             if (is_signed) {
                 if (!sf) {
                     gen_helper_vfp_tosld(tcg_int, tcg_double,
@@ -5265,9 +5286,14 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                                          tcg_shift, tcg_fpstatus);
                 }
             }
+            if (!sf) {
+                tcg_gen_ext32u_i64(tcg_int, tcg_int);
+            }
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
+            break;
+
+        case 0: /* float32 */
+            tcg_single = read_fp_sreg(s, rn);
             if (sf) {
                 if (is_signed) {
                     gen_helper_vfp_tosqs(tcg_int, tcg_single,
@@ -5289,14 +5315,39 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                 tcg_temp_free_i32(tcg_dest);
             }
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = read_fp_sreg(s, rn);
+            if (sf) {
+                if (is_signed) {
+                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_touqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+            } else {
+                TCGv_i32 tcg_dest = tcg_temp_new_i32();
+                if (is_signed) {
+                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
+                tcg_temp_free_i32(tcg_dest);
+            }
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
         tcg_temp_free_i32(tcg_rmode);
-
-        if (!sf) {
-            tcg_gen_ext32u_i64(tcg_int, tcg_int);
-        }
     }
 
     tcg_temp_free_ptr(tcg_fpstatus);
@@ -5465,7 +5516,20 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         /* actual FP conversions */
         bool itof = extract32(opcode, 1, 1);
 
-        if (type > 1 || (rmode != 0 && opcode > 1)) {
+        if (rmode != 0 && opcode > 1) {
+            unallocated_encoding(s);
+            return;
+        }
+        switch (type) {
+        case 0: /* float32 */
+        case 1: /* float64 */
+            break;
+        case 3: /* float16 */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
+        default:
             unallocated_encoding(s);
             return;
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 7/9] target/arm: Implement FCVT (scalar, fixed-point) for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (5 preceding siblings ...)
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 6/9] target/arm: Implement FCVT (scalar, integer) " Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-05-01 10:57   ` Alex Bennée
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) " Richard Henderson
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 4f6317aa0f..794ede7222 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5372,8 +5372,7 @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
     bool sf = extract32(insn, 31, 1);
     bool itof;
 
-    if (sbit || (type > 1)
-        || (!sf && scale < 32)) {
+    if (sbit || (!sf && scale < 32)) {
         unallocated_encoding(s);
         return;
     }
@@ -5392,6 +5391,20 @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
         return;
     }
 
+    switch (type) {
+    case 0: /* float32 */
+    case 1: /* float64 */
+        break;
+    case 3: /* float16 */
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+
     if (!fp_access_check(s)) {
         return;
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (6 preceding siblings ...)
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 7/9] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
@ 2018-04-25  1:22 ` Richard Henderson
  2018-05-01 11:13   ` Alex Bennée
  2018-04-25  1:23 ` [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 " Richard Henderson
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

We missed all of the scalar fp16 binary operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 794ede7222..11b90b7eb0 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -532,6 +532,14 @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
     return v;
 }
 
+static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
+{
+    TCGv_i32 v = tcg_temp_new_i32();
+
+    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
+    return v;
+}
+
 /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
  * If SVE is not enabled, then there are only 128 bits in the vector.
  */
@@ -4968,6 +4976,61 @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (2 source) - half precision */
+static void handle_fp_2src_half(DisasContext *s, int opcode,
+                                int rd, int rn, int rm)
+{
+    TCGv_i32 tcg_op1;
+    TCGv_i32 tcg_op2;
+    TCGv_i32 tcg_res;
+    TCGv_ptr fpst;
+
+    tcg_res = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(true);
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+
+    switch (opcode) {
+    case 0x0: /* FMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x1: /* FDIV */
+        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x2: /* FADD */
+        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x3: /* FSUB */
+        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x4: /* FMAX */
+        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x5: /* FMIN */
+        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x6: /* FMAXNM */
+        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x7: /* FMINNM */
+        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x8: /* FNMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (2 source)
  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -5000,6 +5063,16 @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
         }
         handle_fp_2src_double(s, opcode, rd, rn, rm);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_2src_half(s, opcode, rd, rn, rm);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 source) for fp16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (7 preceding siblings ...)
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) " Richard Henderson
@ 2018-04-25  1:23 ` Richard Henderson
  2018-05-01 11:21   ` Alex Bennée
  2018-04-25  1:35 ` [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 no-reply
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  1:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

We missed all of the scalar fp16 fma operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 11b90b7eb0..0cb1fc4d67 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5154,6 +5154,44 @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (3 source) - half precision */
+static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
+                                int rd, int rn, int rm, int ra)
+{
+    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
+    TCGv_i32 tcg_res = tcg_temp_new_i32();
+    TCGv_ptr fpst = get_fpstatus_ptr(true);
+
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+    tcg_op3 = read_fp_hreg(s, ra);
+
+    /* These are fused multiply-add, and must be done as one
+     * floating point operation with no rounding between the
+     * multiplication and addition steps.
+     * NB that doing the negations here as separate steps is
+     * correct : an input NaN should come out with its sign bit
+     * flipped if it is a negated-input.
+     */
+    if (o1 == true) {
+        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
+    }
+
+    if (o0 != o1) {
+        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
+    }
+
+    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_op3);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (3 source)
  *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
  * +---+---+---+-----------+------+----+------+----+------+------+------+
@@ -5183,6 +5221,16 @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
         }
         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16 Richard Henderson
@ 2018-04-25  1:31   ` Philippe Mathieu-Daudé
  2018-04-25  8:40     ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-04-25  1:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: peter.maydell, alex.bennee

Hi Richard,

On 04/24/2018 10:22 PM, Richard Henderson wrote:
> Adding the fp16 moves to/from general registers.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index f2241d8174..36bb5f6f08 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5375,6 +5375,15 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
>              tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
>              clear_vec_high(s, true, rd);
>              break;
> +        case 3:
> +            /* 16 bit */
> +            tmp = tcg_temp_new_i64();
> +            tcg_gen_ext16u_i64(tmp, tcg_rn);
> +            write_fp_dreg(s, rd, tmp);
> +            tcg_temp_free_i64(tmp);
> +            break;
> +        default:
> +            g_assert_not_reached();
>          }
>      } else {
>          TCGv_i64 tcg_rd = cpu_reg(s, rd);
> @@ -5392,6 +5401,12 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
>              /* 64 bits from top half */
>              tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
>              break;
> +        case 3:
> +            /* 16 bit */
> +            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
> +            break;
> +        default:
> +            g_assert_not_reached();
>          }
>      }
>  }
> @@ -5431,10 +5446,15 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
>          case 0xa: /* 64 bit */
>          case 0xd: /* 64 bit to top half of quad */
>              break;
> +        case 0x6: /* 16-bit */
> +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +                break;
> +            }
> +            /* fallthru */
>          default:
>              /* all other sf/type/rmode combinations are invalid */
>              unallocated_encoding(s);
> -            break;
> +            return;

Agreed with this change, however shouldn't this be in a separate patch?

>          }
>  
>          if (!fp_access_check(s)) {
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (8 preceding siblings ...)
  2018-04-25  1:23 ` [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 " Richard Henderson
@ 2018-04-25  1:35 ` no-reply
  2018-04-25  9:14 ` Alex Bennée
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 34+ messages in thread
From: no-reply @ 2018-04-25  1:35 UTC (permalink / raw)
  To: richard.henderson; +Cc: famz, qemu-devel, peter.maydell, alex.bennee

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180425012300.14698-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]            patchew/1513160272-15921-1-git-send-email-christian.ehrhardt@canonical.com -> patchew/1513160272-15921-1-git-send-email-christian.ehrhardt@canonical.com
 t [tag update]            patchew/20180424152405.10304-1-alex.bennee@linaro.org -> patchew/20180424152405.10304-1-alex.bennee@linaro.org
 t [tag update]            patchew/20180424160329.8089-1-alex.bennee@linaro.org -> patchew/20180424160329.8089-1-alex.bennee@linaro.org
 t [tag update]            patchew/20180424214550.32549-1-lersek@redhat.com -> patchew/20180424214550.32549-1-lersek@redhat.com
 t [tag update]            patchew/20180424222103.19946-1-f4bug@amsat.org -> patchew/20180424222103.19946-1-f4bug@amsat.org
 * [new tag]               patchew/20180425012300.14698-1-richard.henderson@linaro.org -> patchew/20180425012300.14698-1-richard.henderson@linaro.org
Switched to a new branch 'test'
84efc5c3b1 target/arm: Implement FP data-processing (3 source) for fp16
10574670d9 target/arm: Implement FP data-processing (2 source) for fp16
2d4e187869 target/arm: Implement FCVT (scalar, fixed-point) for fp16
ec350065e4 target/arm: Implement FCVT (scalar, integer) for fp16
d157aaa589 target/arm: Implement FMOV (general) for fp16
7eeb78179f target/arm: Clear SVE high bits for FMOV
a7fed5d8a6 target/arm: Fix float16 to/from int16
72239133a0 target/arm: Implement vector shifted FCVT for fp16
5cb4ff2996 target/arm: Implement vector shifted SCVF/UCVF for fp16

=== OUTPUT BEGIN ===
Checking PATCH 1/9: target/arm: Implement vector shifted SCVF/UCVF for fp16...
Checking PATCH 2/9: target/arm: Implement vector shifted FCVT for fp16...
Checking PATCH 3/9: target/arm: Fix float16 to/from int16...
ERROR: spaces required around that '*' (ctx:WxV)
#40: FILE: target/arm/helper.c:11423:
+static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
                                                                     ^

total: 1 errors, 0 warnings, 83 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/9: target/arm: Clear SVE high bits for FMOV...
Checking PATCH 5/9: target/arm: Implement FMOV (general) for fp16...
Checking PATCH 6/9: target/arm: Implement FCVT (scalar, integer) for fp16...
Checking PATCH 7/9: target/arm: Implement FCVT (scalar, fixed-point) for fp16...
Checking PATCH 8/9: target/arm: Implement FP data-processing (2 source) for fp16...
Checking PATCH 9/9: target/arm: Implement FP data-processing (3 source) for fp16...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16
  2018-04-25  1:31   ` Philippe Mathieu-Daudé
@ 2018-04-25  8:40     ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2018-04-25  8:40 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel; +Cc: peter.maydell, alex.bennee

On 04/24/2018 03:31 PM, Philippe Mathieu-Daudé wrote:
>> @@ -5431,10 +5446,15 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
>>          case 0xa: /* 64 bit */
>>          case 0xd: /* 64 bit to top half of quad */
>>              break;
>> +        case 0x6: /* 16-bit */
>> +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
>> +                break;
>> +            }
>> +            /* fallthru */
>>          default:
>>              /* all other sf/type/rmode combinations are invalid */
>>              unallocated_encoding(s);
>> -            break;
>> +            return;
> 
> Agreed with this change, however shouldn't this be in a separate patch?

Why?


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (9 preceding siblings ...)
  2018-04-25  1:35 ` [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 no-reply
@ 2018-04-25  9:14 ` Alex Bennée
  2018-04-27 17:22 ` Alex Bennée
  2018-05-01 15:47 ` Alex Bennée
  12 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-04-25  9:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> When running the gcc testsuite with current aarch64-linux-user,
> the testsuite detects the presence of the fp16 extension and
> enables lots of extra tests for builtins.
>
> Quite a few of these new tests fail because we missed implementing
> some instructions.  We really should go back and verify that nothing
> else is missing from this (rather large) extension.

Ouch. I thought I'd got them all from parsing the ASL. Obviously not.

> In addition, it tests some edge conditions on data that show flaws
> in the way we were performing integer<->fp conversion; particularly
> with respect to scaled conversion.
>
>
> r~
>
> PS: FWIW, this was written against my tgt-arm-sve-9 tree, since I
> was trying to test sve as generated by gcc.  I don't *think* there
> are any dependencies on any of the sve patches, but I didn't check.
>
> PPS: There are two more failures that might be qemu fp16 failures,
> but those are SIGSEGV.  This patch set cures all of the SIGILL and
> (subsequent) SIGABRT type failures within the testsuite.
>
>
> Richard Henderson (9):
>   target/arm: Implement vector shifted SCVF/UCVF for fp16
>   target/arm: Implement vector shifted FCVT for fp16
>   target/arm: Fix float16 to/from int16
>   target/arm: Clear SVE high bits for FMOV
>   target/arm: Implement FMOV (general) for fp16
>   target/arm: Implement FCVT (scalar,integer) for fp16
>   target/arm: Implement FCVT (scalar,fixed-point) for fp16
>   target/arm: Implement FP data-processing (2 source) for fp16
>   target/arm: Implement FP data-processing (3 source) for fp16
>
>  target/arm/helper.h        |   6 +
>  target/arm/helper.c        |  87 ++++++++++-
>  target/arm/translate-a64.c | 371 +++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 399 insertions(+), 65 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16 Richard Henderson
@ 2018-04-27 16:04   ` Alex Bennée
  2018-04-29 14:44     ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2018-04-27 16:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> While we have some of the scalar paths for *CVF for fp16,
> we failed to decode the fp16 version of these instructions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 33 ++++++++++++++++++++-------------
>  1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index b47319d437..c92e052686 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -7077,13 +7077,26 @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
>                                           int immh, int immb, int opcode,
>                                           int rn, int rd)
>  {
> -    bool is_double = extract32(immh, 3, 1);
> -    int size = is_double ? MO_64 : MO_32;
> -    int elements;
> +    int size, elements, fracbits;
>      int immhb = immh << 3 | immb;
> -    int fracbits = (is_double ? 128 : 64) - immhb;
>
> -    if (!extract32(immh, 2, 2)) {
> +    if (immh & 8) {
> +        size = MO_64;
> +        if (!is_scalar && !is_q) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +    } else if (immh & 4) {
> +        size = MO_32;
> +    } else if (immh & 2) {
> +        size = MO_16;
> +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +    } else {
> +        /* immh == 0 would be a failure of the decode logic */
> +        g_assert(immh == 1);
>          unallocated_encoding(s);
>          return;
>      }
> @@ -7091,20 +7104,14 @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
>      if (is_scalar) {
>          elements = 1;
>      } else {
> -        elements = is_double ? 2 : is_q ? 4 : 2;
> -        if (is_double && !is_q) {
> -            unallocated_encoding(s);
> -            return;
> -        }
> +        elements = 8 << is_q >> size;

That is a brain exercise for operator precedence. Would:

     elements = (is_q ? 16 : 8) >> size;

be clearer? Personally I'd have probably done it long hand in each size
leg above, e.g:

    size = MO_16;
    elements = is_scalar ? 1 : (is_q ? 8 : 4);

>      }
> +    fracbits = (16 << size) - immhb;

The ship has already sailed on this but I'm wishing we had a
mosize_to_bits() helper function to be explicit about this
transformation.

>
>      if (!fp_access_check(s)) {
>          return;
>      }
>
> -    /* immh == 0 would be a failure of the decode logic */
> -    g_assert(immh);
> -
>      handle_simd_intfp_conv(s, rd, rn, elements, !is_u, fracbits, size);
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (10 preceding siblings ...)
  2018-04-25  9:14 ` Alex Bennée
@ 2018-04-27 17:22 ` Alex Bennée
  2018-04-27 18:55   ` Alex Bennée
  2018-05-11  2:17   ` Richard Henderson
  2018-05-01 15:47 ` Alex Bennée
  12 siblings, 2 replies; 34+ messages in thread
From: Alex Bennée @ 2018-04-27 17:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> When running the gcc testsuite with current aarch64-linux-user,
> the testsuite detects the presence of the fp16 extension and
> enables lots of extra tests for builtins.
>
> Quite a few of these new tests fail because we missed implementing
> some instructions.  We really should go back and verify that nothing
> else is missing from this (rather large) extension.

So this set of instructions is generated from any ASL description that
contains "half":

# Input file for risugen defining AArch64 instructions
.mode arm.aarch64
FADDP_asisdpair_only_H A64_V 0101111000110000110110 Rn:5  Rd:5
FRECPS_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 001111 Rn:5  Rd:5
FMAXNMP_asisdpair_only_H A64_V 0101111000110000110010 Rn:5  Rd:5
SCVTF_asimdmiscfp16_R A64_V 0 Q:1 00111001111001110110 Rn:5  Rd:5
FRSQRTE_asisdmiscfp16_R A64_V 0111111011111001110110 Rn:5  Rd:5
FCMGT_asisdmiscfp16_FZ A64_V 0101111011111000110010 Rn:5  Rd:5
FMUL_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 000111 Rn:5  Rd:5
FSUB_asimdsamefp16_only A64_V 0 Q:1 001110110 Rm:5 000101 Rn:5  Rd:5
FMAX_H_floatdp2 A64_V 00011110111 Rm:5 010010 Rn:5  Rd:5
FADDP_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 000101 Rn:5  Rd:5
FMAXNMP_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 000001 Rn:5  Rd:5
FMUL_asisdelem_RH_H A64_V 0101111100 L:1  M:1  Rm:4 1001 H:1 0 Rn:5  Rd:5
FCMGT_asimdsamefp16_only A64_V 0 Q:1 101110110 Rm:5 001001 Rn:5  Rd:5
FCVTAS_asisdmiscfp16_R A64_V 0101111001111001110010 Rn:5  Rd:5
FADD_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 000101 Rn:5  Rd:5
FCSEL_H_floatsel A64_V 00011110111 Rm:5  cond:4 11 Rn:5  Rd:5
FMAXNM_H_floatdp2 A64_V 00011110111 Rm:5 011010 Rn:5  Rd:5
FMAXNM_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 000001 Rn:5  Rd:5
FMINNMP_asisdpair_only_H A64_V 0101111010110000110010 Rn:5  Rd:5
FNMUL_H_floatdp2 A64_V 00011110111 Rm:5 100010 Rn:5  Rd:5
FCMPE_H_floatcmp A64_V 00011110111 Rm:5 001000 Rn:5 10000
FMUL_asimdelem_RH_H A64_V 0 Q:1 00111100 L:1  M:1  Rm:4 1001 H:1 0 Rn:5  Rd:5
FRINTN_asimdmiscfp16_R A64_V 0 Q:1 00111001111001100010 Rn:5  Rd:5
FCCMP_H_floatccmp A64_V 00011110111 Rm:5  cond:4 01 Rn:5 0 nzcv:4
FMADD_H_floatdp3 A64_V 00011111110 Rm:5 0 Ra:5  Rn:5  Rd:5
FCVTZS_asisdmiscfp16_R A64_V 0101111011111001101110 Rn:5  Rd:5
FCVTMS_asisdmiscfp16_R A64_V 0101111001111001101110 Rn:5  Rd:5
FMINP_asimdsamefp16_only A64_V 0 Q:1 101110110 Rm:5 001101 Rn:5  Rd:5
FRECPE_asimdmiscfp16_R A64_V 0 Q:1 00111011111001110110 Rn:5  Rd:5
FCMGE_asisdmiscfp16_FZ A64_V 0111111011111000110010 Rn:5  Rd:5
FNEG_asimdmiscfp16_R A64_V 0 Q:1 10111011111000111110 Rn:5  Rd:5
FCMGE_asisdsamefp16_only A64_V 01111110010 Rm:5 001001 Rn:5  Rd:5
FMULX_asisdelem_RH_H A64_V 0111111100 L:1  M:1  Rm:4 1001 H:1 0 Rn:5  Rd:5
SCVTF_asisdmiscfp16_R A64_V 0101111001111001110110 Rn:5  Rd:5
FMAXNMV_asimdall_only_H A64_V 0 Q:1 00111000110000110010 Rn:5  Rd:5
FMLA_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 000011 Rn:5  Rd:5
FACGT_asisdsamefp16_only A64_V 01111110110 Rm:5 001011 Rn:5  Rd:5
FRINTP_H_floatdp1 A64_V 0001111011100100110000 Rn:5  Rd:5
FRINTZ_asimdmiscfp16_R A64_V 0 Q:1 00111011111001100110 Rn:5  Rd:5
FRINTI_asimdmiscfp16_R A64_V 0 Q:1 10111011111001100110 Rn:5  Rd:5
FDIV_H_floatdp2 A64_V 00011110111 Rm:5 000110 Rn:5  Rd:5
FCVTZS_asimdmiscfp16_R A64_V 0 Q:1 00111011111001101110 Rn:5  Rd:5
FRINTP_asimdmiscfp16_R A64_V 0 Q:1 00111011111001100010 Rn:5  Rd:5
FCCMPE_H_floatccmp A64_V 00011110111 Rm:5  cond:4 01 Rn:5 1 nzcv:4
FCMGT_asisdsamefp16_only A64_V 01111110110 Rm:5 001001 Rn:5  Rd:5
FRINTM_asimdmiscfp16_R A64_V 0 Q:1 00111001111001100110 Rn:5  Rd:5
FMAXP_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 001101 Rn:5  Rd:5
FABD_asisdsamefp16_only A64_V 01111110110 Rm:5 000101 Rn:5  Rd:5
FCMGE_asimdmiscfp16_FZ A64_V 0 Q:1 10111011111000110010 Rn:5  Rd:5
FCMEQ_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 001001 Rn:5  Rd:5
FMINNM_asimdsamefp16_only A64_V 0 Q:1 001110110 Rm:5 000001 Rn:5  Rd:5
FNMADD_H_floatdp3 A64_V 00011111111 Rm:5 0 Ra:5  Rn:5  Rd:5
FRINTI_H_floatdp1 A64_V 0001111011100111110000 Rn:5  Rd:5
FMINNM_H_floatdp2 A64_V 00011110111 Rm:5 011110 Rn:5  Rd:5
FCMLT_asimdmiscfp16_FZ A64_V 0 Q:1 00111011111000111010 Rn:5  Rd:5
FCVTAU_asisdmiscfp16_R A64_V 0111111001111001110010 Rn:5  Rd:5
FCVTNU_asisdmiscfp16_R A64_V 0111111001111001101010 Rn:5  Rd:5
FCVTZU_asisdmiscfp16_R A64_V 0111111011111001101110 Rn:5  Rd:5
FABS_asimdmiscfp16_R A64_V 0 Q:1 00111011111000111110 Rn:5  Rd:5
FMLS_asisdelem_RH_H A64_V 0101111100 L:1  M:1  Rm:4 0101 H:1 0 Rn:5  Rd:5
FCMLT_asisdmiscfp16_FZ A64_V 0101111011111000111010 Rn:5  Rd:5
FMAXP_asisdpair_only_H A64_V 0101111000110000111110 Rn:5  Rd:5
FCVTPS_asisdmiscfp16_R A64_V 0101111011111001101010 Rn:5  Rd:5
FRECPE_asisdmiscfp16_R A64_V 0101111011111001110110 Rn:5  Rd:5
FMINNMV_asimdall_only_H A64_V 0 Q:1 00111010110000110010 Rn:5  Rd:5
FRSQRTE_asimdmiscfp16_R A64_V 0 Q:1 10111011111001110110 Rn:5  Rd:5
FMUL_H_floatdp2 A64_V 00011110111 Rm:5 000010 Rn:5  Rd:5
FMLS_asimdsamefp16_only A64_V 0 Q:1 001110110 Rm:5 000011 Rn:5  Rd:5
FDIV_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 001111 Rn:5  Rd:5
FSUB_H_floatdp2 A64_V 00011110111 Rm:5 001110 Rn:5  Rd:5
FMLS_asimdelem_RH_H A64_V 0 Q:1 00111100 L:1  M:1  Rm:4 0101 H:1 0 Rn:5  Rd:5
FMLA_asimdelem_RH_H A64_V 0 Q:1 00111100 L:1  M:1  Rm:4 0001 H:1 0 Rn:5  Rd:5
FCVTPU_asimdmiscfp16_R A64_V 0 Q:1 10111011111001101010 Rn:5  Rd:5
FCMLE_asimdmiscfp16_FZ A64_V 0 Q:1 10111011111000110110 Rn:5  Rd:5
FMLA_asisdelem_RH_H A64_V 0101111100 L:1  M:1  Rm:4 0001 H:1 0 Rn:5  Rd:5
FCVTMS_asimdmiscfp16_R A64_V 0 Q:1 00111001111001101110 Rn:5  Rd:5
FRINTM_H_floatdp1 A64_V 0001111011100101010000 Rn:5  Rd:5
FCVTMU_asimdmiscfp16_R A64_V 0 Q:1 10111001111001101110 Rn:5  Rd:5
FCMGE_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 001001 Rn:5  Rd:5
FRSQRTS_asisdsamefp16_only A64_V 01011110110 Rm:5 001111 Rn:5  Rd:5
FCMEQ_asisdsamefp16_only A64_V 01011110010 Rm:5 001001 Rn:5  Rd:5
FRINTZ_H_floatdp1 A64_V 0001111011100101110000 Rn:5  Rd:5
FCVTNS_asimdmiscfp16_R A64_V 0 Q:1 00111001111001101010 Rn:5  Rd:5
FRSQRTS_asimdsamefp16_only A64_V 0 Q:1 001110110 Rm:5 001111 Rn:5  Rd:5
FRINTX_H_floatdp1 A64_V 0001111011100111010000 Rn:5  Rd:5
FRINTA_H_floatdp1 A64_V 0001111011100110010000 Rn:5  Rd:5
FABD_asimdsamefp16_only A64_V 0 Q:1 101110110 Rm:5 000101 Rn:5  Rd:5
FCVTPU_asisdmiscfp16_R A64_V 0111111011111001101010 Rn:5  Rd:5
FACGE_asisdsamefp16_only A64_V 01111110010 Rm:5 001011 Rn:5  Rd:5
FACGT_asimdsamefp16_only A64_V 0 Q:1 101110110 Rm:5 001011 Rn:5  Rd:5
FCMEQ_asisdmiscfp16_FZ A64_V 0101111011111000110110 Rn:5  Rd:5
FCVTAS_asimdmiscfp16_R A64_V 0 Q:1 00111001111001110010 Rn:5  Rd:5
FMULX_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 000111 Rn:5  Rd:5
FMSUB_H_floatdp3 A64_V 00011111110 Rm:5 1 Ra:5  Rn:5  Rd:5
FCVTMU_asisdmiscfp16_R A64_V 0111111001111001101110 Rn:5  Rd:5
FMIN_H_floatdp2 A64_V 00011110111 Rm:5 010110 Rn:5  Rd:5
FSQRT_asimdmiscfp16_R A64_V 0 Q:1 10111011111001111110 Rn:5  Rd:5
FRINTX_asimdmiscfp16_R A64_V 0 Q:1 10111001111001100110 Rn:5  Rd:5
UCVTF_asimdmiscfp16_R A64_V 0 Q:1 10111001111001110110 Rn:5  Rd:5
FNMSUB_H_floatdp3 A64_V 00011111111 Rm:5 1 Ra:5  Rn:5  Rd:5
FCVTNU_asimdmiscfp16_R A64_V 0 Q:1 10111001111001101010 Rn:5  Rd:5
FMINNMP_asimdsamefp16_only A64_V 0 Q:1 101110110 Rm:5 000001 Rn:5  Rd:5
FRECPX_asisdmiscfp16_R A64_V 0101111011111001111110 Rn:5  Rd:5
FMULX_asisdsamefp16_only A64_V 01011110010 Rm:5 000111 Rn:5  Rd:5
FABS_H_floatdp1 A64_V 0001111011100000110000 Rn:5  Rd:5
FMAX_asimdsamefp16_only A64_V 0 Q:1 001110010 Rm:5 001101 Rn:5  Rd:5
FADD_H_floatdp2 A64_V 00011110111 Rm:5 001010 Rn:5  Rd:5
FRINTN_H_floatdp1 A64_V 0001111011100100010000 Rn:5  Rd:5
UCVTF_asisdmiscfp16_R A64_V 0111111001111001110110 Rn:5  Rd:5
FACGE_asimdsamefp16_only A64_V 0 Q:1 101110010 Rm:5 001011 Rn:5  Rd:5
FCMP_H_floatcmp A64_V 00011110111 Rm:5 001000 Rn:5 00000
FSQRT_H_floatdp1 A64_V 0001111011100001110000 Rn:5  Rd:5
FMAXV_asimdall_only_H A64_V 0 Q:1 00111000110000111110 Rn:5  Rd:5
FCVTPS_asimdmiscfp16_R A64_V 0 Q:1 00111011111001101010 Rn:5  Rd:5
FCMLE_asisdmiscfp16_FZ A64_V 0111111011111000110110 Rn:5  Rd:5
FMINV_asimdall_only_H A64_V 0 Q:1 00111010110000111110 Rn:5  Rd:5
FMINP_asisdpair_only_H A64_V 0101111010110000111110 Rn:5  Rd:5
FCMGT_asimdmiscfp16_FZ A64_V 0 Q:1 00111011111000110010 Rn:5  Rd:5
FCMP_HZ_floatcmp A64_V 00011110111 Rm:5 001000 Rn:5 01000
FRECPS_asisdsamefp16_only A64_V 01011110010 Rm:5 001111 Rn:5  Rd:5
FMOV_H_floatimm A64_V 00011110111 imm8:8 10000000 Rd:5
FCVTNS_asisdmiscfp16_R A64_V 0101111001111001101010 Rn:5  Rd:5
FMULX_asimdelem_RH_H A64_V 0 Q:1 10111100 L:1  M:1  Rm:4 1001 H:1 0 Rn:5  Rd:5
FRINTA_asimdmiscfp16_R A64_V 0 Q:1 10111001111001100010 Rn:5  Rd:5
FCVTAU_asimdmiscfp16_R A64_V 0 Q:1 10111001111001110010 Rn:5  Rd:5
FMIN_asimdsamefp16_only A64_V 0 Q:1 001110110 Rm:5 001101 Rn:5  Rd:5
FCMEQ_asimdmiscfp16_FZ A64_V 0 Q:1 00111011111000110110 Rn:5  Rd:5
FCMPE_HZ_floatcmp A64_V 00011110111 Rm:5 001000 Rn:5 11000
FCVTZU_asimdmiscfp16_R A64_V 0 Q:1 10111011111001101110 Rn:5  Rd:5
FNEG_H_floatdp1 A64_V 0001111011100001010000 Rn:5  Rd:5
FMOV_H_floatdp1 A64_V 0001111011100000010000 Rn:5  Rd:5

And the generated test cases and traces:

  http://people.linaro.org/~alex.bennee/testcases/arm64.risu/testcases.armv8.2_half.tar.xz

Currently I'm seeing failures on:

Failed 9 tests:
testcases.armv8.2_half/insn_FCCMP_H_floatccmp__INC.risu.bin
testcases.armv8.2_half/insn_FCCMPE_H_floatccmp__INC.risu.bin
testcases.armv8.2_half/insn_FCMP_H_floatcmp__INC.risu.bin
testcases.armv8.2_half/insn_FCMP_HZ_floatcmp__INC.risu.bin
testcases.armv8.2_half/insn_FCMPE_H_floatcmp__INC.risu.bin
testcases.armv8.2_half/insn_FCMPE_HZ_floatcmp__INC.risu.bin
testcases.armv8.2_half/insn_FCSEL_H_floatsel__INC.risu.bin
testcases.armv8.2_half/insn_FMOV_H_floatimm__INC.risu.bin
testcases.armv8.2_half/insn_FSQRT_H_floatdp1__INC.risu.bin

but I haven't checked to see if that is just instructions the FVP has in
full SVE mode that aren't in the just FP16 fixes branch I was testing
against.


>
> In addition, it tests some edge conditions on data that show flaws
> in the way we were performing integer<->fp conversion; particularly
> with respect to scaled conversion.
>
>
> r~
>
> PS: FWIW, this was written against my tgt-arm-sve-9 tree, since I
> was trying to test sve as generated by gcc.  I don't *think* there
> are any dependencies on any of the sve patches, but I didn't check.
>
> PPS: There are two more failures that might be qemu fp16 failures,
> but those are SIGSEGV.  This patch set cures all of the SIGILL and
> (subsequent) SIGABRT type failures within the testsuite.
>
>
> Richard Henderson (9):
>   target/arm: Implement vector shifted SCVF/UCVF for fp16
>   target/arm: Implement vector shifted FCVT for fp16
>   target/arm: Fix float16 to/from int16
>   target/arm: Clear SVE high bits for FMOV
>   target/arm: Implement FMOV (general) for fp16
>   target/arm: Implement FCVT (scalar,integer) for fp16
>   target/arm: Implement FCVT (scalar,fixed-point) for fp16
>   target/arm: Implement FP data-processing (2 source) for fp16
>   target/arm: Implement FP data-processing (3 source) for fp16
>
>  target/arm/helper.h        |   6 +
>  target/arm/helper.c        |  87 ++++++++++-
>  target/arm/translate-a64.c | 371 +++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 399 insertions(+), 65 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-27 17:22 ` Alex Bennée
@ 2018-04-27 18:55   ` Alex Bennée
  2018-04-27 19:50     ` Alex Bennée
  2018-05-11  2:17   ` Richard Henderson
  1 sibling, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2018-04-27 18:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Alex Bennée <alex.bennee@linaro.org> writes:

> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> When running the gcc testsuite with current aarch64-linux-user,
>> the testsuite detects the presence of the fp16 extension and
>> enables lots of extra tests for builtins.
>>
>> Quite a few of these new tests fail because we missed implementing
>> some instructions.  We really should go back and verify that nothing
>> else is missing from this (rather large) extension.
>
> So this set of instructions is generated from any ASL description that
> contains "half":
<snip>
>
> Failed 9 tests:
> testcases.armv8.2_half/insn_FCCMP_H_floatccmp__INC.risu.bin
> testcases.armv8.2_half/insn_FCCMPE_H_floatccmp__INC.risu.bin
> testcases.armv8.2_half/insn_FCMP_H_floatcmp__INC.risu.bin
> testcases.armv8.2_half/insn_FCMP_HZ_floatcmp__INC.risu.bin
> testcases.armv8.2_half/insn_FCMPE_H_floatcmp__INC.risu.bin
> testcases.armv8.2_half/insn_FCMPE_HZ_floatcmp__INC.risu.bin
> testcases.armv8.2_half/insn_FCSEL_H_floatsel__INC.risu.bin

Well that looks like a whole class of compares we are missing. I'll get
to work on that.

> testcases.armv8.2_half/insn_FMOV_H_floatimm__INC.risu.bin
> testcases.armv8.2_half/insn_FSQRT_H_floatdp1__INC.risu.bin
>
> but I haven't checked to see if that is just instructions the FVP has in
> full SVE mode that aren't in the just FP16 fixes branch I was testing
> against.
>
>
>>
>> In addition, it tests some edge conditions on data that show flaws
>> in the way we were performing integer<->fp conversion; particularly
>> with respect to scaled conversion.
>>
>>
>> r~
>>
>> PS: FWIW, this was written against my tgt-arm-sve-9 tree, since I
>> was trying to test sve as generated by gcc.  I don't *think* there
>> are any dependencies on any of the sve patches, but I didn't check.
>>
>> PPS: There are two more failures that might be qemu fp16 failures,
>> but those are SIGSEGV.  This patch set cures all of the SIGILL and
>> (subsequent) SIGABRT type failures within the testsuite.
>>
>>
>> Richard Henderson (9):
>>   target/arm: Implement vector shifted SCVF/UCVF for fp16
>>   target/arm: Implement vector shifted FCVT for fp16
>>   target/arm: Fix float16 to/from int16
>>   target/arm: Clear SVE high bits for FMOV
>>   target/arm: Implement FMOV (general) for fp16
>>   target/arm: Implement FCVT (scalar,integer) for fp16
>>   target/arm: Implement FCVT (scalar,fixed-point) for fp16
>>   target/arm: Implement FP data-processing (2 source) for fp16
>>   target/arm: Implement FP data-processing (3 source) for fp16
>>
>>  target/arm/helper.h        |   6 +
>>  target/arm/helper.c        |  87 ++++++++++-
>>  target/arm/translate-a64.c | 371 +++++++++++++++++++++++++++++++++++++--------
>>  3 files changed, 399 insertions(+), 65 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-27 18:55   ` Alex Bennée
@ 2018-04-27 19:50     ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-04-27 19:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
<snip>

If you take my patches from:

  https://github.com/stsquad/qemu/tree/review/rth-fp16-fixes

443a7c3d38 arm/translate-a64: fix-up FMOV FP16 immediate
be4430e9c9 arm/translate-a64: add FP16 FCSEL
7badf508ab arm/translate-a64: add FP16 FCMP operations

>> testcases.armv8.2_half/insn_FSQRT_H_floatdp1__INC.risu.bi

This is the only failure I have left. As it is approaching 2100 on a
Friday night I leave that for you to look at as you'll be up before me
on Monday ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16
  2018-04-27 16:04   ` Alex Bennée
@ 2018-04-29 14:44     ` Richard Henderson
  2018-04-29 15:27       ` Peter Maydell
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-04-29 14:44 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell

On 04/27/2018 11:04 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> While we have some of the scalar paths for *CVF for fp16,
>> we failed to decode the fp16 version of these instructions.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/translate-a64.c | 33 ++++++++++++++++++++-------------
>>  1 file changed, 20 insertions(+), 13 deletions(-)
>>
>> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
>> index b47319d437..c92e052686 100644
>> --- a/target/arm/translate-a64.c
>> +++ b/target/arm/translate-a64.c
>> @@ -7077,13 +7077,26 @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
>>                                           int immh, int immb, int opcode,
>>                                           int rn, int rd)
>>  {
>> -    bool is_double = extract32(immh, 3, 1);
>> -    int size = is_double ? MO_64 : MO_32;
>> -    int elements;
>> +    int size, elements, fracbits;
>>      int immhb = immh << 3 | immb;
>> -    int fracbits = (is_double ? 128 : 64) - immhb;
>>
>> -    if (!extract32(immh, 2, 2)) {
>> +    if (immh & 8) {
>> +        size = MO_64;
>> +        if (!is_scalar && !is_q) {
>> +            unallocated_encoding(s);
>> +            return;
>> +        }
>> +    } else if (immh & 4) {
>> +        size = MO_32;
>> +    } else if (immh & 2) {
>> +        size = MO_16;
>> +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
>> +            unallocated_encoding(s);
>> +            return;
>> +        }
>> +    } else {
>> +        /* immh == 0 would be a failure of the decode logic */
>> +        g_assert(immh == 1);
>>          unallocated_encoding(s);
>>          return;
>>      }
>> @@ -7091,20 +7104,14 @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
>>      if (is_scalar) {
>>          elements = 1;
>>      } else {
>> -        elements = is_double ? 2 : is_q ? 4 : 2;
>> -        if (is_double && !is_q) {
>> -            unallocated_encoding(s);
>> -            return;
>> -        }
>> +        elements = 8 << is_q >> size;
> 
> That is a brain exercise for operator precedence. Would:
> 
>      elements = (is_q ? 16 : 8) >> size;
> 
> be clearer?

I don't think so myself.  I thought the double conditional harder to follow.

>> +    fracbits = (16 << size) - immhb;
> 
> The ship has already sailed on this but I'm wishing we had a
> mosize_to_bits() helper function to be explicit about this
> transformation.

Yeah, that might have been a good thing...


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16
  2018-04-29 14:44     ` Richard Henderson
@ 2018-04-29 15:27       ` Peter Maydell
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Maydell @ 2018-04-29 15:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Alex Bennée, QEMU Developers

On 29 April 2018 at 15:44, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 04/27/2018 11:04 AM, Alex Bennée wrote:
>>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>> +        elements = 8 << is_q >> size;
>>
>> That is a brain exercise for operator precedence. Would:
>>
>>      elements = (is_q ? 16 : 8) >> size;
>>
>> be clearer?
>
> I don't think so myself.

Can we at least put in the brackets if we're going to use
the << >> expression?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] target/arm: Implement vector shifted FCVT for fp16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 2/9] target/arm: Implement vector shifted FCVT " Richard Henderson
@ 2018-04-30 15:55   ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-04-30 15:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> While we have some of the scalar paths for FCVT for fp16,
> we failed to decode the fp16 version of these instructions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++--------------
>  1 file changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index c92e052686..e2d11998bd 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -7120,19 +7120,28 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
>                                           bool is_q, bool is_u,
>                                           int immh, int immb, int rn, int rd)
>  {
> -    bool is_double = extract32(immh, 3, 1);
>      int immhb = immh << 3 | immb;
> -    int fracbits = (is_double ? 128 : 64) - immhb;
> -    int pass;
> +    int pass, size, fracbits;
>      TCGv_ptr tcg_fpstatus;
>      TCGv_i32 tcg_rmode, tcg_shift;
>
> -    if (!extract32(immh, 2, 2)) {
> -        unallocated_encoding(s);
> -        return;
> -    }
> -
> -    if (!is_scalar && !is_q && is_double) {
> +    if (immh & 0x8) {
> +        size = MO_64;
> +        if (!is_scalar && !is_q) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +    } else if (immh & 0x4) {
> +        size = MO_32;
> +    } else if (immh & 0x2) {
> +        size = MO_16;
> +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +    } else {
> +        /* Should have split out AdvSIMD modified immediate earlier.  */
> +        assert(immh == 1);
>          unallocated_encoding(s);
>          return;
>      }
> @@ -7144,11 +7153,12 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
>      assert(!(is_scalar && is_q));
>
>      tcg_rmode = tcg_const_i32(arm_rmode_to_sf(FPROUNDING_ZERO));
> -    tcg_fpstatus = get_fpstatus_ptr(false);
> +    tcg_fpstatus = get_fpstatus_ptr(size == MO_16);
>      gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
> +    fracbits = (16 << size) - immhb;
>      tcg_shift = tcg_const_i32(fracbits);
>
> -    if (is_double) {
> +    if (size == 3) {
>          int maxpass = is_scalar ? 1 : 2;
>
>          for (pass = 0; pass < maxpass; pass++) {
> @@ -7165,20 +7175,37 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
>          }
>          clear_vec_high(s, is_q, rd);
>      } else {
> -        int maxpass = is_scalar ? 1 : is_q ? 4 : 2;
> +        void (*fn)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
> +        int maxpass = is_scalar ? 1 : (8 << is_q >> size);

brackets

> +
> +        switch (size) {
> +        case MO_16:
> +            if (is_u) {
> +                fn = gen_helper_vfp_toulh;
> +            } else {
> +                fn = gen_helper_vfp_toslh;
> +            }
> +            break;
> +        case MO_32:
> +            if (is_u) {
> +                fn = gen_helper_vfp_touls;
> +            } else {
> +                fn = gen_helper_vfp_tosls;
> +            }
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +
>          for (pass = 0; pass < maxpass; pass++) {
>              TCGv_i32 tcg_op = tcg_temp_new_i32();
>
> -            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
> -            if (is_u) {
> -                gen_helper_vfp_touls(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
> -            } else {
> -                gen_helper_vfp_tosls(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
> -            }
> +            read_vec_element_i32(s, tcg_op, rn, pass, size);
> +            fn(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
>              if (is_scalar) {
>                  write_fp_sreg(s, rd, tcg_op);
>              } else {
> -                write_vec_element_i32(s, tcg_op, rd, pass, MO_32);
> +                write_vec_element_i32(s, tcg_op, rd, pass, size);
>              }
>              tcg_temp_free_i32(tcg_op);
>          }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] target/arm: Fix float16 to/from int16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 3/9] target/arm: Fix float16 to/from int16 Richard Henderson
@ 2018-05-01 10:10   ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 10:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> The instruction "ucvtf v0.4h, v04h, #2", with input 0x8000u,
> overflows the intermediate float16 to infinity before we have a
> chance to scale the output.  Use float64 as the intermediate type
> so that no input argument (uint32_t in this case) can overflow
> or round before scaling.  Given the declared argument, the signed
> int32_t function has the same problem.
>
> When converting from float16 to integer, using u/int32_t instead
> of u/int16_t means that the bounding is incorrect.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/helper.h        |  4 ++--
>  target/arm/helper.c        | 53 ++++++++++++++++++++++++++++++++++++++++++++--
>  target/arm/translate-a64.c |  4 ++--
>  3 files changed, 55 insertions(+), 6 deletions(-)
>
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index b3ae394b4f..eafd5d746b 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -149,8 +149,8 @@ DEF_HELPER_3(vfp_toshd_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_tosld_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
> -DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
> -DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
>  DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
>  DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
>  DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index ea09510599..743f34bd0a 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11409,11 +11409,60 @@ VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
>  VFP_CONV_FIX(uh, s, 32, 32, uint16)
>  VFP_CONV_FIX(ul, s, 32, 32, uint32)
>  VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
> -VFP_CONV_FIX_A64(sl, h, 16, 32, int32)
> -VFP_CONV_FIX_A64(ul, h, 16, 32, uint32)
> +
>  #undef VFP_CONV_FIX
>  #undef VFP_CONV_FIX_FLOAT
>  #undef VFP_CONV_FLOAT_FIX_ROUND
> +#undef VFP_CONV_FIX_A64
> +
> +/* Conversion to/from f16 can overflow to infinity before/after scaling.
> + * Therefore we convert to f64 (which does not round), scale,
> + * and then convert f64 to f16 (which may round).
> + */
> +
> +static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
> +{
> +    return float64_to_float16(float64_scalbn(f, -shift, fpst), true, fpst);
> +}
> +
> +float16 HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
> +{
> +    return do_postscale_fp16(int32_to_float64(x, fpst), shift, fpst);
> +}
> +
> +float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
> +{
> +    return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
> +}
> +
> +static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
> +{
> +    if (unlikely(float16_is_any_nan(f))) {
> +        float_raise(float_flag_invalid, fpst);
> +        return 0;
> +    } else {
> +        int old_exc_flags = get_float_exception_flags(fpst);
> +        float64 ret;
> +
> +        ret = float16_to_float64(f, true, fpst);
> +        ret = float64_scalbn(ret, shift, fpst);
> +        old_exc_flags |= get_float_exception_flags(fpst)
> +            & float_flag_input_denormal;
> +        set_float_exception_flags(old_exc_flags, fpst);
> +
> +        return ret;
> +    }
> +}
> +
> +uint32_t HELPER(vfp_toshh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_int16(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
> +}
>
>  /* Set the current fp rounding mode and return the old one.
>   * The argument is a softfloat float_round_ value.
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index e2d11998bd..b27892d971 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -7181,9 +7181,9 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
>          switch (size) {
>          case MO_16:
>              if (is_u) {
> -                fn = gen_helper_vfp_toulh;
> +                fn = gen_helper_vfp_touhh;
>              } else {
> -                fn = gen_helper_vfp_toslh;
> +                fn = gen_helper_vfp_toshh;
>              }
>              break;
>          case MO_32:


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] target/arm: Clear SVE high bits for FMOV
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 4/9] target/arm: Clear SVE high bits for FMOV Richard Henderson
@ 2018-05-01 10:44   ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 10:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> Use write_fp_dreg and clear_vec_high to zero the bits
> that need zeroing for these cases.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index b27892d971..f2241d8174 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5356,31 +5356,24 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
>
>      if (itof) {
>          TCGv_i64 tcg_rn = cpu_reg(s, rn);
> +        TCGv_i64 tmp;
>
>          switch (type) {
>          case 0:
> -        {
>              /* 32 bit */
> -            TCGv_i64 tmp = tcg_temp_new_i64();
> +            tmp = tcg_temp_new_i64();
>              tcg_gen_ext32u_i64(tmp, tcg_rn);
> -            tcg_gen_st_i64(tmp, cpu_env, fp_reg_offset(s, rd, MO_64));
> -            tcg_gen_movi_i64(tmp, 0);
> -            tcg_gen_st_i64(tmp, cpu_env, fp_reg_hi_offset(s, rd));
> +            write_fp_dreg(s, rd, tmp);
>              tcg_temp_free_i64(tmp);
>              break;
> -        }
>          case 1:
> -        {
>              /* 64 bit */
> -            TCGv_i64 tmp = tcg_const_i64(0);
> -            tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_offset(s, rd, MO_64));
> -            tcg_gen_st_i64(tmp, cpu_env, fp_reg_hi_offset(s, rd));
> -            tcg_temp_free_i64(tmp);
> +            write_fp_dreg(s, rd, tcg_rn);
>              break;
> -        }
>          case 2:
>              /* 64 bit to top half. */
>              tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
> +            clear_vec_high(s, true, rd);

I was going to suggest maybe a write_fp_dreg_hi() helper here but there
are only a couple of cases so:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>              break;
>          }
>      } else {


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 6/9] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 6/9] target/arm: Implement FCVT (scalar, integer) " Richard Henderson
@ 2018-05-01 10:55   ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 10:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/helper.h        |  6 +++
>  target/arm/helper.c        | 38 +++++++++++++++++-
>  target/arm/translate-a64.c | 96 ++++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 122 insertions(+), 18 deletions(-)
>
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index eafd5d746b..f494b10f1b 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -151,6 +151,10 @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
>  DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
> +DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
>  DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
>  DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
>  DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
> @@ -177,6 +181,8 @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
>  DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
>  DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
>  DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
> +DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
> +DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
>
>  DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
>  DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 743f34bd0a..dbc10b454a 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11416,8 +11416,12 @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
>  #undef VFP_CONV_FIX_A64
>
>  /* Conversion to/from f16 can overflow to infinity before/after scaling.
> - * Therefore we convert to f64 (which does not round), scale,
> - * and then convert f64 to f16 (which may round).
> + * Therefore we convert to f64, scale, and then convert f64 to f16; or
> + * vice versa for conversion to integer.
> + *
> + * For 16- and 32-bit integers, the conversion to f64 never rounds.
> + * For 64-bit integers, any integer that would cause rounding will also
> + * overflow to f16 infinity, so there is no double rounding problem.
>   */
>
>  static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
> @@ -11435,6 +11439,16 @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
>      return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
>  }
>
> +float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
> +{
> +    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
> +}
> +
> +float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
> +{
> +    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
> +}
> +
>  static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
>  {
>      if (unlikely(float16_is_any_nan(f))) {
> @@ -11464,6 +11478,26 @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
>      return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
>  }
>
> +uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
>  /* Set the current fp rounding mode and return the old one.
>   * The argument is a softfloat float_round_ value.
>   */
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 36bb5f6f08..4f6317aa0f 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5186,11 +5186,11 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>                             bool itof, int rmode, int scale, int sf, int type)
>  {
>      bool is_signed = !(opcode & 1);
> -    bool is_double = type;
>      TCGv_ptr tcg_fpstatus;
> -    TCGv_i32 tcg_shift;
> +    TCGv_i32 tcg_shift, tcg_single;
> +    TCGv_i64 tcg_double;
>
> -    tcg_fpstatus = get_fpstatus_ptr(false);
> +    tcg_fpstatus = get_fpstatus_ptr(type == 3);
>
>      tcg_shift = tcg_const_i32(64 - scale);
>
> @@ -5208,8 +5208,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>              tcg_int = tcg_extend;
>          }
>
> -        if (is_double) {
> -            TCGv_i64 tcg_double = tcg_temp_new_i64();
> +        switch (type) {
> +        case 1: /* float64 */
> +            tcg_double = tcg_temp_new_i64();
>              if (is_signed) {
>                  gen_helper_vfp_sqtod(tcg_double, tcg_int,
>                                       tcg_shift, tcg_fpstatus);
> @@ -5219,8 +5220,10 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>              }
>              write_fp_dreg(s, rd, tcg_double);
>              tcg_temp_free_i64(tcg_double);
> -        } else {
> -            TCGv_i32 tcg_single = tcg_temp_new_i32();
> +            break;
> +
> +        case 0: /* float32 */
> +            tcg_single = tcg_temp_new_i32();
>              if (is_signed) {
>                  gen_helper_vfp_sqtos(tcg_single, tcg_int,
>                                       tcg_shift, tcg_fpstatus);
> @@ -5230,6 +5233,23 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>              }
>              write_fp_sreg(s, rd, tcg_single);
>              tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        case 3: /* float16 */
> +            tcg_single = tcg_temp_new_i32();
> +            if (is_signed) {
> +                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
> +                                     tcg_shift, tcg_fpstatus);
> +            } else {
> +                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
> +                                     tcg_shift, tcg_fpstatus);
> +            }
> +            write_fp_sreg(s, rd, tcg_single);
> +            tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
>          }
>      } else {
>          TCGv_i64 tcg_int = cpu_reg(s, rd);
> @@ -5246,8 +5266,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>
>          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
>
> -        if (is_double) {
> -            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
> +        switch (type) {
> +        case 1: /* float64 */
> +            tcg_double = read_fp_dreg(s, rn);
>              if (is_signed) {
>                  if (!sf) {
>                      gen_helper_vfp_tosld(tcg_int, tcg_double,
> @@ -5265,9 +5286,14 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>                                           tcg_shift, tcg_fpstatus);
>                  }
>              }
> +            if (!sf) {
> +                tcg_gen_ext32u_i64(tcg_int, tcg_int);
> +            }
>              tcg_temp_free_i64(tcg_double);
> -        } else {
> -            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
> +            break;
> +
> +        case 0: /* float32 */
> +            tcg_single = read_fp_sreg(s, rn);
>              if (sf) {
>                  if (is_signed) {
>                      gen_helper_vfp_tosqs(tcg_int, tcg_single,
> @@ -5289,14 +5315,39 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>                  tcg_temp_free_i32(tcg_dest);
>              }
>              tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        case 3: /* float16 */
> +            tcg_single = read_fp_sreg(s, rn);
> +            if (sf) {
> +                if (is_signed) {
> +                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                } else {
> +                    gen_helper_vfp_touqh(tcg_int, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                }
> +            } else {
> +                TCGv_i32 tcg_dest = tcg_temp_new_i32();
> +                if (is_signed) {
> +                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                } else {
> +                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                }
> +                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
> +                tcg_temp_free_i32(tcg_dest);
> +            }
> +            tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
>          }
>
>          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
>          tcg_temp_free_i32(tcg_rmode);
> -
> -        if (!sf) {
> -            tcg_gen_ext32u_i64(tcg_int, tcg_int);
> -        }
>      }
>
>      tcg_temp_free_ptr(tcg_fpstatus);
> @@ -5465,7 +5516,20 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
>          /* actual FP conversions */
>          bool itof = extract32(opcode, 1, 1);
>
> -        if (type > 1 || (rmode != 0 && opcode > 1)) {
> +        if (rmode != 0 && opcode > 1) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        switch (type) {
> +        case 0: /* float32 */
> +        case 1: /* float64 */
> +            break;
> +        case 3: /* float16 */
> +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +                break;
> +            }
> +            /* fallthru */
> +        default:
>              unallocated_encoding(s);
>              return;
>          }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] target/arm: Implement FCVT (scalar, fixed-point) for fp16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 7/9] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
@ 2018-05-01 10:57   ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 10:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 4f6317aa0f..794ede7222 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5372,8 +5372,7 @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
>      bool sf = extract32(insn, 31, 1);
>      bool itof;
>
> -    if (sbit || (type > 1)
> -        || (!sf && scale < 32)) {
> +    if (sbit || (!sf && scale < 32)) {
>          unallocated_encoding(s);
>          return;
>      }
> @@ -5392,6 +5391,20 @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
>          return;
>      }
>
> +    switch (type) {
> +    case 0: /* float32 */
> +    case 1: /* float64 */
> +        break;
> +    case 3: /* float16 */
> +        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +            break;
> +        }
> +        /* fallthru */
> +    default:
> +        unallocated_encoding(s);
> +        return;
> +    }
> +
>      if (!fp_access_check(s)) {
>          return;
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) for fp16
  2018-04-25  1:22 ` [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) " Richard Henderson
@ 2018-05-01 11:13   ` Alex Bennée
  2018-05-02 18:28     ` Richard Henderson
  2018-05-02 18:47     ` Richard Henderson
  0 siblings, 2 replies; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 11:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> We missed all of the scalar fp16 binary operations.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 794ede7222..11b90b7eb0 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -532,6 +532,14 @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
>      return v;
>  }
>
> +static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
> +{
> +    TCGv_i32 v = tcg_temp_new_i32();
> +
> +    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
> +    return v;
> +}
> +
>  /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
>   * If SVE is not enabled, then there are only 128 bits in the vector.
>   */
> @@ -4968,6 +4976,61 @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
>      tcg_temp_free_i64(tcg_res);
>  }
>
> +/* Floating-point data-processing (2 source) - half precision */
> +static void handle_fp_2src_half(DisasContext *s, int opcode,
> +                                int rd, int rn, int rm)
> +{
> +    TCGv_i32 tcg_op1;
> +    TCGv_i32 tcg_op2;
> +    TCGv_i32 tcg_res;
> +    TCGv_ptr fpst;
> +
> +    tcg_res = tcg_temp_new_i32();
> +    fpst = get_fpstatus_ptr(true);
> +    tcg_op1 = read_fp_hreg(s, rn);
> +    tcg_op2 = read_fp_hreg(s, rm);
> +
> +    switch (opcode) {
> +    case 0x0: /* FMUL */
> +        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x1: /* FDIV */
> +        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x2: /* FADD */
> +        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x3: /* FSUB */
> +        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x4: /* FMAX */
> +        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x5: /* FMIN */
> +        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x6: /* FMAXNM */
> +        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x7: /* FMINNM */
> +        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        break;
> +    case 0x8: /* FNMUL */
> +        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
> +        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +
> +    write_fp_sreg(s, rd, tcg_res);

If we are going to the trouble of adding a read_fp_hreg() we might as
well do the same for the write case. Then we can convert the various:

  read_vec_element_i32(s, tcg_vm, rm, 0, MO_16);

that we used before.

> +
> +    tcg_temp_free_ptr(fpst);
> +    tcg_temp_free_i32(tcg_op1);
> +    tcg_temp_free_i32(tcg_op2);
> +    tcg_temp_free_i32(tcg_res);
> +}
> +
>  /* Floating point data-processing (2 source)
>   *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
>   * +---+---+---+-----------+------+---+------+--------+-----+------+------+
> @@ -5000,6 +5063,16 @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
>          }
>          handle_fp_2src_double(s, opcode, rd, rn, rm);
>          break;
> +    case 3:
> +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        if (!fp_access_check(s)) {
> +            return;
> +        }
> +        handle_fp_2src_half(s, opcode, rd, rn, rm);
> +        break;
>      default:
>          unallocated_encoding(s);
>      }

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 source) for fp16
  2018-04-25  1:23 ` [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 " Richard Henderson
@ 2018-05-01 11:21   ` Alex Bennée
  2018-05-02 18:49     ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 11:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> We missed all of the scalar fp16 fma operations.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 11b90b7eb0..0cb1fc4d67 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5154,6 +5154,44 @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
>      tcg_temp_free_i64(tcg_res);
>  }
>
> +/* Floating-point data-processing (3 source) - half precision */
> +static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
> +                                int rd, int rn, int rm, int ra)
> +{
> +    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
> +    TCGv_i32 tcg_res = tcg_temp_new_i32();
> +    TCGv_ptr fpst = get_fpstatus_ptr(true);
> +
> +    tcg_op1 = read_fp_hreg(s, rn);
> +    tcg_op2 = read_fp_hreg(s, rm);
> +    tcg_op3 = read_fp_hreg(s, ra);
> +
> +    /* These are fused multiply-add, and must be done as one
> +     * floating point operation with no rounding between the
> +     * multiplication and addition steps.

I got confused first time reading this as we cover F[N]M[ADD|SUB].
Perhaps that is better enumerated at the top of the function?

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> +     * NB that doing the negations here as separate steps is
> +     * correct : an input NaN should come out with its sign bit
> +     * flipped if it is a negated-input.
> +     */
> +    if (o1 == true) {
> +        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
> +    }
> +
> +    if (o0 != o1) {
> +        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
> +    }
> +
> +    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
> +
> +    write_fp_sreg(s, rd, tcg_res);
> +
> +    tcg_temp_free_ptr(fpst);
> +    tcg_temp_free_i32(tcg_op1);
> +    tcg_temp_free_i32(tcg_op2);
> +    tcg_temp_free_i32(tcg_op3);
> +    tcg_temp_free_i32(tcg_res);
> +}
> +
>  /* Floating point data-processing (3 source)
>   *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
>   * +---+---+---+-----------+------+----+------+----+------+------+------+
> @@ -5183,6 +5221,16 @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
>          }
>          handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
>          break;
> +    case 3:
> +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        if (!fp_access_check(s)) {
> +            return;
> +        }
> +        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
> +        break;
>      default:
>          unallocated_encoding(s);
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (11 preceding siblings ...)
  2018-04-27 17:22 ` Alex Bennée
@ 2018-05-01 15:47 ` Alex Bennée
  2018-05-01 18:35   ` Richard Henderson
  12 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2018-05-01 15:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> When running the gcc testsuite with current aarch64-linux-user,
> the testsuite detects the presence of the fp16 extension and
> enables lots of extra tests for builtins.
>
> Quite a few of these new tests fail because we missed implementing
> some instructions.  We really should go back and verify that nothing
> else is missing from this (rather large) extension.
>
> In addition, it tests some edge conditions on data that show flaws
> in the way we were performing integer<->fp conversion; particularly
> with respect to scaled conversion.

I've finished reviewing this patch set.

> PS: FWIW, this was written against my tgt-arm-sve-9 tree, since I
> was trying to test sve as generated by gcc.  I don't *think* there
> are any dependencies on any of the sve patches, but I didn't check.

They aren't.

>
> PPS: There are two more failures that might be qemu fp16 failures,
> but those are SIGSEGV.  This patch set cures all of the SIGILL and
> (subsequent) SIGABRT type failures within the testsuite.

I've also pushed the fix to sqrt_f16 (passing cpu_env rather than fpst).
Are you happy rolling my fixes into your v2?

>
>
> Richard Henderson (9):
>   target/arm: Implement vector shifted SCVF/UCVF for fp16
>   target/arm: Implement vector shifted FCVT for fp16
>   target/arm: Fix float16 to/from int16
>   target/arm: Clear SVE high bits for FMOV
>   target/arm: Implement FMOV (general) for fp16
>   target/arm: Implement FCVT (scalar,integer) for fp16
>   target/arm: Implement FCVT (scalar,fixed-point) for fp16
>   target/arm: Implement FP data-processing (2 source) for fp16
>   target/arm: Implement FP data-processing (3 source) for fp16
>
>  target/arm/helper.h        |   6 +
>  target/arm/helper.c        |  87 ++++++++++-
>  target/arm/translate-a64.c | 371 +++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 399 insertions(+), 65 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-05-01 15:47 ` Alex Bennée
@ 2018-05-01 18:35   ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2018-05-01 18:35 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell

On 05/01/2018 08:47 AM, Alex Bennée wrote:
> I've also pushed the fix to sqrt_f16 (passing cpu_env rather than fpst).

Oops.

> Are you happy rolling my fixes into your v2?

Yes, I can do that.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) for fp16
  2018-05-01 11:13   ` Alex Bennée
@ 2018-05-02 18:28     ` Richard Henderson
  2018-05-02 18:47     ` Richard Henderson
  1 sibling, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2018-05-02 18:28 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell

On 05/01/2018 04:13 AM, Alex Bennée wrote:
> If we are going to the trouble of adding a read_fp_hreg() we might as
> well do the same for the write case. Then we can convert the various:
> 
>   read_vec_element_i32(s, tcg_vm, rm, 0, MO_16);
> 
> that we used before.

Ok, I've split these new functions into two new patches.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) for fp16
  2018-05-01 11:13   ` Alex Bennée
  2018-05-02 18:28     ` Richard Henderson
@ 2018-05-02 18:47     ` Richard Henderson
  1 sibling, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2018-05-02 18:47 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell

On 05/01/2018 04:13 AM, Alex Bennée wrote:
> If we are going to the trouble of adding a read_fp_hreg() we might as
> well do the same for the write case.

The write case is much harder.

Almost all of the time we have a helper function that has already zero-extended
the result from 16 to 32-bits, and therefore write_fp_sreg is perfectly fine.
In more than a few cases, this writeback is shared by both single and half
precision paths and so adjusting the writeback is not appropriate.

There are in fact zero instances at this point in the tree that do require
extra zero-extension.  So I'm going to leave this alone.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 source) for fp16
  2018-05-01 11:21   ` Alex Bennée
@ 2018-05-02 18:49     ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2018-05-02 18:49 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell

On 05/01/2018 04:21 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> We missed all of the scalar fp16 fma operations.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 48 insertions(+)
>>
>> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
>> index 11b90b7eb0..0cb1fc4d67 100644
>> --- a/target/arm/translate-a64.c
>> +++ b/target/arm/translate-a64.c
>> @@ -5154,6 +5154,44 @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
>>      tcg_temp_free_i64(tcg_res);
>>  }
>>
>> +/* Floating-point data-processing (3 source) - half precision */
>> +static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
>> +                                int rd, int rn, int rm, int ra)
>> +{
>> +    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
>> +    TCGv_i32 tcg_res = tcg_temp_new_i32();
>> +    TCGv_ptr fpst = get_fpstatus_ptr(true);
>> +
>> +    tcg_op1 = read_fp_hreg(s, rn);
>> +    tcg_op2 = read_fp_hreg(s, rm);
>> +    tcg_op3 = read_fp_hreg(s, ra);
>> +
>> +    /* These are fused multiply-add, and must be done as one
>> +     * floating point operation with no rounding between the
>> +     * multiplication and addition steps.
> 
> I got confused first time reading this as we cover F[N]M[ADD|SUB].
> Perhaps that is better enumerated at the top of the function?

*shrug* It's an exact copy of the beginnings of the single and double
functions.  Is it really that confusing?


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-04-27 17:22 ` Alex Bennée
  2018-04-27 18:55   ` Alex Bennée
@ 2018-05-11  2:17   ` Richard Henderson
  2018-05-11 21:13     ` Alex Bennée
  1 sibling, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2018-05-11  2:17 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell

On 04/27/2018 10:22 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> When running the gcc testsuite with current aarch64-linux-user,
>> the testsuite detects the presence of the fp16 extension and
>> enables lots of extra tests for builtins.
>>
>> Quite a few of these new tests fail because we missed implementing
>> some instructions.  We really should go back and verify that nothing
>> else is missing from this (rather large) extension.
> 
> So this set of instructions is generated from any ASL description that
> contains "half":

This still isn't all of them.  At least the insns from fmov_float_gen.html are
missing.  The four insns could be handled with

FMOV_H_general  A64_V sf:1 00 11110 11 100 11 o:1 000000 Rn:5 Rd:5

Peter had pointed out that I didn't handle sf=1 in fmov during review.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-05-11  2:17   ` Richard Henderson
@ 2018-05-11 21:13     ` Alex Bennée
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2018-05-11 21:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> On 04/27/2018 10:22 AM, Alex Bennée wrote:
>>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> When running the gcc testsuite with current aarch64-linux-user,
>>> the testsuite detects the presence of the fp16 extension and
>>> enables lots of extra tests for builtins.
>>>
>>> Quite a few of these new tests fail because we missed implementing
>>> some instructions.  We really should go back and verify that nothing
>>> else is missing from this (rather large) extension.
>>
>> So this set of instructions is generated from any ASL description that
>> contains "half":
>
> This still isn't all of them.  At least the insns from fmov_float_gen.html are
> missing.  The four insns could be handled with
>
> FMOV_H_general  A64_V sf:1 00 11110 11 100 11 o:1 000000 Rn:5 Rd:5
>
> Peter had pointed out that I didn't handle sf=1 in fmov during review.

It describes itself as:

   <docvars>
       <docvar key="convert-type" value="32-to-half" />
       <docvar key="instr-class" value="float" />
       <docvar key="isa" value="A64" />
       <docvar key="mnemonic" value="FMOV" />
   </docvars>

I found I had to make the --desc matching a lot more liberal to catch
stuff from the Aarch32 XML, so now:

22:11:47 [alex@zen:~/l/q/risu.git] add-asl-support-v2(+1/-1) + ./contrib/armasl2risu.py --only-desc "half" xml/ISA_v83A_A64_xml_00bet6/*.xml | grep FMOV
FMOV_32H_float2int A64_V 0001111011100110000000 Rn:5  Rd:5
FMOV_H32_float2int A64_V 0001111011100111000000 Rn:5  Rd:5
FMOV_asimdimm_H_h A64_V 0 Q:1 00111100000 a:1  b:1  c:1 111111 d:1  e:1  f:1  g:1  h:1  Rd:5
FMOV_H64_float2int A64_V 1001111011100111000000 Rn:5  Rd:5
FMOV_64H_float2int A64_V 1001111011100110000000 Rn:5  Rd:5
FMOV_H_floatimm A64_V 00011110111 imm8:8 10000000 Rd:5
FMOV_H_floatdp1 A64_V 0001111011100000010000 Rn:5  Rd:5

Whereas I had:

22:11:58 [alex@zen:~/l/q/risu.git] add-asl-support-v2(+1/-1) + grep FMOV all_v82_half.risu
FMOV_H_floatimm A64_V 00011110111 imm8:8 10000000 Rd:5
FMOV_H_floatdp1 A64_V 0001111011100000010000 Rn:5  Rd:5

I'll regenerate a set.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-05-11 21:14 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25  1:22 [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
2018-04-25  1:22 ` [Qemu-devel] [PATCH 1/9] target/arm: Implement vector shifted SCVF/UCVF for fp16 Richard Henderson
2018-04-27 16:04   ` Alex Bennée
2018-04-29 14:44     ` Richard Henderson
2018-04-29 15:27       ` Peter Maydell
2018-04-25  1:22 ` [Qemu-devel] [PATCH 2/9] target/arm: Implement vector shifted FCVT " Richard Henderson
2018-04-30 15:55   ` Alex Bennée
2018-04-25  1:22 ` [Qemu-devel] [PATCH 3/9] target/arm: Fix float16 to/from int16 Richard Henderson
2018-05-01 10:10   ` Alex Bennée
2018-04-25  1:22 ` [Qemu-devel] [PATCH 4/9] target/arm: Clear SVE high bits for FMOV Richard Henderson
2018-05-01 10:44   ` Alex Bennée
2018-04-25  1:22 ` [Qemu-devel] [PATCH 5/9] target/arm: Implement FMOV (general) for fp16 Richard Henderson
2018-04-25  1:31   ` Philippe Mathieu-Daudé
2018-04-25  8:40     ` Richard Henderson
2018-04-25  1:22 ` [Qemu-devel] [PATCH 6/9] target/arm: Implement FCVT (scalar, integer) " Richard Henderson
2018-05-01 10:55   ` Alex Bennée
2018-04-25  1:22 ` [Qemu-devel] [PATCH 7/9] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
2018-05-01 10:57   ` Alex Bennée
2018-04-25  1:22 ` [Qemu-devel] [PATCH 8/9] target/arm: Implement FP data-processing (2 source) " Richard Henderson
2018-05-01 11:13   ` Alex Bennée
2018-05-02 18:28     ` Richard Henderson
2018-05-02 18:47     ` Richard Henderson
2018-04-25  1:23 ` [Qemu-devel] [PATCH 9/9] target/arm: Implement FP data-processing (3 " Richard Henderson
2018-05-01 11:21   ` Alex Bennée
2018-05-02 18:49     ` Richard Henderson
2018-04-25  1:35 ` [Qemu-devel] [PATCH 0/9] target/arm: Fixups for ARM_FEATURE_V8_FP16 no-reply
2018-04-25  9:14 ` Alex Bennée
2018-04-27 17:22 ` Alex Bennée
2018-04-27 18:55   ` Alex Bennée
2018-04-27 19:50     ` Alex Bennée
2018-05-11  2:17   ` Richard Henderson
2018-05-11 21:13     ` Alex Bennée
2018-05-01 15:47 ` Alex Bennée
2018-05-01 18:35   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.