[Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16
@ 2018-05-12  0:32 Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16 Richard Henderson
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

Changes since v3:
  * Fixup rebase vs target-arm.next.  One of the middle
    patches had conflicts resolved incorrectly, so the
    patch set was non-bisectable.

Changes since v2:
  * Rebased vs target-arm.next.
  * Merged Peter's review.
  * Split out return fix as a separate patch.

Changes since v1:
  * Rebased vs master instead of tgt-arm-sve-9.
  * Alex did some additional digging through the ARM xhtml
    and came up with some additional missing instructions.
  * Everything cc'd to qemu-stable.


r~


Alex Bennée (4):
  target/arm: Implement FCMP for fp16
  target/arm: Implement FCSEL for fp16
  target/arm: Implement FMOV (immediate) for fp16
  target/arm: Fix sqrt_f16 exception raising

Richard Henderson (7):
  target/arm: Implement FMOV (general) for fp16
  target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
  target/arm: Implement FCVT (scalar,integer) for fp16
  target/arm: Implement FCVT (scalar,fixed-point) for fp16
  target/arm: Introduce and use read_fp_hreg
  target/arm: Implement FP data-processing (2 source) for fp16
  target/arm: Implement FP data-processing (3 source) for fp16

 target/arm/helper-a64.h    |   2 +
 target/arm/helper.h        |   6 +
 target/arm/helper-a64.c    |  10 +
 target/arm/helper.c        |  38 +++-
 target/arm/translate-a64.c | 421 +++++++++++++++++++++++++++++++------
 5 files changed, 413 insertions(+), 64 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-15 10:37   ` Alex Bennée
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 02/11] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv Richard Henderson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

Adding the fp16 moves to/from general registers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 4d1b220cc6..5b8cf75e9f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5700,6 +5700,15 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
             clear_vec_high(s, true, rd);
             break;
+        case 3:
+            /* 16 bit */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_ext16u_i64(tmp, tcg_rn);
+            write_fp_dreg(s, rd, tmp);
+            tcg_temp_free_i64(tmp);
+            break;
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_rd = cpu_reg(s, rd);
@@ -5717,6 +5726,12 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             /* 64 bits from top half */
             tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
             break;
+        case 3:
+            /* 16 bit */
+            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 }
@@ -5756,6 +5771,12 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         case 0xa: /* 64 bit */
         case 0xd: /* 64 bit to top half of quad */
             break;
+        case 0x6: /* 16-bit float, 32-bit int */
+        case 0xe: /* 16-bit float, 64-bit int */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
         default:
             /* all other sf/type/rmode combinations are invalid */
             unallocated_encoding(s);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 02/11] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16 Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-15 10:37   ` Alex Bennée
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16 Richard Henderson
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee

No sense in emitting code after the exception.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 5b8cf75e9f..11d8c07943 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5780,7 +5780,7 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         default:
             /* all other sf/type/rmode combinations are invalid */
             unallocated_encoding(s);
-            break;
+            return;
         }
 
         if (!fp_access_check(s)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16 Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 02/11] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-13  7:21   ` Alex Bennée
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 04/11] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  6 +++
 target/arm/helper.c        | 38 ++++++++++++++-
 target/arm/translate-a64.c | 96 +++++++++++++++++++++++++++++++-------
 3 files changed, 122 insertions(+), 18 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 1969b37f2d..ce89968b2d 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -151,6 +151,10 @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
+DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
@@ -177,6 +181,8 @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
 
 DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
 DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 817f9d81a0..c6fd7f9479 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11427,8 +11427,12 @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 #undef VFP_CONV_FIX_A64
 
 /* Conversion to/from f16 can overflow to infinity before/after scaling.
- * Therefore we convert to f64 (which does not round), scale,
- * and then convert f64 to f16 (which may round).
+ * Therefore we convert to f64, scale, and then convert f64 to f16; or
+ * vice versa for conversion to integer.
+ *
+ * For 16- and 32-bit integers, the conversion to f64 never rounds.
+ * For 64-bit integers, any integer that would cause rounding will also
+ * overflow to f16 infinity, so there is no double rounding problem.
  */
 
 static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
@@ -11446,6 +11450,16 @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
     return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
 }
 
+float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
+}
+
+float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
+}
+
 static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
 {
     if (unlikely(float16_is_any_nan(f))) {
@@ -11475,6 +11489,26 @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
     return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
 }
 
+uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
 /* Set the current fp rounding mode and return the old one.
  * The argument is a softfloat float_round_ value.
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 11d8c07943..93fb15d185 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5511,11 +5511,11 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                            bool itof, int rmode, int scale, int sf, int type)
 {
     bool is_signed = !(opcode & 1);
-    bool is_double = type;
     TCGv_ptr tcg_fpstatus;
-    TCGv_i32 tcg_shift;
+    TCGv_i32 tcg_shift, tcg_single;
+    TCGv_i64 tcg_double;
 
-    tcg_fpstatus = get_fpstatus_ptr(false);
+    tcg_fpstatus = get_fpstatus_ptr(type == 3);
 
     tcg_shift = tcg_const_i32(64 - scale);
 
@@ -5533,8 +5533,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             tcg_int = tcg_extend;
         }
 
-        if (is_double) {
-            TCGv_i64 tcg_double = tcg_temp_new_i64();
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = tcg_temp_new_i64();
             if (is_signed) {
                 gen_helper_vfp_sqtod(tcg_double, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -5544,8 +5545,10 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_dreg(s, rd, tcg_double);
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = tcg_temp_new_i32();
+            break;
+
+        case 0: /* float32 */
+            tcg_single = tcg_temp_new_i32();
             if (is_signed) {
                 gen_helper_vfp_sqtos(tcg_single, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -5555,6 +5558,23 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_sreg(s, rd, tcg_single);
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = tcg_temp_new_i32();
+            if (is_signed) {
+                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            } else {
+                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            }
+            write_fp_sreg(s, rd, tcg_single);
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_int = cpu_reg(s, rd);
@@ -5571,8 +5591,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 
-        if (is_double) {
-            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = read_fp_dreg(s, rn);
             if (is_signed) {
                 if (!sf) {
                     gen_helper_vfp_tosld(tcg_int, tcg_double,
@@ -5590,9 +5611,14 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                                          tcg_shift, tcg_fpstatus);
                 }
             }
+            if (!sf) {
+                tcg_gen_ext32u_i64(tcg_int, tcg_int);
+            }
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
+            break;
+
+        case 0: /* float32 */
+            tcg_single = read_fp_sreg(s, rn);
             if (sf) {
                 if (is_signed) {
                     gen_helper_vfp_tosqs(tcg_int, tcg_single,
@@ -5614,14 +5640,39 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                 tcg_temp_free_i32(tcg_dest);
             }
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = read_fp_sreg(s, rn);
+            if (sf) {
+                if (is_signed) {
+                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_touqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+            } else {
+                TCGv_i32 tcg_dest = tcg_temp_new_i32();
+                if (is_signed) {
+                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
+                tcg_temp_free_i32(tcg_dest);
+            }
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
         tcg_temp_free_i32(tcg_rmode);
-
-        if (!sf) {
-            tcg_gen_ext32u_i64(tcg_int, tcg_int);
-        }
     }
 
     tcg_temp_free_ptr(tcg_fpstatus);
@@ -5791,7 +5842,20 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         /* actual FP conversions */
         bool itof = extract32(opcode, 1, 1);
 
-        if (type > 1 || (rmode != 0 && opcode > 1)) {
+        if (rmode != 0 && opcode > 1) {
+            unallocated_encoding(s);
+            return;
+        }
+        switch (type) {
+        case 0: /* float32 */
+        case 1: /* float64 */
+            break;
+        case 3: /* float16 */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
+        default:
             unallocated_encoding(s);
             return;
         }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 04/11] target/arm: Implement FCVT (scalar, fixed-point) for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (2 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16 Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 05/11] target/arm: Introduce and use read_fp_hreg Richard Henderson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 93fb15d185..d0ed125442 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5697,8 +5697,21 @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
     bool sf = extract32(insn, 31, 1);
     bool itof;
 
-    if (sbit || (type > 1)
-        || (!sf && scale < 32)) {
+    if (sbit || (!sf && scale < 32)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0: /* float32 */
+    case 1: /* float64 */
+        break;
+    case 3: /* float16 */
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 05/11] target/arm: Introduce and use read_fp_hreg
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (3 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 04/11] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-15 10:39   ` Alex Bennée
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 06/11] target/arm: Implement FP data-processing (2 source) for fp16 Richard Henderson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d0ed125442..78f12daaf6 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -615,6 +615,14 @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
     return v;
 }
 
+static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
+{
+    TCGv_i32 v = tcg_temp_new_i32();
+
+    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
+    return v;
+}
+
 /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
  * If SVE is not enabled, then there are only 128 bits in the vector.
  */
@@ -4881,11 +4889,9 @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
 static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
 {
     TCGv_ptr fpst = NULL;
-    TCGv_i32 tcg_op = tcg_temp_new_i32();
+    TCGv_i32 tcg_op = read_fp_hreg(s, rn);
     TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-    read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
-
     switch (opcode) {
     case 0x0: /* FMOV */
         tcg_gen_mov_i32(tcg_res, tcg_op);
@@ -7784,13 +7790,10 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
         tcg_temp_free_i64(tcg_op2);
         tcg_temp_free_i64(tcg_res);
     } else {
-        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
-        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+        TCGv_i32 tcg_op1 = read_fp_hreg(s, rn);
+        TCGv_i32 tcg_op2 = read_fp_hreg(s, rm);
         TCGv_i64 tcg_res = tcg_temp_new_i64();
 
-        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
-        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
-
         gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
         gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
 
@@ -8331,13 +8334,10 @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
 
     fpst = get_fpstatus_ptr(true);
 
-    tcg_op1 = tcg_temp_new_i32();
-    tcg_op2 = tcg_temp_new_i32();
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
     tcg_res = tcg_temp_new_i32();
 
-    read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
-    read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
-
     switch (fpopcode) {
     case 0x03: /* FMULX */
         gen_helper_advsimd_mulxh(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -12235,11 +12235,9 @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
     }
 
     if (is_scalar) {
-        TCGv_i32 tcg_op = tcg_temp_new_i32();
+        TCGv_i32 tcg_op = read_fp_hreg(s, rn);
         TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-        read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
-
         switch (fpop) {
         case 0x1a: /* FCVTNS */
         case 0x1b: /* FCVTMS */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 06/11] target/arm: Implement FP data-processing (2 source) for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (4 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 05/11] target/arm: Introduce and use read_fp_hreg Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 07/11] target/arm: Implement FP data-processing (3 " Richard Henderson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

We missed all of the scalar fp16 binary operations.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 78f12daaf6..66607668ce 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5299,6 +5299,61 @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (2 source) - half precision */
+static void handle_fp_2src_half(DisasContext *s, int opcode,
+                                int rd, int rn, int rm)
+{
+    TCGv_i32 tcg_op1;
+    TCGv_i32 tcg_op2;
+    TCGv_i32 tcg_res;
+    TCGv_ptr fpst;
+
+    tcg_res = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(true);
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+
+    switch (opcode) {
+    case 0x0: /* FMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x1: /* FDIV */
+        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x2: /* FADD */
+        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x3: /* FSUB */
+        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x4: /* FMAX */
+        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x5: /* FMIN */
+        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x6: /* FMAXNM */
+        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x7: /* FMINNM */
+        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x8: /* FNMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (2 source)
  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -5331,6 +5386,16 @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
         }
         handle_fp_2src_double(s, opcode, rd, rn, rm);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_2src_half(s, opcode, rd, rn, rm);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 07/11] target/arm: Implement FP data-processing (3 source) for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (5 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 06/11] target/arm: Implement FP data-processing (2 source) for fp16 Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 08/11] target/arm: Implement FCMP " Richard Henderson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

We missed all of the scalar fp16 fma operations.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 66607668ce..a79c09eda2 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5477,6 +5477,44 @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (3 source) - half precision */
+static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
+                                int rd, int rn, int rm, int ra)
+{
+    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
+    TCGv_i32 tcg_res = tcg_temp_new_i32();
+    TCGv_ptr fpst = get_fpstatus_ptr(true);
+
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+    tcg_op3 = read_fp_hreg(s, ra);
+
+    /* These are fused multiply-add, and must be done as one
+     * floating point operation with no rounding between the
+     * multiplication and addition steps.
+     * NB that doing the negations here as separate steps is
+     * correct : an input NaN should come out with its sign bit
+     * flipped if it is a negated-input.
+     */
+    if (o1 == true) {
+        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
+    }
+
+    if (o0 != o1) {
+        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
+    }
+
+    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_op3);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (3 source)
  *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
  * +---+---+---+-----------+------+----+------+----+------+------+------+
@@ -5506,6 +5544,16 @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
         }
         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 08/11] target/arm: Implement FCMP for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (6 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 07/11] target/arm: Implement FP data-processing (3 " Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 09/11] target/arm: Implement FCSEL " Richard Henderson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

From: Alex Bennée <alex.bennee@linaro.org>

These where missed out from the rest of the half-precision work.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[rth: Diagnose lack of FP16 before fp_access_check]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-a64.h    |  2 +
 target/arm/helper-a64.c    | 10 +++++
 target/arm/translate-a64.c | 88 ++++++++++++++++++++++++++++++--------
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index b8028ac98c..9d3a907049 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -19,6 +19,8 @@
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
+DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
+DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 549ed3513e..4f8034c513 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -85,6 +85,16 @@ static inline uint32_t float_rel_to_flags(int res)
     return flags;
 }
 
+uint64_t HELPER(vfp_cmph_a64)(float16 x, float16 y, void *fp_status)
+{
+    return float_rel_to_flags(float16_compare_quiet(x, y, fp_status));
+}
+
+uint64_t HELPER(vfp_cmpeh_a64)(float16 x, float16 y, void *fp_status)
+{
+    return float_rel_to_flags(float16_compare(x, y, fp_status));
+}
+
 uint64_t HELPER(vfp_cmps_a64)(float32 x, float32 y, void *fp_status)
 {
     return float_rel_to_flags(float32_compare_quiet(x, y, fp_status));
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index a79c09eda2..c078a54fa5 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4712,14 +4712,14 @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
     }
 }
 
-static void handle_fp_compare(DisasContext *s, bool is_double,
+static void handle_fp_compare(DisasContext *s, int size,
                               unsigned int rn, unsigned int rm,
                               bool cmp_with_zero, bool signal_all_nans)
 {
     TCGv_i64 tcg_flags = tcg_temp_new_i64();
-    TCGv_ptr fpst = get_fpstatus_ptr(false);
+    TCGv_ptr fpst = get_fpstatus_ptr(size == MO_16);
 
-    if (is_double) {
+    if (size == MO_64) {
         TCGv_i64 tcg_vn, tcg_vm;
 
         tcg_vn = read_fp_dreg(s, rn);
@@ -4736,19 +4736,35 @@ static void handle_fp_compare(DisasContext *s, bool is_double,
         tcg_temp_free_i64(tcg_vn);
         tcg_temp_free_i64(tcg_vm);
     } else {
-        TCGv_i32 tcg_vn, tcg_vm;
+        TCGv_i32 tcg_vn = tcg_temp_new_i32();
+        TCGv_i32 tcg_vm = tcg_temp_new_i32();
 
-        tcg_vn = read_fp_sreg(s, rn);
+        read_vec_element_i32(s, tcg_vn, rn, 0, size);
         if (cmp_with_zero) {
-            tcg_vm = tcg_const_i32(0);
+            tcg_gen_movi_i32(tcg_vm, 0);
         } else {
-            tcg_vm = read_fp_sreg(s, rm);
+            read_vec_element_i32(s, tcg_vm, rm, 0, size);
         }
-        if (signal_all_nans) {
-            gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
-        } else {
-            gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+
+        switch (size) {
+        case MO_32:
+            if (signal_all_nans) {
+                gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            } else {
+                gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            }
+            break;
+        case MO_16:
+            if (signal_all_nans) {
+                gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            } else {
+                gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            }
+            break;
+        default:
+            g_assert_not_reached();
         }
+
         tcg_temp_free_i32(tcg_vn);
         tcg_temp_free_i32(tcg_vm);
     }
@@ -4769,16 +4785,35 @@ static void handle_fp_compare(DisasContext *s, bool is_double,
 static void disas_fp_compare(DisasContext *s, uint32_t insn)
 {
     unsigned int mos, type, rm, op, rn, opc, op2r;
+    int size;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     op = extract32(insn, 14, 2);
     rn = extract32(insn, 5, 5);
     opc = extract32(insn, 3, 2);
     op2r = extract32(insn, 0, 3);
 
-    if (mos || op || op2r || type > 1) {
+    if (mos || op || op2r) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        size = MO_32;
+        break;
+    case 1:
+        size = MO_64;
+        break;
+    case 3:
+        size = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -4787,7 +4822,7 @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
         return;
     }
 
-    handle_fp_compare(s, type, rn, rm, opc & 1, opc & 2);
+    handle_fp_compare(s, size, rn, rm, opc & 1, opc & 2);
 }
 
 /* Floating point conditional compare
@@ -4801,16 +4836,35 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
     unsigned int mos, type, rm, cond, rn, op, nzcv;
     TCGv_i64 tcg_flags;
     TCGLabel *label_continue = NULL;
+    int size;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     cond = extract32(insn, 12, 4);
     rn = extract32(insn, 5, 5);
     op = extract32(insn, 4, 1);
     nzcv = extract32(insn, 0, 4);
 
-    if (mos || type > 1) {
+    if (mos) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        size = MO_32;
+        break;
+    case 1:
+        size = MO_64;
+        break;
+    case 3:
+        size = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -4831,7 +4885,7 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
         gen_set_label(label_match);
     }
 
-    handle_fp_compare(s, type, rn, rm, false, op);
+    handle_fp_compare(s, size, rn, rm, false, op);
 
     if (cond < 0x0e) {
         gen_set_label(label_continue);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 09/11] target/arm: Implement FCSEL for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (7 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 08/11] target/arm: Implement FCMP " Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 10/11] target/arm: Implement FMOV (immediate) " Richard Henderson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

From: Alex Bennée <alex.bennee@linaro.org>

These were missed out from the rest of the half-precision work.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[rth: Fix erroneous check vs type]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c078a54fa5..9dacb583ae 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4903,15 +4903,34 @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
     unsigned int mos, type, rm, cond, rn, rd;
     TCGv_i64 t_true, t_false, t_zero;
     DisasCompare64 c;
+    TCGMemOp sz;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     cond = extract32(insn, 12, 4);
     rn = extract32(insn, 5, 5);
     rd = extract32(insn, 0, 5);
 
-    if (mos || type > 1) {
+    if (mos) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        sz = MO_32;
+        break;
+    case 1:
+        sz = MO_64;
+        break;
+    case 3:
+        sz = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -4920,11 +4939,11 @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* Zero extend sreg inputs to 64 bits now.  */
+    /* Zero extend sreg & hreg inputs to 64 bits now.  */
     t_true = tcg_temp_new_i64();
     t_false = tcg_temp_new_i64();
-    read_vec_element(s, t_true, rn, 0, type ? MO_64 : MO_32);
-    read_vec_element(s, t_false, rm, 0, type ? MO_64 : MO_32);
+    read_vec_element(s, t_true, rn, 0, sz);
+    read_vec_element(s, t_false, rm, 0, sz);
 
     a64_test_cc(&c, cond);
     t_zero = tcg_const_i64(0);
@@ -4933,7 +4952,7 @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
     tcg_temp_free_i64(t_false);
     a64_free_cc(&c);
 
-    /* Note that sregs write back zeros to the high bits,
+    /* Note that sregs & hregs write back zeros to the high bits,
        and we've already done the zero-extension.  */
     write_fp_dreg(s, rd, t_true);
     tcg_temp_free_i64(t_true);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 10/11] target/arm: Implement FMOV (immediate) for fp16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (8 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 09/11] target/arm: Implement FCSEL " Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 11/11] target/arm: Fix sqrt_f16 exception raising Richard Henderson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

From: Alex Bennée <alex.bennee@linaro.org>

All the hard work is already done by vfp_expand_imm, we just need to
make sure we pick up the correct size.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[rth: Merge unallocated_encoding check with TCGMemOp conversion.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 9dacb583ae..35997969b4 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5674,11 +5674,25 @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
 {
     int rd = extract32(insn, 0, 5);
     int imm8 = extract32(insn, 13, 8);
-    int is_double = extract32(insn, 22, 2);
+    int type = extract32(insn, 22, 2);
     uint64_t imm;
     TCGv_i64 tcg_res;
+    TCGMemOp sz;
 
-    if (is_double > 1) {
+    switch (type) {
+    case 0:
+        sz = MO_32;
+        break;
+    case 1:
+        sz = MO_64;
+        break;
+    case 3:
+        sz = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -5687,7 +5701,7 @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
         return;
     }
 
-    imm = vfp_expand_imm(MO_32 + is_double, imm8);
+    imm = vfp_expand_imm(sz, imm8);
 
     tcg_res = tcg_const_i64(imm);
     write_fp_dreg(s, rd, tcg_res);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v4 11/11] target/arm: Fix sqrt_f16 exception raising
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (9 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 10/11] target/arm: Implement FMOV (immediate) " Richard Henderson
@ 2018-05-12  0:32 ` Richard Henderson
  2018-05-13  7:22 ` [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Alex Bennée
  2018-05-14 15:16 ` Peter Maydell
  12 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2018-05-12  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, alex.bennee, qemu-stable

From: Alex Bennée <alex.bennee@linaro.org>

We are meant to explicitly pass fpst, not cpu_env.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 35997969b4..a0b0c43d12 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4976,7 +4976,8 @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
         tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
         break;
     case 0x3: /* FSQRT */
-        gen_helper_sqrt_f16(tcg_res, tcg_op, cpu_env);
+        fpst = get_fpstatus_ptr(true);
+        gen_helper_sqrt_f16(tcg_res, tcg_op, fpst);
         break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16 Richard Henderson
@ 2018-05-13  7:21   ` Alex Bennée
  2018-05-14 15:01     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Bennée @ 2018-05-13  7:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> Cc: qemu-stable@nongnu.org
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Hmm oddly this fails to apply:

Applying: target/arm: Implement FCVT (scalar,integer) for fp16
Using index info to reconstruct a base tree...
M	target/arm/helper.c
M	target/arm/helper.h
M	target/arm/translate-a64.c
Falling back to patching base and 3-way merge...
Auto-merging target/arm/translate-a64.c
Auto-merging target/arm/helper.h
CONFLICT (content): Merge conflict in target/arm/helper.h
Auto-merging target/arm/helper.c
CONFLICT (content): Merge conflict in target/arm/helper.c
error: Failed to merge in the changes.
Patch failed at 0001 target/arm: Implement FCVT (scalar,integer) for fp16
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Which is odd considering not much has changed there recently.

> ---
>  target/arm/helper.h        |  6 +++
>  target/arm/helper.c        | 38 ++++++++++++++-
>  target/arm/translate-a64.c | 96 +++++++++++++++++++++++++++++++-------
>  3 files changed, 122 insertions(+), 18 deletions(-)
>
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 1969b37f2d..ce89968b2d 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -151,6 +151,10 @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
>  DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
>  DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
> +DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
> +DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
>  DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
>  DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
>  DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
> @@ -177,6 +181,8 @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
>  DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
>  DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
>  DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
> +DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
> +DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
>
>  DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
>  DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 817f9d81a0..c6fd7f9479 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11427,8 +11427,12 @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
>  #undef VFP_CONV_FIX_A64
>
>  /* Conversion to/from f16 can overflow to infinity before/after scaling.
> - * Therefore we convert to f64 (which does not round), scale,
> - * and then convert f64 to f16 (which may round).
> + * Therefore we convert to f64, scale, and then convert f64 to f16; or
> + * vice versa for conversion to integer.
> + *
> + * For 16- and 32-bit integers, the conversion to f64 never rounds.
> + * For 64-bit integers, any integer that would cause rounding will also
> + * overflow to f16 infinity, so there is no double rounding problem.
>   */
>
>  static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
> @@ -11446,6 +11450,16 @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
>      return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
>  }
>
> +float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
> +{
> +    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
> +}
> +
> +float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
> +{
> +    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
> +}
> +
>  static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
>  {
>      if (unlikely(float16_is_any_nan(f))) {
> @@ -11475,6 +11489,26 @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
>      return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
>  }
>
> +uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
> +uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
> +{
> +    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
> +}
> +
>  /* Set the current fp rounding mode and return the old one.
>   * The argument is a softfloat float_round_ value.
>   */
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 11d8c07943..93fb15d185 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5511,11 +5511,11 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>                             bool itof, int rmode, int scale, int sf, int type)
>  {
>      bool is_signed = !(opcode & 1);
> -    bool is_double = type;
>      TCGv_ptr tcg_fpstatus;
> -    TCGv_i32 tcg_shift;
> +    TCGv_i32 tcg_shift, tcg_single;
> +    TCGv_i64 tcg_double;
>
> -    tcg_fpstatus = get_fpstatus_ptr(false);
> +    tcg_fpstatus = get_fpstatus_ptr(type == 3);
>
>      tcg_shift = tcg_const_i32(64 - scale);
>
> @@ -5533,8 +5533,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>              tcg_int = tcg_extend;
>          }
>
> -        if (is_double) {
> -            TCGv_i64 tcg_double = tcg_temp_new_i64();
> +        switch (type) {
> +        case 1: /* float64 */
> +            tcg_double = tcg_temp_new_i64();
>              if (is_signed) {
>                  gen_helper_vfp_sqtod(tcg_double, tcg_int,
>                                       tcg_shift, tcg_fpstatus);
> @@ -5544,8 +5545,10 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>              }
>              write_fp_dreg(s, rd, tcg_double);
>              tcg_temp_free_i64(tcg_double);
> -        } else {
> -            TCGv_i32 tcg_single = tcg_temp_new_i32();
> +            break;
> +
> +        case 0: /* float32 */
> +            tcg_single = tcg_temp_new_i32();
>              if (is_signed) {
>                  gen_helper_vfp_sqtos(tcg_single, tcg_int,
>                                       tcg_shift, tcg_fpstatus);
> @@ -5555,6 +5558,23 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>              }
>              write_fp_sreg(s, rd, tcg_single);
>              tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        case 3: /* float16 */
> +            tcg_single = tcg_temp_new_i32();
> +            if (is_signed) {
> +                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
> +                                     tcg_shift, tcg_fpstatus);
> +            } else {
> +                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
> +                                     tcg_shift, tcg_fpstatus);
> +            }
> +            write_fp_sreg(s, rd, tcg_single);
> +            tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
>          }
>      } else {
>          TCGv_i64 tcg_int = cpu_reg(s, rd);
> @@ -5571,8 +5591,9 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>
>          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
>
> -        if (is_double) {
> -            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
> +        switch (type) {
> +        case 1: /* float64 */
> +            tcg_double = read_fp_dreg(s, rn);
>              if (is_signed) {
>                  if (!sf) {
>                      gen_helper_vfp_tosld(tcg_int, tcg_double,
> @@ -5590,9 +5611,14 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>                                           tcg_shift, tcg_fpstatus);
>                  }
>              }
> +            if (!sf) {
> +                tcg_gen_ext32u_i64(tcg_int, tcg_int);
> +            }
>              tcg_temp_free_i64(tcg_double);
> -        } else {
> -            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
> +            break;
> +
> +        case 0: /* float32 */
> +            tcg_single = read_fp_sreg(s, rn);
>              if (sf) {
>                  if (is_signed) {
>                      gen_helper_vfp_tosqs(tcg_int, tcg_single,
> @@ -5614,14 +5640,39 @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
>                  tcg_temp_free_i32(tcg_dest);
>              }
>              tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        case 3: /* float16 */
> +            tcg_single = read_fp_sreg(s, rn);
> +            if (sf) {
> +                if (is_signed) {
> +                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                } else {
> +                    gen_helper_vfp_touqh(tcg_int, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                }
> +            } else {
> +                TCGv_i32 tcg_dest = tcg_temp_new_i32();
> +                if (is_signed) {
> +                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                } else {
> +                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
> +                                         tcg_shift, tcg_fpstatus);
> +                }
> +                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
> +                tcg_temp_free_i32(tcg_dest);
> +            }
> +            tcg_temp_free_i32(tcg_single);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
>          }
>
>          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
>          tcg_temp_free_i32(tcg_rmode);
> -
> -        if (!sf) {
> -            tcg_gen_ext32u_i64(tcg_int, tcg_int);
> -        }
>      }
>
>      tcg_temp_free_ptr(tcg_fpstatus);
> @@ -5791,7 +5842,20 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
>          /* actual FP conversions */
>          bool itof = extract32(opcode, 1, 1);
>
> -        if (type > 1 || (rmode != 0 && opcode > 1)) {
> +        if (rmode != 0 && opcode > 1) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        switch (type) {
> +        case 0: /* float32 */
> +        case 1: /* float64 */
> +            break;
> +        case 3: /* float16 */
> +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +                break;
> +            }
> +            /* fallthru */
> +        default:
>              unallocated_encoding(s);
>              return;
>          }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (10 preceding siblings ...)
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 11/11] target/arm: Fix sqrt_f16 exception raising Richard Henderson
@ 2018-05-13  7:22 ` Alex Bennée
  2018-05-14 15:16 ` Peter Maydell
  12 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2018-05-13  7:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> Changes since v3:
>   * Fixup rebase vs target-arm.next.  One of the middle
>     patches had conflicts resolved incorrectly, so the
>     patch set was non-bisectable.

I've tested with the new RISU set:

 http://people.linaro.org/~alex.bennee/testcases/arm64.risu/testcases_aarch64_all_half.tar.xz

without any LD/ST opcodes on your rth/tgt-arm-fp16 branch

Tested-by: Alex Bennée <alex.bennee@linaro.org>

>
> Changes since v2:
>   * Rebased vs target-arm.next.
>   * Merged Peter's review.
>   * Split out return fix as a separate patch.
>
> Changes since v1:
>   * Rebased vs master instead of tgt-arm-sve-9.
>   * Alex did some additional digging through the ARM xhtml
>     and came up with some additional missing instructions.
>   * Everything cc'd to qemu-stable.
>
>
> r~
>
>
> Alex Bennée (4):
>   target/arm: Implement FCMP for fp16
>   target/arm: Implement FCSEL for fp16
>   target/arm: Implement FMOV (immediate) for fp16
>   target/arm: Fix sqrt_f16 exception raising
>
> Richard Henderson (7):
>   target/arm: Implement FMOV (general) for fp16
>   target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
>   target/arm: Implement FCVT (scalar,integer) for fp16
>   target/arm: Implement FCVT (scalar,fixed-point) for fp16
>   target/arm: Introduce and use read_fp_hreg
>   target/arm: Implement FP data-processing (2 source) for fp16
>   target/arm: Implement FP data-processing (3 source) for fp16
>
>  target/arm/helper-a64.h    |   2 +
>  target/arm/helper.h        |   6 +
>  target/arm/helper-a64.c    |  10 +
>  target/arm/helper.c        |  38 +++-
>  target/arm/translate-a64.c | 421 +++++++++++++++++++++++++++++++------
>  5 files changed, 413 insertions(+), 64 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-05-13  7:21   ` Alex Bennée
@ 2018-05-14 15:01     ` Richard Henderson
  2018-05-14 15:52       ` Alex Bennée
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2018-05-14 15:01 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, peter.maydell, qemu-stable

On 05/13/2018 12:21 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Cc: qemu-stable@nongnu.org
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> Hmm oddly this fails to apply:

Did try vs master or target-arm.next (as mentioned in the cover)?


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16
  2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
                   ` (11 preceding siblings ...)
  2018-05-13  7:22 ` [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Alex Bennée
@ 2018-05-14 15:16 ` Peter Maydell
  12 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2018-05-14 15:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Alex Bennée

On 12 May 2018 at 01:32, Richard Henderson <richard.henderson@linaro.org> wrote:
> Changes since v3:
>   * Fixup rebase vs target-arm.next.  One of the middle
>     patches had conflicts resolved incorrectly, so the
>     patch set was non-bisectable.
>
> Changes since v2:
>   * Rebased vs target-arm.next.
>   * Merged Peter's review.
>   * Split out return fix as a separate patch.
>
> Changes since v1:
>   * Rebased vs master instead of tgt-arm-sve-9.
>   * Alex did some additional digging through the ARM xhtml
>     and came up with some additional missing instructions.
>   * Everything cc'd to qemu-stable.
>
>
> r~
>

Applied all to target-arm.next, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-05-14 15:01     ` Richard Henderson
@ 2018-05-14 15:52       ` Alex Bennée
  2018-05-15 10:42         ` Alex Bennée
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Bennée @ 2018-05-14 15:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> On 05/13/2018 12:21 AM, Alex Bennée wrote:
>>
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> Cc: qemu-stable@nongnu.org
>>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>
>> Hmm oddly this fails to apply:
>
> Did try vs master or target-arm.next (as mentioned in the cover)?

Apologies, I missed that.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16 Richard Henderson
@ 2018-05-15 10:37   ` Alex Bennée
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2018-05-15 10:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> Adding the fp16 moves to/from general registers.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 4d1b220cc6..5b8cf75e9f 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5700,6 +5700,15 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
>              tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
>              clear_vec_high(s, true, rd);
>              break;
> +        case 3:
> +            /* 16 bit */
> +            tmp = tcg_temp_new_i64();
> +            tcg_gen_ext16u_i64(tmp, tcg_rn);
> +            write_fp_dreg(s, rd, tmp);
> +            tcg_temp_free_i64(tmp);
> +            break;
> +        default:
> +            g_assert_not_reached();
>          }
>      } else {
>          TCGv_i64 tcg_rd = cpu_reg(s, rd);
> @@ -5717,6 +5726,12 @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
>              /* 64 bits from top half */
>              tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
>              break;
> +        case 3:
> +            /* 16 bit */
> +            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
> +            break;
> +        default:
> +            g_assert_not_reached();
>          }
>      }
>  }
> @@ -5756,6 +5771,12 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
>          case 0xa: /* 64 bit */
>          case 0xd: /* 64 bit to top half of quad */
>              break;
> +        case 0x6: /* 16-bit float, 32-bit int */
> +        case 0xe: /* 16-bit float, 64-bit int */
> +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
> +                break;
> +            }
> +            /* fallthru */
>          default:
>              /* all other sf/type/rmode combinations are invalid */
>              unallocated_encoding(s);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/11] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 02/11] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv Richard Henderson
@ 2018-05-15 10:37   ` Alex Bennée
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2018-05-15 10:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> No sense in emitting code after the exception.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 5b8cf75e9f..11d8c07943 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -5780,7 +5780,7 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
>          default:
>              /* all other sf/type/rmode combinations are invalid */
>              unallocated_encoding(s);
> -            break;
> +            return;
>          }
>
>          if (!fp_access_check(s)) {


--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/11] target/arm: Introduce and use read_fp_hreg
  2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 05/11] target/arm: Introduce and use read_fp_hreg Richard Henderson
@ 2018-05-15 10:39   ` Alex Bennée
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2018-05-15 10:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell, qemu-stable


Richard Henderson <richard.henderson@linaro.org> writes:

> Cc: qemu-stable@nongnu.org
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.c | 30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index d0ed125442..78f12daaf6 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -615,6 +615,14 @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
>      return v;
>  }
>
> +static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
> +{
> +    TCGv_i32 v = tcg_temp_new_i32();
> +
> +    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
> +    return v;
> +}
> +
>  /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
>   * If SVE is not enabled, then there are only 128 bits in the vector.
>   */
> @@ -4881,11 +4889,9 @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
>  static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
>  {
>      TCGv_ptr fpst = NULL;
> -    TCGv_i32 tcg_op = tcg_temp_new_i32();
> +    TCGv_i32 tcg_op = read_fp_hreg(s, rn);
>      TCGv_i32 tcg_res = tcg_temp_new_i32();
>
> -    read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
> -
>      switch (opcode) {
>      case 0x0: /* FMOV */
>          tcg_gen_mov_i32(tcg_res, tcg_op);
> @@ -7784,13 +7790,10 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
>          tcg_temp_free_i64(tcg_op2);
>          tcg_temp_free_i64(tcg_res);
>      } else {
> -        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
> -        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
> +        TCGv_i32 tcg_op1 = read_fp_hreg(s, rn);
> +        TCGv_i32 tcg_op2 = read_fp_hreg(s, rm);
>          TCGv_i64 tcg_res = tcg_temp_new_i64();
>
> -        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
> -        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
> -
>          gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
>          gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
>
> @@ -8331,13 +8334,10 @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
>
>      fpst = get_fpstatus_ptr(true);
>
> -    tcg_op1 = tcg_temp_new_i32();
> -    tcg_op2 = tcg_temp_new_i32();
> +    tcg_op1 = read_fp_hreg(s, rn);
> +    tcg_op2 = read_fp_hreg(s, rm);
>      tcg_res = tcg_temp_new_i32();
>
> -    read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
> -    read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
> -
>      switch (fpopcode) {
>      case 0x03: /* FMULX */
>          gen_helper_advsimd_mulxh(tcg_res, tcg_op1, tcg_op2, fpst);
> @@ -12235,11 +12235,9 @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
>      }
>
>      if (is_scalar) {
> -        TCGv_i32 tcg_op = tcg_temp_new_i32();
> +        TCGv_i32 tcg_op = read_fp_hreg(s, rn);
>          TCGv_i32 tcg_res = tcg_temp_new_i32();
>
> -        read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
> -
>          switch (fpop) {
>          case 0x1a: /* FCVTNS */
>          case 0x1b: /* FCVTMS */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16
  2018-05-14 15:52       ` Alex Bennée
@ 2018-05-15 10:42         ` Alex Bennée
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2018-05-15 10:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, peter.maydell, qemu-stable


Alex Bennée <alex.bennee@linaro.org> writes:

> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> On 05/13/2018 12:21 AM, Alex Bennée wrote:
>>>
>>> Richard Henderson <richard.henderson@linaro.org> writes:
>>>
>>>> Cc: qemu-stable@nongnu.org
>>>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>>
>>> Hmm oddly this fails to apply:
>>
>> Did try vs master or target-arm.next (as mentioned in the cover)?
>
> Apologies, I missed that.

Fortunately time heals all things and it now applies cleanly to master
as the prerequisites have gone in.

Hopefully Peters scripts will pick up the remaining tags?

Anyway now tested on direct master branch as well:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-05-15 10:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-12  0:32 [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 01/11] target/arm: Implement FMOV (general) for fp16 Richard Henderson
2018-05-15 10:37   ` Alex Bennée
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 02/11] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv Richard Henderson
2018-05-15 10:37   ` Alex Bennée
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 03/11] target/arm: Implement FCVT (scalar, integer) for fp16 Richard Henderson
2018-05-13  7:21   ` Alex Bennée
2018-05-14 15:01     ` Richard Henderson
2018-05-14 15:52       ` Alex Bennée
2018-05-15 10:42         ` Alex Bennée
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 04/11] target/arm: Implement FCVT (scalar, fixed-point) " Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 05/11] target/arm: Introduce and use read_fp_hreg Richard Henderson
2018-05-15 10:39   ` Alex Bennée
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 06/11] target/arm: Implement FP data-processing (2 source) for fp16 Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 07/11] target/arm: Implement FP data-processing (3 " Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 08/11] target/arm: Implement FCMP " Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 09/11] target/arm: Implement FCSEL " Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 10/11] target/arm: Implement FMOV (immediate) " Richard Henderson
2018-05-12  0:32 ` [Qemu-devel] [PATCH v4 11/11] target/arm: Fix sqrt_f16 exception raising Richard Henderson
2018-05-13  7:22 ` [Qemu-devel] [PATCH v4 00/11] target/arm: Fixups for ARM_FEATURE_V8_FP16 Alex Bennée
2018-05-14 15:16 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.