[Qemu-devel] [PATCH v3 1/4] target-tilegx: Add floating point shared functions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 1/4] target-tilegx: Add floating point shared functions
       [not found] <56698865.8050901@emindsoft.com.cn>
@ 2015-12-10 14:13 ` Chen Gang
  2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation Chen Gang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Chen Gang @ 2015-12-10 14:13 UTC (permalink / raw)
  To: rth, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


They are used by fsingle and fdouble helpers.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
---
 target-tilegx/helper-fshared.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)
 create mode 100644 target-tilegx/helper-fshared.c

diff --git a/target-tilegx/helper-fshared.c b/target-tilegx/helper-fshared.c
new file mode 100644
index 0000000..d669f58
--- /dev/null
+++ b/target-tilegx/helper-fshared.c
@@ -0,0 +1,53 @@
+/*
+ *  TILE-Gx virtual Floating point shared functions
+ *
+ *  Copyright (c) 2015 Chen Gang
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+static inline uint64_t create_fsfd_flag_un(void)
+{
+    return 1 << 25;
+}
+
+static inline uint64_t create_fsfd_flag_lt(void)
+{
+    return 1 << 26;
+}
+
+static inline uint64_t create_fsfd_flag_le(void)
+{
+    return 1 << 27;
+}
+
+static inline uint64_t create_fsfd_flag_gt(void)
+{
+    return 1 << 28;
+}
+
+static inline uint64_t create_fsfd_flag_ge(void)
+{
+    return 1 << 29;
+}
+
+static inline uint64_t create_fsfd_flag_eq(void)
+{
+    return 1 << 30;
+}
+
+static inline uint64_t create_fsfd_flag_ne(void)
+{
+    return 1ULL << 31;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
       [not found] <56698865.8050901@emindsoft.com.cn>
  2015-12-10 14:13 ` [Qemu-devel] [PATCH v3 1/4] target-tilegx: Add floating point shared functions Chen Gang
@ 2015-12-10 14:15 ` Chen Gang
  2015-12-10 17:15   ` Richard Henderson
  2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double " Chen Gang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-10 14:15 UTC (permalink / raw)
  To: rth, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


It passes gcc testsuite.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
---
 target-tilegx/helper-fsingle.c | 212 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 212 insertions(+)
 create mode 100644 target-tilegx/helper-fsingle.c

diff --git a/target-tilegx/helper-fsingle.c b/target-tilegx/helper-fsingle.c
new file mode 100644
index 0000000..a33837e
--- /dev/null
+++ b/target-tilegx/helper-fsingle.c
@@ -0,0 +1,212 @@
+/*
+ * QEMU TILE-Gx helpers
+ *
+ *  Copyright (c) 2015 Chen Gang
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * <http://www.gnu.org/licenses/lgpl-2.1.html>
+ */
+
+#include "cpu.h"
+#include "qemu-common.h"
+#include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
+
+#include "helper-fshared.c"
+
+/*
+ * FSingle instructions implemenation:
+ *
+ * fsingle_add1         ; calc srca and srcb,
+ *                      ; convert float_32 to TileGXFPSFmt result.
+ *                      ; move TileGXFPSFmt result to dest.
+ *
+ * fsingle_sub1         ; calc srca and srcb.
+ *                      ; convert float_32 to TileGXFPSFmt result.
+ *                      ; move TileGXFPSFmt result to dest.
+ *
+ * fsingle_addsub2      ; nop.
+ *
+ * fsingle_mul1         ; calc srca and srcb.
+ *                      ; convert float_32 value to TileGXFPSFmt result.
+ *                      ; move TileGXFPSFmt result to dest.
+ *
+ * fsingle_mul2         ; move srca to dest.
+ *
+ * fsingle_pack1        ; nop
+ *
+ * fsingle_pack2        ; treate srca as TileGXFPSFmt result.
+ *                      ; convert TileGXFPSFmt result to float_32 value.
+ *                      ; move float_32 value to dest.
+ */
+
+#define TILEGX_F_CALC_CVT   0     /* convert int to fsingle */
+#define TILEGX_F_CALC_NCVT  1     /* Not convertion */
+
+static uint32_t get_f32_exp(float32 f)
+{
+    return extract32(float32_val(f), 23, 8);
+}
+
+static void set_f32_exp(float32 *f, uint32_t exp)
+{
+    *f = make_float32(deposit32(float32_val(*f), 23, 8, exp));
+}
+
+static uint32_t get_f32_man(float32 f)
+{
+    return float32_val(f) & 0x7fffff;
+}
+
+static float32 create_f32_man(uint32_t man)
+{
+     return make_float32(man & 0x7fffff);
+}
+
+static inline uint32_t get_fsingle_exp(uint64_t n)
+{
+    return n & 0xff;
+}
+
+static inline uint64_t create_fsingle_exp(uint32_t exp)
+{
+    return exp & 0xff;
+}
+
+static inline uint32_t get_fsingle_sign(uint64_t n)
+{
+    return test_bit(10, &n);
+}
+
+static inline void set_fsingle_sign(uint64_t *n)
+{
+    set_bit(10, n);
+}
+
+static inline unsigned int get_fsingle_calc(uint64_t n)
+{
+    return test_bit(11, &n);
+}
+
+static inline void set_fsingle_calc(uint64_t *n, uint32_t calc)
+{
+    set_bit(11, n);
+}
+
+static inline unsigned int get_fsingle_man(uint64_t n)
+{
+    return n >> 32;
+}
+
+static inline uint64_t create_fsingle_man(uint32_t man)
+{
+    return (uint64_t)man << 32;
+}
+
+static uint64_t float32_to_sfmt(float32 f)
+{
+    uint64_t sfmt = 0;
+
+    if (float32_is_neg(f)) {
+        set_fsingle_sign(&sfmt);
+    }
+    sfmt |= create_fsingle_exp(get_f32_exp(f));
+    sfmt |= create_fsingle_man((get_f32_man(f) << 8) | (1 << 31));
+
+    return sfmt;
+}
+
+static float32 sfmt_to_float32(uint64_t sfmt, float_status *fp_status)
+{
+    float32 f;
+    uint32_t sign = get_fsingle_sign(sfmt);
+    uint32_t man = get_fsingle_man(sfmt);
+
+    if (get_fsingle_calc(sfmt) == TILEGX_F_CALC_CVT) {
+        if (sign) {
+            return int32_to_float32(0 - man, fp_status);
+        } else {
+            return uint32_to_float32(man, fp_status);
+        }
+    } else {
+        f = float32_set_sign(float32_zero, sign);
+        f |= create_f32_man(man >> 8);
+        set_f32_exp(&f, get_fsingle_exp(sfmt));
+    }
+
+    return f;
+}
+
+uint64_t helper_fsingle_pack2(CPUTLGState *env, uint64_t srca)
+{
+    return float32_val(sfmt_to_float32(srca, &env->fp_status));
+}
+
+static void ana_bits(float_status *fp_status,
+                     float32 fsrca, float32 fsrcb, uint64_t *sfmt)
+{
+    if (float32_eq(fsrca, fsrcb, fp_status)) {
+        *sfmt |= create_fsfd_flag_eq();
+    } else {
+        *sfmt |= create_fsfd_flag_ne();
+    }
+
+    if (float32_lt(fsrca, fsrcb, fp_status)) {
+        *sfmt |= create_fsfd_flag_lt();
+    }
+    if (float32_le(fsrca, fsrcb, fp_status)) {
+        *sfmt |= create_fsfd_flag_le();
+    }
+
+    if (float32_lt(fsrcb, fsrca, fp_status)) {
+        *sfmt |= create_fsfd_flag_gt();
+    }
+    if (float32_le(fsrcb, fsrca, fp_status)) {
+        *sfmt |= create_fsfd_flag_ge();
+    }
+
+    if (float32_unordered(fsrca, fsrcb, fp_status)) {
+        *sfmt |= create_fsfd_flag_un();
+    }
+}
+
+static uint64_t main_calc(float_status *fp_status,
+                          float32 fsrca, float32 fsrcb,
+                          float32 (*calc)(float32, float32, float_status *))
+{
+    uint64_t sfmt = float32_to_sfmt(calc(fsrca, fsrcb, fp_status));
+
+    ana_bits(fp_status, fsrca, fsrcb, &sfmt);
+
+    set_fsingle_calc(&sfmt, TILEGX_F_CALC_NCVT);
+    return sfmt;
+}
+
+uint64_t helper_fsingle_add1(CPUTLGState *env, uint64_t srca, uint64_t srcb)
+{
+    return main_calc(&env->fp_status,
+                     make_float32(srca), make_float32(srcb), float32_add);
+}
+
+uint64_t helper_fsingle_sub1(CPUTLGState *env, uint64_t srca, uint64_t srcb)
+{
+    return main_calc(&env->fp_status,
+                     make_float32(srca), make_float32(srcb), float32_sub);
+}
+
+uint64_t helper_fsingle_mul1(CPUTLGState *env, uint64_t srca, uint64_t srcb)
+{
+    return main_calc(&env->fp_status,
+                     make_float32(srca), make_float32(srcb), float32_mul);
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation
       [not found] <56698865.8050901@emindsoft.com.cn>
  2015-12-10 14:13 ` [Qemu-devel] [PATCH v3 1/4] target-tilegx: Add floating point shared functions Chen Gang
  2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation Chen Gang
@ 2015-12-10 14:15 ` Chen Gang
  2015-12-10 21:17   ` Richard Henderson
  2015-12-10 14:16 ` [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation Chen Gang
  2015-12-10 14:26 ` [Qemu-devel] [PATCH v3 0/4] target-tilegx: Implement floating point instructions Chen Gang
  4 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-10 14:15 UTC (permalink / raw)
  To: rth, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


It passes gcc testsuite.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
---
 target-tilegx/helper-fdouble.c | 400 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 400 insertions(+)
 create mode 100644 target-tilegx/helper-fdouble.c

diff --git a/target-tilegx/helper-fdouble.c b/target-tilegx/helper-fdouble.c
new file mode 100644
index 0000000..3b824f7
--- /dev/null
+++ b/target-tilegx/helper-fdouble.c
@@ -0,0 +1,400 @@
+/*
+ * QEMU TILE-Gx helpers
+ *
+ *  Copyright (c) 2015 Chen Gang
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * <http://www.gnu.org/licenses/lgpl-2.1.html>
+ */
+
+#include "cpu.h"
+#include "qemu-common.h"
+#include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
+
+#include "helper-fshared.c"
+
+/*
+ * FDouble instructions implemenation:
+ *
+ * fdouble_unpack_min   ; srca and srcb are float_64 value.
+ *                      ; get the min absolute value's mantissa.
+ *                      ; move "mantissa >> (exp_max - exp_min)" to dest.
+ *
+ * fdouble_unpack_max   ; srca and srcb are float_64 value.
+ *                      ; get the max absolute value's mantissa.
+ *                      ; move mantissa to dest.
+ *
+ * fdouble_add_flags    ; srca and srcb are float_64 value.
+ *                      ; calc exp (exp_max), sign, and comp bits for flags.
+ *                      ; set addsub bit to flags and move flags to dest.
+ *
+ * fdouble_sub_flags    ; srca and srcb are float_64 value.
+ *                      ; calc exp (exp_max), sign, and comp bits for flags.
+ *                      ; set addsub bit to flags and move flags to dest.
+ *
+ * fdouble_addsub:      ; dest, srca (max, min mantissa), and srcb (flags).
+ *                      ; "dest +/- srca" depend on the add/sub bit of flags.
+ *                      ; move result mantissa to dest.
+ *
+ * fdouble_mul_flags:   ; srca and srcb are float_64 value.
+ *                      ; calc sign (xor), exp (min + max), and comp bits.
+ *                      ; mix sign, exp, and comp bits as flags to dest.
+ *
+ * fdouble_pack1        ; move srcb (flags) to dest.
+ *
+ * fdouble_pack2        ; srca, srcb (high, low mantissa), and dest (flags)
+ *                      ; normalize and pack result from srca, srcb, and dest.
+ *                      ; move result to dest.
+ */
+
+#define TILEGX_F_EXP_DZERO  0x3ff /* Zero exp for double 11-bits */
+#define TILEGX_F_EXP_DMAX   0x7fe /* max exp for double 11-bits */
+#define TILEGX_F_EXP_DUF    0x1000/* underflow exp bit for double */
+
+#define TILEGX_F_MAN_HBIT   (1ULL << 59)
+
+#define TILEGX_F_CALC_ADD   1     /* Perform absolute add operation */
+#define TILEGX_F_CALC_SUB   2     /* Perform absolute sub operation */
+#define TILEGX_F_CALC_MUL   3     /* Perform absolute mul operation */
+
+static uint32_t get_f64_exp(float64 d)
+{
+    return extract64(float64_val(d), 52, 11);
+}
+
+static void set_f64_exp(float64 *d, uint32_t exp)
+{
+    *d = make_float64(deposit64(float64_val(*d), 52, 11, exp));
+}
+
+static uint64_t get_f64_man(float64 d)
+{
+    return extract64(float64_val(d), 0, 52);
+}
+
+static uint64_t fr_to_man(float64 d)
+{
+    uint64_t val = get_f64_man(d) << 7;
+
+    if (get_f64_exp(d)) {
+        val |= TILEGX_F_MAN_HBIT;
+    }
+
+    return val;
+}
+
+static uint64_t get_fdouble_man(uint64_t n)
+{
+    return extract64(n, 0, 60);
+}
+
+static void set_fdouble_man(uint64_t *n, uint64_t man)
+{
+    *n = deposit64(*n, 0, 60, man);
+}
+
+static uint64_t get_fdouble_man_of(uint64_t n)
+{
+    return test_bit(60, &n);
+}
+
+static void clear_fdouble_man_of(uint64_t *n)
+{
+    return clear_bit(60, n);
+}
+
+static uint32_t get_fdouble_nan(uint64_t n)
+{
+    return test_bit(24, &n);
+}
+
+static void set_fdouble_nan(uint64_t *n)
+{
+    set_bit(24, n);
+}
+
+static uint32_t get_fdouble_inf(uint64_t n)
+{
+    return test_bit(23, &n);
+}
+
+static void set_fdouble_inf(uint64_t *n)
+{
+    set_bit(23, n);
+}
+
+static uint32_t get_fdouble_calc(uint64_t n)
+{
+    return extract32(n, 21, 2);
+}
+
+static void set_fdouble_calc(uint64_t *n, uint32_t calc)
+{
+    *n = deposit64(*n, 21, 2, calc);
+}
+
+static uint32_t get_fdouble_sign(uint64_t n)
+{
+    return test_bit(20, &n);
+}
+
+static void set_fdouble_sign(uint64_t *n)
+{
+    set_bit(20, n);
+}
+
+static uint32_t get_fdouble_vexp(uint64_t n)
+{
+    return extract32(n, 7, 13);
+}
+
+static void set_fdouble_vexp(uint64_t *n, uint32_t vexp)
+{
+    *n = deposit64(*n, 7, 13, vexp);
+}
+
+uint64_t helper_fdouble_unpack_min(CPUTLGState *env,
+                                   uint64_t srca, uint64_t srcb)
+{
+    uint64_t v = 0;
+    uint32_t expa = get_f64_exp(srca);
+    uint32_t expb = get_f64_exp(srcb);
+
+    if (float64_is_any_nan(srca) || float64_is_any_nan(srcb)
+        || float64_is_infinity(srca) || float64_is_infinity(srcb)) {
+        return 0;
+    } else if (expa > expb) {
+        if (expa - expb < 64) {
+            set_fdouble_man(&v, fr_to_man(srcb) >> (expa - expb));
+        } else {
+            return 0;
+        }
+    } else if (expa < expb) {
+        if (expb - expa < 64) {
+            set_fdouble_man(&v, fr_to_man(srca) >> (expb - expa));
+        } else {
+            return 0;
+        }
+    } else if (get_f64_man(srca) > get_f64_man(srcb)) {
+        set_fdouble_man(&v, fr_to_man(srcb));
+    } else {
+        set_fdouble_man(&v, fr_to_man(srca));
+    }
+
+    return v;
+}
+
+uint64_t helper_fdouble_unpack_max(CPUTLGState *env,
+                                   uint64_t srca, uint64_t srcb)
+{
+    uint64_t v = 0;
+    uint32_t expa = get_f64_exp(srca);
+    uint32_t expb = get_f64_exp(srcb);
+
+    if (float64_is_any_nan(srca) || float64_is_any_nan(srcb)
+        || float64_is_infinity(srca) || float64_is_infinity(srcb)) {
+        return 0;
+    } else if (expa > expb) {
+        set_fdouble_man(&v, fr_to_man(srca));
+    } else if (expa < expb) {
+        set_fdouble_man(&v, fr_to_man(srcb));
+    } else if (get_f64_man(srca) > get_f64_man(srcb)) {
+        set_fdouble_man(&v, fr_to_man(srca));
+    } else {
+        set_fdouble_man(&v, fr_to_man(srcb));
+    }
+
+    return v;
+}
+
+uint64_t helper_fdouble_addsub(CPUTLGState *env,
+                               uint64_t dest, uint64_t srca, uint64_t srcb)
+{
+    if (get_fdouble_calc(srcb) == TILEGX_F_CALC_ADD) {
+        return dest + srca; /* maybe set addsub overflow bit */
+    } else {
+        return dest - srca;
+    }
+}
+
+/* absolute-add/mul may cause add/mul carry or overflow */
+static bool proc_oflow(uint64_t *flags, uint64_t *v, uint64_t *srcb)
+{
+    if (get_fdouble_man_of(*v)) {
+        set_fdouble_vexp(flags, get_fdouble_vexp(*flags) + 1);
+        *srcb >>= 1;
+        *srcb |= *v << 63;
+        *v >>= 1;
+        clear_fdouble_man_of(v);
+    }
+    return get_fdouble_vexp(*flags) > TILEGX_F_EXP_DMAX;
+}
+
+uint64_t helper_fdouble_pack2(CPUTLGState *env, uint64_t flags /* dest */,
+                              uint64_t srca, uint64_t srcb)
+{
+    uint64_t v = srca;
+    float64 d = float64_set_sign(float64_zero, get_fdouble_sign(flags));
+
+    /*
+     * fdouble_add_flags, fdouble_sub_flags, or fdouble_mul_flags have
+     * processed exceptions. So need not process fp_status, again.
+     */
+
+    if (get_fdouble_nan(flags)) {
+        return float64_val(float64_default_nan);
+    } else if (get_fdouble_inf(flags)) {
+        return float64_val(d |= float64_infinity);
+    }
+
+    /* absolute-mul needs left shift 4 + 1 bytes to match the real mantissa */
+    if (get_fdouble_calc(flags) == TILEGX_F_CALC_MUL) {
+        v <<= 5;
+        v |= srcb >> 59;
+        srcb <<= 5;
+    }
+
+    /* must check underflow, firstly */
+    if (get_fdouble_vexp(flags) & TILEGX_F_EXP_DUF) {
+        return float64_val(d);
+    }
+
+    if (proc_oflow(&flags, &v, &srcb)) {
+        return float64_val(d |= float64_infinity);
+    }
+
+    while (!(get_fdouble_man(v) & TILEGX_F_MAN_HBIT)
+           && (get_fdouble_man(v) | srcb)) {
+        set_fdouble_vexp(&flags, get_fdouble_vexp(flags) - 1);
+        set_fdouble_man(&v, get_fdouble_man(v) << 1);
+        set_fdouble_man(&v, get_fdouble_man(v) | (srcb >> 63));
+        srcb <<= 1;
+    }
+
+    /* check underflow, again, after format */
+    if ((get_fdouble_vexp(flags) & TILEGX_F_EXP_DUF) || !get_fdouble_man(v)) {
+        return float64_val(d);
+    }
+
+    if (get_fdouble_sign(flags)) {
+        d = int64_to_float64(0 - get_fdouble_man(v), &env->fp_status);
+    } else {
+        d = uint64_to_float64(get_fdouble_man(v), &env->fp_status);
+    }
+
+    if (get_f64_exp(d) == 59 + TILEGX_F_EXP_DZERO) {
+        set_f64_exp(&d, get_fdouble_vexp(flags));
+    } else {                            /* for carry and overflow again */
+        set_f64_exp(&d, get_fdouble_vexp(flags) + 1);
+        if (get_f64_exp(d) == TILEGX_F_EXP_DMAX) {
+            d = float64_infinity;
+        }
+    }
+
+    d = float64_set_sign(d, get_fdouble_sign(flags));
+
+    return float64_val(d);
+}
+
+static void ana_bits(float_status *fp_status,
+                     float64 fsrca, float64 fsrcb, uint64_t *dfmt)
+{
+    if (float64_eq(fsrca, fsrcb, fp_status)) {
+        *dfmt |= create_fsfd_flag_eq();
+    } else {
+        *dfmt |= create_fsfd_flag_ne();
+    }
+
+    if (float64_lt(fsrca, fsrcb, fp_status)) {
+        *dfmt |= create_fsfd_flag_lt();
+    }
+    if (float64_le(fsrca, fsrcb, fp_status)) {
+        *dfmt |= create_fsfd_flag_le();
+    }
+
+    if (float64_lt(fsrcb, fsrca, fp_status)) {
+        *dfmt |= create_fsfd_flag_gt();
+    }
+    if (float64_le(fsrcb, fsrca, fp_status)) {
+        *dfmt |= create_fsfd_flag_ge();
+    }
+
+    if (float64_unordered(fsrca, fsrcb, fp_status)) {
+        *dfmt |= create_fsfd_flag_un();
+    }
+}
+
+static uint64_t main_calc(float_status *fp_status,
+                          float64 fsrca, float64 fsrcb,
+                          float64 (*calc)(float64, float64, float_status *))
+{
+    float64 d;
+    uint64_t flags = 0;
+    uint32_t expa = get_f64_exp(fsrca);
+    uint32_t expb = get_f64_exp(fsrcb);
+
+    ana_bits(fp_status, fsrca, fsrcb, &flags);
+
+    d = calc(fsrca, fsrcb, fp_status); /* also check exceptions */
+    if (float64_is_neg(d)) {
+        set_fdouble_sign(&flags);
+    }
+
+    if (float64_is_any_nan(d)) {
+        set_fdouble_nan(&flags);
+    } else if (float64_is_infinity(d)) {
+        set_fdouble_inf(&flags);
+    } else if (calc == float64_add) {
+        set_fdouble_vexp(&flags, (expa > expb) ? expa : expb);
+        set_fdouble_calc(&flags,
+                         (float64_is_neg(fsrca) == float64_is_neg(fsrcb))
+                             ? TILEGX_F_CALC_ADD : TILEGX_F_CALC_SUB);
+
+    } else if (calc == float64_sub) {
+        set_fdouble_vexp(&flags, (expa > expb) ? expa : expb);
+        set_fdouble_calc(&flags,
+                         (float64_is_neg(fsrca) != float64_is_neg(fsrcb))
+                             ? TILEGX_F_CALC_ADD : TILEGX_F_CALC_SUB);
+
+    } else {
+        set_fdouble_vexp(&flags, (int64_t)(expa - TILEGX_F_EXP_DZERO)
+                                 + (int64_t)(expb - TILEGX_F_EXP_DZERO)
+                                 + TILEGX_F_EXP_DZERO);
+        set_fdouble_calc(&flags, TILEGX_F_CALC_MUL);
+    }
+
+    return flags;
+}
+
+uint64_t helper_fdouble_add_flags(CPUTLGState *env,
+                                  uint64_t srca, uint64_t srcb)
+{
+    return main_calc(&env->fp_status,
+                     make_float64(srca), make_float64(srcb), float64_add);
+}
+
+uint64_t helper_fdouble_sub_flags(CPUTLGState *env,
+                                  uint64_t srca, uint64_t srcb)
+{
+    return main_calc(&env->fp_status,
+                     make_float64(srca), make_float64(srcb), float64_sub);
+}
+
+uint64_t helper_fdouble_mul_flags(CPUTLGState *env,
+                                  uint64_t srca, uint64_t srcb)
+{
+    return main_calc(&env->fp_status,
+                     make_float64(srca), make_float64(srcb), float64_mul);
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation
       [not found] <56698865.8050901@emindsoft.com.cn>
                   ` (2 preceding siblings ...)
  2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double " Chen Gang
@ 2015-12-10 14:16 ` Chen Gang
  2015-12-10 21:37   ` Richard Henderson
  2015-12-10 14:26 ` [Qemu-devel] [PATCH v3 0/4] target-tilegx: Implement floating point instructions Chen Gang
  4 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-10 14:16 UTC (permalink / raw)
  To: rth, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


It passes normal building, and gcc testsuite.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
---
 target-tilegx/Makefile.objs |  3 +-
 target-tilegx/cpu.h         |  2 ++
 target-tilegx/helper.h      | 12 ++++++++
 target-tilegx/translate.c   | 68 +++++++++++++++++++++++++++++++++++++++------
 4 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/target-tilegx/Makefile.objs b/target-tilegx/Makefile.objs
index 0db778f..136ad60 100644
--- a/target-tilegx/Makefile.objs
+++ b/target-tilegx/Makefile.objs
@@ -1 +1,2 @@
-obj-y += cpu.o translate.o helper.o simd_helper.o
+obj-y += cpu.o translate.o helper.o simd_helper.o \
+		helper-fsingle.o helper-fdouble.o
diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
index 03df107..445a606 100644
--- a/target-tilegx/cpu.h
+++ b/target-tilegx/cpu.h
@@ -88,6 +88,8 @@ typedef struct CPUTLGState {
     uint64_t spregs[TILEGX_SPR_COUNT]; /* Special used registers by outside */
     uint64_t pc;                       /* Current pc */
 
+    float_status fp_status;            /* floating point status */
+
 #if defined(CONFIG_USER_ONLY)
     uint64_t excaddr;                  /* exception address */
     uint64_t atomic_srca;              /* Arguments to atomic "exceptions" */
diff --git a/target-tilegx/helper.h b/target-tilegx/helper.h
index 9281d0f..b785bf2 100644
--- a/target-tilegx/helper.h
+++ b/target-tilegx/helper.h
@@ -24,3 +24,15 @@ DEF_HELPER_FLAGS_2(v1shrs, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(v2shl, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(v2shru, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(v2shrs, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+
+DEF_HELPER_3(fsingle_add1, i64, env, i64, i64)
+DEF_HELPER_3(fsingle_sub1, i64, env, i64, i64)
+DEF_HELPER_3(fsingle_mul1, i64, env, i64, i64)
+DEF_HELPER_2(fsingle_pack2, i64, env, i64)
+DEF_HELPER_3(fdouble_unpack_min, i64, env, i64, i64)
+DEF_HELPER_3(fdouble_unpack_max, i64, env, i64, i64)
+DEF_HELPER_3(fdouble_add_flags, i64, env, i64, i64)
+DEF_HELPER_3(fdouble_sub_flags, i64, env, i64, i64)
+DEF_HELPER_4(fdouble_addsub, i64, env, i64, i64, i64)
+DEF_HELPER_3(fdouble_mul_flags, i64, env, i64, i64)
+DEF_HELPER_4(fdouble_pack2, i64, env, i64, i64, i64)
diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
index 354f25a..5c2a98d 100644
--- a/target-tilegx/translate.c
+++ b/target-tilegx/translate.c
@@ -597,6 +597,11 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
         }
         qemu_log_mask(CPU_LOG_TB_IN_ASM, "%s %s", mnemonic, reg_names[srca]);
         return ret;
+
+    case OE_RR_X0(FSINGLE_PACK1):
+    case OE_RR_Y0(FSINGLE_PACK1):
+        mnemonic = "fsingle_pack1";
+        goto done2;
     }
 
     tdest = dest_gr(dc, dest);
@@ -613,9 +618,6 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
         gen_helper_cnttz(tdest, tsrca);
         mnemonic = "cnttz";
         break;
-    case OE_RR_X0(FSINGLE_PACK1):
-    case OE_RR_Y0(FSINGLE_PACK1):
-        return TILEGX_EXCP_OPCODE_UNIMPLEMENTED;
     case OE_RR_X1(LD1S):
         memop = MO_SB;
         mnemonic = "ld1s"; /* prefetch_l1_fault */
@@ -734,6 +736,7 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
         return TILEGX_EXCP_OPCODE_UNKNOWN;
     }
 
+done2:
     qemu_log_mask(CPU_LOG_TB_IN_ASM, "%s %s, %s", mnemonic,
                   reg_names[dest], reg_names[srca]);
     return ret;
@@ -742,13 +745,21 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
 static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
                                unsigned dest, unsigned srca, unsigned srcb)
 {
-    TCGv tdest = dest_gr(dc, dest);
-    TCGv tsrca = load_gr(dc, srca);
-    TCGv tsrcb = load_gr(dc, srcb);
+    TCGv tdest, tsrca, tsrcb;
     TCGv t0;
     const char *mnemonic;
 
     switch (opext) {
+    case OE_RRR(FSINGLE_ADDSUB2, 0, X0):
+        mnemonic = "fsingle_addsub2";
+        goto done2;
+    }
+
+    tdest = dest_gr(dc, dest);
+    tsrca = load_gr(dc, srca);
+    tsrcb = load_gr(dc, srcb);
+
+    switch (opext) {
     case OE_RRR(ADDXSC, 0, X0):
     case OE_RRR(ADDXSC, 0, X1):
         gen_saturate_op(tdest, tsrca, tsrcb, tcg_gen_add_tl);
@@ -906,14 +917,39 @@ static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
         mnemonic = "exch";
         break;
     case OE_RRR(FDOUBLE_ADDSUB, 0, X0):
+        gen_helper_fdouble_addsub(tdest, cpu_env,
+                                  load_gr(dc, dest), tsrca, tsrcb);
+        mnemonic = "fdouble_addsub";
+        break;
     case OE_RRR(FDOUBLE_ADD_FLAGS, 0, X0):
+        gen_helper_fdouble_add_flags(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fdouble_add_flags";
+        break;
     case OE_RRR(FDOUBLE_MUL_FLAGS, 0, X0):
+        gen_helper_fdouble_mul_flags(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fdouble_mul_flags";
+        break;
     case OE_RRR(FDOUBLE_PACK1, 0, X0):
+        tcg_gen_mov_i64(tdest, tsrcb);
+        mnemonic = "fdouble_pack1";
+        break;
     case OE_RRR(FDOUBLE_PACK2, 0, X0):
+        gen_helper_fdouble_pack2(tdest, cpu_env,
+                                 load_gr(dc, dest), tsrca, tsrcb);
+        mnemonic = "fdouble_pack2";
+        break;
     case OE_RRR(FDOUBLE_SUB_FLAGS, 0, X0):
+        gen_helper_fdouble_sub_flags(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fdouble_sub_flags";
+        break;
     case OE_RRR(FDOUBLE_UNPACK_MAX, 0, X0):
+        gen_helper_fdouble_unpack_max(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fdouble_unpack_max";
+        break;
     case OE_RRR(FDOUBLE_UNPACK_MIN, 0, X0):
-        return TILEGX_EXCP_OPCODE_UNIMPLEMENTED;
+        gen_helper_fdouble_unpack_min(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fdouble_unpack_min";
+        break;
     case OE_RRR(FETCHADD4, 0, X1):
         gen_atomic_excp(dc, dest, tdest, tsrca, tsrcb,
                         TILEGX_EXCP_OPCODE_FETCHADD4);
@@ -955,12 +991,25 @@ static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
         mnemonic = "fetchor";
         break;
     case OE_RRR(FSINGLE_ADD1, 0, X0):
-    case OE_RRR(FSINGLE_ADDSUB2, 0, X0):
+        gen_helper_fsingle_add1(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fsingle_add1";
+        break;
     case OE_RRR(FSINGLE_MUL1, 0, X0):
+        gen_helper_fsingle_mul1(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fsingle_mul1";
+        break;
     case OE_RRR(FSINGLE_MUL2, 0, X0):
+        tcg_gen_mov_i64(tdest, tsrca);
+        mnemonic = "fsingle_mul2";
+        break;
     case OE_RRR(FSINGLE_PACK2, 0, X0):
+        gen_helper_fsingle_pack2(tdest, cpu_env, tsrca);
+        mnemonic = "fsingle_pack2";
+        break;
     case OE_RRR(FSINGLE_SUB1, 0, X0):
-        return TILEGX_EXCP_OPCODE_UNIMPLEMENTED;
+        gen_helper_fsingle_sub1(tdest, cpu_env, tsrca, tsrcb);
+        mnemonic = "fsingle_sub1";
+        break;
     case OE_RRR(MNZ, 0, X0):
     case OE_RRR(MNZ, 0, X1):
     case OE_RRR(MNZ, 4, Y0):
@@ -1464,6 +1513,7 @@ static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
         return TILEGX_EXCP_OPCODE_UNKNOWN;
     }
 
+done2:
     qemu_log_mask(CPU_LOG_TB_IN_ASM, "%s %s, %s, %s", mnemonic,
                   reg_names[dest], reg_names[srca], reg_names[srcb]);
     return TILEGX_EXCP_NONE;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] target-tilegx: Implement floating point instructions
       [not found] <56698865.8050901@emindsoft.com.cn>
                   ` (3 preceding siblings ...)
  2015-12-10 14:16 ` [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation Chen Gang
@ 2015-12-10 14:26 ` Chen Gang
  4 siblings, 0 replies; 20+ messages in thread
From: Chen Gang @ 2015-12-10 14:26 UTC (permalink / raw)
  To: rth, Peter Maydell, Chris Metcalf
  Cc: chenwei, qemu-devel, Chen Gang, Chen Gang

Hello all:

After communicated with my company, I am permitted to use my company
email to send patches to open source community.

My company supports what I have done for open source community, but at
present, I still use my personal mail id as Signed-of-by (make and send
patches mainly during my free time).

And thank my company for the support. :-)


Thanks.

On 12/10/15 22:12, Chen Gang wrote:
> 
> These patches are the normal floating point implementation, instead of
> the original temporary one.
> 
> It passes building, and gcc testsuite.
> 
> Chen Gang (4):
>   target-tilegx: Add floating point shared functions
>   target-tilegx: Add single floating point implementation
>   target-tilegx: Add double floating point implementation
>   target-tilegx: Integrate floating pointer implementation
> 
>  target-tilegx/Makefile.objs    |   3 +-
>  target-tilegx/cpu.h            |   2 +
>  target-tilegx/helper-fdouble.c | 400 +++++++++++++++++++++++++++++++++++++++++
>  target-tilegx/helper-fshared.c |  53 ++++++
>  target-tilegx/helper-fsingle.c | 212 ++++++++++++++++++++++
>  target-tilegx/helper.h         |  12 ++
>  target-tilegx/translate.c      |  68 ++++++-
>  7 files changed, 740 insertions(+), 10 deletions(-)
>  create mode 100644 target-tilegx/helper-fdouble.c
>  create mode 100644 target-tilegx/helper-fshared.c
>  create mode 100644 target-tilegx/helper-fsingle.c
> 

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation Chen Gang
@ 2015-12-10 17:15   ` Richard Henderson
  2015-12-10 20:18     ` Richard Henderson
  2015-12-10 22:14     ` Chen Gang
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Henderson @ 2015-12-10 17:15 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/10/2015 06:15 AM, Chen Gang wrote:
> +#define TILEGX_F_CALC_CVT   0     /* convert int to fsingle */
> +#define TILEGX_F_CALC_NCVT  1     /* Not convertion */
> +
> +static uint32_t get_f32_exp(float32 f)
> +{
> +    return extract32(float32_val(f), 23, 8);
> +}
> +
> +static void set_f32_exp(float32 *f, uint32_t exp)
> +{
> +    *f = make_float32(deposit32(float32_val(*f), 23, 8, exp));
> +}

Why take a pointer instead of returning the new value?

> +static inline uint32_t get_fsingle_sign(uint64_t n)
> +{
> +    return test_bit(10, &n);
> +}
> +
> +static inline void set_fsingle_sign(uint64_t *n)
> +{
> +    set_bit(10, n);
> +}

Why are you using test_bit and set_bit here, rather than continuing to use
deposit and extract?

> +static float32 sfmt_to_float32(uint64_t sfmt, float_status *fp_status)
> +{
> +    float32 f;
> +    uint32_t sign = get_fsingle_sign(sfmt);
> +    uint32_t man = get_fsingle_man(sfmt);
> +
> +    if (get_fsingle_calc(sfmt) == TILEGX_F_CALC_CVT) {
> +        if (sign) {
> +            return int32_to_float32(0 - man, fp_status);
> +        } else {
> +            return uint32_to_float32(man, fp_status);
> +        }
> +    } else {
> +        f = float32_set_sign(float32_zero, sign);
> +        f |= create_f32_man(man >> 8);
> +        set_f32_exp(&f, get_fsingle_exp(sfmt));
> +    }

I'm not especially keen on this calc bit.  I'd much rather that we always pack
and round properly.

In particular, if gcc decided to optimize fractional fixed-point types, it
would do something very similar to the current floatsisf2 code sequence, except
that it wouldn't use 0x9e as the exponent; it would use something smaller, so
that some number of low bits of the mantessa would be below the radix point.

Therefore, I think that fsingle_pack2 should do the following: Take the
(sign,exp,man) tuple and slot them into a double -- recall that a single only
has 23 bits in its mantessa, and this temp format has 32 -- then convert the
double to a single.  Pre-rounded single results from fsingle_* will be
unchanged, while integer data that gcc has constructed will be properly rounded.

E.g.

  uint32_t sign = get_fsingle_sign(sfmt);
  uint32_t exp = get_fsingle_exp(sfmt);
  uint32_t man = get_fsingle_man(sfmt);
  uint64_t d;

  /* Adjust the exponent for double precision, preserving Inf/NaN.  */
  if (exp == 0xff) {
    exp = 0x7ff;
  } else {
    exp += 1023 - 127;
  }

  d = (uint64_t)sign << 63;
  d = deposit64(d, 53, 11, exp);
  d = deposit64(d, 21, 32, man);
  return float64_to_float32(d, fp_status);

Note that this does require float32_to_sfmt to store the mantissa
left-justified. That is, not in bits [54-32] as you're doing now, but in bits
[63-41].

> +static void ana_bits(float_status *fp_status,
> +                     float32 fsrca, float32 fsrcb, uint64_t *sfmt)

Is "ana" supposed to be short for "analyze"?

> +{
> +    if (float32_eq(fsrca, fsrcb, fp_status)) {
> +        *sfmt |= create_fsfd_flag_eq();
> +    } else {
> +        *sfmt |= create_fsfd_flag_ne();
> +    }
> +
> +    if (float32_lt(fsrca, fsrcb, fp_status)) {
> +        *sfmt |= create_fsfd_flag_lt();
> +    }
> +    if (float32_le(fsrca, fsrcb, fp_status)) {
> +        *sfmt |= create_fsfd_flag_le();
> +    }
> +
> +    if (float32_lt(fsrcb, fsrca, fp_status)) {
> +        *sfmt |= create_fsfd_flag_gt();
> +    }
> +    if (float32_le(fsrcb, fsrca, fp_status)) {
> +        *sfmt |= create_fsfd_flag_ge();
> +    }
> +
> +    if (float32_unordered(fsrca, fsrcb, fp_status)) {
> +        *sfmt |= create_fsfd_flag_un();
> +    }
> +}

Again, I think it's better to return the new sfmt value than modify a pointer.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-10 17:15   ` Richard Henderson
@ 2015-12-10 20:18     ` Richard Henderson
  2015-12-10 22:15       ` Chen Gang
  2015-12-10 22:14     ` Chen Gang
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-12-10 20:18 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/10/2015 09:15 AM, Richard Henderson wrote:
>   d = (uint64_t)sign << 63;
>   d = deposit64(d, 53, 11, exp);
>   d = deposit64(d, 21, 32, man);
>   return float64_to_float32(d, fp_status);

Hmm.  Actually, this incorrectly adds the implicit bit.  We'd actually need to
steal portions of softfloat.c to do this properly.  Which still isn't that
difficult.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation
  2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double " Chen Gang
@ 2015-12-10 21:17   ` Richard Henderson
  2015-12-11 23:38     ` Chen Gang
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-12-10 21:17 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/10/2015 06:15 AM, Chen Gang wrote:
> +#define TILEGX_F_MAN_HBIT   (1ULL << 59)
...
> +static uint64_t fr_to_man(float64 d)
> +{
> +    uint64_t val = get_f64_man(d) << 7;
> +
> +    if (get_f64_exp(d)) {
> +        val |= TILEGX_F_MAN_HBIT;
> +    }
> +
> +    return val;
> +}

One presumes that "HBIT" is the ieee implicit one bit.
A better name or better comments would help there.

Do we know for sure that "7" is the correct number of guard bits?  From the gcc
implementation of floatsidf, I might guess that the correct number is "4".

> +static uint32_t get_fdouble_vexp(uint64_t n)
> +{
> +    return extract32(n, 7, 13);
> +}

What's a "vexp"?

> +uint64_t helper_fdouble_unpack_min(CPUTLGState *env,
> +                                   uint64_t srca, uint64_t srcb)
> +{
> +    uint64_t v = 0;
> +    uint32_t expa = get_f64_exp(srca);
> +    uint32_t expb = get_f64_exp(srcb);
> +
> +    if (float64_is_any_nan(srca) || float64_is_any_nan(srcb)
> +        || float64_is_infinity(srca) || float64_is_infinity(srcb)) {
> +        return 0;
> +    } else if (expa > expb) {
> +        if (expa - expb < 64) {
> +            set_fdouble_man(&v, fr_to_man(srcb) >> (expa - expb));
> +        } else {
> +            return 0;
> +        }
> +    } else if (expa < expb) {
> +        if (expb - expa < 64) {
> +            set_fdouble_man(&v, fr_to_man(srca) >> (expb - expa));

I very sincerely doubt that a simple right-shift is correct.  In order to
obtain proper rounding for real computation, a sticky bit is required.  That
is, set bit 0 if any bits are shifted out.  See the implementation of
shift64RightJamming in fpu/softfloat-macros.h.

> +uint64_t helper_fdouble_addsub(CPUTLGState *env,
> +                               uint64_t dest, uint64_t srca, uint64_t srcb)
> +{
> +    if (get_fdouble_calc(srcb) == TILEGX_F_CALC_ADD) {
> +        return dest + srca; /* maybe set addsub overflow bit */

Definitely not.  That would be part of packing.

> +/* absolute-add/mul may cause add/mul carry or overflow */
> +static bool proc_oflow(uint64_t *flags, uint64_t *v, uint64_t *srcb)
> +{
> +    if (get_fdouble_man_of(*v)) {
> +        set_fdouble_vexp(flags, get_fdouble_vexp(*flags) + 1);
> +        *srcb >>= 1;
> +        *srcb |= *v << 63;
> +        *v >>= 1;
> +        clear_fdouble_man_of(v);
> +    }
> +    return get_fdouble_vexp(*flags) > TILEGX_F_EXP_DMAX;
> +}
> +
> +uint64_t helper_fdouble_pack2(CPUTLGState *env, uint64_t flags /* dest */,
> +                              uint64_t srca, uint64_t srcb)
> +{
> +    uint64_t v = srca;
> +    float64 d = float64_set_sign(float64_zero, get_fdouble_sign(flags));
> +
> +    /*
> +     * fdouble_add_flags, fdouble_sub_flags, or fdouble_mul_flags have
> +     * processed exceptions. So need not process fp_status, again.
> +     */

No need to process fp_status at all, actually.  Tile-GX (and pro) do not
support exception flags, so everything we do with fp_status is discarded.

Indeed, we should probably not store fp_status in env at all, but create it on
the stack in any function that actually needs one.

> +
> +    if (get_fdouble_nan(flags)) {
> +        return float64_val(float64_default_nan);
> +    } else if (get_fdouble_inf(flags)) {
> +        return float64_val(d |= float64_infinity);

s/|=/|/

> +    /* absolute-mul needs left shift 4 + 1 bytes to match the real mantissa */
> +    if (get_fdouble_calc(flags) == TILEGX_F_CALC_MUL) {
> +        v <<= 5;
> +        v |= srcb >> 59;
> +        srcb <<= 5;
> +    }

As with single, I don't like this calc thing.  We can infer what's required
from principals.

We're given two words containing mantissa, and a "flags" word containing sign,
exponent, and other flags.  For add, sub, and floatsidf, the compiler passes us
0 as the low word; for mul the compiler passes us the result of a 64x64->128
bit multiply.

The first step would be to normalize the 128-bit value so that the highest bit
set is TILEGX_F_MAN_HBIT in the high word, adjusting the exponent in the
process.  Fold the low word into the sticky bit of the high word (high |= (low
!= 0)) for rounding purposes.

The second step would be to round and pack, similar to roundAndPackFloat64,
except that your HBIT is at a different place than softfloat.c.

> +    d = calc(fsrca, fsrcb, fp_status); /* also check exceptions */

There are no exceptions to check.

r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation
  2015-12-10 14:16 ` [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation Chen Gang
@ 2015-12-10 21:37   ` Richard Henderson
  2015-12-10 21:44     ` Chen Gang
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-12-10 21:37 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/10/2015 06:16 AM, Chen Gang wrote:
> 
> It passes normal building, and gcc testsuite.
> 
> Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
> ---
>  target-tilegx/Makefile.objs |  3 +-
>  target-tilegx/cpu.h         |  2 ++
>  target-tilegx/helper.h      | 12 ++++++++
>  target-tilegx/translate.c   | 68 +++++++++++++++++++++++++++++++++++++++------
>  4 files changed, 75 insertions(+), 10 deletions(-)
> 
> diff --git a/target-tilegx/Makefile.objs b/target-tilegx/Makefile.objs
> index 0db778f..136ad60 100644
> --- a/target-tilegx/Makefile.objs
> +++ b/target-tilegx/Makefile.objs
> @@ -1 +1,2 @@
> -obj-y += cpu.o translate.o helper.o simd_helper.o
> +obj-y += cpu.o translate.o helper.o simd_helper.o \
> +		helper-fsingle.o helper-fdouble.o
> diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
> index 03df107..445a606 100644
> --- a/target-tilegx/cpu.h
> +++ b/target-tilegx/cpu.h
> @@ -88,6 +88,8 @@ typedef struct CPUTLGState {
>      uint64_t spregs[TILEGX_SPR_COUNT]; /* Special used registers by outside */
>      uint64_t pc;                       /* Current pc */
>  
> +    float_status fp_status;            /* floating point status */

As mentioned elsewhere, this is pointless.

> +    case OE_RR_X0(FSINGLE_PACK1):
> +    case OE_RR_Y0(FSINGLE_PACK1):
> +        mnemonic = "fsingle_pack1";
> +        goto done2;

This could use a comment that we're "copying" dest to dest.

> @@ -742,13 +745,21 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
>  static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
>                                 unsigned dest, unsigned srca, unsigned srcb)
>  {
> -    TCGv tdest = dest_gr(dc, dest);
> -    TCGv tsrca = load_gr(dc, srca);
> -    TCGv tsrcb = load_gr(dc, srcb);
> +    TCGv tdest, tsrca, tsrcb;
>      TCGv t0;
>      const char *mnemonic;
>  
>      switch (opext) {
> +    case OE_RRR(FSINGLE_ADDSUB2, 0, X0):
> +        mnemonic = "fsingle_addsub2";
> +        goto done2;
> +    }

Likewise.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation
  2015-12-10 21:37   ` Richard Henderson
@ 2015-12-10 21:44     ` Chen Gang
  0 siblings, 0 replies; 20+ messages in thread
From: Chen Gang @ 2015-12-10 21:44 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/11/15 05:37, Richard Henderson wrote:
> On 12/10/2015 06:16 AM, Chen Gang wrote:

[...]

>>
>> diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
>> index 03df107..445a606 100644
>> --- a/target-tilegx/cpu.h
>> +++ b/target-tilegx/cpu.h
>> @@ -88,6 +88,8 @@ typedef struct CPUTLGState {
>>      uint64_t spregs[TILEGX_SPR_COUNT]; /* Special used registers by outside */
>>      uint64_t pc;                       /* Current pc */
>>  
>> +    float_status fp_status;            /* floating point status */
> 
> As mentioned elsewhere, this is pointless.
>

OK, thanks.
 
>> +    case OE_RR_X0(FSINGLE_PACK1):
>> +    case OE_RR_Y0(FSINGLE_PACK1):
>> +        mnemonic = "fsingle_pack1";
>> +        goto done2;
> 
> This could use a comment that we're "copying" dest to dest.
> 

OK, thanks.

>> @@ -742,13 +745,21 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
>>  static TileExcp gen_rrr_opcode(DisasContext *dc, unsigned opext,
>>                                 unsigned dest, unsigned srca, unsigned srcb)
>>  {
>> -    TCGv tdest = dest_gr(dc, dest);
>> -    TCGv tsrca = load_gr(dc, srca);
>> -    TCGv tsrcb = load_gr(dc, srcb);
>> +    TCGv tdest, tsrca, tsrcb;
>>      TCGv t0;
>>      const char *mnemonic;
>>  
>>      switch (opext) {
>> +    case OE_RRR(FSINGLE_ADDSUB2, 0, X0):
>> +        mnemonic = "fsingle_addsub2";
>> +        goto done2;
>> +    }
> 
> Likewise.
> 

Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-10 17:15   ` Richard Henderson
  2015-12-10 20:18     ` Richard Henderson
@ 2015-12-10 22:14     ` Chen Gang
  2015-12-20 15:30       ` Chen Gang
  1 sibling, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-10 22:14 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


On 12/11/15 01:15, Richard Henderson wrote:
> On 12/10/2015 06:15 AM, Chen Gang wrote:
>> +#define TILEGX_F_CALC_CVT   0     /* convert int to fsingle */
>> +#define TILEGX_F_CALC_NCVT  1     /* Not convertion */
>> +
>> +static uint32_t get_f32_exp(float32 f)
>> +{
>> +    return extract32(float32_val(f), 23, 8);
>> +}
>> +
>> +static void set_f32_exp(float32 *f, uint32_t exp)
>> +{
>> +    *f = make_float32(deposit32(float32_val(*f), 23, 8, exp));
>> +}
> 
> Why take a pointer instead of returning the new value?
>

I referenced set_* functions' declarations in "include/fpu/softfloat.h",
originally.

 
>> +static inline uint32_t get_fsingle_sign(uint64_t n)
>> +{
>> +    return test_bit(10, &n);
>> +}
>> +
>> +static inline void set_fsingle_sign(uint64_t *n)
>> +{
>> +    set_bit(10, n);
>> +}
> 
> Why are you using test_bit and set_bit here, rather than continuing to use
> deposit and extract?
> 

It is really only for one bit test and set, so test_bit/set_bit are
simpler and clearer than deposit/extract.


>> +static float32 sfmt_to_float32(uint64_t sfmt, float_status *fp_status)
>> +{
>> +    float32 f;
>> +    uint32_t sign = get_fsingle_sign(sfmt);
>> +    uint32_t man = get_fsingle_man(sfmt);
>> +
>> +    if (get_fsingle_calc(sfmt) == TILEGX_F_CALC_CVT) {
>> +        if (sign) {
>> +            return int32_to_float32(0 - man, fp_status);
>> +        } else {
>> +            return uint32_to_float32(man, fp_status);
>> +        }
>> +    } else {
>> +        f = float32_set_sign(float32_zero, sign);
>> +        f |= create_f32_man(man >> 8);
>> +        set_f32_exp(&f, get_fsingle_exp(sfmt));
>> +    }
> 
> I'm not especially keen on this calc bit.  I'd much rather that we always pack
> and round properly.
>

OK.
 
> In particular, if gcc decided to optimize fractional fixed-point types, it
> would do something very similar to the current floatsisf2 code sequence, except
> that it wouldn't use 0x9e as the exponent; it would use something smaller, so
> that some number of low bits of the mantessa would be below the radix point.
> 

Oh, really.

> Therefore, I think that fsingle_pack2 should do the following: Take the
> (sign,exp,man) tuple and slot them into a double -- recall that a single only
> has 23 bits in its mantessa, and this temp format has 32 -- then convert the
> double to a single.  Pre-rounded single results from fsingle_* will be
> unchanged, while integer data that gcc has constructed will be properly rounded.
> 
> E.g.
> 
>   uint32_t sign = get_fsingle_sign(sfmt);
>   uint32_t exp = get_fsingle_exp(sfmt);
>   uint32_t man = get_fsingle_man(sfmt);
>   uint64_t d;
> 
>   /* Adjust the exponent for double precision, preserving Inf/NaN.  */
>   if (exp == 0xff) {
>     exp = 0x7ff;
>   } else {
>     exp += 1023 - 127;
>   }
> 
>   d = (uint64_t)sign << 63;
>   d = deposit64(d, 53, 11, exp);
>   d = deposit64(d, 21, 32, man);
>   return float64_to_float32(d, fp_status);
> 
> Note that this does require float32_to_sfmt to store the mantissa
> left-justified. That is, not in bits [54-32] as you're doing now, but in bits
> [63-41].
> 

For me, it is a good idea! :-)


>> +static void ana_bits(float_status *fp_status,
>> +                     float32 fsrca, float32 fsrcb, uint64_t *sfmt)
> 
> Is "ana" supposed to be short for "analyze"?
>

Yes.
 
>> +{
>> +    if (float32_eq(fsrca, fsrcb, fp_status)) {
>> +        *sfmt |= create_fsfd_flag_eq();
>> +    } else {
>> +        *sfmt |= create_fsfd_flag_ne();
>> +    }
>> +
>> +    if (float32_lt(fsrca, fsrcb, fp_status)) {
>> +        *sfmt |= create_fsfd_flag_lt();
>> +    }
>> +    if (float32_le(fsrca, fsrcb, fp_status)) {
>> +        *sfmt |= create_fsfd_flag_le();
>> +    }
>> +
>> +    if (float32_lt(fsrcb, fsrca, fp_status)) {
>> +        *sfmt |= create_fsfd_flag_gt();
>> +    }
>> +    if (float32_le(fsrcb, fsrca, fp_status)) {
>> +        *sfmt |= create_fsfd_flag_ge();
>> +    }
>> +
>> +    if (float32_unordered(fsrca, fsrcb, fp_status)) {
>> +        *sfmt |= create_fsfd_flag_un();
>> +    }
>> +}
> 
> Again, I think it's better to return the new sfmt value than modify a pointer.
> 

Oh, I guess, we can inline ana_bits() to main_calc(), for they are both
simple short functions, and ana_bits() is only called by main_calc().

Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-10 20:18     ` Richard Henderson
@ 2015-12-10 22:15       ` Chen Gang
  2015-12-22 22:29         ` Chen Gang
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-10 22:15 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


On 12/11/15 04:18, Richard Henderson wrote:
> On 12/10/2015 09:15 AM, Richard Henderson wrote:
>>   d = (uint64_t)sign << 63;
>>   d = deposit64(d, 53, 11, exp);
>>   d = deposit64(d, 21, 32, man);
>>   return float64_to_float32(d, fp_status);
> 
> Hmm.  Actually, this incorrectly adds the implicit bit.  We'd actually need to
> steal portions of softfloat.c to do this properly.  Which still isn't that
> difficult.
> 

Yes, thanks.

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation
  2015-12-10 21:17   ` Richard Henderson
@ 2015-12-11 23:38     ` Chen Gang
  2015-12-12  0:41       ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-11 23:38 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


On 12/11/15 05:17, Richard Henderson wrote:
> On 12/10/2015 06:15 AM, Chen Gang wrote:
>> +#define TILEGX_F_MAN_HBIT   (1ULL << 59)
> ...
>> +static uint64_t fr_to_man(float64 d)
>> +{
>> +    uint64_t val = get_f64_man(d) << 7;
>> +
>> +    if (get_f64_exp(d)) {
>> +        val |= TILEGX_F_MAN_HBIT;
>> +    }
>> +
>> +    return val;
>> +}
> 
> One presumes that "HBIT" is the ieee implicit one bit.
> A better name or better comments would help there.
> 

OK, thanks. And after think of again, I guess, the real hardware does
not use HBIT internally (use the full 64 bits as mantissa without HBIT).

But what I have done is still OK (use 59 bits + 1 HBIT as mantissa), for
59 bits are enough for double mantissa (52 bits). It makes the overflow
processing easier, but has to process mul operation specially.

It we have to try to match the real hardware have done, I shall rewrite
the related code for mantissa. (I guess, we need to match the real
hardware have done).


> Do we know for sure that "7" is the correct number of guard bits?  From the gcc
> implementation of floatsidf, I might guess that the correct number is "4".
> 

According to floatsidf, it seems "4", but after I expanded the bits, I
guess, it is "7".

/*
 * Double exp analyzing: (0x21b00 << 1) - 0x37(55) = 0x3ff
 *
 *   17  16  15  14  13  12  11  10   9   8   7    6   5   4   3   2   1   0
 *
 *    1   0   0   0   0   1   1   0   1   1   0    0   0   0   0   0   0   0
 *
 *    0   0   0   0   0   1   1   0   1   1   1    => 0x37(55)
 *
 *    0   1   1   1   1   1   1   1   1   1   1    => 0x3ff
 *
 */

I guess, I need restore this comments in helper_fdouble.c.


>> +static uint32_t get_fdouble_vexp(uint64_t n)
>> +{
>> +    return extract32(n, 7, 13);
>> +}
> 
> What's a "vexp"?
> 

It is exp + overflow bit + underflow bit. We can use vexp for internal
calculation, directly, and check uv and ov for the result. I guess the
real hardware will do like this.

The full description of the format is:

typedef union TileGXFPDFmtF {

    struct {
        uint64_t unknown0 : 7;    /* unknown */

        uint64_t vexp : 13;      /* vexp = exp | ov | uv */
#if 0 /* it is only the explanation for vexp above */
        uint64_t exp : 11;        /* exp, 0x21b << 1: 55 + TILEGX_F_EXP_DZERO */
        uint64_t ov : 1;          /* overflow for mul, low priority */
        uint64_t uv : 1;          /* underflow for mul, high priority */
#endif

        uint64_t sign : 1;        /* Sign bit for the total value */

        uint64_t calc: 2;         /* absolute add, sub, or mul */
        uint64_t inf: 1;          /* infinit */
        uint64_t nan: 1;          /* nan */

        /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
        uint64_t unordered : 1;   /* The two are unordered */
        uint64_t lt : 1;          /* 1st is less than 2nd */
        uint64_t le : 1;          /* 1st is less than or equal to 2nd */
        uint64_t gt : 1;          /* 1st is greater than 2nd */
        uint64_t ge : 1;          /* 1st is greater than or equal to 2nd */
        uint64_t eq : 1;          /* The two operands are equal */
        uint64_t neq : 1;         /* The two operands are not equal */

        uint64_t unknown1 : 32;   /* unknown */
    } fmt;
    uint64_t ll;                  /* only for easy using */
} TileGXFPDFmtF;


>> +uint64_t helper_fdouble_unpack_min(CPUTLGState *env,
>> +                                   uint64_t srca, uint64_t srcb)
>> +{
>> +    uint64_t v = 0;
>> +    uint32_t expa = get_f64_exp(srca);
>> +    uint32_t expb = get_f64_exp(srcb);
>> +
>> +    if (float64_is_any_nan(srca) || float64_is_any_nan(srcb)
>> +        || float64_is_infinity(srca) || float64_is_infinity(srcb)) {
>> +        return 0;
>> +    } else if (expa > expb) {
>> +        if (expa - expb < 64) {
>> +            set_fdouble_man(&v, fr_to_man(srcb) >> (expa - expb));
>> +        } else {
>> +            return 0;
>> +        }
>> +    } else if (expa < expb) {
>> +        if (expb - expa < 64) {
>> +            set_fdouble_man(&v, fr_to_man(srca) >> (expb - expa));
> 
> I very sincerely doubt that a simple right-shift is correct.  In order to
> obtain proper rounding for real computation, a sticky bit is required.  That
> is, set bit 0 if any bits are shifted out.  See the implementation of
> shift64RightJamming in fpu/softfloat-macros.h.
> 

Oh, really, thanks.


>> +uint64_t helper_fdouble_addsub(CPUTLGState *env,
>> +                               uint64_t dest, uint64_t srca, uint64_t srcb)
>> +{
>> +    if (get_fdouble_calc(srcb) == TILEGX_F_CALC_ADD) {
>> +        return dest + srca; /* maybe set addsub overflow bit */
> 
> Definitely not.  That would be part of packing.
> 

If we need to try to match the real hardware have done, the related
implementation above is incorrect.

And for my current implementation (I guess, it should be correct):

typedef union TileGXFPDFmtV {
    struct {
        uint64_t mantissa : 60;   /* mantissa */
        uint64_t overflow : 1;    /* carry/overflow bit for absolute add/mul */
        uint64_t unknown1 : 3;    /* unknown */
    } fmt;
    uint64_t ll;                  /* only for easy using */
} TileGXFPDFmtV;


In helper_fdouble_addsub(), both dest and srca are unpacked, so they are
within 60 bits. So one time absolute add are within 61 bits, so let bit
61 as overflow bit is enough.


>> +/* absolute-add/mul may cause add/mul carry or overflow */
>> +static bool proc_oflow(uint64_t *flags, uint64_t *v, uint64_t *srcb)
>> +{
>> +    if (get_fdouble_man_of(*v)) {
>> +        set_fdouble_vexp(flags, get_fdouble_vexp(*flags) + 1);
>> +        *srcb >>= 1;
>> +        *srcb |= *v << 63;
>> +        *v >>= 1;
>> +        clear_fdouble_man_of(v);
>> +    }
>> +    return get_fdouble_vexp(*flags) > TILEGX_F_EXP_DMAX;
>> +}
>> +
>> +uint64_t helper_fdouble_pack2(CPUTLGState *env, uint64_t flags /* dest */,
>> +                              uint64_t srca, uint64_t srcb)
>> +{
>> +    uint64_t v = srca;
>> +    float64 d = float64_set_sign(float64_zero, get_fdouble_sign(flags));
>> +
>> +    /*
>> +     * fdouble_add_flags, fdouble_sub_flags, or fdouble_mul_flags have
>> +     * processed exceptions. So need not process fp_status, again.
>> +     */
> 
> No need to process fp_status at all, actually.  Tile-GX (and pro) do not
> support exception flags, so everything we do with fp_status is discarded.
> 
> Indeed, we should probably not store fp_status in env at all, but create it on
> the stack in any function that actually needs one.
> 

OK, thanks.

>> +
>> +    if (get_fdouble_nan(flags)) {
>> +        return float64_val(float64_default_nan);
>> +    } else if (get_fdouble_inf(flags)) {
>> +        return float64_val(d |= float64_infinity);
> 
> s/|=/|/
> 

OK, thanks.

>> +    /* absolute-mul needs left shift 4 + 1 bytes to match the real mantissa */
>> +    if (get_fdouble_calc(flags) == TILEGX_F_CALC_MUL) {
>> +        v <<= 5;
>> +        v |= srcb >> 59;
>> +        srcb <<= 5;
>> +    }
> 
> As with single, I don't like this calc thing.  We can infer what's required
> from principals.
> 
> We're given two words containing mantissa, and a "flags" word containing sign,
> exponent, and other flags.  For add, sub, and floatsidf, the compiler passes us
> 0 as the low word; for mul the compiler passes us the result of a 64x64->128
> bit multiply.
> 

OK, thanks. It looks, we have to try to match what the hardware have
done.

> The first step would be to normalize the 128-bit value so that the highest bit
> set is TILEGX_F_MAN_HBIT in the high word, adjusting the exponent in the
> process.  Fold the low word into the sticky bit of the high word (high |= (low
> != 0)) for rounding purposes.
> 

OK, thanks. And my original implementation did not consider about the
sticky bit.

> The second step would be to round and pack, similar to roundAndPackFloat64,
> except that your HBIT is at a different place than softfloat.c.
> 

It sounds good (and originally I really considered about it). If we have
an export common function for it, that will be really good.

At present, I use (u)int64_to_float64(), then process exp again.


>> +    d = calc(fsrca, fsrcb, fp_status); /* also check exceptions */
> 
> There are no exceptions to check.
>

OK, thanks. 

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation
  2015-12-11 23:38     ` Chen Gang
@ 2015-12-12  0:41       ` Richard Henderson
  2015-12-12  2:45         ` Chen Gang
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-12-12  0:41 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/11/2015 03:38 PM, Chen Gang wrote:
>
> On 12/11/15 05:17, Richard Henderson wrote:
>> On 12/10/2015 06:15 AM, Chen Gang wrote:
>>> +#define TILEGX_F_MAN_HBIT   (1ULL << 59)
>> ...
>>> +static uint64_t fr_to_man(float64 d)
>>> +{
>>> +    uint64_t val = get_f64_man(d) << 7;
>>> +
>>> +    if (get_f64_exp(d)) {
>>> +        val |= TILEGX_F_MAN_HBIT;
>>> +    }
>>> +
>>> +    return val;
>>> +}
>>
>> One presumes that "HBIT" is the ieee implicit one bit.
>> A better name or better comments would help there.
>>
>
> OK, thanks. And after think of again, I guess, the real hardware does
> not use HBIT internally (use the full 64 bits as mantissa without HBIT).

It must do.  Otherwise the arithmetic doesn't work out.

> But what I have done is still OK (use 59 bits + 1 HBIT as mantissa), for
> 59 bits are enough for double mantissa (52 bits). It makes the overflow
> processing easier, but has to process mul operation specially.

What you have works.  But the mul operation isn't as special as you make it out 
-- aside from requiring at least 104 bits as intermediate -- in that when one 
implements what the hardware does, subtraction also may require significant 
normalization.

> According to floatsidf, it seems "4", but after I expanded the bits, I
> guess, it is "7".
>
> /*
>   * Double exp analyzing: (0x21b00 << 1) - 0x37(55) = 0x3ff
>   *
>   *   17  16  15  14  13  12  11  10   9   8   7    6   5   4   3   2   1   0
>   *
>   *    1   0   0   0   0   1   1   0   1   1   0    0   0   0   0   0   0   0
>   *
>   *    0   0   0   0   0   1   1   0   1   1   1    => 0x37(55)
>   *
>   *    0   1   1   1   1   1   1   1   1   1   1    => 0x3ff
>   *
>   */

That's the exponent within the flags temporary.  It has nothing to do with the 
position of the extracted mantissa.

FWIW, the minimum shift would be 3, in order to properly implement rounding; if 
the hardware uses a shift of 4, that's fine too.

What I would love to know is if the shift present in floatsidf is not really 
required; equally valid to adjust 0x21b00 by 4.  Meaning normalization would do 
a proper job with the entire given mantissa.  This would require better 
documentation, or access to hardware to verify.

>>> +uint64_t helper_fdouble_addsub(CPUTLGState
> And for my current implementation (I guess, it should be correct):
>
> typedef union TileGXFPDFmtV {
>      struct {
>          uint64_t mantissa : 60;   /* mantissa */
>          uint64_t overflow : 1;    /* carry/overflow bit for absolute add/mul */
>          uint64_t unknown1 : 3;    /* unknown */

I personally like to call all 4 of the top bits overflow.  But I have no idea 
what the real hardware actually does.

> In helper_fdouble_addsub(), both dest and srca are unpacked, so they are
> within 60 bits. So one time absolute add are within 61 bits, so let bit
> 61 as overflow bit is enough.

True.  But if all 4 top bits are considered overflow, then one could implement 
floatdidf fairly easily.  But I suspect that real hw doesn't work that way, or 
it would have already been done.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation
  2015-12-12  0:41       ` Richard Henderson
@ 2015-12-12  2:45         ` Chen Gang
  0 siblings, 0 replies; 20+ messages in thread
From: Chen Gang @ 2015-12-12  2:45 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


On 12/12/15 08:41, Richard Henderson wrote:
> On 12/11/2015 03:38 PM, Chen Gang wrote:
>>
>> On 12/11/15 05:17, Richard Henderson wrote:
>>> On 12/10/2015 06:15 AM, Chen Gang wrote:
>>>> +#define TILEGX_F_MAN_HBIT   (1ULL << 59)
>>> ...
>>>> +static uint64_t fr_to_man(float64 d)
>>>> +{
>>>> +    uint64_t val = get_f64_man(d) << 7;
>>>> +
>>>> +    if (get_f64_exp(d)) {
>>>> +        val |= TILEGX_F_MAN_HBIT;
>>>> +    }
>>>> +
>>>> +    return val;
>>>> +}
>>>
>>> One presumes that "HBIT" is the ieee implicit one bit.
>>> A better name or better comments would help there.
>>>
>>
>> OK, thanks. And after think of again, I guess, the real hardware does
>> not use HBIT internally (use the full 64 bits as mantissa without HBIT).
> 
> It must do.  Otherwise the arithmetic doesn't work out.
> 

Oh, yes, and we have to use my original implementation (60 for mantissa,
4 bits for other using).

>> But what I have done is still OK (use 59 bits + 1 HBIT as mantissa), for
>> 59 bits are enough for double mantissa (52 bits). It makes the overflow
>> processing easier, but has to process mul operation specially.
> 
> What you have works.  But the mul operation isn't as special as you make it out -- aside from requiring at least 104 bits as intermediate -- in that when one implements what the hardware does, subtraction also may require significant normalization.
> 

I guess, you misunderstood what I said (my English is not quite well).

For mul, at least, it needs (104 - 1) bits, At present, we have 120 bits
for it (in fact, our mul generates 119 bits result). So it is enough.


>> According to floatsidf, it seems "4", but after I expanded the bits, I
>> guess, it is "7".
>>
>> /*
>>   * Double exp analyzing: (0x21b00 << 1) - 0x37(55) = 0x3ff
>>   *
>>   *   17  16  15  14  13  12  11  10   9   8   7    6   5   4   3   2   1   0
>>   *
>>   *    1   0   0   0   0   1   1   0   1   1   0    0   0   0   0   0   0   0
>>   *
>>   *    0   0   0   0   0   1   1   0   1   1   1    => 0x37(55)
>>   *
>>   *    0   1   1   1   1   1   1   1   1   1   1    => 0x3ff
>>   *
>>   */
> 
> That's the exponent within the flags temporary.  It has nothing to do with the position of the extracted mantissa.
> 

0x37(55) + 4 (guard bits) + 1 (HBIT) = 60 bits.

So, if the above is correct, the mantissa is 60 bits (with HBIT), and
bit 18 in flags for overflow, bit 19 for underflow (bit 20 must be for
sign).

> FWIW, the minimum shift would be 3, in order to properly implement rounding; if the hardware uses a shift of 4, that's fine too.
> 

I guess, so it uses 4 guard bits.

> What I would love to know is if the shift present in floatsidf is not really required; equally valid to adjust 0x21b00 by 4.  Meaning normalization would do a proper job with the entire given mantissa.  This would require better documentation, or access to hardware to verify.
> 

I guess, before call any fdouble insns, we can use the low 4 bits as
mantissa (e.g. calc mul), but when call any fdouble insn, we can not use
the lower 4 guard bits, so floatsidf has to shift 4 bits left.

>>>> +uint64_t helper_fdouble_addsub(CPUTLGState
>> And for my current implementation (I guess, it should be correct):
>>
>> typedef union TileGXFPDFmtV {
>>      struct {
>>          uint64_t mantissa : 60;   /* mantissa */
>>          uint64_t overflow : 1;    /* carry/overflow bit for absolute add/mul */
>>          uint64_t unknown1 : 3;    /* unknown */
> 
> I personally like to call all 4 of the top bits overflow.  But I have no idea what the real hardware actually does.
> 
>> In helper_fdouble_addsub(), both dest and srca are unpacked, so they are
>> within 60 bits. So one time absolute add are within 61 bits, so let bit
>> 61 as overflow bit is enough.
> 
> True.  But if all 4 top bits are considered overflow, then one could implement floatdidf fairly easily.  But I suspect that real hw doesn't work that way, or it would have already been done.
> 

So, I only assumed bit 60 is for overflow, the high 3 bits are unknown.

For me, if one bit for overflow is enough, the hardware will save the
other bits for another using (or are reserved for future).


Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-10 22:14     ` Chen Gang
@ 2015-12-20 15:30       ` Chen Gang
  2015-12-21 15:01         ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-20 15:30 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


After tried, I guess, this way below is incorrect: float64_to_float32()
assumes the input 'd' is already a standard (packed) float64 variable.
But in fact, it is not (e.g. the input from floatsisf2).

And we have to still check TILEGX_F_CALC_CVT, for they are really two
different format: TILEGX_F_CALC_CVT has no HBIT, but TILEGX_F_CALC_NCVT
has HBIT (which we need process it specially).

For me, the way like helper_fdouble_pack2 (the double implementation) is
OK to TILEGX_F_CALC_NCVT format, too.

 - Shift left to get HBIT, and change the related vexp (use vexp instead
   of exp to process overflow cases -- like double implementation does).

 - Use (u)int32_to_float32 for the mantissa.

 - Then process exp again.


Thanks.

On 12/11/15 06:14, Chen Gang wrote:
>> In particular, if gcc decided to optimize fractional fixed-point types, it
>> > would do something very similar to the current floatsisf2 code sequence, except
>> > that it wouldn't use 0x9e as the exponent; it would use something smaller, so
>> > that some number of low bits of the mantessa would be below the radix point.
>> > 
> Oh, really.
> 
>> > Therefore, I think that fsingle_pack2 should do the following: Take the
>> > (sign,exp,man) tuple and slot them into a double -- recall that a single only
>> > has 23 bits in its mantessa, and this temp format has 32 -- then convert the
>> > double to a single.  Pre-rounded single results from fsingle_* will be
>> > unchanged, while integer data that gcc has constructed will be properly rounded.
>> > 
>> > E.g.
>> > 
>> >   uint32_t sign = get_fsingle_sign(sfmt);
>> >   uint32_t exp = get_fsingle_exp(sfmt);
>> >   uint32_t man = get_fsingle_man(sfmt);
>> >   uint64_t d;
>> > 
>> >   /* Adjust the exponent for double precision, preserving Inf/NaN.  */
>> >   if (exp == 0xff) {
>> >     exp = 0x7ff;
>> >   } else {
>> >     exp += 1023 - 127;
>> >   }
>> > 
>> >   d = (uint64_t)sign << 63;
>> >   d = deposit64(d, 53, 11, exp);
>> >   d = deposit64(d, 21, 32, man);
>> >   return float64_to_float32(d, fp_status);
>> > 
>> > Note that this does require float32_to_sfmt to store the mantissa
>> > left-justified. That is, not in bits [54-32] as you're doing now, but in bits
>> > [63-41].
>> > 
> For me, it is a good idea! :-)
> 
> 

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-20 15:30       ` Chen Gang
@ 2015-12-21 15:01         ` Richard Henderson
  2015-12-21 18:54           ` Chen Gang
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-12-21 15:01 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/20/2015 07:30 AM, Chen Gang wrote:
> And we have to still check TILEGX_F_CALC_CVT, for they are really two
> different format: TILEGX_F_CALC_CVT has no HBIT, but TILEGX_F_CALC_NCVT
> has HBIT (which we need process it specially).

The both do, in that you re-normalize to produce that HBIT.
That's the whole point.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-21 15:01         ` Richard Henderson
@ 2015-12-21 18:54           ` Chen Gang
  2015-12-22  2:00             ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Gang @ 2015-12-21 18:54 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


On 12/21/15 23:01, Richard Henderson wrote:
> On 12/20/2015 07:30 AM, Chen Gang wrote:
>> And we have to still check TILEGX_F_CALC_CVT, for they are really two
>> different format: TILEGX_F_CALC_CVT has no HBIT, but TILEGX_F_CALC_NCVT
>> has HBIT (which we need process it specially).
> 
> The both do, in that you re-normalize to produce that HBIT.
> That's the whole point.
> 

Oh, yes.

But all together, we want to normalize the float value in fsingle_pack2,
so we can not use float64_to_float32(), it assumes the input is already
normalized (if we can let the input normalized, we will return directly).

Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-21 18:54           ` Chen Gang
@ 2015-12-22  2:00             ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-12-22  2:00 UTC (permalink / raw)
  To: Chen Gang, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel

On 12/21/2015 10:54 AM, Chen Gang wrote:
>> The both do, in that you re-normalize to produce that HBIT.
>> That's the whole point.
>>
>
> Oh, yes.
>
> But all together, we want to normalize the float value in fsingle_pack2,
> so we can not use float64_to_float32()...

Of course not.  I told you that you couldn't.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation
  2015-12-10 22:15       ` Chen Gang
@ 2015-12-22 22:29         ` Chen Gang
  0 siblings, 0 replies; 20+ messages in thread
From: Chen Gang @ 2015-12-22 22:29 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell, Chris Metcalf; +Cc: chenwei, qemu-devel


On 12/11/15 06:15, Chen Gang wrote:
> 
> On 12/11/15 04:18, Richard Henderson wrote:
>> On 12/10/2015 09:15 AM, Richard Henderson wrote:
>>>   d = (uint64_t)sign << 63;
>>>   d = deposit64(d, 53, 11, exp);
>>>   d = deposit64(d, 21, 32, man);
>>>   return float64_to_float32(d, fp_status);
>>
>> Hmm.  Actually, this incorrectly adds the implicit bit.  We'd actually need to
>> steal portions of softfloat.c to do this properly.  Which still isn't that
>> difficult.
>>

Oh, sorry, I misunderstood this reply.

> 
> Yes, thanks.
> 

Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-12-22 22:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <56698865.8050901@emindsoft.com.cn>
2015-12-10 14:13 ` [Qemu-devel] [PATCH v3 1/4] target-tilegx: Add floating point shared functions Chen Gang
2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 2/4] target-tilegx: Add single floating point implementation Chen Gang
2015-12-10 17:15   ` Richard Henderson
2015-12-10 20:18     ` Richard Henderson
2015-12-10 22:15       ` Chen Gang
2015-12-22 22:29         ` Chen Gang
2015-12-10 22:14     ` Chen Gang
2015-12-20 15:30       ` Chen Gang
2015-12-21 15:01         ` Richard Henderson
2015-12-21 18:54           ` Chen Gang
2015-12-22  2:00             ` Richard Henderson
2015-12-10 14:15 ` [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double " Chen Gang
2015-12-10 21:17   ` Richard Henderson
2015-12-11 23:38     ` Chen Gang
2015-12-12  0:41       ` Richard Henderson
2015-12-12  2:45         ` Chen Gang
2015-12-10 14:16 ` [Qemu-devel] [PATCH v3 4/4] target-tilegx: Integrate floating pointer implementation Chen Gang
2015-12-10 21:37   ` Richard Henderson
2015-12-10 21:44     ` Chen Gang
2015-12-10 14:26 ` [Qemu-devel] [PATCH v3 0/4] target-tilegx: Implement floating point instructions Chen Gang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.