[Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements.
@ 2010-05-10 22:23 Richard Henderson
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-10 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

The first patch is required in order to fix TCGv_i32/_i64 type errors.

The second patch fixes some mistakes I noticed with ADDX carry generation.

The third patch improves code generation for some common cases.  With
Aurelien's tcg-optimization patches we get nearly optimal code, and
it isn't half bad with the TCG optimizer as-is.

r~

Richard Henderson (3):
  target-sparc: Fix compilation with --enable-debug.
  target-sparc: Simplify ICC generation; fix ADDX carry generation.
  target-sparc: Inline some generation of carry for ADDX/SUBX.

 target-sparc/op_helper.c |  106 ++++++++++++-------
 target-sparc/translate.c |  268 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 263 insertions(+), 111 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug.
  2010-05-10 22:23 [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements Richard Henderson
@ 2010-05-10 22:23 ` Richard Henderson
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation Richard Henderson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-10 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Return a target_ulong from compute_C_icc to match the width of the users.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/helper.h    |    2 +-
 target-sparc/op_helper.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 6f103e7..04c1306 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -158,6 +158,6 @@ VIS_CMPHELPER(cmpne);
 #undef VIS_HELPER
 #undef VIS_CMPHELPER
 DEF_HELPER_0(compute_psr, void);
-DEF_HELPER_0(compute_C_icc, i32);
+DEF_HELPER_0(compute_C_icc, tl);
 
 #include "def-helper.h"
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index fcfd3f3..09449c5 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -1282,7 +1282,7 @@ void helper_compute_psr(void)
     CC_OP = CC_OP_FLAGS;
 }
 
-uint32_t helper_compute_C_icc(void)
+target_ulong helper_compute_C_icc(void)
 {
     uint32_t ret;
 
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-10 22:23 [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements Richard Henderson
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
@ 2010-05-10 22:23 ` Richard Henderson
  2010-05-11 19:14   ` [Qemu-devel] " Blue Swirl
  2010-05-11 21:31   ` [Qemu-devel] " Artyom Tarasenko
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
  2010-05-12 18:04 ` [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2 Richard Henderson
  3 siblings, 2 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-10 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Use int32 types instead of target_ulong when computing ICC.  This
simplifies the generated code for 32-bit host and 64-bit guest.
Use the same simplified expressions for ICC as were already used
for XCC in carry flag generation.

ADDX ICC carry generation was using the same routines as ADD ICC,
which is incorrect if the input carry bit produces the output carry.
Use the algorithms already in place for ADDX XCC carry generation.
Similarly for SUBX.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/op_helper.c |  106 ++++++++++++++++++++++++++++++----------------
 1 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index 09449c5..c36bc54 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -896,13 +896,13 @@ static uint32_t compute_C_flags(void)
     return env->psr & PSR_CARRY;
 }
 
-static inline uint32_t get_NZ_icc(target_ulong dst)
+static inline uint32_t get_NZ_icc(int32_t dst)
 {
     uint32_t ret = 0;
 
-    if (!(dst & 0xffffffffULL))
+    if (dst == 0)
         ret |= PSR_ZERO;
-    if ((int32_t) (dst & 0xffffffffULL) < 0)
+    if (dst < 0)
         ret |= PSR_NEG;
     return ret;
 }
@@ -918,13 +918,13 @@ static uint32_t compute_C_flags_xcc(void)
     return env->xcc & PSR_CARRY;
 }
 
-static inline uint32_t get_NZ_xcc(target_ulong dst)
+static inline uint32_t get_NZ_xcc(target_long dst)
 {
     uint32_t ret = 0;
 
     if (!dst)
         ret |= PSR_ZERO;
-    if ((int64_t)dst < 0)
+    if (dst < 0)
         ret |= PSR_NEG;
     return ret;
 }
@@ -953,25 +953,21 @@ static uint32_t compute_C_div(void)
     return 0;
 }
 
-/* carry = (src1[31] & src2[31]) | ( ~dst[31] & (src1[31] | src2[31])) */
-static inline uint32_t get_C_add_icc(target_ulong dst, target_ulong src1,
-                                     target_ulong src2)
+static inline uint32_t get_C_add_icc(uint32_t dst, uint32_t src1)
 {
     uint32_t ret = 0;
 
-    if (((src1 & (1ULL << 31)) & (src2 & (1ULL << 31)))
-        | ((~(dst & (1ULL << 31)))
-           & ((src1 & (1ULL << 31)) | (src2 & (1ULL << 31)))))
+    if (dst < src1)
         ret |= PSR_CARRY;
     return ret;
 }
 
-static inline uint32_t get_V_add_icc(target_ulong dst, target_ulong src1,
-                                         target_ulong src2)
+static inline uint32_t get_V_add_icc(uint32_t dst, uint32_t src1,
+                                     uint32_t src2)
 {
     uint32_t ret = 0;
 
-    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 31))
+    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1U << 31))
         ret |= PSR_OVF;
     return ret;
 }
@@ -1017,14 +1013,14 @@ static uint32_t compute_all_add(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
 
 static uint32_t compute_C_add(void)
 {
-    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_add_icc(CC_DST, CC_SRC);
 }
 
 #ifdef TARGET_SPARC64
@@ -1049,6 +1045,26 @@ static uint32_t compute_C_addx_xcc(void)
 }
 #endif
 
+static uint32_t compute_all_addx(void)
+{
+    uint32_t ret;
+
+    ret = get_NZ_icc(CC_DST);
+    ret |= get_C_add_icc(CC_DST - CC_SRC2, CC_SRC);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
+    ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    return ret;
+}
+
+static uint32_t compute_C_addx(void)
+{
+    uint32_t ret;
+
+    ret = get_C_add_icc(CC_DST - CC_SRC2, CC_SRC);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
+    return ret;
+}
+
 static inline uint32_t get_V_tag_icc(target_ulong src1, target_ulong src2)
 {
     uint32_t ret = 0;
@@ -1063,7 +1079,7 @@ static uint32_t compute_all_tadd(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
     ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
     return ret;
@@ -1071,7 +1087,7 @@ static uint32_t compute_all_tadd(void)
 
 static uint32_t compute_C_tadd(void)
 {
-    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_add_icc(CC_DST, CC_SRC);
 }
 
 static uint32_t compute_all_taddtv(void)
@@ -1079,34 +1095,30 @@ static uint32_t compute_all_taddtv(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
     return ret;
 }
 
 static uint32_t compute_C_taddtv(void)
 {
-    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_add_icc(CC_DST, CC_SRC);
 }
 
-/* carry = (~src1[31] & src2[31]) | ( dst[31]  & (~src1[31] | src2[31])) */
-static inline uint32_t get_C_sub_icc(target_ulong dst, target_ulong src1,
-                                     target_ulong src2)
+static inline uint32_t get_C_sub_icc(uint32_t src1, uint32_t src2)
 {
     uint32_t ret = 0;
 
-    if (((~(src1 & (1ULL << 31))) & (src2 & (1ULL << 31)))
-        | ((dst & (1ULL << 31)) & (( ~(src1 & (1ULL << 31)))
-                                   | (src2 & (1ULL << 31)))))
+    if (src1 < src2)
         ret |= PSR_CARRY;
     return ret;
 }
 
-static inline uint32_t get_V_sub_icc(target_ulong dst, target_ulong src1,
-                                     target_ulong src2)
+static inline uint32_t get_V_sub_icc(uint32_t dst, uint32_t src1,
+                                     uint32_t src2)
 {
     uint32_t ret = 0;
 
-    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 31))
+    if (((src1 ^ src2) & (src1 ^ dst)) & (1U << 31))
         ret |= PSR_OVF;
     return ret;
 }
@@ -1153,14 +1165,14 @@ static uint32_t compute_all_sub(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
 
 static uint32_t compute_C_sub(void)
 {
-    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_sub_icc(CC_SRC, CC_SRC2);
 }
 
 #ifdef TARGET_SPARC64
@@ -1185,12 +1197,32 @@ static uint32_t compute_C_subx_xcc(void)
 }
 #endif
 
+static uint32_t compute_all_subx(void)
+{
+    uint32_t ret;
+
+    ret = get_NZ_icc(CC_DST);
+    ret |= get_C_sub_icc(CC_DST - CC_SRC2, CC_SRC);
+    ret |= get_C_sub_icc(CC_DST, CC_SRC2);
+    ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    return ret;
+}
+
+static uint32_t compute_C_subx(void)
+{
+    uint32_t ret;
+
+    ret = get_C_sub_icc(CC_DST - CC_SRC2, CC_SRC);
+    ret |= get_C_sub_icc(CC_DST, CC_SRC2);
+    return ret;
+}
+
 static uint32_t compute_all_tsub(void)
 {
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
     ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
     return ret;
@@ -1198,7 +1230,7 @@ static uint32_t compute_all_tsub(void)
 
 static uint32_t compute_C_tsub(void)
 {
-    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_sub_icc(CC_SRC, CC_SRC2);
 }
 
 static uint32_t compute_all_tsubtv(void)
@@ -1206,13 +1238,13 @@ static uint32_t compute_all_tsubtv(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
     return ret;
 }
 
 static uint32_t compute_C_tsubtv(void)
 {
-    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_sub_icc(CC_SRC, CC_SRC2);
 }
 
 static uint32_t compute_all_logic(void)
@@ -1242,11 +1274,11 @@ static const CCTable icc_table[CC_OP_NB] = {
     [CC_OP_FLAGS] = { compute_all_flags, compute_C_flags },
     [CC_OP_DIV] = { compute_all_div, compute_C_div },
     [CC_OP_ADD] = { compute_all_add, compute_C_add },
-    [CC_OP_ADDX] = { compute_all_add, compute_C_add },
+    [CC_OP_ADDX] = { compute_all_addx, compute_C_addx },
     [CC_OP_TADD] = { compute_all_tadd, compute_C_tadd },
     [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_taddtv },
     [CC_OP_SUB] = { compute_all_sub, compute_C_sub },
-    [CC_OP_SUBX] = { compute_all_sub, compute_C_sub },
+    [CC_OP_SUBX] = { compute_all_subx, compute_C_subx },
     [CC_OP_TSUB] = { compute_all_tsub, compute_C_tsub },
     [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_tsubtv },
     [CC_OP_LOGIC] = { compute_all_logic, compute_C_logic },
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX.
  2010-05-10 22:23 [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements Richard Henderson
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation Richard Henderson
@ 2010-05-10 22:23 ` Richard Henderson
  2010-05-11 19:28   ` [Qemu-devel] " Blue Swirl
  2010-05-12 18:04 ` [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2 Richard Henderson
  3 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2010-05-10 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Computing carry is trivial for some inputs.  By avoiding an
external function call, we generate near-optimal code for
the common cases of add+addx (double-word arithmetic) and
cmp+addx (a setcc pattern).

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/helper.h    |    2 +-
 target-sparc/op_helper.c |    2 +-
 target-sparc/translate.c |  268 +++++++++++++++++++++++++++++++++-------------
 3 files changed, 196 insertions(+), 76 deletions(-)

diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 04c1306..6f103e7 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -158,6 +158,6 @@ VIS_CMPHELPER(cmpne);
 #undef VIS_HELPER
 #undef VIS_CMPHELPER
 DEF_HELPER_0(compute_psr, void);
-DEF_HELPER_0(compute_C_icc, tl);
+DEF_HELPER_0(compute_C_icc, i32);
 
 #include "def-helper.h"
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index c36bc54..3d6177b 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -1314,7 +1314,7 @@ void helper_compute_psr(void)
     CC_OP = CC_OP_FLAGS;
 }
 
-target_ulong helper_compute_C_icc(void)
+uint32_t helper_compute_C_icc(void)
 {
     uint32_t ret;
 
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index ea7c71b..06f0f34 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -332,24 +332,130 @@ static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
     tcg_gen_mov_tl(dst, cpu_cc_dst);
 }
 
-static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
+static TCGv_i32 gen_add32_carry32(void)
 {
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_movi_tl(cpu_cc_src2, src2);
-    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
+    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
+
+    /* Carry is computed from a previous add: (dst < src)  */
+#if TARGET_LONG_BITS == 64
+    cc_src1_32 = tcg_temp_new_i32();
+    cc_src2_32 = tcg_temp_new_i32();
+    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_dst);
+    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src);
+#else
+    cc_src1_32 = cpu_cc_dst;
+    cc_src2_32 = cpu_cc_src;
+#endif
+
+    carry_32 = tcg_temp_new_i32();
+    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
+
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free_i32(cc_src1_32);
+    tcg_temp_free_i32(cc_src2_32);
+#endif
+
+    return carry_32;
 }
 
-static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
+static TCGv_i32 gen_sub32_carry32(void)
 {
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_mov_tl(cpu_cc_src2, src2);
-    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
+    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
+
+    /* Carry is computed from a previous borrow: (src1 < src2)  */
+#if TARGET_LONG_BITS == 64
+    cc_src1_32 = tcg_temp_new_i32();
+    cc_src2_32 = tcg_temp_new_i32();
+    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_src);
+    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src2);
+#else
+    cc_src1_32 = cpu_cc_src;
+    cc_src2_32 = cpu_cc_src2;
+#endif
+
+    carry_32 = tcg_temp_new_i32();
+    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
+
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free_i32(cc_src1_32);
+    tcg_temp_free_i32(cc_src2_32);
+#endif
+
+    return carry_32;
+}
+
+static void gen_op_addx_int(DisasContext *dc, TCGv dst, TCGv src1,
+                            TCGv src2, int update_cc)
+{
+    TCGv_i32 carry_32;
+    TCGv carry;
+
+    switch (dc->cc_op) {
+    case CC_OP_DIV:
+    case CC_OP_LOGIC:
+        /* Carry is known to be zero.  Fall back to plain ADD.  */
+        if (update_cc) {
+            gen_op_add_cc(dst, src1, src2);
+        } else {
+            tcg_gen_add_tl(dst, src1, src2);
+        }
+        return;
+
+    case CC_OP_ADD:
+    case CC_OP_TADD:
+    case CC_OP_TADDTV:
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+        {
+            /* For 32-bit hosts, we can re-use the host's hardware carry
+               generation by using an ADD2 opcode.  We discard the low
+               part of the output.  Ideally we'd combine this operation
+               with the add that generated the carry in the first place.  */
+            TCGv dst_low = tcg_temp_new();
+            tcg_gen_op6_i32(INDEX_op_add2_i32, dst_low, dst, 
+                            cpu_cc_src, src1, cpu_cc_src2, src2);
+            tcg_temp_free(dst_low);
+            goto add_done;
+        }
+#endif
+        carry_32 = gen_add32_carry32();
+        break;
+
+    case CC_OP_SUB:
+    case CC_OP_TSUB:
+    case CC_OP_TSUBTV:
+        carry_32 = gen_sub32_carry32();
+        break;
+
+    default:
+        /* We need external help to produce the carry.  */
+        carry_32 = tcg_temp_new_i32();
+        gen_helper_compute_C_icc(carry_32);
+        break;
+    }
+
+#if TARGET_LONG_BITS == 64
+    carry = tcg_temp_new();
+    tcg_gen_extu_i32_i64(carry, carry_32);
+#else
+    carry = carry_32;
+#endif
+
+    tcg_gen_add_tl(dst, src1, src2);
+    tcg_gen_add_tl(dst, dst, carry);
+
+    tcg_temp_free_i32(carry_32);
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free(carry);
+#endif
+
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+ add_done:
+#endif
+    if (update_cc) {
+        tcg_gen_mov_tl(cpu_cc_src, src1);
+        tcg_gen_mov_tl(cpu_cc_src2, src2);
+        tcg_gen_mov_tl(cpu_cc_dst, dst);
+    }
 }
 
 static inline void gen_op_tadd_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -415,24 +521,78 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2)
     tcg_gen_mov_tl(dst, cpu_cc_dst);
 }
 
-static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
+static void gen_op_subx_int(DisasContext *dc, TCGv dst, TCGv src1,
+                            TCGv src2, int update_cc)
 {
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_movi_tl(cpu_cc_src2, src2);
-    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
-}
+    TCGv_i32 carry_32;
+    TCGv carry;
 
-static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
-{
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_mov_tl(cpu_cc_src2, src2);
-    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
+    switch (dc->cc_op) {
+    case CC_OP_DIV:
+    case CC_OP_LOGIC:
+        /* Carry is known to be zero.  Fall back to plain SUB.  */
+        if (update_cc) {
+            gen_op_sub_cc(dst, src1, src2);
+        } else {
+            tcg_gen_sub_tl(dst, src1, src2);
+        }
+        return;
+
+    case CC_OP_ADD:
+    case CC_OP_TADD:
+    case CC_OP_TADDTV:
+        carry_32 = gen_add32_carry32();
+        break;
+
+    case CC_OP_SUB:
+    case CC_OP_TSUB:
+    case CC_OP_TSUBTV:
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+        {
+            /* For 32-bit hosts, we can re-use the host's hardware carry
+               generation by using a SUB2 opcode.  We discard the low
+               part of the output.  Ideally we'd combine this operation
+               with the add that generated the carry in the first place.  */
+            TCGv dst_low = tcg_temp_new();
+            tcg_gen_op6_i32(INDEX_op_sub2_i32, dst_low, dst, 
+                            cpu_cc_src, src1, cpu_cc_src2, src2);
+            tcg_temp_free(dst_low);
+            goto sub_done;
+        }
+#endif
+        carry_32 = gen_sub32_carry32();
+        break;
+
+    default:
+        /* We need external help to produce the carry.  */
+        carry_32 = tcg_temp_new_i32();
+        gen_helper_compute_C_icc(carry_32);
+        break;
+    }
+
+#if TARGET_LONG_BITS == 64
+    carry = tcg_temp_new();
+    tcg_gen_extu_i32_i64(carry, carry_32);
+#else
+    carry = carry_32;
+#endif
+
+    tcg_gen_sub_tl(dst, src1, src2);
+    tcg_gen_sub_tl(dst, dst, carry);
+
+    tcg_temp_free_i32(carry_32);
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free(carry);
+#endif
+
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+ sub_done:
+#endif
+    if (update_cc) {
+        tcg_gen_mov_tl(cpu_cc_src, src1);
+        tcg_gen_mov_tl(cpu_cc_src2, src2);
+        tcg_gen_mov_tl(cpu_cc_dst, dst);
+    }
 }
 
 static inline void gen_op_tsub_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -2950,28 +3110,8 @@ static void disas_sparc_insn(DisasContext * dc)
                         }
                         break;
                     case 0x8: /* addx, V9 addc */
-                        if (IS_IMM) {
-                            simm = GET_FIELDs(insn, 19, 31);
-                            if (xop & 0x10) {
-                                gen_op_addxi_cc(cpu_dst, cpu_src1, simm);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
-                                dc->cc_op = CC_OP_ADDX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
-                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        } else {
-                            if (xop & 0x10) {
-                                gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
-                                dc->cc_op = CC_OP_ADDX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
-                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        }
+                        gen_op_addx_int(dc, cpu_dst, cpu_src1, cpu_src2,
+                                        (xop & 0x10));
                         break;
 #ifdef TARGET_SPARC64
                     case 0x9: /* V9 mulx */
@@ -3002,28 +3142,8 @@ static void disas_sparc_insn(DisasContext * dc)
                         }
                         break;
                     case 0xc: /* subx, V9 subc */
-                        if (IS_IMM) {
-                            simm = GET_FIELDs(insn, 19, 31);
-                            if (xop & 0x10) {
-                                gen_op_subxi_cc(cpu_dst, cpu_src1, simm);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
-                                dc->cc_op = CC_OP_SUBX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
-                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        } else {
-                            if (xop & 0x10) {
-                                gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
-                                dc->cc_op = CC_OP_SUBX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
-                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        }
+                        gen_op_subx_int(dc, cpu_dst, cpu_src1, cpu_src2,
+                                        (xop & 0x10));
                         break;
 #ifdef TARGET_SPARC64
                     case 0xd: /* V9 udivx */
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation Richard Henderson
@ 2010-05-11 19:14   ` Blue Swirl
  2010-05-11 21:31   ` [Qemu-devel] " Artyom Tarasenko
  1 sibling, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2010-05-11 19:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 5/11/10, Richard Henderson <rth@twiddle.net> wrote:
> Use int32 types instead of target_ulong when computing ICC.  This
>  simplifies the generated code for 32-bit host and 64-bit guest.
>  Use the same simplified expressions for ICC as were already used
>  for XCC in carry flag generation.
>
>  ADDX ICC carry generation was using the same routines as ADD ICC,
>  which is incorrect if the input carry bit produces the output carry.
>  Use the algorithms already in place for ADDX XCC carry generation.
>  Similarly for SUBX.
>
>  Signed-off-by: Richard Henderson <rth@twiddle.net>
>  ---
>   target-sparc/op_helper.c |  106 ++++++++++++++++++++++++++++++----------------
>   1 files changed, 69 insertions(+), 37 deletions(-)
>
>  diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>  index 09449c5..c36bc54 100644
>  --- a/target-sparc/op_helper.c
>  +++ b/target-sparc/op_helper.c
>  @@ -896,13 +896,13 @@ static uint32_t compute_C_flags(void)
>      return env->psr & PSR_CARRY;
>   }
>
>  -static inline uint32_t get_NZ_icc(target_ulong dst)
>  +static inline uint32_t get_NZ_icc(int32_t dst)
>   {
>      uint32_t ret = 0;
>
>  -    if (!(dst & 0xffffffffULL))
>  +    if (dst == 0)

Could you add the braces to fix CODING_STYLE while at it?

>          ret |= PSR_ZERO;
>  -    if ((int32_t) (dst & 0xffffffffULL) < 0)
>  +    if (dst < 0)
>          ret |= PSR_NEG;
>      return ret;
>   }
>  @@ -918,13 +918,13 @@ static uint32_t compute_C_flags_xcc(void)
>      return env->xcc & PSR_CARRY;
>   }
>
>  -static inline uint32_t get_NZ_xcc(target_ulong dst)
>  +static inline uint32_t get_NZ_xcc(target_long dst)
>   {
>      uint32_t ret = 0;
>
>      if (!dst)
>          ret |= PSR_ZERO;
>  -    if ((int64_t)dst < 0)
>  +    if (dst < 0)
>          ret |= PSR_NEG;
>      return ret;
>   }
>  @@ -953,25 +953,21 @@ static uint32_t compute_C_div(void)
>      return 0;
>   }
>
>  -/* carry = (src1[31] & src2[31]) | ( ~dst[31] & (src1[31] | src2[31])) */
>  -static inline uint32_t get_C_add_icc(target_ulong dst, target_ulong src1,
>  -                                     target_ulong src2)
>  +static inline uint32_t get_C_add_icc(uint32_t dst, uint32_t src1)
>   {
>      uint32_t ret = 0;
>
>  -    if (((src1 & (1ULL << 31)) & (src2 & (1ULL << 31)))
>  -        | ((~(dst & (1ULL << 31)))
>  -           & ((src1 & (1ULL << 31)) | (src2 & (1ULL << 31)))))
>  +    if (dst < src1)
>          ret |= PSR_CARRY;
>      return ret;
>   }
>
>  -static inline uint32_t get_V_add_icc(target_ulong dst, target_ulong src1,
>  -                                         target_ulong src2)
>  +static inline uint32_t get_V_add_icc(uint32_t dst, uint32_t src1,
>  +                                     uint32_t src2)
>   {
>      uint32_t ret = 0;
>
>  -    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 31))
>  +    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1U << 31))
>          ret |= PSR_OVF;
>      return ret;
>   }
>  @@ -1017,14 +1013,14 @@ static uint32_t compute_all_add(void)
>      uint32_t ret;
>
>      ret = get_NZ_icc(CC_DST);
>  -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>      ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>      return ret;
>   }
>
>   static uint32_t compute_C_add(void)
>   {
>  -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return get_C_add_icc(CC_DST, CC_SRC);
>   }
>
>   #ifdef TARGET_SPARC64
>  @@ -1049,6 +1045,26 @@ static uint32_t compute_C_addx_xcc(void)
>   }
>   #endif
>
>  +static uint32_t compute_all_addx(void)
>  +{
>  +    uint32_t ret;
>  +
>  +    ret = get_NZ_icc(CC_DST);
>  +    ret |= get_C_add_icc(CC_DST - CC_SRC2, CC_SRC);
>  +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>  +    ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return ret;
>  +}
>  +
>  +static uint32_t compute_C_addx(void)
>  +{
>  +    uint32_t ret;
>  +
>  +    ret = get_C_add_icc(CC_DST - CC_SRC2, CC_SRC);
>  +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>  +    return ret;
>  +}
>  +
>   static inline uint32_t get_V_tag_icc(target_ulong src1, target_ulong src2)
>   {
>      uint32_t ret = 0;
>  @@ -1063,7 +1079,7 @@ static uint32_t compute_all_tadd(void)
>      uint32_t ret;
>
>      ret = get_NZ_icc(CC_DST);
>  -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>      ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>      ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>      return ret;
>  @@ -1071,7 +1087,7 @@ static uint32_t compute_all_tadd(void)
>
>   static uint32_t compute_C_tadd(void)
>   {
>  -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return get_C_add_icc(CC_DST, CC_SRC);
>   }
>
>   static uint32_t compute_all_taddtv(void)
>  @@ -1079,34 +1095,30 @@ static uint32_t compute_all_taddtv(void)
>      uint32_t ret;
>
>      ret = get_NZ_icc(CC_DST);
>  -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>      return ret;
>   }
>
>   static uint32_t compute_C_taddtv(void)
>   {
>  -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return get_C_add_icc(CC_DST, CC_SRC);
>   }
>
>  -/* carry = (~src1[31] & src2[31]) | ( dst[31]  & (~src1[31] | src2[31])) */
>  -static inline uint32_t get_C_sub_icc(target_ulong dst, target_ulong src1,
>  -                                     target_ulong src2)
>  +static inline uint32_t get_C_sub_icc(uint32_t src1, uint32_t src2)
>   {
>      uint32_t ret = 0;
>
>  -    if (((~(src1 & (1ULL << 31))) & (src2 & (1ULL << 31)))
>  -        | ((dst & (1ULL << 31)) & (( ~(src1 & (1ULL << 31)))
>  -                                   | (src2 & (1ULL << 31)))))
>  +    if (src1 < src2)
>          ret |= PSR_CARRY;
>      return ret;
>   }
>
>  -static inline uint32_t get_V_sub_icc(target_ulong dst, target_ulong src1,
>  -                                     target_ulong src2)
>  +static inline uint32_t get_V_sub_icc(uint32_t dst, uint32_t src1,
>  +                                     uint32_t src2)
>   {
>      uint32_t ret = 0;
>
>  -    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 31))
>  +    if (((src1 ^ src2) & (src1 ^ dst)) & (1U << 31))
>          ret |= PSR_OVF;
>      return ret;
>   }
>  @@ -1153,14 +1165,14 @@ static uint32_t compute_all_sub(void)
>      uint32_t ret;
>
>      ret = get_NZ_icc(CC_DST);
>  -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>      ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>      return ret;
>   }
>
>   static uint32_t compute_C_sub(void)
>   {
>  -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>   }
>
>   #ifdef TARGET_SPARC64
>  @@ -1185,12 +1197,32 @@ static uint32_t compute_C_subx_xcc(void)
>   }
>   #endif
>
>  +static uint32_t compute_all_subx(void)
>  +{
>  +    uint32_t ret;
>  +
>  +    ret = get_NZ_icc(CC_DST);
>  +    ret |= get_C_sub_icc(CC_DST - CC_SRC2, CC_SRC);
>  +    ret |= get_C_sub_icc(CC_DST, CC_SRC2);
>  +    ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return ret;
>  +}
>  +
>  +static uint32_t compute_C_subx(void)
>  +{
>  +    uint32_t ret;
>  +
>  +    ret = get_C_sub_icc(CC_DST - CC_SRC2, CC_SRC);
>  +    ret |= get_C_sub_icc(CC_DST, CC_SRC2);
>  +    return ret;
>  +}
>  +
>   static uint32_t compute_all_tsub(void)
>   {
>      uint32_t ret;
>
>      ret = get_NZ_icc(CC_DST);
>  -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>      ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>      ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>      return ret;
>  @@ -1198,7 +1230,7 @@ static uint32_t compute_all_tsub(void)
>
>   static uint32_t compute_C_tsub(void)
>   {
>  -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>   }
>
>   static uint32_t compute_all_tsubtv(void)
>  @@ -1206,13 +1238,13 @@ static uint32_t compute_all_tsubtv(void)
>      uint32_t ret;
>
>      ret = get_NZ_icc(CC_DST);
>  -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>      return ret;
>   }
>
>   static uint32_t compute_C_tsubtv(void)
>   {
>  -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>  +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>   }
>
>   static uint32_t compute_all_logic(void)
>  @@ -1242,11 +1274,11 @@ static const CCTable icc_table[CC_OP_NB] = {
>      [CC_OP_FLAGS] = { compute_all_flags, compute_C_flags },
>      [CC_OP_DIV] = { compute_all_div, compute_C_div },
>      [CC_OP_ADD] = { compute_all_add, compute_C_add },
>  -    [CC_OP_ADDX] = { compute_all_add, compute_C_add },
>  +    [CC_OP_ADDX] = { compute_all_addx, compute_C_addx },
>      [CC_OP_TADD] = { compute_all_tadd, compute_C_tadd },
>      [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_taddtv },
>      [CC_OP_SUB] = { compute_all_sub, compute_C_sub },
>  -    [CC_OP_SUBX] = { compute_all_sub, compute_C_sub },
>  +    [CC_OP_SUBX] = { compute_all_subx, compute_C_subx },
>      [CC_OP_TSUB] = { compute_all_tsub, compute_C_tsub },
>      [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_tsubtv },
>      [CC_OP_LOGIC] = { compute_all_logic, compute_C_logic },
>
> --
>  1.7.0.1
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX.
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
@ 2010-05-11 19:28   ` Blue Swirl
  2010-05-12 14:48     ` Richard Henderson
  0 siblings, 1 reply; 18+ messages in thread
From: Blue Swirl @ 2010-05-11 19:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 5/11/10, Richard Henderson <rth@twiddle.net> wrote:
> Computing carry is trivial for some inputs.  By avoiding an
>  external function call, we generate near-optimal code for
>  the common cases of add+addx (double-word arithmetic) and
>  cmp+addx (a setcc pattern).
>
>  Signed-off-by: Richard Henderson <rth@twiddle.net>
>  ---
>   target-sparc/helper.h    |    2 +-
>   target-sparc/op_helper.c |    2 +-
>   target-sparc/translate.c |  268 +++++++++++++++++++++++++++++++++-------------
>   3 files changed, 196 insertions(+), 76 deletions(-)
>
>  diff --git a/target-sparc/helper.h b/target-sparc/helper.h
>  index 04c1306..6f103e7 100644
>  --- a/target-sparc/helper.h
>  +++ b/target-sparc/helper.h
>  @@ -158,6 +158,6 @@ VIS_CMPHELPER(cmpne);
>   #undef VIS_HELPER
>   #undef VIS_CMPHELPER
>   DEF_HELPER_0(compute_psr, void);
>  -DEF_HELPER_0(compute_C_icc, tl);
>  +DEF_HELPER_0(compute_C_icc, i32);
>
>   #include "def-helper.h"
>  diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
>  index c36bc54..3d6177b 100644
>  --- a/target-sparc/op_helper.c
>  +++ b/target-sparc/op_helper.c
>  @@ -1314,7 +1314,7 @@ void helper_compute_psr(void)
>      CC_OP = CC_OP_FLAGS;
>   }
>
>  -target_ulong helper_compute_C_icc(void)
>  +uint32_t helper_compute_C_icc(void)
>   {
>      uint32_t ret;
>
>  diff --git a/target-sparc/translate.c b/target-sparc/translate.c
>  index ea7c71b..06f0f34 100644
>  --- a/target-sparc/translate.c
>  +++ b/target-sparc/translate.c
>  @@ -332,24 +332,130 @@ static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
>      tcg_gen_mov_tl(dst, cpu_cc_dst);
>   }
>
>  -static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
>  +static TCGv_i32 gen_add32_carry32(void)
>   {
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>  -    tcg_gen_mov_tl(cpu_cc_src, src1);
>  -    tcg_gen_movi_tl(cpu_cc_src2, src2);
>  -    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>  -    tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
>  -    tcg_gen_mov_tl(dst, cpu_cc_dst);
>  +    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
>  +
>  +    /* Carry is computed from a previous add: (dst < src)  */
>  +#if TARGET_LONG_BITS == 64
>  +    cc_src1_32 = tcg_temp_new_i32();
>  +    cc_src2_32 = tcg_temp_new_i32();
>  +    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_dst);
>  +    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src);
>  +#else
>  +    cc_src1_32 = cpu_cc_dst;
>  +    cc_src2_32 = cpu_cc_src;
>  +#endif
>  +
>  +    carry_32 = tcg_temp_new_i32();
>  +    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
>  +
>  +#if TARGET_LONG_BITS == 64
>  +    tcg_temp_free_i32(cc_src1_32);
>  +    tcg_temp_free_i32(cc_src2_32);
>  +#endif
>  +
>  +    return carry_32;
>   }
>
>  -static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
>  +static TCGv_i32 gen_sub32_carry32(void)
>   {
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>  -    tcg_gen_mov_tl(cpu_cc_src, src1);
>  -    tcg_gen_mov_tl(cpu_cc_src2, src2);
>  -    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>  -    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
>  -    tcg_gen_mov_tl(dst, cpu_cc_dst);
>  +    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
>  +
>  +    /* Carry is computed from a previous borrow: (src1 < src2)  */
>  +#if TARGET_LONG_BITS == 64
>  +    cc_src1_32 = tcg_temp_new_i32();
>  +    cc_src2_32 = tcg_temp_new_i32();
>  +    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_src);
>  +    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src2);
>  +#else
>  +    cc_src1_32 = cpu_cc_src;
>  +    cc_src2_32 = cpu_cc_src2;
>  +#endif
>  +
>  +    carry_32 = tcg_temp_new_i32();
>  +    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
>  +
>  +#if TARGET_LONG_BITS == 64
>  +    tcg_temp_free_i32(cc_src1_32);
>  +    tcg_temp_free_i32(cc_src2_32);
>  +#endif
>  +
>  +    return carry_32;
>  +}
>  +
>  +static void gen_op_addx_int(DisasContext *dc, TCGv dst, TCGv src1,
>  +                            TCGv src2, int update_cc)
>  +{
>  +    TCGv_i32 carry_32;
>  +    TCGv carry;
>  +
>  +    switch (dc->cc_op) {
>  +    case CC_OP_DIV:
>  +    case CC_OP_LOGIC:
>  +        /* Carry is known to be zero.  Fall back to plain ADD.  */
>  +        if (update_cc) {
>  +            gen_op_add_cc(dst, src1, src2);
>  +        } else {
>  +            tcg_gen_add_tl(dst, src1, src2);
>  +        }
>  +        return;
>  +
>  +    case CC_OP_ADD:
>  +    case CC_OP_TADD:
>  +    case CC_OP_TADDTV:
>  +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
>  +        {
>  +            /* For 32-bit hosts, we can re-use the host's hardware carry
>  +               generation by using an ADD2 opcode.  We discard the low
>  +               part of the output.  Ideally we'd combine this operation
>  +               with the add that generated the carry in the first place.  */
>  +            TCGv dst_low = tcg_temp_new();
>  +            tcg_gen_op6_i32(INDEX_op_add2_i32, dst_low, dst,
>  +                            cpu_cc_src, src1, cpu_cc_src2, src2);

Awesome idea!

>  +            tcg_temp_free(dst_low);
>  +            goto add_done;
>  +        }
>  +#endif
>  +        carry_32 = gen_add32_carry32();
>  +        break;
>  +
>  +    case CC_OP_SUB:
>  +    case CC_OP_TSUB:
>  +    case CC_OP_TSUBTV:
>  +        carry_32 = gen_sub32_carry32();
>  +        break;
>  +
>  +    default:
>  +        /* We need external help to produce the carry.  */
>  +        carry_32 = tcg_temp_new_i32();
>  +        gen_helper_compute_C_icc(carry_32);
>  +        break;
>  +    }
>  +
>  +#if TARGET_LONG_BITS == 64
>  +    carry = tcg_temp_new();
>  +    tcg_gen_extu_i32_i64(carry, carry_32);
>  +#else
>  +    carry = carry_32;
>  +#endif
>  +
>  +    tcg_gen_add_tl(dst, src1, src2);
>  +    tcg_gen_add_tl(dst, dst, carry);
>  +
>  +    tcg_temp_free_i32(carry_32);
>  +#if TARGET_LONG_BITS == 64
>  +    tcg_temp_free(carry);
>  +#endif
>  +
>  +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
>  + add_done:
>  +#endif
>  +    if (update_cc) {
>  +        tcg_gen_mov_tl(cpu_cc_src, src1);
>  +        tcg_gen_mov_tl(cpu_cc_src2, src2);
>  +        tcg_gen_mov_tl(cpu_cc_dst, dst);
>  +    }
>   }
>
>   static inline void gen_op_tadd_cc(TCGv dst, TCGv src1, TCGv src2)
>  @@ -415,24 +521,78 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2)
>      tcg_gen_mov_tl(dst, cpu_cc_dst);
>   }
>
>  -static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
>  +static void gen_op_subx_int(DisasContext *dc, TCGv dst, TCGv src1,
>  +                            TCGv src2, int update_cc)
>   {
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>  -    tcg_gen_mov_tl(cpu_cc_src, src1);
>  -    tcg_gen_movi_tl(cpu_cc_src2, src2);
>  -    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>  -    tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
>  -    tcg_gen_mov_tl(dst, cpu_cc_dst);
>  -}
>  +    TCGv_i32 carry_32;
>  +    TCGv carry;
>
>  -static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
>  -{
>  -    gen_helper_compute_C_icc(cpu_tmp0);
>  -    tcg_gen_mov_tl(cpu_cc_src, src1);
>  -    tcg_gen_mov_tl(cpu_cc_src2, src2);
>  -    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
>  -    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
>  -    tcg_gen_mov_tl(dst, cpu_cc_dst);
>  +    switch (dc->cc_op) {
>  +    case CC_OP_DIV:
>  +    case CC_OP_LOGIC:
>  +        /* Carry is known to be zero.  Fall back to plain SUB.  */
>  +        if (update_cc) {
>  +            gen_op_sub_cc(dst, src1, src2);
>  +        } else {
>  +            tcg_gen_sub_tl(dst, src1, src2);
>  +        }
>  +        return;
>  +
>  +    case CC_OP_ADD:
>  +    case CC_OP_TADD:
>  +    case CC_OP_TADDTV:
>  +        carry_32 = gen_add32_carry32();
>  +        break;
>  +
>  +    case CC_OP_SUB:
>  +    case CC_OP_TSUB:
>  +    case CC_OP_TSUBTV:
>  +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
>  +        {
>  +            /* For 32-bit hosts, we can re-use the host's hardware carry
>  +               generation by using a SUB2 opcode.  We discard the low
>  +               part of the output.  Ideally we'd combine this operation
>  +               with the add that generated the carry in the first place.  */
>  +            TCGv dst_low = tcg_temp_new();
>  +            tcg_gen_op6_i32(INDEX_op_sub2_i32, dst_low, dst,
>  +                            cpu_cc_src, src1, cpu_cc_src2, src2);
>  +            tcg_temp_free(dst_low);
>  +            goto sub_done;
>  +        }
>  +#endif
>  +        carry_32 = gen_sub32_carry32();
>  +        break;
>  +
>  +    default:
>  +        /* We need external help to produce the carry.  */
>  +        carry_32 = tcg_temp_new_i32();
>  +        gen_helper_compute_C_icc(carry_32);
>  +        break;
>  +    }
>  +
>  +#if TARGET_LONG_BITS == 64
>  +    carry = tcg_temp_new();
>  +    tcg_gen_extu_i32_i64(carry, carry_32);
>  +#else
>  +    carry = carry_32;
>  +#endif
>  +
>  +    tcg_gen_sub_tl(dst, src1, src2);
>  +    tcg_gen_sub_tl(dst, dst, carry);
>  +
>  +    tcg_temp_free_i32(carry_32);
>  +#if TARGET_LONG_BITS == 64
>  +    tcg_temp_free(carry);
>  +#endif
>  +
>  +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
>  + sub_done:
>  +#endif
>  +    if (update_cc) {
>  +        tcg_gen_mov_tl(cpu_cc_src, src1);
>  +        tcg_gen_mov_tl(cpu_cc_src2, src2);
>  +        tcg_gen_mov_tl(cpu_cc_dst, dst);
>  +    }
>   }
>
>   static inline void gen_op_tsub_cc(TCGv dst, TCGv src1, TCGv src2)
>  @@ -2950,28 +3110,8 @@ static void disas_sparc_insn(DisasContext * dc)
>                          }
>                          break;
>                      case 0x8: /* addx, V9 addc */
>  -                        if (IS_IMM) {
>  -                            simm = GET_FIELDs(insn, 19, 31);
>  -                            if (xop & 0x10) {
>  -                                gen_op_addxi_cc(cpu_dst, cpu_src1, simm);
>  -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
>  -                                dc->cc_op = CC_OP_ADDX;

The new code doesn't update dc->cc_op, shouldn't that happen if the
condition codes are changed? For example 'addx' in the sequence
'addcc; addxcc; addx;' should need the C flag from the second addxcc,
not from first addcc.

If we don't need to update, this was the only use of CC_OP_ADDX, do we
need that anymore?

>  -                            } else {
>  -                                gen_helper_compute_C_icc(cpu_tmp0);
>  -                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
>  -                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
>  -                            }
>  -                        } else {
>  -                            if (xop & 0x10) {
>  -                                gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2);
>  -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
>  -                                dc->cc_op = CC_OP_ADDX;
>  -                            } else {
>  -                                gen_helper_compute_C_icc(cpu_tmp0);
>  -                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
>  -                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
>  -                            }
>  -                        }
>  +                        gen_op_addx_int(dc, cpu_dst, cpu_src1, cpu_src2,
>  +                                        (xop & 0x10));
>                          break;
>   #ifdef TARGET_SPARC64
>                      case 0x9: /* V9 mulx */
>  @@ -3002,28 +3142,8 @@ static void disas_sparc_insn(DisasContext * dc)
>                          }
>                          break;
>                      case 0xc: /* subx, V9 subc */
>  -                        if (IS_IMM) {
>  -                            simm = GET_FIELDs(insn, 19, 31);
>  -                            if (xop & 0x10) {
>  -                                gen_op_subxi_cc(cpu_dst, cpu_src1, simm);
>  -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
>  -                                dc->cc_op = CC_OP_SUBX;
>  -                            } else {
>  -                                gen_helper_compute_C_icc(cpu_tmp0);
>  -                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
>  -                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
>  -                            }
>  -                        } else {
>  -                            if (xop & 0x10) {
>  -                                gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2);
>  -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
>  -                                dc->cc_op = CC_OP_SUBX;
>  -                            } else {
>  -                                gen_helper_compute_C_icc(cpu_tmp0);
>  -                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
>  -                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
>  -                            }
>  -                        }
>  +                        gen_op_subx_int(dc, cpu_dst, cpu_src1, cpu_src2,
>  +                                        (xop & 0x10));
>                          break;
>   #ifdef TARGET_SPARC64
>                      case 0xd: /* V9 udivx */
>
> --
>  1.7.0.1
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation Richard Henderson
  2010-05-11 19:14   ` [Qemu-devel] " Blue Swirl
@ 2010-05-11 21:31   ` Artyom Tarasenko
  2010-05-12 14:56     ` Richard Henderson
  2010-05-12 15:04     ` Richard Henderson
  1 sibling, 2 replies; 18+ messages in thread
From: Artyom Tarasenko @ 2010-05-11 21:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: blauwirbel, qemu-devel

2010/5/11 Richard Henderson <rth@twiddle.net>:
> Use int32 types instead of target_ulong when computing ICC.  This
> simplifies the generated code for 32-bit host and 64-bit guest.
> Use the same simplified expressions for ICC as were already used
> for XCC in carry flag generation.
>
> ADDX ICC carry generation was using the same routines as ADD ICC,
> which is incorrect if the input carry bit produces the output carry.
> Use the algorithms already in place for ADDX XCC carry generation.
> Similarly for SUBX.

Nack. It looks like you reverted carry generation to the previous
(broken) behavior.
The current implementation is indeed not optimal regarding the performance,
but is [more] precise.

This patch breaks solaris boot at a very early stage, as it breaks subx .

Can you give a reference to the specification you are implementing?

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-sparc/op_helper.c |  106 ++++++++++++++++++++++++++++++----------------
>  1 files changed, 69 insertions(+), 37 deletions(-)
>
> diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
> index 09449c5..c36bc54 100644
> --- a/target-sparc/op_helper.c
> +++ b/target-sparc/op_helper.c
> @@ -896,13 +896,13 @@ static uint32_t compute_C_flags(void)
>     return env->psr & PSR_CARRY;
>  }
>
> -static inline uint32_t get_NZ_icc(target_ulong dst)
> +static inline uint32_t get_NZ_icc(int32_t dst)
>  {
>     uint32_t ret = 0;
>
> -    if (!(dst & 0xffffffffULL))
> +    if (dst == 0)
>         ret |= PSR_ZERO;
> -    if ((int32_t) (dst & 0xffffffffULL) < 0)
> +    if (dst < 0)
>         ret |= PSR_NEG;
>     return ret;
>  }
> @@ -918,13 +918,13 @@ static uint32_t compute_C_flags_xcc(void)
>     return env->xcc & PSR_CARRY;
>  }
>
> -static inline uint32_t get_NZ_xcc(target_ulong dst)
> +static inline uint32_t get_NZ_xcc(target_long dst)
>  {
>     uint32_t ret = 0;
>
>     if (!dst)
>         ret |= PSR_ZERO;
> -    if ((int64_t)dst < 0)
> +    if (dst < 0)
>         ret |= PSR_NEG;
>     return ret;
>  }
> @@ -953,25 +953,21 @@ static uint32_t compute_C_div(void)
>     return 0;
>  }
>
> -/* carry = (src1[31] & src2[31]) | ( ~dst[31] & (src1[31] | src2[31])) */

This is how it is specified in the sparc v8 documentation.

> -static inline uint32_t get_C_add_icc(target_ulong dst, target_ulong src1,
> -                                     target_ulong src2)
> +static inline uint32_t get_C_add_icc(uint32_t dst, uint32_t src1)
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 & (1ULL << 31)) & (src2 & (1ULL << 31)))
> -        | ((~(dst & (1ULL << 31)))
> -           & ((src1 & (1ULL << 31)) | (src2 & (1ULL << 31)))))
> +    if (dst < src1)
>         ret |= PSR_CARRY;
>     return ret;
>  }
>
> -static inline uint32_t get_V_add_icc(target_ulong dst, target_ulong src1,
> -                                         target_ulong src2)
> +static inline uint32_t get_V_add_icc(uint32_t dst, uint32_t src1,
> +                                     uint32_t src2)
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 31))
> +    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1U << 31))
>         ret |= PSR_OVF;
>     return ret;
>  }
> @@ -1017,14 +1013,14 @@ static uint32_t compute_all_add(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
>
>  static uint32_t compute_C_add(void)
>  {
> -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_add_icc(CC_DST, CC_SRC);
>  }
>
>  #ifdef TARGET_SPARC64
> @@ -1049,6 +1045,26 @@ static uint32_t compute_C_addx_xcc(void)
>  }
>  #endif
>
> +static uint32_t compute_all_addx(void)
> +{
> +    uint32_t ret;
> +
> +    ret = get_NZ_icc(CC_DST);
> +    ret |= get_C_add_icc(CC_DST - CC_SRC2, CC_SRC);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
> +    ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return ret;
> +}
> +
> +static uint32_t compute_C_addx(void)
> +{
> +    uint32_t ret;
> +
> +    ret = get_C_add_icc(CC_DST - CC_SRC2, CC_SRC);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
> +    return ret;
> +}
> +
>  static inline uint32_t get_V_tag_icc(target_ulong src1, target_ulong src2)
>  {
>     uint32_t ret = 0;
> @@ -1063,7 +1079,7 @@ static uint32_t compute_all_tadd(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>     ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>     return ret;
> @@ -1071,7 +1087,7 @@ static uint32_t compute_all_tadd(void)
>
>  static uint32_t compute_C_tadd(void)
>  {
> -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_add_icc(CC_DST, CC_SRC);
>  }
>
>  static uint32_t compute_all_taddtv(void)
> @@ -1079,34 +1095,30 @@ static uint32_t compute_all_taddtv(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>     return ret;
>  }
>
>  static uint32_t compute_C_taddtv(void)
>  {
> -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_add_icc(CC_DST, CC_SRC);
>  }
>
> -/* carry = (~src1[31] & src2[31]) | ( dst[31]  & (~src1[31] | src2[31])) */

This was a copy & paste from the docs too.

> -static inline uint32_t get_C_sub_icc(target_ulong dst, target_ulong src1,
> -                                     target_ulong src2)
> +static inline uint32_t get_C_sub_icc(uint32_t src1, uint32_t src2)
>  {
>     uint32_t ret = 0;
>
> -    if (((~(src1 & (1ULL << 31))) & (src2 & (1ULL << 31)))
> -        | ((dst & (1ULL << 31)) & (( ~(src1 & (1ULL << 31)))
> -                                   | (src2 & (1ULL << 31)))))
> +    if (src1 < src2)
>         ret |= PSR_CARRY;
>     return ret;
>  }
>
> -static inline uint32_t get_V_sub_icc(target_ulong dst, target_ulong src1,
> -                                     target_ulong src2)
> +static inline uint32_t get_V_sub_icc(uint32_t dst, uint32_t src1,
> +                                     uint32_t src2)
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 31))
> +    if (((src1 ^ src2) & (src1 ^ dst)) & (1U << 31))
>         ret |= PSR_OVF;
>     return ret;
>  }
> @@ -1153,14 +1165,14 @@ static uint32_t compute_all_sub(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
>
>  static uint32_t compute_C_sub(void)
>  {
> -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>  }
>
>  #ifdef TARGET_SPARC64
> @@ -1185,12 +1197,32 @@ static uint32_t compute_C_subx_xcc(void)
>  }
>  #endif
>
> +static uint32_t compute_all_subx(void)
> +{
> +    uint32_t ret;
> +
> +    ret = get_NZ_icc(CC_DST);
> +    ret |= get_C_sub_icc(CC_DST - CC_SRC2, CC_SRC);
> +    ret |= get_C_sub_icc(CC_DST, CC_SRC2);
> +    ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return ret;
> +}
> +
> +static uint32_t compute_C_subx(void)
> +{
> +    uint32_t ret;
> +
> +    ret = get_C_sub_icc(CC_DST - CC_SRC2, CC_SRC);
> +    ret |= get_C_sub_icc(CC_DST, CC_SRC2);
> +    return ret;
> +}
> +
>  static uint32_t compute_all_tsub(void)
>  {
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>     ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>     return ret;
> @@ -1198,7 +1230,7 @@ static uint32_t compute_all_tsub(void)
>
>  static uint32_t compute_C_tsub(void)
>  {
> -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>  }
>
>  static uint32_t compute_all_tsubtv(void)
> @@ -1206,13 +1238,13 @@ static uint32_t compute_all_tsubtv(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>     return ret;
>  }
>
>  static uint32_t compute_C_tsubtv(void)
>  {
> -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>  }
>
>  static uint32_t compute_all_logic(void)
> @@ -1242,11 +1274,11 @@ static const CCTable icc_table[CC_OP_NB] = {
>     [CC_OP_FLAGS] = { compute_all_flags, compute_C_flags },
>     [CC_OP_DIV] = { compute_all_div, compute_C_div },
>     [CC_OP_ADD] = { compute_all_add, compute_C_add },
> -    [CC_OP_ADDX] = { compute_all_add, compute_C_add },
> +    [CC_OP_ADDX] = { compute_all_addx, compute_C_addx },
>     [CC_OP_TADD] = { compute_all_tadd, compute_C_tadd },
>     [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_taddtv },
>     [CC_OP_SUB] = { compute_all_sub, compute_C_sub },
> -    [CC_OP_SUBX] = { compute_all_sub, compute_C_sub },
> +    [CC_OP_SUBX] = { compute_all_subx, compute_C_subx },
>     [CC_OP_TSUB] = { compute_all_tsub, compute_C_tsub },
>     [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_tsubtv },
>     [CC_OP_LOGIC] = { compute_all_logic, compute_C_logic },
> --
> 1.7.0.1
>
>
>



-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX.
  2010-05-11 19:28   ` [Qemu-devel] " Blue Swirl
@ 2010-05-12 14:48     ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 14:48 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

> The new code doesn't update dc->cc_op, shouldn't that happen if the
> condition codes are changed? For example 'addx' in the sequence
> 'addcc; addxcc; addx;' should need the C flag from the second addxcc,
> not from first addcc.

Oops, yes, that needs updating too.  Will fix.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-11 21:31   ` [Qemu-devel] " Artyom Tarasenko
@ 2010-05-12 14:56     ` Richard Henderson
  2010-05-12 15:18       ` Artyom Tarasenko
  2010-05-12 15:04     ` Richard Henderson
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 14:56 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: blauwirbel, qemu-devel

On 05/11/2010 02:31 PM, Artyom Tarasenko wrote:
> Nack. It looks like you reverted carry generation to the previous
> (broken) behavior.

Oh?  I suppose I should go back and look at the logs, but the way
it's written there sure seems to match 5.1.5.1 of the sparcv9 manual:
You'll only get carry into the high bit on X - Y if X < Y.

In any case, if you meant to fix carry computation for subtraction,
you missed changing the 64-bit carry computation too.  It is still
written the same way before and after my patch, and matches both 
the expectation I have from the v9 manual and the post-patch code
along the 32-bit path.

r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-11 21:31   ` [Qemu-devel] " Artyom Tarasenko
  2010-05-12 14:56     ` Richard Henderson
@ 2010-05-12 15:04     ` Richard Henderson
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 15:04 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: blauwirbel, qemu-devel

On 05/11/2010 02:31 PM, Artyom Tarasenko wrote:
> Nack. It looks like you reverted carry generation to the previous
> (broken) behavior.

Perhaps you could point out the change I'm reverting?  I don't see
any change to the actual computation of the flags since 
f0f26a06d51b7e7764f8951cdbf67ac9ad507f6d, last year.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-12 14:56     ` Richard Henderson
@ 2010-05-12 15:18       ` Artyom Tarasenko
  2010-05-12 17:08         ` Richard Henderson
  0 siblings, 1 reply; 18+ messages in thread
From: Artyom Tarasenko @ 2010-05-12 15:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: blauwirbel, qemu-devel

2010/5/12 Richard Henderson <rth@twiddle.net>:
> On 05/11/2010 02:31 PM, Artyom Tarasenko wrote:
>> Nack. It looks like you reverted carry generation to the previous
>> (broken) behavior.
>
> Oh?  I suppose I should go back and look at the logs, but the way
> it's written there sure seems to match 5.1.5.1 of the sparcv9 manual:
> You'll only get carry into the high bit on X - Y if X < Y.
>
> In any case, if you meant to fix carry computation for subtraction,
> you missed changing the 64-bit carry computation too.

May very well be the case. I cared only about sparcv8. Was sort of
expecting that someone would stop me telling I broke v9...

> It is still
> written the same way before and after my patch, and matches both
> the expectation I have from the v9 manual and the post-patch code
> along the 32-bit path.

Probably v9 differs from v8?

> Perhaps you could point out the change I'm reverting?  I don't
> see any change to the actual computation of the flags since
> f0f26a06d51b7e7764f8951cdbf67ac9ad507f6d, last year.

It is last year, but
 3e6ba503400c34cbe0f9ad6e289921688bf303a3


-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation.
  2010-05-12 15:18       ` Artyom Tarasenko
@ 2010-05-12 17:08         ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 17:08 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: blauwirbel, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

On 05/12/2010 08:18 AM, Artyom Tarasenko wrote:
> It is last year, but
>  3e6ba503400c34cbe0f9ad6e289921688bf303a3

>     The page 108 of the SPARC Version 8 Architecture Manual describes
>     that addcc and addxcc shall compute carry flag the same way.
>     The page 110 claims the same about subcc and subxcc instructions.
>     This patch fixes carry computation in corner cases and removes redundant cod
>     The most visible effect of the patch is enabling Solaris boot when using OBP

I'll contradict you right there by asserting that the manual
says no such thing about computing the carry flag "the same way".

What it says is:

p29:
# Carry is set on addition if there is a carry out of bit 31.
# Carry is set on subtraction if there is borrow into bit 31.

p108:
# ADDX and ADDXcc also add the PSR's carry (c) bit; that is
# they compute "r[rs1] + r[rs2] + c"...

How are the two computations "the same"?  First, it says that
carry is set if there is carry out.  Second, it says that there
is the extra addition of the (previous) C bit.  If there is an
extra addition, there must be an extra chance for carry, and
thus the carry flag *cannot* be computed in the same way.

Looking closely at the generation of the addx carry (with the
attached test program), I see that the original definition of
the ADDX carry, as still seen in the _xcc versions, does not
generate correct values, and your definition of the carry does.
However, that does not mean that we need to use your version
of the carry for plain ADD, only for ADDX.

I'll send a revised patch sequence shortly.


r~

[-- Attachment #2: z.c --]
[-- Type: text/x-csrc, Size: 1117 bytes --]

#include <stdio.h>


static inline int test_rolw(unsigned short *s)
{
  int i, start, end;
  asm volatile("rdtsc\n\t"
               "movl %%eax, %1\n\t"
               "movzwl %3,%2\n\t"
               "rolw $8, %w2\n\t"
               "addl $1,%2\n\t"
               "rdtsc"
               : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
  return end - start;
}

static inline int test_bswap(unsigned short *s)
{
  int i, start, end;
  asm volatile("rdtsc\n\t"
               "movl %%eax, %1\n\t"
               "movzwl %3,%2\n\t"
               "bswap %2\n\t"
               "shl $16,%2\n\t"
               "addl $1,%2\n\t"
               "rdtsc"
               : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
  return end - start;
}

#define N 10
int main()
{
  unsigned t_r[N], t_b[N];
  unsigned short s = 0;
  int i;

  for (i = 0; i < N; ++i)
    t_r[i] = test_rolw(&s);

  for (i = 0; i < N; ++i)
    t_b[i] = test_bswap(&s);

  printf("rolw\t");
  for (i = 0; i < N; ++i)
    printf("%5u", t_r[i]);
  printf("\nbswap\t");
  for (i = 0; i < N; ++i)
    printf("%5u", t_b[i]);
  printf("\n");
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2
  2010-05-10 22:23 [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2010-05-10 22:23 ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
@ 2010-05-12 18:04 ` Richard Henderson
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
                     ` (2 more replies)
  3 siblings, 3 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 18:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, atar4qemu

Changes v1->v2:
  * Fix ADDX carry generation properly, i.e. use the previous ADD
    ICC carry computation for ADDX ICC and XCC.
  * Tidy PSR generators wrt CODING_STYLE, other minor improvements.
  * Set CC_OP properly in patch 3.


r~



Richard Henderson (3):
  target-sparc: Fix compilation with --enable-debug.
  target-sparc: Simplify ICC generation.
  target-sparc: Inline some generation of carry for ADDX/SUBX.

 target-sparc/op_helper.c |  220 ++++++++++++++++++++++++--------------
 target-sparc/translate.c |  272 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 338 insertions(+), 154 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug.
  2010-05-12 18:04 ` [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2 Richard Henderson
@ 2010-05-12 18:04   ` Richard Henderson
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation Richard Henderson
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
  2 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 18:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, atar4qemu

Return a target_ulong from compute_C_icc to match the width of the users.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/helper.h    |    2 +-
 target-sparc/op_helper.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 6f103e7..04c1306 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -158,6 +158,6 @@ VIS_CMPHELPER(cmpne);
 #undef VIS_HELPER
 #undef VIS_CMPHELPER
 DEF_HELPER_0(compute_psr, void);
-DEF_HELPER_0(compute_C_icc, i32);
+DEF_HELPER_0(compute_C_icc, tl);
 
 #include "def-helper.h"
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index fcfd3f3..09449c5 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -1282,7 +1282,7 @@ void helper_compute_psr(void)
     CC_OP = CC_OP_FLAGS;
 }
 
-uint32_t helper_compute_C_icc(void)
+target_ulong helper_compute_C_icc(void)
 {
     uint32_t ret;
 
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation.
  2010-05-12 18:04 ` [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2 Richard Henderson
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
@ 2010-05-12 18:04   ` Richard Henderson
  2010-05-14  9:04     ` [Qemu-devel] " Artyom Tarasenko
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 18:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, atar4qemu

Use int32 types instead of target_ulong when computing ICC.  This
simplifies the generated code for 32-bit host and 64-bit guest.
Use the same simplified expressions for ICC as were already used
for XCC in carry flag generation.

Simplify the ADD carry generation to not consider a possible carry-in.
Use the more complex carry computation for ADDX only.  Use the same
carry algorithm for the XCC result of ADDX.  Similarly for SUB/SUBX.

Use the ADD carry generation functions for TADD/TADDTV.  Similarly
for SUB and TSUB/TSUBTV.

Tidy the code with respect to CODING_STYLE.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/op_helper.c |  220 +++++++++++++++++++++++++++++-----------------
 1 files changed, 140 insertions(+), 80 deletions(-)

diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index 09449c5..3783b02 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -896,14 +896,15 @@ static uint32_t compute_C_flags(void)
     return env->psr & PSR_CARRY;
 }
 
-static inline uint32_t get_NZ_icc(target_ulong dst)
+static inline uint32_t get_NZ_icc(int32_t dst)
 {
     uint32_t ret = 0;
 
-    if (!(dst & 0xffffffffULL))
-        ret |= PSR_ZERO;
-    if ((int32_t) (dst & 0xffffffffULL) < 0)
-        ret |= PSR_NEG;
+    if (dst == 0) {
+        ret = PSR_ZERO;
+    } else if (dst < 0) {
+        ret = PSR_NEG;
+    }
     return ret;
 }
 
@@ -918,14 +919,15 @@ static uint32_t compute_C_flags_xcc(void)
     return env->xcc & PSR_CARRY;
 }
 
-static inline uint32_t get_NZ_xcc(target_ulong dst)
+static inline uint32_t get_NZ_xcc(target_long dst)
 {
     uint32_t ret = 0;
 
-    if (!dst)
-        ret |= PSR_ZERO;
-    if ((int64_t)dst < 0)
-        ret |= PSR_NEG;
+    if (!dst) {
+        ret = PSR_ZERO;
+    } else if (dst < 0) {
+        ret = PSR_NEG;
+    }
     return ret;
 }
 #endif
@@ -934,8 +936,9 @@ static inline uint32_t get_V_div_icc(target_ulong src2)
 {
     uint32_t ret = 0;
 
-    if (src2 != 0)
-        ret |= PSR_OVF;
+    if (src2 != 0) {
+        ret = PSR_OVF;
+    }
     return ret;
 }
 
@@ -953,26 +956,35 @@ static uint32_t compute_C_div(void)
     return 0;
 }
 
-/* carry = (src1[31] & src2[31]) | ( ~dst[31] & (src1[31] | src2[31])) */
-static inline uint32_t get_C_add_icc(target_ulong dst, target_ulong src1,
-                                     target_ulong src2)
+static inline uint32_t get_C_add_icc(uint32_t dst, uint32_t src1)
 {
     uint32_t ret = 0;
 
-    if (((src1 & (1ULL << 31)) & (src2 & (1ULL << 31)))
-        | ((~(dst & (1ULL << 31)))
-           & ((src1 & (1ULL << 31)) | (src2 & (1ULL << 31)))))
-        ret |= PSR_CARRY;
+    if (dst < src1) {
+        ret = PSR_CARRY;
+    }
     return ret;
 }
 
-static inline uint32_t get_V_add_icc(target_ulong dst, target_ulong src1,
-                                         target_ulong src2)
+static inline uint32_t get_C_addx_icc(uint32_t dst, uint32_t src1,
+                                      uint32_t src2)
 {
     uint32_t ret = 0;
 
-    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 31))
-        ret |= PSR_OVF;
+    if (((src1 & src2) | (~dst & (src1 | src2))) & (1U << 31)) {
+        ret = PSR_CARRY;
+    }
+    return ret;
+}
+
+static inline uint32_t get_V_add_icc(uint32_t dst, uint32_t src1,
+                                     uint32_t src2)
+{
+    uint32_t ret = 0;
+
+    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1U << 31)) {
+        ret = PSR_OVF;
+    }
     return ret;
 }
 
@@ -981,8 +993,20 @@ static inline uint32_t get_C_add_xcc(target_ulong dst, target_ulong src1)
 {
     uint32_t ret = 0;
 
-    if (dst < src1)
-        ret |= PSR_CARRY;
+    if (dst < src1) {
+        ret = PSR_CARRY;
+    }
+    return ret;
+}
+
+static inline uint32_t get_C_addx_xcc(target_ulong dst, target_ulong src1,
+                                      target_ulong src2)
+{
+    uint32_t ret = 0;
+
+    if (((src1 & src2) | (~dst & (src1 | src2))) & (1ULL << 63)) {
+        ret = PSR_CARRY;
+    }
     return ret;
 }
 
@@ -991,8 +1015,9 @@ static inline uint32_t get_V_add_xcc(target_ulong dst, target_ulong src1,
 {
     uint32_t ret = 0;
 
-    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 63))
-        ret |= PSR_OVF;
+    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 63)) {
+        ret = PSR_OVF;
+    }
     return ret;
 }
 
@@ -1017,14 +1042,14 @@ static uint32_t compute_all_add(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
 
 static uint32_t compute_C_add(void)
 {
-    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_add_icc(CC_DST, CC_SRC);
 }
 
 #ifdef TARGET_SPARC64
@@ -1033,8 +1058,7 @@ static uint32_t compute_all_addx_xcc(void)
     uint32_t ret;
 
     ret = get_NZ_xcc(CC_DST);
-    ret |= get_C_add_xcc(CC_DST - CC_SRC2, CC_SRC);
-    ret |= get_C_add_xcc(CC_DST, CC_SRC);
+    ret |= get_C_addx_xcc(CC_DST, CC_SRC, CC_SRC2);
     ret |= get_V_add_xcc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
@@ -1043,18 +1067,36 @@ static uint32_t compute_C_addx_xcc(void)
 {
     uint32_t ret;
 
-    ret = get_C_add_xcc(CC_DST - CC_SRC2, CC_SRC);
-    ret |= get_C_add_xcc(CC_DST, CC_SRC);
+    ret = get_C_addx_xcc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
 #endif
 
+static uint32_t compute_all_addx(void)
+{
+    uint32_t ret;
+
+    ret = get_NZ_icc(CC_DST);
+    ret |= get_C_addx_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    return ret;
+}
+
+static uint32_t compute_C_addx(void)
+{
+    uint32_t ret;
+
+    ret = get_C_addx_icc(CC_DST, CC_SRC, CC_SRC2);
+    return ret;
+}
+
 static inline uint32_t get_V_tag_icc(target_ulong src1, target_ulong src2)
 {
     uint32_t ret = 0;
 
-    if ((src1 | src2) & 0x3)
-        ret |= PSR_OVF;
+    if ((src1 | src2) & 0x3) {
+        ret = PSR_OVF;
+    }
     return ret;
 }
 
@@ -1063,51 +1105,50 @@ static uint32_t compute_all_tadd(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
     ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
     return ret;
 }
 
-static uint32_t compute_C_tadd(void)
-{
-    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
-}
-
 static uint32_t compute_all_taddtv(void)
 {
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_add_icc(CC_DST, CC_SRC);
     return ret;
 }
 
-static uint32_t compute_C_taddtv(void)
+static inline uint32_t get_C_sub_icc(uint32_t src1, uint32_t src2)
 {
-    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
+    uint32_t ret = 0;
+
+    if (src1 < src2) {
+        ret = PSR_CARRY;
+    }
+    return ret;
 }
 
-/* carry = (~src1[31] & src2[31]) | ( dst[31]  & (~src1[31] | src2[31])) */
-static inline uint32_t get_C_sub_icc(target_ulong dst, target_ulong src1,
-                                     target_ulong src2)
+static inline uint32_t get_C_subx_icc(uint32_t dst, uint32_t src1,
+                                      uint32_t src2)
 {
     uint32_t ret = 0;
 
-    if (((~(src1 & (1ULL << 31))) & (src2 & (1ULL << 31)))
-        | ((dst & (1ULL << 31)) & (( ~(src1 & (1ULL << 31)))
-                                   | (src2 & (1ULL << 31)))))
-        ret |= PSR_CARRY;
+    if (((~src1 & src2) | (dst & (~src1 | src2))) & (1U << 31)) {
+        ret = PSR_CARRY;
+    }
     return ret;
 }
 
-static inline uint32_t get_V_sub_icc(target_ulong dst, target_ulong src1,
-                                     target_ulong src2)
+static inline uint32_t get_V_sub_icc(uint32_t dst, uint32_t src1,
+                                     uint32_t src2)
 {
     uint32_t ret = 0;
 
-    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 31))
-        ret |= PSR_OVF;
+    if (((src1 ^ src2) & (src1 ^ dst)) & (1U << 31)) {
+        ret = PSR_OVF;
+    }
     return ret;
 }
 
@@ -1117,8 +1158,20 @@ static inline uint32_t get_C_sub_xcc(target_ulong src1, target_ulong src2)
 {
     uint32_t ret = 0;
 
-    if (src1 < src2)
-        ret |= PSR_CARRY;
+    if (src1 < src2) {
+        ret = PSR_CARRY;
+    }
+    return ret;
+}
+
+static inline uint32_t get_C_subx_xcc(target_ulong dst, target_ulong src1,
+                                      target_ulong src2)
+{
+    uint32_t ret = 0;
+
+    if (((~src1 & src2) | (dst & (~src1 | src2))) & (1ULL << 63)) {
+        ret = PSR_CARRY;
+    }
     return ret;
 }
 
@@ -1127,8 +1180,9 @@ static inline uint32_t get_V_sub_xcc(target_ulong dst, target_ulong src1,
 {
     uint32_t ret = 0;
 
-    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 63))
-        ret |= PSR_OVF;
+    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 63)) {
+        ret = PSR_OVF;
+    }
     return ret;
 }
 
@@ -1153,14 +1207,14 @@ static uint32_t compute_all_sub(void)
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
 
 static uint32_t compute_C_sub(void)
 {
-    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    return get_C_sub_icc(CC_SRC, CC_SRC2);
 }
 
 #ifdef TARGET_SPARC64
@@ -1169,8 +1223,7 @@ static uint32_t compute_all_subx_xcc(void)
     uint32_t ret;
 
     ret = get_NZ_xcc(CC_DST);
-    ret |= get_C_sub_xcc(CC_DST - CC_SRC2, CC_SRC);
-    ret |= get_C_sub_xcc(CC_DST, CC_SRC2);
+    ret |= get_C_subx_xcc(CC_DST, CC_SRC, CC_SRC2);
     ret |= get_V_sub_xcc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
@@ -1179,40 +1232,47 @@ static uint32_t compute_C_subx_xcc(void)
 {
     uint32_t ret;
 
-    ret = get_C_sub_xcc(CC_DST - CC_SRC2, CC_SRC);
-    ret |= get_C_sub_xcc(CC_DST, CC_SRC2);
+    ret = get_C_subx_xcc(CC_DST, CC_SRC, CC_SRC2);
     return ret;
 }
 #endif
 
-static uint32_t compute_all_tsub(void)
+static uint32_t compute_all_subx(void)
 {
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_subx_icc(CC_DST, CC_SRC, CC_SRC2);
     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
-    ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
     return ret;
 }
 
-static uint32_t compute_C_tsub(void)
+static uint32_t compute_C_subx(void)
 {
-    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    uint32_t ret;
+
+    ret = get_C_subx_icc(CC_DST, CC_SRC, CC_SRC2);
+    return ret;
 }
 
-static uint32_t compute_all_tsubtv(void)
+static uint32_t compute_all_tsub(void)
 {
     uint32_t ret;
 
     ret = get_NZ_icc(CC_DST);
-    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
+    ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
     return ret;
 }
 
-static uint32_t compute_C_tsubtv(void)
+static uint32_t compute_all_tsubtv(void)
 {
-    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
+    uint32_t ret;
+
+    ret = get_NZ_icc(CC_DST);
+    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
+    return ret;
 }
 
 static uint32_t compute_all_logic(void)
@@ -1242,13 +1302,13 @@ static const CCTable icc_table[CC_OP_NB] = {
     [CC_OP_FLAGS] = { compute_all_flags, compute_C_flags },
     [CC_OP_DIV] = { compute_all_div, compute_C_div },
     [CC_OP_ADD] = { compute_all_add, compute_C_add },
-    [CC_OP_ADDX] = { compute_all_add, compute_C_add },
-    [CC_OP_TADD] = { compute_all_tadd, compute_C_tadd },
-    [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_taddtv },
+    [CC_OP_ADDX] = { compute_all_addx, compute_C_addx },
+    [CC_OP_TADD] = { compute_all_tadd, compute_C_add },
+    [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_add },
     [CC_OP_SUB] = { compute_all_sub, compute_C_sub },
-    [CC_OP_SUBX] = { compute_all_sub, compute_C_sub },
-    [CC_OP_TSUB] = { compute_all_tsub, compute_C_tsub },
-    [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_tsubtv },
+    [CC_OP_SUBX] = { compute_all_subx, compute_C_subx },
+    [CC_OP_TSUB] = { compute_all_tsub, compute_C_sub },
+    [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_sub },
     [CC_OP_LOGIC] = { compute_all_logic, compute_C_logic },
 };
 
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX.
  2010-05-12 18:04 ` [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2 Richard Henderson
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation Richard Henderson
@ 2010-05-12 18:04   ` Richard Henderson
  2010-05-20 19:59     ` [Qemu-devel] " Blue Swirl
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2010-05-12 18:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, atar4qemu

Computing carry is trivial for some inputs.  By avoiding an
external function call, we generate near-optimal code for
the common cases of add+addx (double-word arithmetic) and
cmp+addx (a setcc pattern).

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/helper.h    |    2 +-
 target-sparc/op_helper.c |    2 +-
 target-sparc/translate.c |  272 +++++++++++++++++++++++++++++++++-------------
 3 files changed, 200 insertions(+), 76 deletions(-)

diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 04c1306..6f103e7 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -158,6 +158,6 @@ VIS_CMPHELPER(cmpne);
 #undef VIS_HELPER
 #undef VIS_CMPHELPER
 DEF_HELPER_0(compute_psr, void);
-DEF_HELPER_0(compute_C_icc, tl);
+DEF_HELPER_0(compute_C_icc, i32);
 
 #include "def-helper.h"
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
index 3783b02..125cd67 100644
--- a/target-sparc/op_helper.c
+++ b/target-sparc/op_helper.c
@@ -1342,7 +1342,7 @@ void helper_compute_psr(void)
     CC_OP = CC_OP_FLAGS;
 }
 
-target_ulong helper_compute_C_icc(void)
+uint32_t helper_compute_C_icc(void)
 {
     uint32_t ret;
 
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index ea7c71b..713d3e1 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -332,24 +332,132 @@ static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
     tcg_gen_mov_tl(dst, cpu_cc_dst);
 }
 
-static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
+static TCGv_i32 gen_add32_carry32(void)
 {
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_movi_tl(cpu_cc_src2, src2);
-    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
+    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
+
+    /* Carry is computed from a previous add: (dst < src)  */
+#if TARGET_LONG_BITS == 64
+    cc_src1_32 = tcg_temp_new_i32();
+    cc_src2_32 = tcg_temp_new_i32();
+    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_dst);
+    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src);
+#else
+    cc_src1_32 = cpu_cc_dst;
+    cc_src2_32 = cpu_cc_src;
+#endif
+
+    carry_32 = tcg_temp_new_i32();
+    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
+
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free_i32(cc_src1_32);
+    tcg_temp_free_i32(cc_src2_32);
+#endif
+
+    return carry_32;
 }
 
-static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
+static TCGv_i32 gen_sub32_carry32(void)
 {
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_mov_tl(cpu_cc_src2, src2);
-    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
+    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
+
+    /* Carry is computed from a previous borrow: (src1 < src2)  */
+#if TARGET_LONG_BITS == 64
+    cc_src1_32 = tcg_temp_new_i32();
+    cc_src2_32 = tcg_temp_new_i32();
+    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_src);
+    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src2);
+#else
+    cc_src1_32 = cpu_cc_src;
+    cc_src2_32 = cpu_cc_src2;
+#endif
+
+    carry_32 = tcg_temp_new_i32();
+    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
+
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free_i32(cc_src1_32);
+    tcg_temp_free_i32(cc_src2_32);
+#endif
+
+    return carry_32;
+}
+
+static void gen_op_addx_int(DisasContext *dc, TCGv dst, TCGv src1,
+                            TCGv src2, int update_cc)
+{
+    TCGv_i32 carry_32;
+    TCGv carry;
+
+    switch (dc->cc_op) {
+    case CC_OP_DIV:
+    case CC_OP_LOGIC:
+        /* Carry is known to be zero.  Fall back to plain ADD.  */
+        if (update_cc) {
+            gen_op_add_cc(dst, src1, src2);
+        } else {
+            tcg_gen_add_tl(dst, src1, src2);
+        }
+        return;
+
+    case CC_OP_ADD:
+    case CC_OP_TADD:
+    case CC_OP_TADDTV:
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+        {
+            /* For 32-bit hosts, we can re-use the host's hardware carry
+               generation by using an ADD2 opcode.  We discard the low
+               part of the output.  Ideally we'd combine this operation
+               with the add that generated the carry in the first place.  */
+            TCGv dst_low = tcg_temp_new();
+            tcg_gen_op6_i32(INDEX_op_add2_i32, dst_low, dst, 
+                            cpu_cc_src, src1, cpu_cc_src2, src2);
+            tcg_temp_free(dst_low);
+            goto add_done;
+        }
+#endif
+        carry_32 = gen_add32_carry32();
+        break;
+
+    case CC_OP_SUB:
+    case CC_OP_TSUB:
+    case CC_OP_TSUBTV:
+        carry_32 = gen_sub32_carry32();
+        break;
+
+    default:
+        /* We need external help to produce the carry.  */
+        carry_32 = tcg_temp_new_i32();
+        gen_helper_compute_C_icc(carry_32);
+        break;
+    }
+
+#if TARGET_LONG_BITS == 64
+    carry = tcg_temp_new();
+    tcg_gen_extu_i32_i64(carry, carry_32);
+#else
+    carry = carry_32;
+#endif
+
+    tcg_gen_add_tl(dst, src1, src2);
+    tcg_gen_add_tl(dst, dst, carry);
+
+    tcg_temp_free_i32(carry_32);
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free(carry);
+#endif
+
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+ add_done:
+#endif
+    if (update_cc) {
+        tcg_gen_mov_tl(cpu_cc_src, src1);
+        tcg_gen_mov_tl(cpu_cc_src2, src2);
+        tcg_gen_mov_tl(cpu_cc_dst, dst);
+        tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
+        dc->cc_op = CC_OP_ADDX;
+    }
 }
 
 static inline void gen_op_tadd_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -415,24 +523,80 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2)
     tcg_gen_mov_tl(dst, cpu_cc_dst);
 }
 
-static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
+static void gen_op_subx_int(DisasContext *dc, TCGv dst, TCGv src1,
+                            TCGv src2, int update_cc)
 {
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_movi_tl(cpu_cc_src2, src2);
-    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
-}
+    TCGv_i32 carry_32;
+    TCGv carry;
 
-static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
-{
-    gen_helper_compute_C_icc(cpu_tmp0);
-    tcg_gen_mov_tl(cpu_cc_src, src1);
-    tcg_gen_mov_tl(cpu_cc_src2, src2);
-    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
-    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
-    tcg_gen_mov_tl(dst, cpu_cc_dst);
+    switch (dc->cc_op) {
+    case CC_OP_DIV:
+    case CC_OP_LOGIC:
+        /* Carry is known to be zero.  Fall back to plain SUB.  */
+        if (update_cc) {
+            gen_op_sub_cc(dst, src1, src2);
+        } else {
+            tcg_gen_sub_tl(dst, src1, src2);
+        }
+        return;
+
+    case CC_OP_ADD:
+    case CC_OP_TADD:
+    case CC_OP_TADDTV:
+        carry_32 = gen_add32_carry32();
+        break;
+
+    case CC_OP_SUB:
+    case CC_OP_TSUB:
+    case CC_OP_TSUBTV:
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+        {
+            /* For 32-bit hosts, we can re-use the host's hardware carry
+               generation by using a SUB2 opcode.  We discard the low
+               part of the output.  Ideally we'd combine this operation
+               with the add that generated the carry in the first place.  */
+            TCGv dst_low = tcg_temp_new();
+            tcg_gen_op6_i32(INDEX_op_sub2_i32, dst_low, dst, 
+                            cpu_cc_src, src1, cpu_cc_src2, src2);
+            tcg_temp_free(dst_low);
+            goto sub_done;
+        }
+#endif
+        carry_32 = gen_sub32_carry32();
+        break;
+
+    default:
+        /* We need external help to produce the carry.  */
+        carry_32 = tcg_temp_new_i32();
+        gen_helper_compute_C_icc(carry_32);
+        break;
+    }
+
+#if TARGET_LONG_BITS == 64
+    carry = tcg_temp_new();
+    tcg_gen_extu_i32_i64(carry, carry_32);
+#else
+    carry = carry_32;
+#endif
+
+    tcg_gen_sub_tl(dst, src1, src2);
+    tcg_gen_sub_tl(dst, dst, carry);
+
+    tcg_temp_free_i32(carry_32);
+#if TARGET_LONG_BITS == 64
+    tcg_temp_free(carry);
+#endif
+
+#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
+ sub_done:
+#endif
+    if (update_cc) {
+        tcg_gen_mov_tl(cpu_cc_src, src1);
+        tcg_gen_mov_tl(cpu_cc_src2, src2);
+        tcg_gen_mov_tl(cpu_cc_dst, dst);
+        tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
+        dc->cc_op = CC_OP_SUBX;
+    }
 }
 
 static inline void gen_op_tsub_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -2950,28 +3114,8 @@ static void disas_sparc_insn(DisasContext * dc)
                         }
                         break;
                     case 0x8: /* addx, V9 addc */
-                        if (IS_IMM) {
-                            simm = GET_FIELDs(insn, 19, 31);
-                            if (xop & 0x10) {
-                                gen_op_addxi_cc(cpu_dst, cpu_src1, simm);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
-                                dc->cc_op = CC_OP_ADDX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
-                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        } else {
-                            if (xop & 0x10) {
-                                gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
-                                dc->cc_op = CC_OP_ADDX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
-                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        }
+                        gen_op_addx_int(dc, cpu_dst, cpu_src1, cpu_src2,
+                                        (xop & 0x10));
                         break;
 #ifdef TARGET_SPARC64
                     case 0x9: /* V9 mulx */
@@ -3002,28 +3146,8 @@ static void disas_sparc_insn(DisasContext * dc)
                         }
                         break;
                     case 0xc: /* subx, V9 subc */
-                        if (IS_IMM) {
-                            simm = GET_FIELDs(insn, 19, 31);
-                            if (xop & 0x10) {
-                                gen_op_subxi_cc(cpu_dst, cpu_src1, simm);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
-                                dc->cc_op = CC_OP_SUBX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
-                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        } else {
-                            if (xop & 0x10) {
-                                gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2);
-                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
-                                dc->cc_op = CC_OP_SUBX;
-                            } else {
-                                gen_helper_compute_C_icc(cpu_tmp0);
-                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
-                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
-                            }
-                        }
+                        gen_op_subx_int(dc, cpu_dst, cpu_src1, cpu_src2,
+                                        (xop & 0x10));
                         break;
 #ifdef TARGET_SPARC64
                     case 0xd: /* V9 udivx */
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] target-sparc: Simplify ICC generation.
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation Richard Henderson
@ 2010-05-14  9:04     ` Artyom Tarasenko
  0 siblings, 0 replies; 18+ messages in thread
From: Artyom Tarasenko @ 2010-05-14  9:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: blauwirbel, qemu-devel

2010/5/12 Richard Henderson <rth@twiddle.net>:
> Use int32 types instead of target_ulong when computing ICC.  This
> simplifies the generated code for 32-bit host and 64-bit guest.
> Use the same simplified expressions for ICC as were already used
> for XCC in carry flag generation.
>
> Simplify the ADD carry generation to not consider a possible carry-in.
> Use the more complex carry computation for ADDX only.  Use the same
> carry algorithm for the XCC result of ADDX.  Similarly for SUB/SUBX.
>
> Use the ADD carry generation functions for TADD/TADDTV.  Similarly
> for SUB and TSUB/TSUBTV.
>
> Tidy the code with respect to CODING_STYLE.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-sparc/op_helper.c |  220 +++++++++++++++++++++++++++++-----------------
>  1 files changed, 140 insertions(+), 80 deletions(-)
>
> diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
> index 09449c5..3783b02 100644
> --- a/target-sparc/op_helper.c
> +++ b/target-sparc/op_helper.c
> @@ -896,14 +896,15 @@ static uint32_t compute_C_flags(void)
>     return env->psr & PSR_CARRY;
>  }
>
> -static inline uint32_t get_NZ_icc(target_ulong dst)
> +static inline uint32_t get_NZ_icc(int32_t dst)
>  {
>     uint32_t ret = 0;
>
> -    if (!(dst & 0xffffffffULL))
> -        ret |= PSR_ZERO;
> -    if ((int32_t) (dst & 0xffffffffULL) < 0)
> -        ret |= PSR_NEG;
> +    if (dst == 0) {
> +        ret = PSR_ZERO;
> +    } else if (dst < 0) {
> +        ret = PSR_NEG;
> +    }
>     return ret;
>  }
>
> @@ -918,14 +919,15 @@ static uint32_t compute_C_flags_xcc(void)
>     return env->xcc & PSR_CARRY;
>  }
>
> -static inline uint32_t get_NZ_xcc(target_ulong dst)
> +static inline uint32_t get_NZ_xcc(target_long dst)
>  {
>     uint32_t ret = 0;
>
> -    if (!dst)
> -        ret |= PSR_ZERO;
> -    if ((int64_t)dst < 0)
> -        ret |= PSR_NEG;
> +    if (!dst) {
> +        ret = PSR_ZERO;
> +    } else if (dst < 0) {
> +        ret = PSR_NEG;
> +    }
>     return ret;
>  }
>  #endif
> @@ -934,8 +936,9 @@ static inline uint32_t get_V_div_icc(target_ulong src2)
>  {
>     uint32_t ret = 0;
>
> -    if (src2 != 0)
> -        ret |= PSR_OVF;
> +    if (src2 != 0) {
> +        ret = PSR_OVF;
> +    }
>     return ret;
>  }
>
> @@ -953,26 +956,35 @@ static uint32_t compute_C_div(void)
>     return 0;
>  }
>
> -/* carry = (src1[31] & src2[31]) | ( ~dst[31] & (src1[31] | src2[31])) */
> -static inline uint32_t get_C_add_icc(target_ulong dst, target_ulong src1,
> -                                     target_ulong src2)
> +static inline uint32_t get_C_add_icc(uint32_t dst, uint32_t src1)
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 & (1ULL << 31)) & (src2 & (1ULL << 31)))
> -        | ((~(dst & (1ULL << 31)))
> -           & ((src1 & (1ULL << 31)) | (src2 & (1ULL << 31)))))
> -        ret |= PSR_CARRY;
> +    if (dst < src1) {
> +        ret = PSR_CARRY;
> +    }
>     return ret;
>  }
>
> -static inline uint32_t get_V_add_icc(target_ulong dst, target_ulong src1,
> -                                         target_ulong src2)
> +static inline uint32_t get_C_addx_icc(uint32_t dst, uint32_t src1,
> +                                      uint32_t src2)
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 31))
> -        ret |= PSR_OVF;
> +    if (((src1 & src2) | (~dst & (src1 | src2))) & (1U << 31)) {
> +        ret = PSR_CARRY;
> +    }
> +    return ret;
> +}
> +
> +static inline uint32_t get_V_add_icc(uint32_t dst, uint32_t src1,
> +                                     uint32_t src2)
> +{
> +    uint32_t ret = 0;
> +
> +    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1U << 31)) {
> +        ret = PSR_OVF;
> +    }
>     return ret;
>  }
>
> @@ -981,8 +993,20 @@ static inline uint32_t get_C_add_xcc(target_ulong dst, target_ulong src1)
>  {
>     uint32_t ret = 0;
>
> -    if (dst < src1)
> -        ret |= PSR_CARRY;
> +    if (dst < src1) {
> +        ret = PSR_CARRY;
> +    }
> +    return ret;
> +}
> +
> +static inline uint32_t get_C_addx_xcc(target_ulong dst, target_ulong src1,
> +                                      target_ulong src2)
> +{
> +    uint32_t ret = 0;
> +
> +    if (((src1 & src2) | (~dst & (src1 | src2))) & (1ULL << 63)) {
> +        ret = PSR_CARRY;
> +    }
>     return ret;
>  }
>
> @@ -991,8 +1015,9 @@ static inline uint32_t get_V_add_xcc(target_ulong dst, target_ulong src1,
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 63))
> -        ret |= PSR_OVF;
> +    if (((src1 ^ src2 ^ -1) & (src1 ^ dst)) & (1ULL << 63)) {
> +        ret = PSR_OVF;
> +    }
>     return ret;
>  }
>
> @@ -1017,14 +1042,14 @@ static uint32_t compute_all_add(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
>
>  static uint32_t compute_C_add(void)
>  {
> -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_add_icc(CC_DST, CC_SRC);
>  }
>
>  #ifdef TARGET_SPARC64
> @@ -1033,8 +1058,7 @@ static uint32_t compute_all_addx_xcc(void)
>     uint32_t ret;
>
>     ret = get_NZ_xcc(CC_DST);
> -    ret |= get_C_add_xcc(CC_DST - CC_SRC2, CC_SRC);
> -    ret |= get_C_add_xcc(CC_DST, CC_SRC);
> +    ret |= get_C_addx_xcc(CC_DST, CC_SRC, CC_SRC2);
>     ret |= get_V_add_xcc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
> @@ -1043,18 +1067,36 @@ static uint32_t compute_C_addx_xcc(void)
>  {
>     uint32_t ret;
>
> -    ret = get_C_add_xcc(CC_DST - CC_SRC2, CC_SRC);
> -    ret |= get_C_add_xcc(CC_DST, CC_SRC);
> +    ret = get_C_addx_xcc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
>  #endif
>
> +static uint32_t compute_all_addx(void)
> +{
> +    uint32_t ret;
> +
> +    ret = get_NZ_icc(CC_DST);
> +    ret |= get_C_addx_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return ret;
> +}
> +
> +static uint32_t compute_C_addx(void)
> +{
> +    uint32_t ret;
> +
> +    ret = get_C_addx_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return ret;
> +}
> +
>  static inline uint32_t get_V_tag_icc(target_ulong src1, target_ulong src2)
>  {
>     uint32_t ret = 0;
>
> -    if ((src1 | src2) & 0x3)
> -        ret |= PSR_OVF;
> +    if ((src1 | src2) & 0x3) {
> +        ret = PSR_OVF;
> +    }
>     return ret;
>  }
>
> @@ -1063,51 +1105,50 @@ static uint32_t compute_all_tadd(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>     ret |= get_V_add_icc(CC_DST, CC_SRC, CC_SRC2);
>     ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>     return ret;
>  }
>
> -static uint32_t compute_C_tadd(void)
> -{
> -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> -}
> -
>  static uint32_t compute_all_taddtv(void)
>  {
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_add_icc(CC_DST, CC_SRC);
>     return ret;
>  }
>
> -static uint32_t compute_C_taddtv(void)
> +static inline uint32_t get_C_sub_icc(uint32_t src1, uint32_t src2)
>  {
> -    return get_C_add_icc(CC_DST, CC_SRC, CC_SRC2);
> +    uint32_t ret = 0;
> +
> +    if (src1 < src2) {
> +        ret = PSR_CARRY;
> +    }
> +    return ret;
>  }
>
> -/* carry = (~src1[31] & src2[31]) | ( dst[31]  & (~src1[31] | src2[31])) */
> -static inline uint32_t get_C_sub_icc(target_ulong dst, target_ulong src1,
> -                                     target_ulong src2)
> +static inline uint32_t get_C_subx_icc(uint32_t dst, uint32_t src1,
> +                                      uint32_t src2)
>  {
>     uint32_t ret = 0;
>
> -    if (((~(src1 & (1ULL << 31))) & (src2 & (1ULL << 31)))
> -        | ((dst & (1ULL << 31)) & (( ~(src1 & (1ULL << 31)))
> -                                   | (src2 & (1ULL << 31)))))
> -        ret |= PSR_CARRY;
> +    if (((~src1 & src2) | (dst & (~src1 | src2))) & (1U << 31)) {
> +        ret = PSR_CARRY;
> +    }
>     return ret;
>  }
>
> -static inline uint32_t get_V_sub_icc(target_ulong dst, target_ulong src1,
> -                                     target_ulong src2)
> +static inline uint32_t get_V_sub_icc(uint32_t dst, uint32_t src1,
> +                                     uint32_t src2)
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 31))
> -        ret |= PSR_OVF;
> +    if (((src1 ^ src2) & (src1 ^ dst)) & (1U << 31)) {
> +        ret = PSR_OVF;
> +    }
>     return ret;
>  }
>
> @@ -1117,8 +1158,20 @@ static inline uint32_t get_C_sub_xcc(target_ulong src1, target_ulong src2)
>  {
>     uint32_t ret = 0;
>
> -    if (src1 < src2)
> -        ret |= PSR_CARRY;
> +    if (src1 < src2) {
> +        ret = PSR_CARRY;
> +    }
> +    return ret;
> +}
> +
> +static inline uint32_t get_C_subx_xcc(target_ulong dst, target_ulong src1,
> +                                      target_ulong src2)
> +{
> +    uint32_t ret = 0;
> +
> +    if (((~src1 & src2) | (dst & (~src1 | src2))) & (1ULL << 63)) {
> +        ret = PSR_CARRY;
> +    }
>     return ret;
>  }
>
> @@ -1127,8 +1180,9 @@ static inline uint32_t get_V_sub_xcc(target_ulong dst, target_ulong src1,
>  {
>     uint32_t ret = 0;
>
> -    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 63))
> -        ret |= PSR_OVF;
> +    if (((src1 ^ src2) & (src1 ^ dst)) & (1ULL << 63)) {
> +        ret = PSR_OVF;
> +    }
>     return ret;
>  }
>
> @@ -1153,14 +1207,14 @@ static uint32_t compute_all_sub(void)
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
>     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
>
>  static uint32_t compute_C_sub(void)
>  {
> -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return get_C_sub_icc(CC_SRC, CC_SRC2);
>  }
>
>  #ifdef TARGET_SPARC64
> @@ -1169,8 +1223,7 @@ static uint32_t compute_all_subx_xcc(void)
>     uint32_t ret;
>
>     ret = get_NZ_xcc(CC_DST);
> -    ret |= get_C_sub_xcc(CC_DST - CC_SRC2, CC_SRC);
> -    ret |= get_C_sub_xcc(CC_DST, CC_SRC2);
> +    ret |= get_C_subx_xcc(CC_DST, CC_SRC, CC_SRC2);
>     ret |= get_V_sub_xcc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
> @@ -1179,40 +1232,47 @@ static uint32_t compute_C_subx_xcc(void)
>  {
>     uint32_t ret;
>
> -    ret = get_C_sub_xcc(CC_DST - CC_SRC2, CC_SRC);
> -    ret |= get_C_sub_xcc(CC_DST, CC_SRC2);
> +    ret = get_C_subx_xcc(CC_DST, CC_SRC, CC_SRC2);
>     return ret;
>  }
>  #endif
>
> -static uint32_t compute_all_tsub(void)
> +static uint32_t compute_all_subx(void)
>  {
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_subx_icc(CC_DST, CC_SRC, CC_SRC2);
>     ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> -    ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>     return ret;
>  }
>
> -static uint32_t compute_C_tsub(void)
> +static uint32_t compute_C_subx(void)
>  {
> -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    uint32_t ret;
> +
> +    ret = get_C_subx_icc(CC_DST, CC_SRC, CC_SRC2);
> +    return ret;
>  }
>
> -static uint32_t compute_all_tsubtv(void)
> +static uint32_t compute_all_tsub(void)
>  {
>     uint32_t ret;
>
>     ret = get_NZ_icc(CC_DST);
> -    ret |= get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
> +    ret |= get_V_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    ret |= get_V_tag_icc(CC_SRC, CC_SRC2);
>     return ret;
>  }
>
> -static uint32_t compute_C_tsubtv(void)
> +static uint32_t compute_all_tsubtv(void)
>  {
> -    return get_C_sub_icc(CC_DST, CC_SRC, CC_SRC2);
> +    uint32_t ret;
> +
> +    ret = get_NZ_icc(CC_DST);
> +    ret |= get_C_sub_icc(CC_SRC, CC_SRC2);
> +    return ret;
>  }
>
>  static uint32_t compute_all_logic(void)
> @@ -1242,13 +1302,13 @@ static const CCTable icc_table[CC_OP_NB] = {
>     [CC_OP_FLAGS] = { compute_all_flags, compute_C_flags },
>     [CC_OP_DIV] = { compute_all_div, compute_C_div },
>     [CC_OP_ADD] = { compute_all_add, compute_C_add },
> -    [CC_OP_ADDX] = { compute_all_add, compute_C_add },
> -    [CC_OP_TADD] = { compute_all_tadd, compute_C_tadd },
> -    [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_taddtv },
> +    [CC_OP_ADDX] = { compute_all_addx, compute_C_addx },
> +    [CC_OP_TADD] = { compute_all_tadd, compute_C_add },
> +    [CC_OP_TADDTV] = { compute_all_taddtv, compute_C_add },
>     [CC_OP_SUB] = { compute_all_sub, compute_C_sub },
> -    [CC_OP_SUBX] = { compute_all_sub, compute_C_sub },
> -    [CC_OP_TSUB] = { compute_all_tsub, compute_C_tsub },
> -    [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_tsubtv },
> +    [CC_OP_SUBX] = { compute_all_subx, compute_C_subx },
> +    [CC_OP_TSUB] = { compute_all_tsub, compute_C_sub },
> +    [CC_OP_TSUBTV] = { compute_all_tsubtv, compute_C_sub },
>     [CC_OP_LOGIC] = { compute_all_logic, compute_C_logic },
>  };
>
> --
> 1.7.0.1
>
>

This one looks much better and passes the smoke test.
Haven't tested 3/3 yet.

-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] Re: [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX.
  2010-05-12 18:04   ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
@ 2010-05-20 19:59     ` Blue Swirl
  0 siblings, 0 replies; 18+ messages in thread
From: Blue Swirl @ 2010-05-20 19:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, atar4qemu

Thanks, applied.

On Wed, May 12, 2010 at 6:04 PM, Richard Henderson <rth@twiddle.net> wrote:
> Computing carry is trivial for some inputs.  By avoiding an
> external function call, we generate near-optimal code for
> the common cases of add+addx (double-word arithmetic) and
> cmp+addx (a setcc pattern).
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-sparc/helper.h    |    2 +-
>  target-sparc/op_helper.c |    2 +-
>  target-sparc/translate.c |  272 +++++++++++++++++++++++++++++++++-------------
>  3 files changed, 200 insertions(+), 76 deletions(-)
>
> diff --git a/target-sparc/helper.h b/target-sparc/helper.h
> index 04c1306..6f103e7 100644
> --- a/target-sparc/helper.h
> +++ b/target-sparc/helper.h
> @@ -158,6 +158,6 @@ VIS_CMPHELPER(cmpne);
>  #undef VIS_HELPER
>  #undef VIS_CMPHELPER
>  DEF_HELPER_0(compute_psr, void);
> -DEF_HELPER_0(compute_C_icc, tl);
> +DEF_HELPER_0(compute_C_icc, i32);
>
>  #include "def-helper.h"
> diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
> index 3783b02..125cd67 100644
> --- a/target-sparc/op_helper.c
> +++ b/target-sparc/op_helper.c
> @@ -1342,7 +1342,7 @@ void helper_compute_psr(void)
>     CC_OP = CC_OP_FLAGS;
>  }
>
> -target_ulong helper_compute_C_icc(void)
> +uint32_t helper_compute_C_icc(void)
>  {
>     uint32_t ret;
>
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index ea7c71b..713d3e1 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -332,24 +332,132 @@ static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
>     tcg_gen_mov_tl(dst, cpu_cc_dst);
>  }
>
> -static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
> +static TCGv_i32 gen_add32_carry32(void)
>  {
> -    gen_helper_compute_C_icc(cpu_tmp0);
> -    tcg_gen_mov_tl(cpu_cc_src, src1);
> -    tcg_gen_movi_tl(cpu_cc_src2, src2);
> -    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> -    tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
> -    tcg_gen_mov_tl(dst, cpu_cc_dst);
> +    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
> +
> +    /* Carry is computed from a previous add: (dst < src)  */
> +#if TARGET_LONG_BITS == 64
> +    cc_src1_32 = tcg_temp_new_i32();
> +    cc_src2_32 = tcg_temp_new_i32();
> +    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_dst);
> +    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src);
> +#else
> +    cc_src1_32 = cpu_cc_dst;
> +    cc_src2_32 = cpu_cc_src;
> +#endif
> +
> +    carry_32 = tcg_temp_new_i32();
> +    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
> +
> +#if TARGET_LONG_BITS == 64
> +    tcg_temp_free_i32(cc_src1_32);
> +    tcg_temp_free_i32(cc_src2_32);
> +#endif
> +
> +    return carry_32;
>  }
>
> -static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
> +static TCGv_i32 gen_sub32_carry32(void)
>  {
> -    gen_helper_compute_C_icc(cpu_tmp0);
> -    tcg_gen_mov_tl(cpu_cc_src, src1);
> -    tcg_gen_mov_tl(cpu_cc_src2, src2);
> -    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> -    tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
> -    tcg_gen_mov_tl(dst, cpu_cc_dst);
> +    TCGv_i32 carry_32, cc_src1_32, cc_src2_32;
> +
> +    /* Carry is computed from a previous borrow: (src1 < src2)  */
> +#if TARGET_LONG_BITS == 64
> +    cc_src1_32 = tcg_temp_new_i32();
> +    cc_src2_32 = tcg_temp_new_i32();
> +    tcg_gen_trunc_i64_i32(cc_src1_32, cpu_cc_src);
> +    tcg_gen_trunc_i64_i32(cc_src2_32, cpu_cc_src2);
> +#else
> +    cc_src1_32 = cpu_cc_src;
> +    cc_src2_32 = cpu_cc_src2;
> +#endif
> +
> +    carry_32 = tcg_temp_new_i32();
> +    tcg_gen_setcond_i32(TCG_COND_LTU, carry_32, cc_src1_32, cc_src2_32);
> +
> +#if TARGET_LONG_BITS == 64
> +    tcg_temp_free_i32(cc_src1_32);
> +    tcg_temp_free_i32(cc_src2_32);
> +#endif
> +
> +    return carry_32;
> +}
> +
> +static void gen_op_addx_int(DisasContext *dc, TCGv dst, TCGv src1,
> +                            TCGv src2, int update_cc)
> +{
> +    TCGv_i32 carry_32;
> +    TCGv carry;
> +
> +    switch (dc->cc_op) {
> +    case CC_OP_DIV:
> +    case CC_OP_LOGIC:
> +        /* Carry is known to be zero.  Fall back to plain ADD.  */
> +        if (update_cc) {
> +            gen_op_add_cc(dst, src1, src2);
> +        } else {
> +            tcg_gen_add_tl(dst, src1, src2);
> +        }
> +        return;
> +
> +    case CC_OP_ADD:
> +    case CC_OP_TADD:
> +    case CC_OP_TADDTV:
> +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
> +        {
> +            /* For 32-bit hosts, we can re-use the host's hardware carry
> +               generation by using an ADD2 opcode.  We discard the low
> +               part of the output.  Ideally we'd combine this operation
> +               with the add that generated the carry in the first place.  */
> +            TCGv dst_low = tcg_temp_new();
> +            tcg_gen_op6_i32(INDEX_op_add2_i32, dst_low, dst,
> +                            cpu_cc_src, src1, cpu_cc_src2, src2);
> +            tcg_temp_free(dst_low);
> +            goto add_done;
> +        }
> +#endif
> +        carry_32 = gen_add32_carry32();
> +        break;
> +
> +    case CC_OP_SUB:
> +    case CC_OP_TSUB:
> +    case CC_OP_TSUBTV:
> +        carry_32 = gen_sub32_carry32();
> +        break;
> +
> +    default:
> +        /* We need external help to produce the carry.  */
> +        carry_32 = tcg_temp_new_i32();
> +        gen_helper_compute_C_icc(carry_32);
> +        break;
> +    }
> +
> +#if TARGET_LONG_BITS == 64
> +    carry = tcg_temp_new();
> +    tcg_gen_extu_i32_i64(carry, carry_32);
> +#else
> +    carry = carry_32;
> +#endif
> +
> +    tcg_gen_add_tl(dst, src1, src2);
> +    tcg_gen_add_tl(dst, dst, carry);
> +
> +    tcg_temp_free_i32(carry_32);
> +#if TARGET_LONG_BITS == 64
> +    tcg_temp_free(carry);
> +#endif
> +
> +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
> + add_done:
> +#endif
> +    if (update_cc) {
> +        tcg_gen_mov_tl(cpu_cc_src, src1);
> +        tcg_gen_mov_tl(cpu_cc_src2, src2);
> +        tcg_gen_mov_tl(cpu_cc_dst, dst);
> +        tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
> +        dc->cc_op = CC_OP_ADDX;
> +    }
>  }
>
>  static inline void gen_op_tadd_cc(TCGv dst, TCGv src1, TCGv src2)
> @@ -415,24 +523,80 @@ static inline void gen_op_sub_cc(TCGv dst, TCGv src1, TCGv src2)
>     tcg_gen_mov_tl(dst, cpu_cc_dst);
>  }
>
> -static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
> +static void gen_op_subx_int(DisasContext *dc, TCGv dst, TCGv src1,
> +                            TCGv src2, int update_cc)
>  {
> -    gen_helper_compute_C_icc(cpu_tmp0);
> -    tcg_gen_mov_tl(cpu_cc_src, src1);
> -    tcg_gen_movi_tl(cpu_cc_src2, src2);
> -    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> -    tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
> -    tcg_gen_mov_tl(dst, cpu_cc_dst);
> -}
> +    TCGv_i32 carry_32;
> +    TCGv carry;
>
> -static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
> -{
> -    gen_helper_compute_C_icc(cpu_tmp0);
> -    tcg_gen_mov_tl(cpu_cc_src, src1);
> -    tcg_gen_mov_tl(cpu_cc_src2, src2);
> -    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
> -    tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
> -    tcg_gen_mov_tl(dst, cpu_cc_dst);
> +    switch (dc->cc_op) {
> +    case CC_OP_DIV:
> +    case CC_OP_LOGIC:
> +        /* Carry is known to be zero.  Fall back to plain SUB.  */
> +        if (update_cc) {
> +            gen_op_sub_cc(dst, src1, src2);
> +        } else {
> +            tcg_gen_sub_tl(dst, src1, src2);
> +        }
> +        return;
> +
> +    case CC_OP_ADD:
> +    case CC_OP_TADD:
> +    case CC_OP_TADDTV:
> +        carry_32 = gen_add32_carry32();
> +        break;
> +
> +    case CC_OP_SUB:
> +    case CC_OP_TSUB:
> +    case CC_OP_TSUBTV:
> +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
> +        {
> +            /* For 32-bit hosts, we can re-use the host's hardware carry
> +               generation by using a SUB2 opcode.  We discard the low
> +               part of the output.  Ideally we'd combine this operation
> +               with the add that generated the carry in the first place.  */
> +            TCGv dst_low = tcg_temp_new();
> +            tcg_gen_op6_i32(INDEX_op_sub2_i32, dst_low, dst,
> +                            cpu_cc_src, src1, cpu_cc_src2, src2);
> +            tcg_temp_free(dst_low);
> +            goto sub_done;
> +        }
> +#endif
> +        carry_32 = gen_sub32_carry32();
> +        break;
> +
> +    default:
> +        /* We need external help to produce the carry.  */
> +        carry_32 = tcg_temp_new_i32();
> +        gen_helper_compute_C_icc(carry_32);
> +        break;
> +    }
> +
> +#if TARGET_LONG_BITS == 64
> +    carry = tcg_temp_new();
> +    tcg_gen_extu_i32_i64(carry, carry_32);
> +#else
> +    carry = carry_32;
> +#endif
> +
> +    tcg_gen_sub_tl(dst, src1, src2);
> +    tcg_gen_sub_tl(dst, dst, carry);
> +
> +    tcg_temp_free_i32(carry_32);
> +#if TARGET_LONG_BITS == 64
> +    tcg_temp_free(carry);
> +#endif
> +
> +#if TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 32
> + sub_done:
> +#endif
> +    if (update_cc) {
> +        tcg_gen_mov_tl(cpu_cc_src, src1);
> +        tcg_gen_mov_tl(cpu_cc_src2, src2);
> +        tcg_gen_mov_tl(cpu_cc_dst, dst);
> +        tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
> +        dc->cc_op = CC_OP_SUBX;
> +    }
>  }
>
>  static inline void gen_op_tsub_cc(TCGv dst, TCGv src1, TCGv src2)
> @@ -2950,28 +3114,8 @@ static void disas_sparc_insn(DisasContext * dc)
>                         }
>                         break;
>                     case 0x8: /* addx, V9 addc */
> -                        if (IS_IMM) {
> -                            simm = GET_FIELDs(insn, 19, 31);
> -                            if (xop & 0x10) {
> -                                gen_op_addxi_cc(cpu_dst, cpu_src1, simm);
> -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
> -                                dc->cc_op = CC_OP_ADDX;
> -                            } else {
> -                                gen_helper_compute_C_icc(cpu_tmp0);
> -                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
> -                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
> -                            }
> -                        } else {
> -                            if (xop & 0x10) {
> -                                gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2);
> -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
> -                                dc->cc_op = CC_OP_ADDX;
> -                            } else {
> -                                gen_helper_compute_C_icc(cpu_tmp0);
> -                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
> -                                tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
> -                            }
> -                        }
> +                        gen_op_addx_int(dc, cpu_dst, cpu_src1, cpu_src2,
> +                                        (xop & 0x10));
>                         break;
>  #ifdef TARGET_SPARC64
>                     case 0x9: /* V9 mulx */
> @@ -3002,28 +3146,8 @@ static void disas_sparc_insn(DisasContext * dc)
>                         }
>                         break;
>                     case 0xc: /* subx, V9 subc */
> -                        if (IS_IMM) {
> -                            simm = GET_FIELDs(insn, 19, 31);
> -                            if (xop & 0x10) {
> -                                gen_op_subxi_cc(cpu_dst, cpu_src1, simm);
> -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
> -                                dc->cc_op = CC_OP_SUBX;
> -                            } else {
> -                                gen_helper_compute_C_icc(cpu_tmp0);
> -                                tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
> -                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
> -                            }
> -                        } else {
> -                            if (xop & 0x10) {
> -                                gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2);
> -                                tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
> -                                dc->cc_op = CC_OP_SUBX;
> -                            } else {
> -                                gen_helper_compute_C_icc(cpu_tmp0);
> -                                tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
> -                                tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
> -                            }
> -                        }
> +                        gen_op_subx_int(dc, cpu_dst, cpu_src1, cpu_src2,
> +                                        (xop & 0x10));
>                         break;
>  #ifdef TARGET_SPARC64
>                     case 0xd: /* V9 udivx */
> --
> 1.7.0.1
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-05-20 20:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-10 22:23 [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements Richard Henderson
2010-05-10 22:23 ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
2010-05-10 22:23 ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation; fix ADDX carry generation Richard Henderson
2010-05-11 19:14   ` [Qemu-devel] " Blue Swirl
2010-05-11 21:31   ` [Qemu-devel] " Artyom Tarasenko
2010-05-12 14:56     ` Richard Henderson
2010-05-12 15:18       ` Artyom Tarasenko
2010-05-12 17:08         ` Richard Henderson
2010-05-12 15:04     ` Richard Henderson
2010-05-10 22:23 ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
2010-05-11 19:28   ` [Qemu-devel] " Blue Swirl
2010-05-12 14:48     ` Richard Henderson
2010-05-12 18:04 ` [Qemu-devel] [PATCH 0/3] Fix ADDX compilation plus improvements, v2 Richard Henderson
2010-05-12 18:04   ` [Qemu-devel] [PATCH 1/3] target-sparc: Fix compilation with --enable-debug Richard Henderson
2010-05-12 18:04   ` [Qemu-devel] [PATCH 2/3] target-sparc: Simplify ICC generation Richard Henderson
2010-05-14  9:04     ` [Qemu-devel] " Artyom Tarasenko
2010-05-12 18:04   ` [Qemu-devel] [PATCH 3/3] target-sparc: Inline some generation of carry for ADDX/SUBX Richard Henderson
2010-05-20 19:59     ` [Qemu-devel] " Blue Swirl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.