All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
@ 2015-02-19 21:14 Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 01/11] target-arm: Introduce DisasCompare Richard Henderson
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

While doing the mechanics of a previous patch set converting
translators to use to TCGLabel pointers, I was reminded of
several outstanding OPTME comments in the aarch64 translator.

I had started with the csel change, which at first failed and
took quite some time to debug.  See the comment for patch 1.

Since this depends on the outstanding TCGLabel patch set, the
full tree is available at

  git://github.com/rth7680/qemu.git arm-movcond


r~


Richard Henderson (11):
  target-arm: Introduce DisasCompare
  target-arm: Extend NZCF to 64 bits
  target-arm: Handle always condition codes within arm_test_cc
  target-arm: Recognize SXTB, SXTH, SXTW, ASR
  target-arm: Recognize UXTB, UXTH, LSR, LSL
  target-arm: Eliminate unnecessary zero-extend in disas_bitfield
  target-arm: Recognize ROR
  target-arm: Use setcond and movcond for csel
  target-arm: Implement ccmp branchless
  target-arm: Implement fccmp branchless
  target-arm: Implement fcsel with movcond

 target-arm/cpu.h           |  21 +-
 target-arm/helper.c        |  18 +-
 target-arm/translate-a64.c | 688 ++++++++++++++++++++++++++-------------------
 target-arm/translate.c     | 151 ++++++----
 target-arm/translate.h     |   2 -
 5 files changed, 524 insertions(+), 356 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 01/11] target-arm: Introduce DisasCompare
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits Richard Henderson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Splitting arm_gen_test_cc into 3 functions, so that it can be reused
for non-branch TCG comparisons.

Note that this is also a bug fix for aarch64.  At present, we have branches
using the 32-bit (translate.c) versions of cpu_[NZCV]F, but we set the flags
using the 64-bit (translate-a64.c) versions of cpu_[NZCV]F.  From the view
of the TCG code generator, these are unrelated variables.

The bug is hard to see because we currently only read these variables from
branches, and upon reaching a branch TCG will first spill live variables and
then reload the arguments of the branch.  Since the 32-bit versions were
never live until reaching the branch, we'd re-read the data that had just
been spilled from the 64-bit versions.

Accept the code duplication for now, as the 64-bit functions will diverge.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 102 +++++++++++++++++++++++++++++++++++++++
 target-arm/translate.c     | 116 +++++++++++++++++++++++++++------------------
 target-arm/translate.h     |   2 -
 3 files changed, 172 insertions(+), 48 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 0b192a1..dbca12a 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1036,6 +1036,108 @@ static inline void gen_check_sp_alignment(DisasContext *s)
      */
 }
 
+typedef struct DisasCompare {
+    TCGCond cond;
+    TCGv_i32 value;
+    bool value_global;
+} DisasCompare;
+
+/*
+ * generate a conditional based on ARM condition code cc.
+ * This is common between ARM and Aarch64 targets.
+ */
+static void arm_test_cc(DisasCompare *cmp, int cc)
+{
+    TCGv_i32 value;
+    TCGCond cond;
+    bool global = true;
+
+    switch (cc) {
+    case 0: /* eq: Z */
+    case 1: /* ne: !Z */
+        cond = TCG_COND_EQ;
+        value = cpu_ZF;
+        break;
+
+    case 2: /* cs: C */
+    case 3: /* cc: !C */
+        cond = TCG_COND_NE;
+        value = cpu_CF;
+        break;
+
+    case 4: /* mi: N */
+    case 5: /* pl: !N */
+        cond = TCG_COND_LT;
+        value = cpu_NF;
+        break;
+
+    case 6: /* vs: V */
+    case 7: /* vc: !V */
+        cond = TCG_COND_LT;
+        value = cpu_VF;
+        break;
+
+    case 8: /* hi: C && !Z */
+    case 9: /* ls: !C || Z */
+        cond = TCG_COND_NE;
+        value = tcg_temp_new_i32();
+        global = false;
+        tcg_gen_neg_i32(value, cpu_CF);
+        tcg_gen_and_i32(value, value, cpu_ZF);
+        break;
+
+    case 10: /* ge: N == V -> N ^ V == 0 */
+    case 11: /* lt: N != V -> N ^ V != 0 */
+        cond = TCG_COND_GE;
+        value = tcg_temp_new_i32();
+        global = false;
+        tcg_gen_xor_i32(value, cpu_VF, cpu_NF);
+        break;
+
+    case 12: /* gt: !Z && N == V */
+    case 13: /* le: Z || N != V */
+        cond = TCG_COND_NE;
+        value = tcg_temp_new_i32();
+        global = false;
+        tcg_gen_xor_i32(value, cpu_VF, cpu_NF);
+        tcg_gen_sari_i32(value, value, 31);
+        tcg_gen_andc_i32(value, cpu_ZF, value);
+        break;
+
+    default:
+        fprintf(stderr, "Bad condition code 0x%x\n", cc);
+        abort();
+    }
+
+    if (cc & 1) {
+        cond = tcg_invert_cond(cond);
+    }
+
+    cmp->cond = cond;
+    cmp->value = value;
+    cmp->value_global = global;
+}
+
+static void arm_free_cc(DisasCompare *cmp)
+{
+    if (!cmp->value_global) {
+        tcg_temp_free_i32(cmp->value);
+    }
+}
+
+static void arm_jump_cc(DisasCompare *cmp, TCGLabel *label)
+{
+    tcg_gen_brcondi_i32(cmp->cond, cmp->value, 0, label);
+}
+
+static void arm_gen_test_cc(int cc, TCGLabel *label)
+{
+    DisasCompare cmp;
+    arm_test_cc(&cmp, cc);
+    arm_jump_cc(&cmp, label);
+    arm_free_cc(&cmp);
+}
+
 /*
  * This provides a simple table based table lookup decoder. It is
  * intended to be used when the relevant bits for decode are too
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 381d896..dd4d80f 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -732,82 +732,106 @@ static void gen_thumb2_parallel_addsub(int op1, int op2, TCGv_i32 a, TCGv_i32 b)
 }
 #undef PAS_OP
 
+typedef struct DisasCompare {
+    TCGCond cond;
+    TCGv_i32 value;
+    bool value_global;
+} DisasCompare;
+
 /*
- * generate a conditional branch based on ARM condition code cc.
+ * generate a conditional based on ARM condition code cc.
  * This is common between ARM and Aarch64 targets.
  */
-void arm_gen_test_cc(int cc, TCGLabel *label)
+static void arm_test_cc(DisasCompare *cmp, int cc)
 {
-    TCGv_i32 tmp;
-    TCGLabel *inv;
+    TCGv_i32 value;
+    TCGCond cond;
+    bool global = true;
 
     switch (cc) {
     case 0: /* eq: Z */
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_ZF, 0, label);
-        break;
     case 1: /* ne: !Z */
-        tcg_gen_brcondi_i32(TCG_COND_NE, cpu_ZF, 0, label);
+        cond = TCG_COND_EQ;
+        value = cpu_ZF;
         break;
+
     case 2: /* cs: C */
-        tcg_gen_brcondi_i32(TCG_COND_NE, cpu_CF, 0, label);
-        break;
     case 3: /* cc: !C */
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_CF, 0, label);
+        cond = TCG_COND_NE;
+        value = cpu_CF;
         break;
+
     case 4: /* mi: N */
-        tcg_gen_brcondi_i32(TCG_COND_LT, cpu_NF, 0, label);
-        break;
     case 5: /* pl: !N */
-        tcg_gen_brcondi_i32(TCG_COND_GE, cpu_NF, 0, label);
+        cond = TCG_COND_LT;
+        value = cpu_NF;
         break;
+
     case 6: /* vs: V */
-        tcg_gen_brcondi_i32(TCG_COND_LT, cpu_VF, 0, label);
-        break;
     case 7: /* vc: !V */
-        tcg_gen_brcondi_i32(TCG_COND_GE, cpu_VF, 0, label);
+        cond = TCG_COND_LT;
+        value = cpu_VF;
         break;
+
     case 8: /* hi: C && !Z */
-        inv = gen_new_label();
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_CF, 0, inv);
-        tcg_gen_brcondi_i32(TCG_COND_NE, cpu_ZF, 0, label);
-        gen_set_label(inv);
-        break;
-    case 9: /* ls: !C || Z */
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_CF, 0, label);
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_ZF, 0, label);
+    case 9: /* ls: !C || Z -> !(C && !Z) */
+        cond = TCG_COND_NE;
+        value = tcg_temp_new_i32();
+        global = false;
+        tcg_gen_neg_i32(value, cpu_CF);
+        tcg_gen_and_i32(value, value, cpu_ZF);
         break;
+
     case 10: /* ge: N == V -> N ^ V == 0 */
-        tmp = tcg_temp_new_i32();
-        tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-        tcg_gen_brcondi_i32(TCG_COND_GE, tmp, 0, label);
-        tcg_temp_free_i32(tmp);
-        break;
     case 11: /* lt: N != V -> N ^ V != 0 */
-        tmp = tcg_temp_new_i32();
-        tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-        tcg_gen_brcondi_i32(TCG_COND_LT, tmp, 0, label);
-        tcg_temp_free_i32(tmp);
+        cond = TCG_COND_GE;
+        value = tcg_temp_new_i32();
+        global = false;
+        tcg_gen_xor_i32(value, cpu_VF, cpu_NF);
         break;
+
     case 12: /* gt: !Z && N == V */
-        inv = gen_new_label();
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_ZF, 0, inv);
-        tmp = tcg_temp_new_i32();
-        tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-        tcg_gen_brcondi_i32(TCG_COND_GE, tmp, 0, label);
-        tcg_temp_free_i32(tmp);
-        gen_set_label(inv);
-        break;
     case 13: /* le: Z || N != V */
-        tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_ZF, 0, label);
-        tmp = tcg_temp_new_i32();
-        tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-        tcg_gen_brcondi_i32(TCG_COND_LT, tmp, 0, label);
-        tcg_temp_free_i32(tmp);
+        cond = TCG_COND_NE;
+        value = tcg_temp_new_i32();
+        global = false;
+        tcg_gen_xor_i32(value, cpu_VF, cpu_NF);
+        tcg_gen_sari_i32(value, value, 31);
+        tcg_gen_andc_i32(value, cpu_ZF, value);
         break;
+
     default:
         fprintf(stderr, "Bad condition code 0x%x\n", cc);
         abort();
     }
+
+    if (cc & 1) {
+        cond = tcg_invert_cond(cond);
+    }
+
+    cmp->cond = cond;
+    cmp->value = value;
+    cmp->value_global = global;
+}
+
+static void arm_free_cc(DisasCompare *cmp)
+{
+    if (!cmp->value_global) {
+        tcg_temp_free_i32(cmp->value);
+    }
+}
+
+static void arm_jump_cc(DisasCompare *cmp, TCGLabel *label)
+{
+    tcg_gen_brcondi_i32(cmp->cond, cmp->value, 0, label);
+}
+
+static void arm_gen_test_cc(int cc, TCGLabel *label)
+{
+    DisasCompare cmp;
+    arm_test_cc(&cmp, cc);
+    arm_jump_cc(&cmp, label);
+    arm_free_cc(&cmp);
 }
 
 static const uint8_t table_logic_cc[16] = {
diff --git a/target-arm/translate.h b/target-arm/translate.h
index 9829576..e6c7048 100644
--- a/target-arm/translate.h
+++ b/target-arm/translate.h
@@ -119,6 +119,4 @@ static inline void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
 }
 #endif
 
-void arm_gen_test_cc(int cc, TCGLabel *label);
-
 #endif /* TARGET_ARM_TRANSLATE_H */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 01/11] target-arm: Introduce DisasCompare Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-03-10 16:08   ` Peter Maydell
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 03/11] target-arm: Handle always condition codes within arm_test_cc Richard Henderson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The resulting aarch64 translation is a bit cleaner.
Sign-extending from 32-bits is simpler than having
to use setcond to narrow from 64-bits.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/cpu.h           |  21 ++--
 target-arm/helper.c        |  18 ++-
 target-arm/translate-a64.c | 297 ++++++++++++++++++---------------------------
 target-arm/translate.c     |  26 +++-
 4 files changed, 163 insertions(+), 199 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 11845a6..74835f4 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -167,10 +167,10 @@ typedef struct CPUARMState {
     uint32_t fiq_regs[5];
 
     /* cpsr flag cache for faster execution */
-    uint32_t CF; /* 0 or 1 */
-    uint32_t VF; /* V is the bit 31. All other bits are undefined */
-    uint32_t NF; /* N is bit 31. All other bits are undefined.  */
-    uint32_t ZF; /* Z set if zero.  */
+    uint64_t CF; /* 0 or 1 */
+    uint64_t VF; /* V is the bit 63. All other bits are undefined */
+    uint64_t NF; /* N is bit 63. All other bits are undefined.  */
+    uint64_t ZF; /* Z set if zero.  */
     uint32_t QF; /* 0 or 1 */
     uint32_t GE; /* cpsr[19:16] */
     uint32_t thumb; /* cpsr[5]. 0 = arm mode, 1 = thumb mode. */
@@ -666,20 +666,21 @@ static inline unsigned int aarch64_pstate_mode(unsigned int el, bool handler)
  */
 static inline uint32_t pstate_read(CPUARMState *env)
 {
-    int ZF;
+    unsigned ZF = (env->ZF == 0);
+    unsigned NF = ((int64_t)env->NF < 0);
+    unsigned VF = ((int64_t)env->VF < 0);
+    unsigned CF = env->CF;
 
-    ZF = (env->ZF == 0);
-    return (env->NF & 0x80000000) | (ZF << 30)
-        | (env->CF << 29) | ((env->VF & 0x80000000) >> 3)
+    return (NF << 31) | (ZF << 30) | (CF << 29) | (VF << 28)
         | env->pstate | env->daif;
 }
 
 static inline void pstate_write(CPUARMState *env, uint32_t val)
 {
+    env->NF = (uint64_t)val << 32;
     env->ZF = (~val) & PSTATE_Z;
-    env->NF = val;
     env->CF = (val >> 29) & 1;
-    env->VF = (val << 3) & 0x80000000;
+    env->VF = (uint64_t)val << (32 + (31 - 28));
     env->daif = val & PSTATE_DAIF;
     env->pstate = val & ~CACHED_PSTATE_BITS;
 }
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3bc20af..1b28108 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3876,10 +3876,13 @@ static int bad_mode_switch(CPUARMState *env, int mode)
 
 uint32_t cpsr_read(CPUARMState *env)
 {
-    int ZF;
-    ZF = (env->ZF == 0);
-    return env->uncached_cpsr | (env->NF & 0x80000000) | (ZF << 30) |
-        (env->CF << 29) | ((env->VF & 0x80000000) >> 3) | (env->QF << 27)
+    unsigned ZF = (env->ZF == 0);
+    unsigned NF = ((int64_t)env->NF < 0);
+    unsigned VF = ((int64_t)env->VF < 0);
+    unsigned CF = env->CF;
+
+    return env->uncached_cpsr | (NF << 31) | (ZF << 30)
+        | (CF << 29) | (VF << 28) | (env->QF << 27)
         | (env->thumb << 5) | ((env->condexec_bits & 3) << 25)
         | ((env->condexec_bits & 0xfc) << 8)
         | (env->GE << 16) | (env->daif & CPSR_AIF);
@@ -3890,10 +3893,10 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
     uint32_t changed_daif;
 
     if (mask & CPSR_NZCV) {
+        env->NF = (uint64_t)val << 32;
         env->ZF = (~val) & CPSR_Z;
-        env->NF = val;
         env->CF = (val >> 29) & 1;
-        env->VF = (val << 3) & 0x80000000;
+        env->VF = (uint64_t)val << (32 + (31 - 28));
     }
     if (mask & CPSR_Q)
         env->QF = ((val & CPSR_Q) != 0);
@@ -4545,6 +4548,9 @@ void aarch64_sync_64_to_32(CPUARMState *env)
         env->regs[i] = env->xregs[i];
     }
 
+    /* Need to compress Z into the low bits.  */
+    env->ZF = (env->ZF != 0);
+
     /* Unless we are in FIQ mode, r8-r12 come from the user registers x8-x12.
      * Otherwise, we copy x8-x12 into the banked user regs.
      */
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index dbca12a..763bf35 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -39,7 +39,7 @@
 
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
-static TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
+static TCGv_i64 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
 
 /* Load/store exclusive handling */
 static TCGv_i64 cpu_exclusive_addr;
@@ -104,10 +104,10 @@ void a64_translate_init(void)
                                           regnames[i]);
     }
 
-    cpu_NF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, NF), "NF");
-    cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
-    cpu_CF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, CF), "CF");
-    cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
+    cpu_NF = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, NF), "NF");
+    cpu_ZF = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
+    cpu_CF = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, CF), "CF");
+    cpu_VF = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
 
     cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
@@ -515,78 +515,55 @@ static TCGv_ptr get_fpstatus_ptr(void)
     return statusptr;
 }
 
-/* Set ZF and NF based on a 64 bit result. This is alas fiddlier
- * than the 32 bit equivalent.
- */
-static inline void gen_set_NZ64(TCGv_i64 result)
-{
-    TCGv_i64 flag = tcg_temp_new_i64();
-
-    tcg_gen_setcondi_i64(TCG_COND_NE, flag, result, 0);
-    tcg_gen_trunc_i64_i32(cpu_ZF, flag);
-    tcg_gen_shri_i64(flag, result, 32);
-    tcg_gen_trunc_i64_i32(cpu_NF, flag);
-    tcg_temp_free_i64(flag);
-}
-
 /* Set NZCV as for a logical operation: NZ as per result, CV cleared. */
 static inline void gen_logic_CC(int sf, TCGv_i64 result)
 {
     if (sf) {
-        gen_set_NZ64(result);
+        tcg_gen_mov_i64(cpu_NF, result);
+        tcg_gen_mov_i64(cpu_ZF, result);
     } else {
-        tcg_gen_trunc_i64_i32(cpu_ZF, result);
-        tcg_gen_trunc_i64_i32(cpu_NF, result);
+        tcg_gen_ext32s_i64(cpu_NF, result);
+        tcg_gen_mov_i64(cpu_ZF, cpu_NF);
     }
-    tcg_gen_movi_i32(cpu_CF, 0);
-    tcg_gen_movi_i32(cpu_VF, 0);
+    tcg_gen_movi_i64(cpu_CF, 0);
+    tcg_gen_movi_i64(cpu_VF, 0);
 }
 
 /* dest = T0 + T1; compute C, N, V and Z flags */
 static void gen_add_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 {
     if (sf) {
-        TCGv_i64 result, flag, tmp;
-        result = tcg_temp_new_i64();
-        flag = tcg_temp_new_i64();
-        tmp = tcg_temp_new_i64();
+        TCGv_i64 tmp = tcg_temp_new_i64();
 
         tcg_gen_movi_i64(tmp, 0);
-        tcg_gen_add2_i64(result, flag, t0, tmp, t1, tmp);
-
-        tcg_gen_trunc_i64_i32(cpu_CF, flag);
-
-        gen_set_NZ64(result);
-
-        tcg_gen_xor_i64(flag, result, t0);
+        tcg_gen_add2_i64(cpu_NF, cpu_CF, t0, tmp, t1, tmp);
+        tcg_gen_mov_i64(cpu_ZF, cpu_NF);
+        tcg_gen_xor_i64(cpu_VF, cpu_NF, t0);
         tcg_gen_xor_i64(tmp, t0, t1);
-        tcg_gen_andc_i64(flag, flag, tmp);
+        tcg_gen_andc_i64(cpu_VF, cpu_VF, tmp);
         tcg_temp_free_i64(tmp);
-        tcg_gen_shri_i64(flag, flag, 32);
-        tcg_gen_trunc_i64_i32(cpu_VF, flag);
-
-        tcg_gen_mov_i64(dest, result);
-        tcg_temp_free_i64(result);
-        tcg_temp_free_i64(flag);
+        tcg_gen_mov_i64(dest, cpu_NF);
     } else {
         /* 32 bit arithmetic */
-        TCGv_i32 t0_32 = tcg_temp_new_i32();
-        TCGv_i32 t1_32 = tcg_temp_new_i32();
-        TCGv_i32 tmp = tcg_temp_new_i32();
-
-        tcg_gen_movi_i32(tmp, 0);
-        tcg_gen_trunc_i64_i32(t0_32, t0);
-        tcg_gen_trunc_i64_i32(t1_32, t1);
-        tcg_gen_add2_i32(cpu_NF, cpu_CF, t0_32, tmp, t1_32, tmp);
-        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
-        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
-        tcg_gen_xor_i32(tmp, t0_32, t1_32);
-        tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
-        tcg_gen_extu_i32_i64(dest, cpu_NF);
-
-        tcg_temp_free_i32(tmp);
-        tcg_temp_free_i32(t0_32);
-        tcg_temp_free_i32(t1_32);
+        TCGv_i64 t0_32 = tcg_temp_new_i64();
+        TCGv_i64 t1_32 = tcg_temp_new_i64();
+        TCGv_i64 tmp;
+
+        tcg_gen_ext32u_i64(t0_32, t0);
+        tcg_gen_ext32u_i64(t1_32, t1);
+        tcg_gen_add_i64(cpu_NF, t0_32, t1_32);
+        tcg_gen_shri_i64(cpu_CF, cpu_NF, 32);
+        tcg_gen_xor_i64(cpu_VF, cpu_NF, t0_32);
+        tmp = tcg_temp_new_i64();
+        tcg_gen_xor_i64(tmp, t0_32, t1_32);
+        tcg_gen_andc_i64(cpu_VF, cpu_VF, tmp);
+        tcg_temp_free_i64(tmp);
+        tcg_temp_free_i64(t0_32);
+        tcg_temp_free_i64(t1_32);
+        tcg_gen_ext32u_i64(cpu_ZF, cpu_NF);
+        tcg_gen_ext32s_i64(cpu_NF, cpu_NF);
+        tcg_gen_ext32s_i64(cpu_VF, cpu_VF);
+        tcg_gen_mov_i64(dest, cpu_ZF);
     }
 }
 
@@ -595,58 +572,46 @@ static void gen_sub_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 {
     if (sf) {
         /* 64 bit arithmetic */
-        TCGv_i64 result, flag, tmp;
-
-        result = tcg_temp_new_i64();
-        flag = tcg_temp_new_i64();
-        tcg_gen_sub_i64(result, t0, t1);
-
-        gen_set_NZ64(result);
-
-        tcg_gen_setcond_i64(TCG_COND_GEU, flag, t0, t1);
-        tcg_gen_trunc_i64_i32(cpu_CF, flag);
+        TCGv_i64 tmp;
 
-        tcg_gen_xor_i64(flag, result, t0);
+        tcg_gen_sub_i64(cpu_NF, t0, t1);
+        tcg_gen_mov_i64(cpu_ZF, cpu_NF);
+        tcg_gen_setcond_i64(TCG_COND_GEU, cpu_CF, t0, t1);
+        tcg_gen_xor_i64(cpu_VF, cpu_NF, t0);
         tmp = tcg_temp_new_i64();
         tcg_gen_xor_i64(tmp, t0, t1);
-        tcg_gen_and_i64(flag, flag, tmp);
+        tcg_gen_and_i64(cpu_VF, cpu_VF, tmp);
         tcg_temp_free_i64(tmp);
-        tcg_gen_shri_i64(flag, flag, 32);
-        tcg_gen_trunc_i64_i32(cpu_VF, flag);
-        tcg_gen_mov_i64(dest, result);
-        tcg_temp_free_i64(flag);
-        tcg_temp_free_i64(result);
+        tcg_gen_mov_i64(dest, cpu_NF);
     } else {
         /* 32 bit arithmetic */
-        TCGv_i32 t0_32 = tcg_temp_new_i32();
-        TCGv_i32 t1_32 = tcg_temp_new_i32();
-        TCGv_i32 tmp;
-
-        tcg_gen_trunc_i64_i32(t0_32, t0);
-        tcg_gen_trunc_i64_i32(t1_32, t1);
-        tcg_gen_sub_i32(cpu_NF, t0_32, t1_32);
-        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
-        tcg_gen_setcond_i32(TCG_COND_GEU, cpu_CF, t0_32, t1_32);
-        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
-        tmp = tcg_temp_new_i32();
-        tcg_gen_xor_i32(tmp, t0_32, t1_32);
-        tcg_temp_free_i32(t0_32);
-        tcg_temp_free_i32(t1_32);
-        tcg_gen_and_i32(cpu_VF, cpu_VF, tmp);
-        tcg_temp_free_i32(tmp);
-        tcg_gen_extu_i32_i64(dest, cpu_NF);
+        TCGv_i64 t0_32 = tcg_temp_new_i64();
+        TCGv_i64 t1_32 = tcg_temp_new_i64();
+        TCGv_i64 tmp;
+
+        tcg_gen_ext32u_i64(t0_32, t0);
+        tcg_gen_ext32u_i64(t1_32, t1);
+        tcg_gen_sub_i64(cpu_NF, t0_32, t1_32);
+        tcg_gen_setcond_i64(TCG_COND_GEU, cpu_CF, t0_32, t1_32);
+        tcg_gen_xor_i64(cpu_VF, cpu_NF, t0_32);
+        tmp = tcg_temp_new_i64();
+        tcg_gen_xor_i64(tmp, t0_32, t1_32);
+        tcg_gen_and_i64(cpu_VF, cpu_VF, tmp);
+        tcg_temp_free_i64(tmp);
+        tcg_temp_free_i64(t0_32);
+        tcg_temp_free_i64(t1_32);
+        tcg_gen_ext32u_i64(cpu_ZF, cpu_NF);
+        tcg_gen_ext32s_i64(cpu_NF, cpu_NF);
+        tcg_gen_ext32s_i64(cpu_VF, cpu_VF);
+        tcg_gen_mov_i64(dest, cpu_ZF);
     }
 }
 
 /* dest = T0 + T1 + CF; do not compute flags. */
 static void gen_adc(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 {
-    TCGv_i64 flag = tcg_temp_new_i64();
-    tcg_gen_extu_i32_i64(flag, cpu_CF);
     tcg_gen_add_i64(dest, t0, t1);
-    tcg_gen_add_i64(dest, dest, flag);
-    tcg_temp_free_i64(flag);
-
+    tcg_gen_add_i64(dest, dest, cpu_CF);
     if (!sf) {
         tcg_gen_ext32u_i64(dest, dest);
     }
@@ -656,50 +621,37 @@ static void gen_adc(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 static void gen_adc_CC(int sf, TCGv_i64 dest, TCGv_i64 t0, TCGv_i64 t1)
 {
     if (sf) {
-        TCGv_i64 result, cf_64, vf_64, tmp;
-        result = tcg_temp_new_i64();
-        cf_64 = tcg_temp_new_i64();
-        vf_64 = tcg_temp_new_i64();
-        tmp = tcg_const_i64(0);
-
-        tcg_gen_extu_i32_i64(cf_64, cpu_CF);
-        tcg_gen_add2_i64(result, cf_64, t0, tmp, cf_64, tmp);
-        tcg_gen_add2_i64(result, cf_64, result, cf_64, t1, tmp);
-        tcg_gen_trunc_i64_i32(cpu_CF, cf_64);
-        gen_set_NZ64(result);
-
-        tcg_gen_xor_i64(vf_64, result, t0);
-        tcg_gen_xor_i64(tmp, t0, t1);
-        tcg_gen_andc_i64(vf_64, vf_64, tmp);
-        tcg_gen_shri_i64(vf_64, vf_64, 32);
-        tcg_gen_trunc_i64_i32(cpu_VF, vf_64);
-
-        tcg_gen_mov_i64(dest, result);
+        TCGv_i64 tmp = tcg_const_i64(0);
 
+        tcg_gen_add2_i64(cpu_NF, cpu_CF, t0, tmp, cpu_CF, tmp);
+        tcg_gen_add2_i64(cpu_NF, cpu_CF, cpu_NF, cpu_CF, t1, tmp);
+        tcg_gen_mov_i64(cpu_ZF, cpu_NF);
+        tcg_gen_xor_i64(cpu_VF, cpu_NF, t0);
+        tcg_gen_xor_i64(tmp, t0, t1);
+        tcg_gen_andc_i64(cpu_VF, cpu_VF, tmp);
         tcg_temp_free_i64(tmp);
-        tcg_temp_free_i64(vf_64);
-        tcg_temp_free_i64(cf_64);
-        tcg_temp_free_i64(result);
+        tcg_gen_mov_i64(dest, cpu_NF);
     } else {
-        TCGv_i32 t0_32, t1_32, tmp;
-        t0_32 = tcg_temp_new_i32();
-        t1_32 = tcg_temp_new_i32();
-        tmp = tcg_const_i32(0);
-
-        tcg_gen_trunc_i64_i32(t0_32, t0);
-        tcg_gen_trunc_i64_i32(t1_32, t1);
-        tcg_gen_add2_i32(cpu_NF, cpu_CF, t0_32, tmp, cpu_CF, tmp);
-        tcg_gen_add2_i32(cpu_NF, cpu_CF, cpu_NF, cpu_CF, t1_32, tmp);
-
-        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
-        tcg_gen_xor_i32(cpu_VF, cpu_NF, t0_32);
-        tcg_gen_xor_i32(tmp, t0_32, t1_32);
-        tcg_gen_andc_i32(cpu_VF, cpu_VF, tmp);
-        tcg_gen_extu_i32_i64(dest, cpu_NF);
-
-        tcg_temp_free_i32(tmp);
-        tcg_temp_free_i32(t1_32);
-        tcg_temp_free_i32(t0_32);
+        TCGv_i64 t0_32 = tcg_temp_new_i64();
+        TCGv_i64 t1_32 = tcg_temp_new_i64();
+        TCGv_i64 tmp;
+
+        tcg_gen_ext32u_i64(t0_32, t0);
+        tcg_gen_ext32u_i64(t1_32, t1);
+        tcg_gen_add_i64(cpu_NF, t0_32, cpu_CF);
+        tcg_gen_add_i64(cpu_NF, cpu_NF, t1_32);
+        tcg_gen_shri_i64(cpu_CF, cpu_NF, 32);
+        tcg_gen_xor_i64(cpu_VF, cpu_NF, t0_32);
+        tmp = tcg_temp_new_i64();
+        tcg_gen_xor_i64(tmp, t0_32, t1_32);
+        tcg_gen_andc_i64(cpu_VF, cpu_VF, tmp);
+        tcg_temp_free_i64(tmp);
+        tcg_temp_free_i64(t0_32);
+        tcg_temp_free_i64(t1_32);
+        tcg_gen_ext32u_i64(cpu_ZF, cpu_NF);
+        tcg_gen_ext32s_i64(cpu_NF, cpu_NF);
+        tcg_gen_ext32s_i64(cpu_VF, cpu_VF);
+        tcg_gen_mov_i64(dest, cpu_ZF);
     }
 }
 
@@ -1038,7 +990,7 @@ static inline void gen_check_sp_alignment(DisasContext *s)
 
 typedef struct DisasCompare {
     TCGCond cond;
-    TCGv_i32 value;
+    TCGv_i64 value;
     bool value_global;
 } DisasCompare;
 
@@ -1048,7 +1000,7 @@ typedef struct DisasCompare {
  */
 static void arm_test_cc(DisasCompare *cmp, int cc)
 {
-    TCGv_i32 value;
+    TCGv_i64 value;
     TCGCond cond;
     bool global = true;
 
@@ -1080,28 +1032,28 @@ static void arm_test_cc(DisasCompare *cmp, int cc)
     case 8: /* hi: C && !Z */
     case 9: /* ls: !C || Z */
         cond = TCG_COND_NE;
-        value = tcg_temp_new_i32();
+        value = tcg_temp_new_i64();
         global = false;
-        tcg_gen_neg_i32(value, cpu_CF);
-        tcg_gen_and_i32(value, value, cpu_ZF);
+        tcg_gen_neg_i64(value, cpu_CF);
+        tcg_gen_and_i64(value, value, cpu_ZF);
         break;
 
     case 10: /* ge: N == V -> N ^ V == 0 */
     case 11: /* lt: N != V -> N ^ V != 0 */
         cond = TCG_COND_GE;
-        value = tcg_temp_new_i32();
+        value = tcg_temp_new_i64();
         global = false;
-        tcg_gen_xor_i32(value, cpu_VF, cpu_NF);
+        tcg_gen_xor_i64(value, cpu_VF, cpu_NF);
         break;
 
     case 12: /* gt: !Z && N == V */
     case 13: /* le: Z || N != V */
         cond = TCG_COND_NE;
-        value = tcg_temp_new_i32();
+        value = tcg_temp_new_i64();
         global = false;
-        tcg_gen_xor_i32(value, cpu_VF, cpu_NF);
-        tcg_gen_sari_i32(value, value, 31);
-        tcg_gen_andc_i32(value, cpu_ZF, value);
+        tcg_gen_xor_i64(value, cpu_VF, cpu_NF);
+        tcg_gen_sari_i64(value, value, 63);
+        tcg_gen_andc_i64(value, cpu_ZF, value);
         break;
 
     default:
@@ -1121,13 +1073,13 @@ static void arm_test_cc(DisasCompare *cmp, int cc)
 static void arm_free_cc(DisasCompare *cmp)
 {
     if (!cmp->value_global) {
-        tcg_temp_free_i32(cmp->value);
+        tcg_temp_free_i64(cmp->value);
     }
 }
 
 static void arm_jump_cc(DisasCompare *cmp, TCGLabel *label)
 {
-    tcg_gen_brcondi_i32(cmp->cond, cmp->value, 0, label);
+    tcg_gen_brcondi_i64(cmp->cond, cmp->value, 0, label);
 }
 
 static void arm_gen_test_cc(int cc, TCGLabel *label)
@@ -1369,46 +1321,35 @@ static void handle_msr_i(DisasContext *s, uint32_t insn,
 
 static void gen_get_nzcv(TCGv_i64 tcg_rt)
 {
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    TCGv_i32 nzcv = tcg_temp_new_i32();
+    TCGv_i64 tmp = tcg_temp_new_i64();
 
     /* build bit 31, N */
-    tcg_gen_andi_i32(nzcv, cpu_NF, (1U << 31));
+    tcg_gen_shri_i64(tmp, cpu_NF, 63);
+    tcg_gen_shli_i64(tcg_rt, tmp, 31);
     /* build bit 30, Z */
-    tcg_gen_setcondi_i32(TCG_COND_EQ, tmp, cpu_ZF, 0);
-    tcg_gen_deposit_i32(nzcv, nzcv, tmp, 30, 1);
+    tcg_gen_setcondi_i64(TCG_COND_EQ, tmp, cpu_ZF, 0);
+    tcg_gen_deposit_i64(tcg_rt, tcg_rt, tmp, 30, 1);
     /* build bit 29, C */
-    tcg_gen_deposit_i32(nzcv, nzcv, cpu_CF, 29, 1);
+    tcg_gen_deposit_i64(tcg_rt, tcg_rt, cpu_CF, 29, 1);
     /* build bit 28, V */
-    tcg_gen_shri_i32(tmp, cpu_VF, 31);
-    tcg_gen_deposit_i32(nzcv, nzcv, tmp, 28, 1);
-    /* generate result */
-    tcg_gen_extu_i32_i64(tcg_rt, nzcv);
+    tcg_gen_shri_i64(tmp, cpu_VF, 63);
+    tcg_gen_deposit_i64(tcg_rt, tcg_rt, tmp, 28, 1);
 
-    tcg_temp_free_i32(nzcv);
-    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(tmp);
 }
 
 static void gen_set_nzcv(TCGv_i64 tcg_rt)
-
 {
-    TCGv_i32 nzcv = tcg_temp_new_i32();
-
-    /* take NZCV from R[t] */
-    tcg_gen_trunc_i64_i32(nzcv, tcg_rt);
-
     /* bit 31, N */
-    tcg_gen_andi_i32(cpu_NF, nzcv, (1U << 31));
+    tcg_gen_shli_i64(cpu_NF, tcg_rt, 32);
     /* bit 30, Z */
-    tcg_gen_andi_i32(cpu_ZF, nzcv, (1 << 30));
-    tcg_gen_setcondi_i32(TCG_COND_EQ, cpu_ZF, cpu_ZF, 0);
+    tcg_gen_not_i64(cpu_ZF, tcg_rt);
+    tcg_gen_andi_i64(cpu_ZF, cpu_ZF, 1 << 30);
     /* bit 29, C */
-    tcg_gen_andi_i32(cpu_CF, nzcv, (1 << 29));
-    tcg_gen_shri_i32(cpu_CF, cpu_CF, 29);
+    tcg_gen_shri_i64(cpu_CF, tcg_rt, 29);
+    tcg_gen_andi_i64(cpu_CF, cpu_CF, 1);
     /* bit 28, V */
-    tcg_gen_andi_i32(cpu_VF, nzcv, (1 << 28));
-    tcg_gen_shli_i32(cpu_VF, cpu_VF, 3);
-    tcg_temp_free_i32(nzcv);
+    tcg_gen_shli_i64(cpu_VF, tcg_rt, 32 + (31 - 28));
 }
 
 /* C5.6.129 MRS - move from system register
diff --git a/target-arm/translate.c b/target-arm/translate.c
index dd4d80f..0d0a4d1 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -85,7 +85,7 @@ static const char *regnames[] =
 /* initialize TCG globals.  */
 void arm_translate_init(void)
 {
-    int i;
+    int i, be, le;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
 
@@ -94,10 +94,26 @@ void arm_translate_init(void)
                                           offsetof(CPUARMState, regs[i]),
                                           regnames[i]);
     }
-    cpu_CF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, CF), "CF");
-    cpu_NF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, NF), "NF");
-    cpu_VF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, VF), "VF");
-    cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUARMState, ZF), "ZF");
+
+#ifdef HOST_WORDS_BIGENDIAN
+    be = 0;
+    le = 4;
+#else
+    le = 0;
+    be = 4;
+#endif
+
+    /* Place CF and ZF at the low end of the 64-bit variable, and NF
+       and VF at the high end.  The other halves are ignore(able) in
+       32-bit mode and synced during mode transition.  */
+    cpu_CF = tcg_global_mem_new_i32(TCG_AREG0,
+                                    offsetof(CPUARMState, CF) + le, "CF");
+    cpu_ZF = tcg_global_mem_new_i32(TCG_AREG0,
+                                    offsetof(CPUARMState, ZF) + le, "ZF");
+    cpu_NF = tcg_global_mem_new_i32(TCG_AREG0,
+                                    offsetof(CPUARMState, NF) + be, "NF");
+    cpu_VF = tcg_global_mem_new_i32(TCG_AREG0,
+                                    offsetof(CPUARMState, VF) + be, "VF");
 
     cpu_exclusive_addr = tcg_global_mem_new_i64(TCG_AREG0,
         offsetof(CPUARMState, exclusive_addr), "exclusive_addr");
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 03/11] target-arm: Handle always condition codes within arm_test_cc
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 01/11] target-arm: Introduce DisasCompare Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 04/11] target-arm: Recognize SXTB, SXTH, SXTW, ASR Richard Henderson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Handling this with TCG_COND_ALWAYS will allow these unlikely
cases to be handled without special cases in the rest of the
translator.  The TCG optimizer ought to be able to reduce
these ALWAYS conditions completely.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 9 +++++++++
 target-arm/translate.c     | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 763bf35..219e257 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1056,6 +1056,14 @@ static void arm_test_cc(DisasCompare *cmp, int cc)
         tcg_gen_andc_i64(value, cpu_ZF, value);
         break;
 
+    case 14: /* always */
+    case 15: /* always */
+        /* Use the ALWAYS condition, which will fold early.
+           It doesn't matter what we use for the value.  */
+        cond = TCG_COND_ALWAYS;
+        value = cpu_ZF;
+        goto no_invert;
+
     default:
         fprintf(stderr, "Bad condition code 0x%x\n", cc);
         abort();
@@ -1065,6 +1073,7 @@ static void arm_test_cc(DisasCompare *cmp, int cc)
         cond = tcg_invert_cond(cond);
     }
 
+ no_invert:
     cmp->cond = cond;
     cmp->value = value;
     cmp->value_global = global;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 0d0a4d1..54edc33 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -816,6 +816,14 @@ static void arm_test_cc(DisasCompare *cmp, int cc)
         tcg_gen_andc_i32(value, cpu_ZF, value);
         break;
 
+    case 14: /* always */
+    case 15: /* always */
+        /* Use the ALWAYS condition, which will fold early.
+           It doesn't matter what we use for the value.  */
+        cond = TCG_COND_ALWAYS;
+        value = cpu_ZF;
+        goto no_invert;
+
     default:
         fprintf(stderr, "Bad condition code 0x%x\n", cc);
         abort();
@@ -825,6 +833,7 @@ static void arm_test_cc(DisasCompare *cmp, int cc)
         cond = tcg_invert_cond(cond);
     }
 
+ no_invert:
     cmp->cond = cond;
     cmp->value = value;
     cmp->value_global = global;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 04/11] target-arm: Recognize SXTB, SXTH, SXTW, ASR
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (2 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 03/11] target-arm: Handle always condition codes within arm_test_cc Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 05/11] target-arm: Recognize UXTB, UXTH, LSR, LSL Richard Henderson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

... as aliases of SBFM, and special-case them.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 219e257..0cb60a2 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3036,7 +3036,28 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
     tcg_rd = cpu_reg(s, rd);
     tcg_tmp = read_cpu_reg(s, rn, sf);
 
-    /* OPTME: probably worth recognizing common cases of ext{8,16,32}{u,s} */
+    /* Recognize the common aliases.  */
+    if (opc == 0) { /* SBFM */
+        if (ri == 0) {
+            if (si == 7) { /* SXTB */
+                tcg_gen_ext8s_i64(tcg_rd, tcg_tmp);
+                goto done;
+            } else if (si == 15) { /* SXTH */
+                tcg_gen_ext16s_i64(tcg_rd, tcg_tmp);
+                goto done;
+            } else if (si == 31) { /* SXTW */
+                tcg_gen_ext32s_i64(tcg_rd, tcg_tmp);
+                goto done;
+            }
+        }
+        if (si == 63 || (si == 31 && ri <= si)) { /* ASR */
+            if (si == 31) {
+                tcg_gen_ext32s_i64(tcg_tmp, tcg_tmp);
+            }
+            tcg_gen_sari_i64(tcg_rd, tcg_tmp, ri);
+            goto done;
+        }
+    }
 
     if (opc != 1) { /* SBFM or UBFM */
         tcg_gen_movi_i64(tcg_rd, 0);
@@ -3061,6 +3082,7 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
         tcg_gen_sari_i64(tcg_rd, tcg_rd, 64 - (pos + len));
     }
 
+ done:
     if (!sf) { /* zero extend final result */
         tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 05/11] target-arm: Recognize UXTB, UXTH, LSR, LSL
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (3 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 04/11] target-arm: Recognize SXTB, SXTH, SXTW, ASR Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 06/11] target-arm: Eliminate unnecessary zero-extend in disas_bitfield Richard Henderson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 0cb60a2..54290ad 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3057,6 +3057,23 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
             tcg_gen_sari_i64(tcg_rd, tcg_tmp, ri);
             goto done;
         }
+    } else if (opc == 2) { /* UBFM */
+        if (ri == 0) { /* UXTB, UXTH, plus non-canonical AND */
+            tcg_gen_andi_i64(tcg_rd, tcg_tmp, bitmask64(si + 1));
+            return;
+        }
+        if (si == 63 || (si == 31 && ri <= si)) { /* LSR */
+            if (si == 31) {
+                tcg_gen_ext32u_i64(tcg_tmp, tcg_tmp);
+            }
+            tcg_gen_shri_i64(tcg_rd, tcg_tmp, ri);
+            return;
+        }
+        if (si + 1 == ri && si != bitsize - 1) { /* LSL */
+            int shift = bitsize - 1 - si;
+            tcg_gen_shli_i64(tcg_rd, tcg_tmp, shift);
+            goto done;
+        }
     }
 
     if (opc != 1) { /* SBFM or UBFM */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 06/11] target-arm: Eliminate unnecessary zero-extend in disas_bitfield
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (4 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 05/11] target-arm: Recognize UXTB, UXTH, LSR, LSL Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 07/11] target-arm: Recognize ROR Richard Henderson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

For !SF, this initial ext32u can't be optimized away by the
current TCG code generator.  (It would require backward bit
liveness propagation.)

But since the range of bits for !SF are already constrained by
unallocated_encoding, we'll never reference the high bits anyway.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 54290ad..ed97ed6 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3034,7 +3034,11 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
     }
 
     tcg_rd = cpu_reg(s, rd);
-    tcg_tmp = read_cpu_reg(s, rn, sf);
+
+    /* Suppress the zero-extend for !sf.  Since RI and SI are constrained
+       to be smaller than bitsize, we'll never reference data outside the
+       low 32-bits anyway.  */
+    tcg_tmp = read_cpu_reg(s, rn, 1);
 
     /* Recognize the common aliases.  */
     if (opc == 0) { /* SBFM */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 07/11] target-arm: Recognize ROR
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (5 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 06/11] target-arm: Eliminate unnecessary zero-extend in disas_bitfield Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 08/11] target-arm: Use setcond and movcond for csel Richard Henderson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index ed97ed6..d139b2d 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3136,17 +3136,7 @@ static void disas_extract(DisasContext *s, uint32_t insn)
 
         tcg_rd = cpu_reg(s, rd);
 
-        if (imm) {
-            /* OPTME: we can special case rm==rn as a rotate */
-            tcg_rm = read_cpu_reg(s, rm, sf);
-            tcg_rn = read_cpu_reg(s, rn, sf);
-            tcg_gen_shri_i64(tcg_rm, tcg_rm, imm);
-            tcg_gen_shli_i64(tcg_rn, tcg_rn, bitsize - imm);
-            tcg_gen_or_i64(tcg_rd, tcg_rm, tcg_rn);
-            if (!sf) {
-                tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
-            }
-        } else {
+        if (unlikely(imm == 0)) {
             /* tcg shl_i32/shl_i64 is undefined for 32/64 bit shifts,
              * so an extract from bit 0 is a special case.
              */
@@ -3155,8 +3145,27 @@ static void disas_extract(DisasContext *s, uint32_t insn)
             } else {
                 tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, rm));
             }
+        } else if (rm == rn) { /* ROR */
+            tcg_rm = cpu_reg(s, rm);
+            if (sf) {
+                tcg_gen_rotri_i64(tcg_rd, tcg_rm, imm);
+            } else {
+                TCGv_i32 tmp = tcg_temp_new_i32();
+                tcg_gen_trunc_i64_i32(tmp, tcg_rm);
+                tcg_gen_rotri_i32(tmp, tmp, imm);
+                tcg_gen_extu_i32_i64(tcg_rd, tmp);
+                tcg_temp_free_i32(tmp);
+            }
+        } else {
+            tcg_rm = read_cpu_reg(s, rm, sf);
+            tcg_rn = read_cpu_reg(s, rn, sf);
+            tcg_gen_shri_i64(tcg_rm, tcg_rm, imm);
+            tcg_gen_shli_i64(tcg_rn, tcg_rn, bitsize - imm);
+            tcg_gen_or_i64(tcg_rd, tcg_rm, tcg_rn);
+            if (!sf) {
+                tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
+            }
         }
-
     }
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 08/11] target-arm: Use setcond and movcond for csel
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (6 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 07/11] target-arm: Recognize ROR Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 09/11] target-arm: Implement ccmp branchless Richard Henderson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 60 +++++++++++++++++++---------------------------
 1 file changed, 24 insertions(+), 36 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index d139b2d..7549267 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3703,7 +3703,8 @@ static void disas_cc(DisasContext *s, uint32_t insn)
 static void disas_cond_select(DisasContext *s, uint32_t insn)
 {
     unsigned int sf, else_inv, rm, cond, else_inc, rn, rd;
-    TCGv_i64 tcg_rd, tcg_src;
+    TCGv_i64 tcg_rd, zero;
+    DisasCompare c;
 
     if (extract32(insn, 29, 1) || extract32(insn, 11, 1)) {
         /* S == 1 or op2<1> == 1 */
@@ -3718,48 +3719,35 @@ static void disas_cond_select(DisasContext *s, uint32_t insn)
     rn = extract32(insn, 5, 5);
     rd = extract32(insn, 0, 5);
 
-    if (rd == 31) {
-        /* silly no-op write; until we use movcond we must special-case
-         * this to avoid a dead temporary across basic blocks.
-         */
-        return;
-    }
-
     tcg_rd = cpu_reg(s, rd);
 
-    if (cond >= 0x0e) { /* condition "always" */
-        tcg_src = read_cpu_reg(s, rn, sf);
-        tcg_gen_mov_i64(tcg_rd, tcg_src);
-    } else {
-        /* OPTME: we could use movcond here, at the cost of duplicating
-         * a lot of the arm_gen_test_cc() logic.
-         */
-        TCGLabel *label_match = gen_new_label();
-        TCGLabel *label_continue = gen_new_label();
-
-        arm_gen_test_cc(cond, label_match);
-        /* nomatch: */
-        tcg_src = cpu_reg(s, rm);
+    arm_test_cc(&c, cond);
+    zero = tcg_const_i64(0);
 
+    if (rn == 31 && rm == 31 && (else_inc ^ else_inv)) {
+        /* CSET & CSETM.  */
+        tcg_gen_setcond_i64(tcg_invert_cond(c.cond), tcg_rd, c.value, zero);
+        if (else_inv) {
+            tcg_gen_neg_i64(tcg_rd, tcg_rd);
+        }
+    } else {
+        TCGv_i64 t_true = cpu_reg(s, rn);
+        TCGv_i64 t_false = read_cpu_reg(s, rm, 1);
         if (else_inv && else_inc) {
-            tcg_gen_neg_i64(tcg_rd, tcg_src);
+            tcg_gen_neg_i64(t_false, t_false);
         } else if (else_inv) {
-            tcg_gen_not_i64(tcg_rd, tcg_src);
+            tcg_gen_not_i64(t_false, t_false);
         } else if (else_inc) {
-            tcg_gen_addi_i64(tcg_rd, tcg_src, 1);
-        } else {
-            tcg_gen_mov_i64(tcg_rd, tcg_src);
-        }
-        if (!sf) {
-            tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
+            tcg_gen_addi_i64(t_false, t_false, 1);
         }
-        tcg_gen_br(label_continue);
-        /* match: */
-        gen_set_label(label_match);
-        tcg_src = read_cpu_reg(s, rn, sf);
-        tcg_gen_mov_i64(tcg_rd, tcg_src);
-        /* continue: */
-        gen_set_label(label_continue);
+        tcg_gen_movcond_i64(c.cond, tcg_rd, c.value, zero, t_true, t_false);
+    }
+
+    tcg_temp_free_i64(zero);
+    arm_free_cc(&c);
+
+    if (!sf) {
+        tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
     }
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 09/11] target-arm: Implement ccmp branchless
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (7 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 08/11] target-arm: Use setcond and movcond for csel Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless Richard Henderson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This can allow much of a ccmp to be elided when particular
flags are subsequently dead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 62 ++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 7549267..8171a1f 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3641,8 +3641,8 @@ static void disas_adc_sbc(DisasContext *s, uint32_t insn)
 static void disas_cc(DisasContext *s, uint32_t insn)
 {
     unsigned int sf, op, y, cond, rn, nzcv, is_imm;
-    TCGLabel *label_continue = NULL;
-    TCGv_i64 tcg_tmp, tcg_y, tcg_rn;
+    TCGv_i64 tcg_t0, tcg_t1, tcg_t2, tcg_y, tcg_rn;
+    DisasCompare c;
 
     if (!extract32(insn, 29, 1)) {
         unallocated_encoding(s);
@@ -3660,19 +3660,13 @@ static void disas_cc(DisasContext *s, uint32_t insn)
     rn = extract32(insn, 5, 5);
     nzcv = extract32(insn, 0, 4);
 
-    if (cond < 0x0e) { /* not always */
-        TCGLabel *label_match = gen_new_label();
-        label_continue = gen_new_label();
-        arm_gen_test_cc(cond, label_match);
-        /* nomatch: */
-        tcg_tmp = tcg_temp_new_i64();
-        tcg_gen_movi_i64(tcg_tmp, nzcv << 28);
-        gen_set_nzcv(tcg_tmp);
-        tcg_temp_free_i64(tcg_tmp);
-        tcg_gen_br(label_continue);
-        gen_set_label(label_match);
-    }
-    /* match, or condition is always */
+    /* Set T0 = !COND.  */
+    tcg_t0 = tcg_temp_new_i64();
+    arm_test_cc(&c, cond);
+    tcg_gen_setcondi_i64(tcg_invert_cond(c.cond), tcg_t0, c.value, 0);
+    arm_free_cc(&c);
+
+    /* Load the arguments for the new comparison.  */
     if (is_imm) {
         tcg_y = new_tmp_a64(s);
         tcg_gen_movi_i64(tcg_y, y);
@@ -3681,17 +3675,43 @@ static void disas_cc(DisasContext *s, uint32_t insn)
     }
     tcg_rn = cpu_reg(s, rn);
 
-    tcg_tmp = tcg_temp_new_i64();
+    /* Set the flags for the new comparison.  */
+    tcg_t1 = tcg_temp_new_i64();
     if (op) {
-        gen_sub_CC(sf, tcg_tmp, tcg_rn, tcg_y);
+        gen_sub_CC(sf, tcg_t1, tcg_rn, tcg_y);
     } else {
-        gen_add_CC(sf, tcg_tmp, tcg_rn, tcg_y);
+        gen_add_CC(sf, tcg_t1, tcg_rn, tcg_y);
     }
-    tcg_temp_free_i64(tcg_tmp);
 
-    if (cond < 0x0e) { /* continue */
-        gen_set_label(label_continue);
+    /* If COND was false, force the flags to #nzcv.
+       Note that T1 = (COND ? 0 : -1), T2 = (COND ? -1 : 0).  */
+    tcg_t2 = tcg_temp_new_i64();
+    tcg_gen_neg_i64(tcg_t1, tcg_t0);
+    tcg_gen_subi_i64(tcg_t2, tcg_t0, 1);
+
+    if (nzcv & 8) { /* N */
+        tcg_gen_or_i64(cpu_NF, cpu_NF, tcg_t1);
+    } else {
+        tcg_gen_and_i64(cpu_NF, cpu_NF, tcg_t2);
+    }
+    if (nzcv & 4) { /* Z */
+        tcg_gen_and_i64(cpu_ZF, cpu_ZF, tcg_t2);
+    } else {
+        tcg_gen_or_i64(cpu_ZF, cpu_ZF, tcg_t0);
+    }
+    if (nzcv & 2) { /* C */
+        tcg_gen_or_i64(cpu_CF, cpu_CF, tcg_t0);
+    } else {
+        tcg_gen_and_i64(cpu_CF, cpu_CF, tcg_t2);
+    }
+    if (nzcv & 1) { /* V */
+        tcg_gen_or_i64(cpu_VF, cpu_VF, tcg_t1);
+    } else {
+        tcg_gen_and_i64(cpu_VF, cpu_VF, tcg_t2);
     }
+    tcg_temp_free_i64(tcg_t0);
+    tcg_temp_free_i64(tcg_t1);
+    tcg_temp_free_i64(tcg_t2);
 }
 
 /* C3.5.6 Conditional select
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (8 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 09/11] target-arm: Implement ccmp branchless Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-20 13:57   ` Laurent Desnogues
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 11/11] target-arm: Implement fcsel with movcond Richard Henderson
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 50 +++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 8171a1f..5539ae3 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -4128,11 +4128,11 @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
     }
 }
 
-static void handle_fp_compare(DisasContext *s, bool is_double,
-                              unsigned int rn, unsigned int rm,
-                              bool cmp_with_zero, bool signal_all_nans)
+static void handle_fp_compare_1(DisasContext *s, TCGv_i64 tcg_flags,
+                                bool is_double, unsigned int rn,
+                                unsigned int rm, bool cmp_with_zero,
+                                bool signal_all_nans)
 {
-    TCGv_i64 tcg_flags = tcg_temp_new_i64();
     TCGv_ptr fpst = get_fpstatus_ptr();
 
     if (is_double) {
@@ -4170,7 +4170,16 @@ static void handle_fp_compare(DisasContext *s, bool is_double,
     }
 
     tcg_temp_free_ptr(fpst);
+}
 
+static void handle_fp_compare(DisasContext *s, bool is_double,
+                              unsigned int rn, unsigned int rm,
+                              bool cmp_with_zero, bool signal_all_nans)
+{
+    TCGv_i64 tcg_flags = tcg_temp_new_i64();
+
+    handle_fp_compare_1(s, tcg_flags, is_double, rn, rm,
+                        cmp_with_zero, signal_all_nans);
     gen_set_nzcv(tcg_flags);
 
     tcg_temp_free_i64(tcg_flags);
@@ -4215,8 +4224,8 @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
 static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
 {
     unsigned int mos, type, rm, cond, rn, op, nzcv;
-    TCGv_i64 tcg_flags;
-    TCGLabel *label_continue = NULL;
+    TCGv_i64 t_flags, t_zero, t_nzcv;
+    DisasCompare c;
 
     mos = extract32(insn, 29, 3);
     type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
@@ -4235,23 +4244,22 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
         return;
     }
 
-    if (cond < 0x0e) { /* not always */
-        TCGLabel *label_match = gen_new_label();
-        label_continue = gen_new_label();
-        arm_gen_test_cc(cond, label_match);
-        /* nomatch: */
-        tcg_flags = tcg_const_i64(nzcv << 28);
-        gen_set_nzcv(tcg_flags);
-        tcg_temp_free_i64(tcg_flags);
-        tcg_gen_br(label_continue);
-        gen_set_label(label_match);
-    }
+    /* Perform the new compare, but don't write the result back to flags. */
+    t_flags = tcg_temp_new_i64();
+    handle_fp_compare_1(s, t_flags, type, rn, rm, false, op);
 
-    handle_fp_compare(s, type, rn, rm, false, op);
+    /* If the condition is false, force the flags to #nzcv.  */
+    arm_test_cc(&c, cond);
+    t_zero = tcg_const_i64(0);
+    t_nzcv = tcg_const_i64(nzcv << 28);
+    tcg_gen_movcond_i64(c.cond, t_flags, c.value, t_zero, t_flags, t_nzcv);
+    tcg_temp_free_i64(t_zero);
+    tcg_temp_free_i64(t_nzcv);
+    arm_free_cc(&c);
 
-    if (cond < 0x0e) {
-        gen_set_label(label_continue);
-    }
+    /* Write back the new flags.  */
+    gen_set_nzcv(t_flags);
+    tcg_temp_free_i64(t_flags);
 }
 
 /* copy src FP register to dst FP register; type specifies single or double */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 11/11] target-arm: Implement fcsel with movcond
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (9 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless Richard Henderson
@ 2015-02-19 21:14 ` Richard Henderson
  2015-02-19 23:52 ` [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Peter Maydell
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-02-19 21:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 48 ++++++++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 5539ae3..1302cec 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -4262,20 +4262,6 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
     tcg_temp_free_i64(t_flags);
 }
 
-/* copy src FP register to dst FP register; type specifies single or double */
-static void gen_mov_fp2fp(DisasContext *s, int type, int dst, int src)
-{
-    if (type) {
-        TCGv_i64 v = read_fp_dreg(s, src);
-        write_fp_dreg(s, dst, v);
-        tcg_temp_free_i64(v);
-    } else {
-        TCGv_i32 v = read_fp_sreg(s, src);
-        write_fp_sreg(s, dst, v);
-        tcg_temp_free_i32(v);
-    }
-}
-
 /* C3.6.24 Floating point conditional select
  *   31  30  29 28       24 23  22  21 20  16 15  12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+------+-----+------+------+
@@ -4285,7 +4271,8 @@ static void gen_mov_fp2fp(DisasContext *s, int type, int dst, int src)
 static void disas_fp_csel(DisasContext *s, uint32_t insn)
 {
     unsigned int mos, type, rm, cond, rn, rd;
-    TCGLabel *label_continue = NULL;
+    TCGv_i64 t_true, t_false, t_zero;
+    DisasCompare c;
 
     mos = extract32(insn, 29, 3);
     type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
@@ -4303,21 +4290,28 @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
         return;
     }
 
-    if (cond < 0x0e) { /* not always */
-        TCGLabel *label_match = gen_new_label();
-        label_continue = gen_new_label();
-        arm_gen_test_cc(cond, label_match);
-        /* nomatch: */
-        gen_mov_fp2fp(s, type, rd, rm);
-        tcg_gen_br(label_continue);
-        gen_set_label(label_match);
+    if (type) {
+        t_true = read_fp_dreg(s, rn);
+        t_false = read_fp_dreg(s, rm);
+    } else {
+        /* Zero-extend sreg inputs to 64-bits now.  */
+        t_true = tcg_temp_new_i64();
+        t_false = tcg_temp_new_i64();
+        tcg_gen_ld32u_i64(t_true, cpu_env, fp_reg_offset(s, rn, MO_32));
+        tcg_gen_ld32u_i64(t_false, cpu_env, fp_reg_offset(s, rm, MO_32));
     }
 
-    gen_mov_fp2fp(s, type, rd, rn);
+    arm_test_cc(&c, cond);
+    t_zero = tcg_const_i64(0);
+    tcg_gen_movcond_i64(c.cond, t_true, c.value, t_zero, t_true, t_false);
+    tcg_temp_free_i64(t_zero);
+    tcg_temp_free_i64(t_false);
+    arm_free_cc(&c);
 
-    if (cond < 0x0e) { /* continue */
-        gen_set_label(label_continue);
-    }
+    /* Note that sregs write back zeros to the high bits,
+       and we've already done the zero-extension.  */
+    write_fp_dreg(s, rd, t_true);
+    tcg_temp_free_i64(t_true);
 }
 
 /* C3.6.25 Floating-point data-processing (1 source) - single precision */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (10 preceding siblings ...)
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 11/11] target-arm: Implement fcsel with movcond Richard Henderson
@ 2015-02-19 23:52 ` Peter Maydell
  2015-02-20 16:50   ` Alex Bennée
  2015-02-20 10:00 ` Laurent Desnogues
  2015-02-23  7:49 ` Laurent Desnogues
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2015-02-19 23:52 UTC (permalink / raw)
  To: Alex Bennée; +Cc: QEMU Developers, Richard Henderson

On 20 February 2015 at 06:14, Richard Henderson <rth@twiddle.net> wrote:
> While doing the mechanics of a previous patch set converting
> translators to use to TCGLabel pointers, I was reminded of
> several outstanding OPTME comments in the aarch64 translator.
>
> I had started with the csel change, which at first failed and
> took quite some time to debug.  See the comment for patch 1.
>
> Since this depends on the outstanding TCGLabel patch set, the
> full tree is available at
>
>   git://github.com/rth7680/qemu.git arm-movcond

Alex, could you run this lot through the A64 risu testsuite,
please?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (11 preceding siblings ...)
  2015-02-19 23:52 ` [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Peter Maydell
@ 2015-02-20 10:00 ` Laurent Desnogues
  2015-02-20 10:54   ` Laurent Desnogues
  2015-02-23  7:49 ` Laurent Desnogues
  13 siblings, 1 reply; 23+ messages in thread
From: Laurent Desnogues @ 2015-02-20 10:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Feb 19, 2015 at 10:14 PM, Richard Henderson <rth@twiddle.net> wrote:
> While doing the mechanics of a previous patch set converting
> translators to use to TCGLabel pointers, I was reminded of
> several outstanding OPTME comments in the aarch64 translator.
>
> I had started with the csel change, which at first failed and
> took quite some time to debug.  See the comment for patch 1.
>
> Since this depends on the outstanding TCGLabel patch set, the
> full tree is available at
>
>   git://github.com/rth7680/qemu.git arm-movcond

Tested on both integer and FP tests.  No regression found.

On the other hand, aarch64-linux-user seems to be significantly
slower on a linux-user test I ran:

176.gcc with 166.i
Host: CPU E5-2650 v2 iwth CentOS 6.6 64-bit
time for standard QEMU: ~29s
time for RTH QEMU: ~33s

Is this expected?

Thanks,

Laurent

>
> r~
>
>
> Richard Henderson (11):
>   target-arm: Introduce DisasCompare
>   target-arm: Extend NZCF to 64 bits
>   target-arm: Handle always condition codes within arm_test_cc
>   target-arm: Recognize SXTB, SXTH, SXTW, ASR
>   target-arm: Recognize UXTB, UXTH, LSR, LSL
>   target-arm: Eliminate unnecessary zero-extend in disas_bitfield
>   target-arm: Recognize ROR
>   target-arm: Use setcond and movcond for csel
>   target-arm: Implement ccmp branchless
>   target-arm: Implement fccmp branchless
>   target-arm: Implement fcsel with movcond
>
>  target-arm/cpu.h           |  21 +-
>  target-arm/helper.c        |  18 +-
>  target-arm/translate-a64.c | 688 ++++++++++++++++++++++++++-------------------
>  target-arm/translate.c     | 151 ++++++----
>  target-arm/translate.h     |   2 -
>  5 files changed, 524 insertions(+), 356 deletions(-)
>
> --
> 2.1.0
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
  2015-02-20 10:00 ` Laurent Desnogues
@ 2015-02-20 10:54   ` Laurent Desnogues
  0 siblings, 0 replies; 23+ messages in thread
From: Laurent Desnogues @ 2015-02-20 10:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Fri, Feb 20, 2015 at 11:00 AM, Laurent Desnogues
<laurent.desnogues@gmail.com> wrote:
> On Thu, Feb 19, 2015 at 10:14 PM, Richard Henderson <rth@twiddle.net> wrote:
>> While doing the mechanics of a previous patch set converting
>> translators to use to TCGLabel pointers, I was reminded of
>> several outstanding OPTME comments in the aarch64 translator.
>>
>> I had started with the csel change, which at first failed and
>> took quite some time to debug.  See the comment for patch 1.
>>
>> Since this depends on the outstanding TCGLabel patch set, the
>> full tree is available at
>>
>>   git://github.com/rth7680/qemu.git arm-movcond
>
> Tested on both integer and FP tests.  No regression found.
>
> On the other hand, aarch64-linux-user seems to be significantly
> slower on a linux-user test I ran:
>
> 176.gcc with 166.i
> Host: CPU E5-2650 v2 iwth CentOS 6.6 64-bit
> time for standard QEMU: ~29s
> time for RTH QEMU: ~33s
>
> Is this expected?

Forget this, I had forgotten to switch to the arm-movcond branch...

This patch set breaks fccmp and various integer tests.  I'll take a
look this afternoon.


Laurent

> Thanks,
>
> Laurent
>
>>
>> r~
>>
>>
>> Richard Henderson (11):
>>   target-arm: Introduce DisasCompare
>>   target-arm: Extend NZCF to 64 bits
>>   target-arm: Handle always condition codes within arm_test_cc
>>   target-arm: Recognize SXTB, SXTH, SXTW, ASR
>>   target-arm: Recognize UXTB, UXTH, LSR, LSL
>>   target-arm: Eliminate unnecessary zero-extend in disas_bitfield
>>   target-arm: Recognize ROR
>>   target-arm: Use setcond and movcond for csel
>>   target-arm: Implement ccmp branchless
>>   target-arm: Implement fccmp branchless
>>   target-arm: Implement fcsel with movcond
>>
>>  target-arm/cpu.h           |  21 +-
>>  target-arm/helper.c        |  18 +-
>>  target-arm/translate-a64.c | 688 ++++++++++++++++++++++++++-------------------
>>  target-arm/translate.c     | 151 ++++++----
>>  target-arm/translate.h     |   2 -
>>  5 files changed, 524 insertions(+), 356 deletions(-)
>>
>> --
>> 2.1.0
>>
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless Richard Henderson
@ 2015-02-20 13:57   ` Laurent Desnogues
  2015-02-20 15:53     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Laurent Desnogues @ 2015-02-20 13:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

Hi Richard,

On Thu, Feb 19, 2015 at 10:14 PM, Richard Henderson <rth@twiddle.net> wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/translate-a64.c | 50 +++++++++++++++++++++++++++-------------------
>  1 file changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index 8171a1f..5539ae3 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -4128,11 +4128,11 @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
>      }
>  }
>
> -static void handle_fp_compare(DisasContext *s, bool is_double,
> -                              unsigned int rn, unsigned int rm,
> -                              bool cmp_with_zero, bool signal_all_nans)
> +static void handle_fp_compare_1(DisasContext *s, TCGv_i64 tcg_flags,
> +                                bool is_double, unsigned int rn,
> +                                unsigned int rm, bool cmp_with_zero,
> +                                bool signal_all_nans)
>  {
> -    TCGv_i64 tcg_flags = tcg_temp_new_i64();
>      TCGv_ptr fpst = get_fpstatus_ptr();
>
>      if (is_double) {
> @@ -4170,7 +4170,16 @@ static void handle_fp_compare(DisasContext *s, bool is_double,
>      }
>
>      tcg_temp_free_ptr(fpst);
> +}
>
> +static void handle_fp_compare(DisasContext *s, bool is_double,
> +                              unsigned int rn, unsigned int rm,
> +                              bool cmp_with_zero, bool signal_all_nans)
> +{
> +    TCGv_i64 tcg_flags = tcg_temp_new_i64();
> +
> +    handle_fp_compare_1(s, tcg_flags, is_double, rn, rm,
> +                        cmp_with_zero, signal_all_nans);
>      gen_set_nzcv(tcg_flags);
>
>      tcg_temp_free_i64(tcg_flags);
> @@ -4215,8 +4224,8 @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
>  static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
>  {
>      unsigned int mos, type, rm, cond, rn, op, nzcv;
> -    TCGv_i64 tcg_flags;
> -    TCGLabel *label_continue = NULL;
> +    TCGv_i64 t_flags, t_zero, t_nzcv;
> +    DisasCompare c;
>
>      mos = extract32(insn, 29, 3);
>      type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
> @@ -4235,23 +4244,22 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
>          return;
>      }
>
> -    if (cond < 0x0e) { /* not always */
> -        TCGLabel *label_match = gen_new_label();
> -        label_continue = gen_new_label();
> -        arm_gen_test_cc(cond, label_match);
> -        /* nomatch: */
> -        tcg_flags = tcg_const_i64(nzcv << 28);
> -        gen_set_nzcv(tcg_flags);
> -        tcg_temp_free_i64(tcg_flags);
> -        tcg_gen_br(label_continue);
> -        gen_set_label(label_match);
> -    }
> +    /* Perform the new compare, but don't write the result back to flags. */
> +    t_flags = tcg_temp_new_i64();
> +    handle_fp_compare_1(s, t_flags, type, rn, rm, false, op);

The problem with this approach is that you'll always call the FP
compare which might result in FP flags corruption.

The ARMv8 manual clearly states that the FP compare should only be
called if the condition holds.

Thanks,

Laurent

> -    handle_fp_compare(s, type, rn, rm, false, op);
> +    /* If the condition is false, force the flags to #nzcv.  */
> +    arm_test_cc(&c, cond);
> +    t_zero = tcg_const_i64(0);
> +    t_nzcv = tcg_const_i64(nzcv << 28);
> +    tcg_gen_movcond_i64(c.cond, t_flags, c.value, t_zero, t_flags, t_nzcv);
> +    tcg_temp_free_i64(t_zero);
> +    tcg_temp_free_i64(t_nzcv);
> +    arm_free_cc(&c);
>
> -    if (cond < 0x0e) {
> -        gen_set_label(label_continue);
> -    }
> +    /* Write back the new flags.  */
> +    gen_set_nzcv(t_flags);
> +    tcg_temp_free_i64(t_flags);
>  }
>
>  /* copy src FP register to dst FP register; type specifies single or double */
> --
> 2.1.0
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless
  2015-02-20 13:57   ` Laurent Desnogues
@ 2015-02-20 15:53     ` Richard Henderson
  2015-02-23  7:43       ` Laurent Desnogues
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2015-02-20 15:53 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: Peter Maydell, qemu-devel

On 02/20/2015 05:57 AM, Laurent Desnogues wrote:
> The problem with this approach is that you'll always call the FP
> compare which might result in FP flags corruption.
> 
> The ARMv8 manual clearly states that the FP compare should only be
> called if the condition holds.

Ah, I hadn't considered that.  Consider this patch dropped.

Thanks,


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
  2015-02-19 23:52 ` [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Peter Maydell
@ 2015-02-20 16:50   ` Alex Bennée
  2015-02-20 17:50     ` Alex Bennée
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Bennée @ 2015-02-20 16:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

It's running through LAVA now:

https://validation.linaro.org/scheduler/job/268253


On 19 February 2015 at 23:52, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 20 February 2015 at 06:14, Richard Henderson <rth@twiddle.net> wrote:
>> While doing the mechanics of a previous patch set converting
>> translators to use to TCGLabel pointers, I was reminded of
>> several outstanding OPTME comments in the aarch64 translator.
>>
>> I had started with the csel change, which at first failed and
>> took quite some time to debug.  See the comment for patch 1.
>>
>> Since this depends on the outstanding TCGLabel patch set, the
>> full tree is available at
>>
>>   git://github.com/rth7680/qemu.git arm-movcond
>
> Alex, could you run this lot through the A64 risu testsuite,
> please?
>
> thanks
> -- PMM



-- 
Alex Bennée
KVM/QEMU Hacker for Linaro

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
  2015-02-20 16:50   ` Alex Bennée
@ 2015-02-20 17:50     ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2015-02-20 17:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

https://validation.linaro.org/dashboard/streams/anonymous/alex.bennee/bundles/52315b57f77238f924b5528ad16cc549d93a9d31/2a7f8984-0d1c-4268-8249-ba86c56dcbb7/?search=&length=100#table

All passing on the current test set. I've got more to add when I get a chance,

On 20 February 2015 at 16:50, Alex Bennée <alex.bennee@linaro.org> wrote:
> It's running through LAVA now:
>
> https://validation.linaro.org/scheduler/job/268253
>
>
> On 19 February 2015 at 23:52, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 20 February 2015 at 06:14, Richard Henderson <rth@twiddle.net> wrote:
>>> While doing the mechanics of a previous patch set converting
>>> translators to use to TCGLabel pointers, I was reminded of
>>> several outstanding OPTME comments in the aarch64 translator.
>>>
>>> I had started with the csel change, which at first failed and
>>> took quite some time to debug.  See the comment for patch 1.
>>>
>>> Since this depends on the outstanding TCGLabel patch set, the
>>> full tree is available at
>>>
>>>   git://github.com/rth7680/qemu.git arm-movcond
>>
>> Alex, could you run this lot through the A64 risu testsuite,
>> please?
>>
>> thanks
>> -- PMM
>
>
>
> --
> Alex Bennée
> KVM/QEMU Hacker for Linaro



-- 
Alex Bennée
KVM/QEMU Hacker for Linaro

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless
  2015-02-20 15:53     ` Richard Henderson
@ 2015-02-23  7:43       ` Laurent Desnogues
  0 siblings, 0 replies; 23+ messages in thread
From: Laurent Desnogues @ 2015-02-23  7:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

Hi Richard,

On Fri, Feb 20, 2015 at 4:53 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 02/20/2015 05:57 AM, Laurent Desnogues wrote:
>> The problem with this approach is that you'll always call the FP
>> compare which might result in FP flags corruption.
>>
>> The ARMv8 manual clearly states that the FP compare should only be
>> called if the condition holds.
>
> Ah, I hadn't considered that.  Consider this patch dropped.

With this patch removed, all the FP tests pass as on master.

Thanks,

Laurent

> Thanks,
>
>
> r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments
  2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
                   ` (12 preceding siblings ...)
  2015-02-20 10:00 ` Laurent Desnogues
@ 2015-02-23  7:49 ` Laurent Desnogues
  13 siblings, 0 replies; 23+ messages in thread
From: Laurent Desnogues @ 2015-02-23  7:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

Hi Richard,

On Thu, Feb 19, 2015 at 10:14 PM, Richard Henderson <rth@twiddle.net> wrote:
> While doing the mechanics of a previous patch set converting
> translators to use to TCGLabel pointers, I was reminded of
> several outstanding OPTME comments in the aarch64 translator.
>
> I had started with the csel change, which at first failed and
> took quite some time to debug.  See the comment for patch 1.
>
> Since this depends on the outstanding TCGLabel patch set, the
> full tree is available at
>
>   git://github.com/rth7680/qemu.git arm-movcond

With patch 10 (fccmp) removed and the handling of ALWAYS
and NEVER added to tcg-op.c, I get no failure any more (and
no significant speed change).

Thanks,

Laurent

>
> r~
>
>
> Richard Henderson (11):
>   target-arm: Introduce DisasCompare
>   target-arm: Extend NZCF to 64 bits
>   target-arm: Handle always condition codes within arm_test_cc
>   target-arm: Recognize SXTB, SXTH, SXTW, ASR
>   target-arm: Recognize UXTB, UXTH, LSR, LSL
>   target-arm: Eliminate unnecessary zero-extend in disas_bitfield
>   target-arm: Recognize ROR
>   target-arm: Use setcond and movcond for csel
>   target-arm: Implement ccmp branchless
>   target-arm: Implement fccmp branchless
>   target-arm: Implement fcsel with movcond
>
>  target-arm/cpu.h           |  21 +-
>  target-arm/helper.c        |  18 +-
>  target-arm/translate-a64.c | 688 ++++++++++++++++++++++++++-------------------
>  target-arm/translate.c     | 151 ++++++----
>  target-arm/translate.h     |   2 -
>  5 files changed, 524 insertions(+), 356 deletions(-)
>
> --
> 2.1.0
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits
  2015-02-19 21:14 ` [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits Richard Henderson
@ 2015-03-10 16:08   ` Peter Maydell
  2015-03-10 18:18     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2015-03-10 16:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On 19 February 2015 at 21:14, Richard Henderson <rth@twiddle.net> wrote:
> The resulting aarch64 translation is a bit cleaner.
> Sign-extending from 32-bits is simpler than having
> to use setcond to narrow from 64-bits.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>


> @@ -4545,6 +4548,9 @@ void aarch64_sync_64_to_32(CPUARMState *env)
>          env->regs[i] = env->xregs[i];
>      }
>
> +    /* Need to compress Z into the low bits.  */
> +    env->ZF = (env->ZF != 0);
> +

I really don't like this. Having state with a different format
in 32-bit and 64-bit modes is asking for trouble -- the bits
we already have to convert are already awkward enough.

I'd much rather we stuck with a format where env->ZF is
the same regardless of register width, as we have now.

-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits
  2015-03-10 16:08   ` Peter Maydell
@ 2015-03-10 18:18     ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2015-03-10 18:18 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 03/10/2015 09:08 AM, Peter Maydell wrote:
> On 19 February 2015 at 21:14, Richard Henderson <rth@twiddle.net> wrote:
>> The resulting aarch64 translation is a bit cleaner.
>> Sign-extending from 32-bits is simpler than having
>> to use setcond to narrow from 64-bits.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
> 
> 
>> @@ -4545,6 +4548,9 @@ void aarch64_sync_64_to_32(CPUARMState *env)
>>          env->regs[i] = env->xregs[i];
>>      }
>>
>> +    /* Need to compress Z into the low bits.  */
>> +    env->ZF = (env->ZF != 0);
>> +
> 
> I really don't like this. Having state with a different format
> in 32-bit and 64-bit modes is asking for trouble -- the bits
> we already have to convert are already awkward enough.
> 
> I'd much rather we stuck with a format where env->ZF is
> the same regardless of register width, as we have now.

Err.. it is the same format, from the viewpoint of outside TCG generated code.

>From the viewpoint inside TCG generated code, for AArch32, ZF is only 32-bits
wide.  For AArch64, ZF is 64-bits wide.  So when we transition from AArch64 to
AArch32, we must make sure that if ZF != 0, then ZF <= 0xffffffff.

It's a similar concept for NF and VF, except there I can arrange for the sign
bit of the 32-bit AArch32 NF/VF to line up with the 64-bit AArch64 NF/VF in memory.

If that's not what you mean... then I don't know what you mean.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-03-10 18:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-19 21:14 [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 01/11] target-arm: Introduce DisasCompare Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 02/11] target-arm: Extend NZCF to 64 bits Richard Henderson
2015-03-10 16:08   ` Peter Maydell
2015-03-10 18:18     ` Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 03/11] target-arm: Handle always condition codes within arm_test_cc Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 04/11] target-arm: Recognize SXTB, SXTH, SXTW, ASR Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 05/11] target-arm: Recognize UXTB, UXTH, LSR, LSL Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 06/11] target-arm: Eliminate unnecessary zero-extend in disas_bitfield Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 07/11] target-arm: Recognize ROR Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 08/11] target-arm: Use setcond and movcond for csel Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 09/11] target-arm: Implement ccmp branchless Richard Henderson
2015-02-19 21:14 ` [Qemu-devel] [PATCH 10/11] target-arm: Implement fccmp branchless Richard Henderson
2015-02-20 13:57   ` Laurent Desnogues
2015-02-20 15:53     ` Richard Henderson
2015-02-23  7:43       ` Laurent Desnogues
2015-02-19 21:14 ` [Qemu-devel] [PATCH 11/11] target-arm: Implement fcsel with movcond Richard Henderson
2015-02-19 23:52 ` [Qemu-devel] [PATCH 00/11] target-aarch64 fix and improvments Peter Maydell
2015-02-20 16:50   ` Alex Bennée
2015-02-20 17:50     ` Alex Bennée
2015-02-20 10:00 ` Laurent Desnogues
2015-02-20 10:54   ` Laurent Desnogues
2015-02-23  7:49 ` Laurent Desnogues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.