All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG
@ 2014-08-28 17:14 Paolo Bonzini
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index Paolo Bonzini
                   ` (17 more replies)
  0 siblings, 18 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Hi everyone,

these patches provide a speedup around 20% when running PPC softmmu
emulation on x86 machines (10% for user-mode emulation).  There are
actually two separate speedups here:

* avoiding TLB flushing on every kernel<->user transition (patches 1-2)

* rewriting CR handling to use 32 1-bit registers instead of 8
  4-bit registers (patches 3-16)

They must not be too shoddy; they boot a Linux guest fine. :) And the
speedup is very interesting of course.  The three problems with it are:

* I don't have a good testsuite.  So floating-point, decimal and SPE
  are mostly untested

* I don't have much time to work on them (they are about a year old and
  I have just rebased them).

* Patch 15 is a monster and hard to review, but I have no idea how to
  split it.

Please take a look and if you are interested help in any way you can. :)

I think patches 1-13 can be separated, as the two optimizations are
independent and patches 3-13 are mostly bug fixes and cleanups.

Paolo

Paolo Bonzini (17):
  ppc: do not look at the MMU index
  ppc: avoid excessive TLB flushing
  ppc: fix monitor access to CR
  ppc: use ARRAY_SIZE in gdbstub.c
  ppc: use CRF_* in fpu_helper.c
  ppc: use CRF_* in int_helper.c
  ppc: fix result of DLMZB when no zero bytes are found
  ppc: introduce helpers for mfocrf/mtocrf
  ppc: reorganize gen_compute_fprf
  ppc: introduce gen_op_mfcr/gen_op_mtcr
  ppc: rename gen_set_cr6_from_fpscr
  ppc: use movcond for isel
  ppc: compute mask from BI using right shift
  ppc: introduce ppc_get_crf and ppc_set_crf
  ppc: store CR registers in 32 1-bit registers
  ppc: inline ppc_get_crf/ppc_set_crf when clearer
  ppc: dump all 32 CR bits

 cputlb.c                    |  19 ++
 hw/ppc/spapr_hcall.c        |   6 +-
 include/exec/exec-all.h     |   5 +
 linux-user/elfload.c        |   4 +-
 linux-user/main.c           |   9 +-
 linux-user/signal.c         |   8 +-
 monitor.c                   |   2 +-
 target-ppc/cpu.h            |  43 +++-
 target-ppc/excp_helper.c    |   8 +-
 target-ppc/fpu_helper.c     |  73 +++----
 target-ppc/gdbstub.c        |   8 +-
 target-ppc/helper.h         |   9 +-
 target-ppc/helper_regs.h    |  52 +++--
 target-ppc/int_helper.c     |  64 ++++--
 target-ppc/kvm.c            |  10 +-
 target-ppc/machine.c        |   9 +
 target-ppc/translate.c      | 470 ++++++++++++++++++++++----------------------
 target-ppc/translate_init.c |   5 +
 18 files changed, 461 insertions(+), 343 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
@ 2014-08-28 17:14 ` Paolo Bonzini
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

The MMU index is an internal detail that should not be
needed by the translator.  Look at the MSR directly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index c07bb01..5a8267a 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -11261,7 +11261,7 @@ static inline void gen_intermediate_code_internal(PowerPCCPU *cpu,
     ctx.tb = tb;
     ctx.exception = POWERPC_EXCP_NONE;
     ctx.spr_cb = env->spr_cb;
-    ctx.mem_idx = env->mmu_idx;
+    ctx.mem_idx = (!msr_pr && msr_hv) ? 2 : 1 - msr_pr;
     ctx.insns_flags = env->insns_flags;
     ctx.insns_flags2 = env->insns_flags2;
     ctx.access_type = -1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index Paolo Bonzini
@ 2014-08-28 17:14 ` Paolo Bonzini
  2014-08-28 17:30   ` Peter Maydell
  2014-09-05  7:10   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR Paolo Bonzini
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

PowerPC TCG flushes the TLB on every IR/DR change, which basically
means on every user<->kernel context switch.  Use the 6-element
TLB array as a cache, where each MMU index is mapped to a different
state of the IR/DR/PR/HV bits.

This brings the number of TLB flushes down from ~900000 to ~50000
for starting up the Debian installer, which is in line with x86
and gives a ~10% performance improvement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cputlb.c                    | 19 +++++++++++++++++
 hw/ppc/spapr_hcall.c        |  6 +++++-
 include/exec/exec-all.h     |  5 +++++
 target-ppc/cpu.h            |  4 +++-
 target-ppc/excp_helper.c    |  6 +-----
 target-ppc/helper_regs.h    | 52 +++++++++++++++++++++++++++++++--------------
 target-ppc/translate_init.c |  5 +++++
 7 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index afd3705..17e1b03 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
     tlb_flush_count++;
 }
 
+void tlb_flush_idx(CPUState *cpu, int mmu_idx)
+{
+    CPUArchState *env = cpu->env_ptr;
+
+#if defined(DEBUG_TLB)
+    printf("tlb_flush_idx %d:\n", mmu_idx);
+#endif
+    /* must reset current TB so that interrupts cannot modify the
+       links while we are modifying them */
+    cpu->current_tb = NULL;
+
+    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
+    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+
+    env->tlb_flush_addr = -1;
+    env->tlb_flush_mask = 0;
+    tlb_flush_count++;
+}
+
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
 {
     if (addr == (tlb_entry->addr_read &
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 467858c..b95961c 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 {
     CPUPPCState *env = &cpu->env;
     CPUState *cs = CPU(cpu);
+    bool flush;
 
     env->msr |= (1ULL << MSR_EE);
-    hreg_compute_hflags(env);
+    flush = hreg_compute_hflags(env);
     if (!cpu_has_work(cs)) {
         cs->halted = 1;
         cs->exception_index = EXCP_HLT;
         cs->exit_request = 1;
+    } else if (flush) {
+        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+        cs->exit_request = 1;
     }
     return H_SUCCESS;
 }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5e5d86e..629a550 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -100,6 +100,7 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as);
 /* cputlb.c */
 void tlb_flush_page(CPUState *cpu, target_ulong addr);
 void tlb_flush(CPUState *cpu, int flush_global);
+void tlb_flush_idx(CPUState *cpu, int mmu_idx);
 void tlb_set_page(CPUState *cpu, target_ulong vaddr,
                   hwaddr paddr, int prot,
                   int mmu_idx, target_ulong size);
@@ -112,6 +113,10 @@ static inline void tlb_flush_page(CPUState *cpu, target_ulong addr)
 static inline void tlb_flush(CPUState *cpu, int flush_global)
 {
 }
+
+static inline void tlb_flush_idx(CPUState *cpu, int mmu_idx)
+{
+}
 #endif
 
 #define CODE_GEN_ALIGN           16 /* must be >= of the size of a icache line */
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index b64c652..c1cb27f 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -922,7 +922,7 @@ struct ppc_segment_page_sizes {
 
 /*****************************************************************************/
 /* The whole PowerPC CPU context */
-#define NB_MMU_MODES 3
+#define NB_MMU_MODES 6
 
 #define PPC_CPU_OPCODES_LEN 0x40
 
@@ -1085,6 +1085,8 @@ struct CPUPPCState {
     target_ulong hflags;      /* hflags is a MSR & HFLAGS_MASK         */
     target_ulong hflags_nmsr; /* specific hflags, not coming from MSR */
     int mmu_idx;         /* precomputed MMU index to speed up mem accesses */
+    uint32_t mmu_msr[NB_MMU_MODES];  /* ir/dr/hv/pr values for TLBs */
+    int mmu_fifo;  /* for replacement in mmu_msr */
 
     /* Power management */
     int (*check_pow)(CPUPPCState *env);
diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
index be71590..bf25d44 100644
--- a/target-ppc/excp_helper.c
+++ b/target-ppc/excp_helper.c
@@ -623,9 +623,6 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
 
     if (env->spr[SPR_LPCR] & LPCR_AIL) {
         new_msr |= (1 << MSR_IR) | (1 << MSR_DR);
-    } else if (msr & ((1 << MSR_IR) | (1 << MSR_DR))) {
-        /* If we disactivated any translation, flush TLBs */
-        tlb_flush(cs, 1);
     }
 
 #ifdef TARGET_PPC64
@@ -678,8 +675,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
     if ((env->mmu_model == POWERPC_MMU_BOOKE) ||
         (env->mmu_model == POWERPC_MMU_BOOKE206)) {
         /* XXX: The BookE changes address space when switching modes,
-                we should probably implement that as different MMU indexes,
-                but for the moment we do it the slow way and flush all.  */
+                TODO: still needed?!?  */
         tlb_flush(cs, 1);
     }
 }
diff --git a/target-ppc/helper_regs.h b/target-ppc/helper_regs.h
index 271fddf..291f9c1 100644
--- a/target-ppc/helper_regs.h
+++ b/target-ppc/helper_regs.h
@@ -39,17 +39,38 @@ static inline void hreg_swap_gpr_tgpr(CPUPPCState *env)
     env->tgpr[3] = tmp;
 }
 
-static inline void hreg_compute_mem_idx(CPUPPCState *env)
+static inline bool hreg_compute_mem_idx(CPUPPCState *env)
 {
-    /* Precompute MMU index */
-    if (msr_pr == 0 && msr_hv != 0) {
-        env->mmu_idx = 2;
-    } else {
-        env->mmu_idx = 1 - msr_pr;
+    CPUState *cs = CPU(ppc_env_get_cpu(env));
+    int msr = env->msr;
+    int i;
+
+    if (!tcg_enabled()) {
+        return false;
+    }
+
+    msr &= (1 << MSR_IR) | (1 << MSR_DR) | (1 << MSR_PR) | MSR_HVB;
+    if (msr_pr == 1) {
+        msr &= ~MSR_HVB;
     }
+
+    for (i = 0; i < NB_MMU_MODES; i++) {
+        if (env->mmu_msr[i] == msr) {
+            env->mmu_idx = i;
+            return false;
+        }
+    }
+
+    /* Use a new index with FIFO replacement.  */
+    i = (env->mmu_fifo == NB_MMU_MODES - 1 ? 0 : env->mmu_fifo + 1);
+    env->mmu_fifo = i;
+    env->mmu_msr[i] = msr;
+    env->mmu_idx = i;
+    tlb_flush_idx(cs, i);
+    return true;
 }
 
-static inline void hreg_compute_hflags(CPUPPCState *env)
+static inline bool hreg_compute_hflags(CPUPPCState *env)
 {
     target_ulong hflags_mask;
 
@@ -58,10 +79,10 @@ static inline void hreg_compute_hflags(CPUPPCState *env)
         (1 << MSR_PR) | (1 << MSR_FP) | (1 << MSR_SE) | (1 << MSR_BE) |
         (1 << MSR_LE) | (1 << MSR_VSX);
     hflags_mask |= (1ULL << MSR_CM) | (1ULL << MSR_SF) | MSR_HVB;
-    hreg_compute_mem_idx(env);
     env->hflags = env->msr & hflags_mask;
     /* Merge with hflags coming from other registers */
     env->hflags |= env->hflags_nmsr;
+    return hreg_compute_mem_idx(env);
 }
 
 static inline int hreg_store_msr(CPUPPCState *env, target_ulong value,
@@ -80,13 +101,6 @@ static inline int hreg_store_msr(CPUPPCState *env, target_ulong value,
         value &= ~MSR_HVB;
         value |= env->msr & MSR_HVB;
     }
-    if (((value >> MSR_IR) & 1) != msr_ir ||
-        ((value >> MSR_DR) & 1) != msr_dr) {
-        /* Flush all tlb when changing translation mode */
-        tlb_flush(cs, 1);
-        excp = POWERPC_EXCP_NONE;
-        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
-    }
     if (unlikely((env->flags & POWERPC_FLAG_TGPR) &&
                  ((value ^ env->msr) & (1 << MSR_TGPR)))) {
         /* Swap temporary saved registers with GPRs */
@@ -98,7 +112,13 @@ static inline int hreg_store_msr(CPUPPCState *env, target_ulong value,
     }
 #endif
     env->msr = value;
-    hreg_compute_hflags(env);
+    if (hreg_compute_hflags(env)) {
+#if !defined(CONFIG_USER_ONLY)
+        /* TLB was flushed, exit the current translation block.  */
+        excp = POWERPC_EXCP_NONE;
+        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+#endif
+    }
 #if !defined(CONFIG_USER_ONLY)
     if (unlikely(msr_pow == 1)) {
         if (!env->pending_interrupts && (*env->check_pow)(env)) {
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 48177ed..1c2ded9 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9472,6 +9472,7 @@ static void ppc_cpu_reset(CPUState *s)
         /* XXX: find a suitable condition to enable the hypervisor mode */
         msr |= (target_ulong)MSR_HVB;
     }
+
     msr |= (target_ulong)0 << MSR_AP; /* TO BE CHECKED */
     msr |= (target_ulong)0 << MSR_SA; /* TO BE CHECKED */
     msr |= (target_ulong)1 << MSR_EP;
@@ -9504,6 +9505,10 @@ static void ppc_cpu_reset(CPUState *s)
     }
 #endif
 
+    for (i = 1; i < NB_MMU_MODES; i++) {
+        env->mmu_msr[i] = -1;
+    }
+
     hreg_store_msr(env, msr, 1);
 
 #if !defined(CONFIG_USER_ONLY)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index Paolo Bonzini
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
@ 2014-08-28 17:14 ` Paolo Bonzini
  2014-09-03 18:21   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c Paolo Bonzini
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

This was off-by-one.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monitor.c b/monitor.c
index 34cee74..ec73dd4 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2968,7 +2968,7 @@ static target_long monitor_get_ccr (const struct MonitorDef *md, int val)
 
     u = 0;
     for (i = 0; i < 8; i++)
-        u |= env->crf[i] << (32 - (4 * i));
+        u |= env->crf[i] << (32 - (4 * (i + 1)));
 
     return u;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:21   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c Paolo Bonzini
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Match the idiom used by linux-user/signal.c and
linux-user/elfload.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/gdbstub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
index 14675f4..bad49ae 100644
--- a/target-ppc/gdbstub.c
+++ b/target-ppc/gdbstub.c
@@ -138,7 +138,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
             {
                 uint32_t cr = 0;
                 int i;
-                for (i = 0; i < 8; i++) {
+                for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
                     cr |= env->crf[i] << (32 - ((i + 1) * 4));
                 }
                 gdb_get_reg32(mem_buf, cr);
@@ -246,7 +246,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
             {
                 uint32_t cr = ldl_p(mem_buf);
                 int i;
-                for (i = 0; i < 8; i++) {
+                for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
                     env->crf[i] = (cr >> (32 - ((i + 1) * 4))) & 0xF;
                 }
                 break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (3 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:21   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c Paolo Bonzini
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/fpu_helper.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index da93d12..0fe006a 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1043,7 +1043,7 @@ uint32_t helper_ftdiv(uint64_t fra, uint64_t frb)
         }
     }
 
-    return 0x8 | (fg_flag ? 4 : 0) | (fe_flag ? 2 : 0);
+    return (1 << CRF_LT) | (fg_flag << CRF_GT) | (fe_flag << CRF_EQ);
 }
 
 uint32_t helper_ftsqrt(uint64_t frb)
@@ -1074,7 +1074,7 @@ uint32_t helper_ftsqrt(uint64_t frb)
         }
     }
 
-    return 0x8 | (fg_flag ? 4 : 0) | (fe_flag ? 2 : 0);
+    return (1 << CRF_LT) | (fg_flag << CRF_GT) | (fe_flag << CRF_EQ);
 }
 
 void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
@@ -1088,19 +1088,19 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
 
     if (unlikely(float64_is_any_nan(farg1.d) ||
                  float64_is_any_nan(farg2.d))) {
-        ret = 0x01UL;
+        ret = CRF_SO;
     } else if (float64_lt(farg1.d, farg2.d, &env->fp_status)) {
-        ret = 0x08UL;
+        ret = CRF_LT;
     } else if (!float64_le(farg1.d, farg2.d, &env->fp_status)) {
-        ret = 0x04UL;
+        ret = CRF_GT;
     } else {
-        ret = 0x02UL;
+        ret = CRF_EQ;
     }
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
-    env->fpscr |= ret << FPSCR_FPRF;
-    env->crf[crfD] = ret;
-    if (unlikely(ret == 0x01UL
+    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
+    env->crf[crfD] = (1 << ret);
+    if (unlikely(ret == CRF_SO
                  && (float64_is_signaling_nan(farg1.d) ||
                      float64_is_signaling_nan(farg2.d)))) {
         /* sNaN comparison */
@@ -1119,19 +1119,19 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
 
     if (unlikely(float64_is_any_nan(farg1.d) ||
                  float64_is_any_nan(farg2.d))) {
-        ret = 0x01UL;
+        ret = CRF_SO;
     } else if (float64_lt(farg1.d, farg2.d, &env->fp_status)) {
-        ret = 0x08UL;
+        ret = CRF_LT;
     } else if (!float64_le(farg1.d, farg2.d, &env->fp_status)) {
-        ret = 0x04UL;
+        ret = CRF_GT;
     } else {
-        ret = 0x02UL;
+        ret = CRF_EQ;
     }
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
-    env->fpscr |= ret << FPSCR_FPRF;
-    env->crf[crfD] = ret;
-    if (unlikely(ret == 0x01UL)) {
+    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
+    env->crf[crfD] = (1 << ret);
+    if (unlikely(ret == CRF_SO)) {
         if (float64_is_signaling_nan(farg1.d) ||
             float64_is_signaling_nan(farg2.d)) {
             /* sNaN comparison */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (4 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:28   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found Paolo Bonzini
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/int_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index f6e8846..9c1c5cd 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -2303,25 +2303,25 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
         if (sgna == sgnb) {
             result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgna, ps);
             zero = bcd_add_mag(&result, a, b, &invalid, &overflow);
-            cr = (sgna > 0) ? 4 : 8;
+            cr = (sgna > 0) ? 1 << CRF_GT : 1 << CRF_LT;
         } else if (bcd_cmp_mag(a, b) > 0) {
             result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgna, ps);
             zero = bcd_sub_mag(&result, a, b, &invalid, &overflow);
-            cr = (sgna > 0) ? 4 : 8;
+            cr = (sgna > 0) ? 1 << CRF_GT : 1 << CRF_LT;
         } else {
             result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgnb, ps);
             zero = bcd_sub_mag(&result, b, a, &invalid, &overflow);
-            cr = (sgnb > 0) ? 4 : 8;
+            cr = (sgnb > 0) ? 1 << CRF_GT : 1 << CRF_LT;
         }
     }
 
     if (unlikely(invalid)) {
         result.u64[HI_IDX] = result.u64[LO_IDX] = -1;
-        cr = 1;
+        cr = 1 << CRF_SO;
     } else if (overflow) {
-        cr |= 1;
+        cr |= 1 << CRF_SO;
     } else if (zero) {
-        cr = 2;
+        cr = 1 << CRF_EQ;
     }
 
     *r = result;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (5 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:28   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf Paolo Bonzini
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

It must return 8 and place 8 in XER, but the current code uses
i directly which is 9 at this point of the code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/int_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 9c1c5cd..7955bf7 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -2573,6 +2573,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
         }
         i++;
     }
+    i = 8;
     if (update_Rc) {
         env->crf[0] = 0x2;
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (6 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:28   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf Paolo Bonzini
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/helper.h     |  3 +++
 target-ppc/int_helper.c | 22 ++++++++++++++++++++++
 target-ppc/translate.c  | 31 ++++---------------------------
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 509eae5..5342f13 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -60,6 +60,9 @@ DEF_HELPER_2(fpscr_setbit, void, env, i32)
 DEF_HELPER_2(float64_to_float32, i32, env, i64)
 DEF_HELPER_2(float32_to_float64, i64, env, i32)
 
+DEF_HELPER_1(mfocrf, tl, env)
+DEF_HELPER_3(mtocrf, void, env, tl, i32)
+
 DEF_HELPER_4(fcmpo, void, env, i64, i64, i32)
 DEF_HELPER_4(fcmpu, void, env, i64, i64, i32)
 
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 7955bf7..5fa10c7 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -306,6 +306,28 @@ target_ulong helper_popcntw(target_ulong val)
 }
 #endif
 
+void helper_mtocrf(CPUPPCState *env, target_ulong cr, uint32_t mask)
+{
+    int i;
+    for (i = 7; i >= 0; i--) {
+        if (mask & 1) {
+            env->crf[i] = cr & 0x0F;
+        }
+        cr >>= 4;
+        mask >>= 1;
+    }
+}
+
+target_ulong helper_mfocrf(CPUPPCState *env)
+{
+    uint32_t cr = 0;
+    int i;
+    for (i = 0; i < 8; i++) {
+        cr |= env->crf[i] << (32 - (i + 1) * 4);
+    }
+    return cr;
+}
+
 /*****************************************************************************/
 /* PowerPC 601 specific instructions (POWER bridge) */
 target_ulong helper_div(CPUPPCState *env, target_ulong arg1, target_ulong arg2)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 5a8267a..0a85a23 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -4145,24 +4145,7 @@ static void gen_mfcr(DisasContext *ctx)
                             cpu_gpr[rD(ctx->opcode)], crn * 4);
         }
     } else {
-        TCGv_i32 t0 = tcg_temp_new_i32();
-        tcg_gen_mov_i32(t0, cpu_crf[0]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[1]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[2]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[3]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[4]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[5]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[6]);
-        tcg_gen_shli_i32(t0, t0, 4);
-        tcg_gen_or_i32(t0, t0, cpu_crf[7]);
-        tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], t0);
-        tcg_temp_free_i32(t0);
+        gen_helper_mfocrf(cpu_gpr[rD(ctx->opcode)], cpu_env);
     }
 }
 
@@ -4257,15 +4240,9 @@ static void gen_mtcrf(DisasContext *ctx)
             tcg_temp_free_i32(temp);
         }
     } else {
-        TCGv_i32 temp = tcg_temp_new_i32();
-        tcg_gen_trunc_tl_i32(temp, cpu_gpr[rS(ctx->opcode)]);
-        for (crn = 0 ; crn < 8 ; crn++) {
-            if (crm & (1 << crn)) {
-                    tcg_gen_shri_i32(cpu_crf[7 - crn], temp, crn * 4);
-                    tcg_gen_andi_i32(cpu_crf[7 - crn], cpu_crf[7 - crn], 0xf);
-            }
-        }
-        tcg_temp_free_i32(temp);
+        TCGv_i32 t0 = tcg_const_i32(crm);
+        gen_helper_mtocrf(cpu_env, cpu_gpr[rS(ctx->opcode)], t0);
+        tcg_temp_free_i32(t0);
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (7 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:29   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr Paolo Bonzini
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 0a85a23..afbd336 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -253,21 +253,19 @@ static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
 {
     TCGv_i32 t0 = tcg_temp_new_i32();
 
-    if (set_fprf != 0) {
-        /* This case might be optimized later */
-        tcg_gen_movi_i32(t0, 1);
-        gen_helper_compute_fprf(t0, cpu_env, arg, t0);
-        if (unlikely(set_rc)) {
-            tcg_gen_mov_i32(cpu_crf[1], t0);
-        }
-        gen_helper_float_check_status(cpu_env);
-    } else if (unlikely(set_rc)) {
-        /* We always need to compute fpcc */
-        tcg_gen_movi_i32(t0, 0);
-        gen_helper_compute_fprf(t0, cpu_env, arg, t0);
+    if (set_fprf == 0 && !set_rc) {
+        return;
+    }
+
+    tcg_gen_movi_i32(t0, set_fprf != 0);
+    gen_helper_compute_fprf(t0, cpu_env, arg, t0);
+    if (set_rc) {
         tcg_gen_mov_i32(cpu_crf[1], t0);
     }
 
+    if (set_fprf != 0) {
+        gen_helper_float_check_status(cpu_env);
+    }
     tcg_temp_free_i32(t0);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (8 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 18:58   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr Paolo Bonzini
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 60 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 42 insertions(+), 18 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index afbd336..8def0ae 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -249,6 +249,21 @@ static inline void gen_reset_fpstatus(void)
     gen_helper_reset_fpstatus(cpu_env);
 }
 
+static inline void gen_op_mfcr(TCGv dest, int first_cr, int shift)
+{
+    tcg_gen_shli_i32(dest, cpu_crf[first_cr >> 2], shift);
+}
+
+static inline void gen_op_mtcr(int first_cr, TCGv src, int shift)
+{
+    if (shift) {
+        tcg_gen_shri_i32(cpu_crf[first_cr >> 2], src, shift);
+        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], cpu_crf[first_cr >> 2], 0x0F);
+    } else {
+        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], src, 0x0F);
+    }
+}
+
 static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
 {
     TCGv_i32 t0 = tcg_temp_new_i32();
@@ -260,7 +275,7 @@ static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
     tcg_gen_movi_i32(t0, set_fprf != 0);
     gen_helper_compute_fprf(t0, cpu_env, arg, t0);
     if (set_rc) {
-        tcg_gen_mov_i32(cpu_crf[1], t0);
+        gen_op_mtcr(4, t0, 0);
     }
 
     if (set_fprf != 0) {
@@ -2428,6 +2443,7 @@ static void gen_fmrgow(DisasContext *ctx)
 static void gen_mcrfs(DisasContext *ctx)
 {
     TCGv tmp = tcg_temp_new();
+    TCGv_i32 tmp32 = tcg_temp_new_i32();
     int bfa;
 
     if (unlikely(!ctx->fpu_enabled)) {
@@ -2436,10 +2452,11 @@ static void gen_mcrfs(DisasContext *ctx)
     }
     bfa = 4 * (7 - crfS(ctx->opcode));
     tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
-    tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
+    tcg_gen_trunc_tl_i32(tmp32, tmp);
     tcg_temp_free(tmp);
-    tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 0xf);
+    gen_op_mtcr(crfD(ctx->opcode) << 2, tmp32, 0);
     tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
+    tcg_temp_free(tmp32);
 }
 
 /* mffs */
@@ -2474,8 +2491,10 @@ static void gen_mtfsb0(DisasContext *ctx)
         tcg_temp_free_i32(t0);
     }
     if (unlikely(Rc(ctx->opcode) != 0)) {
-        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
-        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+        TCGv_i32 tmp32 = tcg_temp_new_i32();
+        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
+        gen_op_mtcr(4, tmp32, FPSCR_OX);
+        tcg_temp_free_i32(tmp32);
     }
 }
 
@@ -2500,8 +2519,10 @@ static void gen_mtfsb1(DisasContext *ctx)
         tcg_temp_free_i32(t0);
     }
     if (unlikely(Rc(ctx->opcode) != 0)) {
-        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
-        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+        TCGv_i32 tmp32 = tcg_temp_new_i32();
+        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
+        gen_op_mtcr(4, tmp32, FPSCR_OX);
+        tcg_temp_free_i32(tmp32);
     }
     /* We can raise a differed exception */
     gen_helper_float_check_status(cpu_env);
@@ -2535,8 +2556,10 @@ static void gen_mtfsf(DisasContext *ctx)
     gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
     tcg_temp_free_i32(t0);
     if (unlikely(Rc(ctx->opcode) != 0)) {
-        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
-        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+        TCGv_i32 tmp32 = tcg_temp_new_i32();
+        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
+        gen_op_mtcr(4, tmp32, FPSCR_OX);
+        tcg_temp_free_i32(tmp32);
     }
     /* We can raise a differed exception */
     gen_helper_float_check_status(cpu_env);
@@ -2569,8 +2592,10 @@ static void gen_mtfsfi(DisasContext *ctx)
     tcg_temp_free_i64(t0);
     tcg_temp_free_i32(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
-        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
-        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
+        TCGv_i32 tmp32 = tcg_temp_new_i32();
+        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
+        gen_op_mtcr(4, tmp32, FPSCR_OX);
+        tcg_temp_free_i32(tmp32);
     }
     /* We can raise a differed exception */
     gen_helper_float_check_status(cpu_env);
@@ -4137,10 +4162,10 @@ static void gen_mfcr(DisasContext *ctx)
     if (likely(ctx->opcode & 0x00100000)) {
         crm = CRM(ctx->opcode);
         if (likely(crm && ((crm & (crm - 1)) == 0))) {
+            TCGv_i32 t0 = tcg_temp_new_i32();
             crn = ctz32 (crm);
-            tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], cpu_crf[7 - crn]);
-            tcg_gen_shli_tl(cpu_gpr[rD(ctx->opcode)],
-                            cpu_gpr[rD(ctx->opcode)], crn * 4);
+            gen_op_mfcr(t0, (7 - crn) * 4, crn * 4);
+            tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], t0);
         }
     } else {
         gen_helper_mfocrf(cpu_gpr[rD(ctx->opcode)], cpu_env);
@@ -4233,8 +4258,7 @@ static void gen_mtcrf(DisasContext *ctx)
             TCGv_i32 temp = tcg_temp_new_i32();
             crn = ctz32 (crm);
             tcg_gen_trunc_tl_i32(temp, cpu_gpr[rS(ctx->opcode)]);
-            tcg_gen_shri_i32(temp, temp, crn * 4);
-            tcg_gen_andi_i32(cpu_crf[7 - crn], temp, 0xf);
+            gen_op_mtcr((7 - crn) * 4, temp, crn * 4);
             tcg_temp_free_i32(temp);
         }
     } else {
@@ -8159,13 +8183,13 @@ static void gen_set_cr6_from_fpscr(DisasContext *ctx)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
     tcg_gen_trunc_tl_i32(tmp, cpu_fpscr);
-    tcg_gen_shri_i32(cpu_crf[1], tmp, 28);
+    gen_op_mtcr(4, tmp, 28);
     tcg_temp_free_i32(tmp);
 }
 #else
 static void gen_set_cr6_from_fpscr(DisasContext *ctx)
 {
-        tcg_gen_shri_tl(cpu_crf[1], cpu_fpscr, 28);
+    gen_op_mtcr(4, cpu_fpscr, 28);
 }
 #endif
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (9 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 19:41   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

It sets CR1, not CR6 (and the spec agrees).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 8def0ae..67f13f7 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -8179,7 +8179,7 @@ static inline TCGv_ptr gen_fprp_ptr(int reg)
 }
 
 #if defined(TARGET_PPC64)
-static void gen_set_cr6_from_fpscr(DisasContext *ctx)
+static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
     tcg_gen_trunc_tl_i32(tmp, cpu_fpscr);
@@ -8187,7 +8187,7 @@ static void gen_set_cr6_from_fpscr(DisasContext *ctx)
     tcg_temp_free_i32(tmp);
 }
 #else
-static void gen_set_cr6_from_fpscr(DisasContext *ctx)
+static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 {
     gen_op_mtcr(4, cpu_fpscr, 28);
 }
@@ -8207,7 +8207,7 @@ static void gen_##name(DisasContext *ctx)        \
     rb = gen_fprp_ptr(rB(ctx->opcode));          \
     gen_helper_##name(cpu_env, rd, ra, rb);      \
     if (unlikely(Rc(ctx->opcode) != 0)) {        \
-        gen_set_cr6_from_fpscr(ctx);             \
+        gen_set_cr1_from_fpscr(ctx);             \
     }                                            \
     tcg_temp_free_ptr(rd);                       \
     tcg_temp_free_ptr(ra);                       \
@@ -8265,7 +8265,7 @@ static void gen_##name(DisasContext *ctx)             \
     u32_2 = tcg_const_i32(u32f2(ctx->opcode));        \
     gen_helper_##name(cpu_env, rt, rb, u32_1, u32_2); \
     if (unlikely(Rc(ctx->opcode) != 0)) {             \
-        gen_set_cr6_from_fpscr(ctx);                  \
+        gen_set_cr1_from_fpscr(ctx);                  \
     }                                                 \
     tcg_temp_free_ptr(rt);                            \
     tcg_temp_free_ptr(rb);                            \
@@ -8289,7 +8289,7 @@ static void gen_##name(DisasContext *ctx)        \
     i32 = tcg_const_i32(i32fld(ctx->opcode));    \
     gen_helper_##name(cpu_env, rt, ra, rb, i32); \
     if (unlikely(Rc(ctx->opcode) != 0)) {        \
-        gen_set_cr6_from_fpscr(ctx);             \
+        gen_set_cr1_from_fpscr(ctx);             \
     }                                            \
     tcg_temp_free_ptr(rt);                       \
     tcg_temp_free_ptr(rb);                       \
@@ -8310,7 +8310,7 @@ static void gen_##name(DisasContext *ctx)        \
     rb = gen_fprp_ptr(rB(ctx->opcode));          \
     gen_helper_##name(cpu_env, rt, rb);          \
     if (unlikely(Rc(ctx->opcode) != 0)) {        \
-        gen_set_cr6_from_fpscr(ctx);             \
+        gen_set_cr1_from_fpscr(ctx);             \
     }                                            \
     tcg_temp_free_ptr(rt);                       \
     tcg_temp_free_ptr(rb);                       \
@@ -8331,7 +8331,7 @@ static void gen_##name(DisasContext *ctx)          \
     i32 = tcg_const_i32(i32fld(ctx->opcode));      \
     gen_helper_##name(cpu_env, rt, rs, i32);       \
     if (unlikely(Rc(ctx->opcode) != 0)) {          \
-        gen_set_cr6_from_fpscr(ctx);               \
+        gen_set_cr1_from_fpscr(ctx);               \
     }                                              \
     tcg_temp_free_ptr(rt);                         \
     tcg_temp_free_ptr(rs);                         \
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (10 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-08-29 18:30   ` Richard Henderson
  2014-09-03 19:41   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift Paolo Bonzini
                   ` (5 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 67f13f7..48c7b66 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -789,27 +789,26 @@ static void gen_cmpli(DisasContext *ctx)
 /* isel (PowerPC 2.03 specification) */
 static void gen_isel(DisasContext *ctx)
 {
-    int l1, l2;
     uint32_t bi = rC(ctx->opcode);
     uint32_t mask;
     TCGv_i32 t0;
-
-    l1 = gen_new_label();
-    l2 = gen_new_label();
+    TCGv t1, true_op, zero;
 
     mask = 1 << (3 - (bi & 0x03));
     t0 = tcg_temp_new_i32();
     tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
-    tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
+    t1 = tcg_temp_new();
+    tcg_gen_extu_i32_tl(t1, t0);
+    zero = tcg_const_tl(0);
     if (rA(ctx->opcode) == 0)
-        tcg_gen_movi_tl(cpu_gpr[rD(ctx->opcode)], 0);
+        true_op = zero;
     else
-        tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    tcg_gen_br(l2);
-    gen_set_label(l1);
-    tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
-    gen_set_label(l2);
-    tcg_temp_free_i32(t0);
+        true_op = cpu_gpr[rA(ctx->opcode)];
+
+    tcg_gen_movcond_tl(cpu_gpr[rD(ctx->opcode)], t1, zero,
+                       true_op, cpu_gpr[rB(ctx->opcode)], TCG_COND_NE);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free(zero);
 }
 
 /* cmpb: PowerPC 2.05 specification */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (11 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-03 20:59   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf Paolo Bonzini
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

This will match the code we use in fpu_helper.c when we flip
CRF_* bit-endianness.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 48c7b66..4ce7af4 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -794,7 +794,7 @@ static void gen_isel(DisasContext *ctx)
     TCGv_i32 t0;
     TCGv t1, true_op, zero;
 
-    mask = 1 << (3 - (bi & 0x03));
+    mask = 0x08 >> (bi & 0x03);
     t0 = tcg_temp_new_i32();
     tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
     t1 = tcg_temp_new();
@@ -3870,7 +3870,7 @@ static inline void gen_bcond(DisasContext *ctx, int type)
     if ((bo & 0x10) == 0) {
         /* Test CR */
         uint32_t bi = BI(ctx->opcode);
-        uint32_t mask = 1 << (3 - (bi & 0x03));
+        uint32_t mask = 0x08 >> (bi & 0x03);
         TCGv_i32 temp = tcg_temp_new_i32();
 
         if (bo & 0x8) {
@@ -3952,7 +3952,7 @@ static void glue(gen_, name)(DisasContext *ctx)
     else                                                                      \
         tcg_gen_mov_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2]);                 \
     tcg_op(t0, t0, t1);                                                       \
-    bitmask = 1 << (3 - (crbD(ctx->opcode) & 0x03));                          \
+    bitmask = 0x08 >> (crbD(ctx->opcode) & 0x03);                             \
     tcg_gen_andi_i32(t0, t0, bitmask);                                        \
     tcg_gen_andi_i32(t1, cpu_crf[crbD(ctx->opcode) >> 2], ~bitmask);          \
     tcg_gen_or_i32(cpu_crf[crbD(ctx->opcode) >> 2], t0, t1);                  \
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (12 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-04 18:26   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers Paolo Bonzini
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

These two functions will group together four CR bits into a single
value, once we change the representation of condition registers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/elfload.c     |  2 +-
 linux-user/main.c        |  2 +-
 linux-user/signal.c      |  4 ++--
 monitor.c                |  2 +-
 target-ppc/cpu.h         | 10 ++++++++++
 target-ppc/excp_helper.c |  2 +-
 target-ppc/fpu_helper.c  |  6 ++++--
 target-ppc/gdbstub.c     |  4 ++--
 target-ppc/int_helper.c  | 16 ++++++++--------
 target-ppc/kvm.c         |  4 ++--
 target-ppc/translate.c   | 13 +++++++------
 11 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index bea803b..3769ae6 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -858,7 +858,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUPPCState *en
     (*regs)[37] = tswapreg(env->xer);
 
     for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-        ccr |= env->crf[i] << (32 - ((i + 1) * 4));
+        ccr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
     }
     (*regs)[38] = tswapreg(ccr);
 }
diff --git a/linux-user/main.c b/linux-user/main.c
index 472a16d..152c031 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -1550,7 +1550,7 @@ static int do_store_exclusive(CPUPPCState *env)
                 }
             }
         }
-        env->crf[0] = (stored << 1) | xer_so;
+        ppc_set_crf(env, 0, (stored << 1) | xer_so);
         env->reserve_addr = (target_ulong)-1;
     }
     if (!segv) {
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 26929c5..4f5d79f 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -4512,7 +4512,7 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame,
     __put_user(env->xer, &frame->mc_gregs[TARGET_PT_XER]);
 
     for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-        ccr |= env->crf[i] << (32 - ((i + 1) * 4));
+        ccr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
     }
     __put_user(ccr, &frame->mc_gregs[TARGET_PT_CCR]);
 
@@ -4591,7 +4591,7 @@ static void restore_user_regs(CPUPPCState *env,
     __get_user(ccr, &frame->mc_gregs[TARGET_PT_CCR]);
 
     for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-        env->crf[i] = (ccr >> (32 - ((i + 1) * 4))) & 0xf;
+        ppc_set_crf(env, i, (ccr >> (32 - ((i + 1) * 4))) & 0xf);
     }
 
     if (!sig) {
diff --git a/monitor.c b/monitor.c
index ec73dd4..97d72f4 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2968,7 +2968,7 @@ static target_long monitor_get_ccr (const struct MonitorDef *md, int val)
 
     u = 0;
     for (i = 0; i < 8; i++)
-        u |= env->crf[i] << (32 - (4 * (i + 1)));
+        u |= ppc_get_crf(env, i) << (32 - (4 * (i + 1)));
 
     return u;
 }
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index c1cb27f..05c29b2 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1198,6 +1198,16 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr);
 
 void store_fpscr(CPUPPCState *env, uint64_t arg, uint32_t mask);
 
+static inline uint32_t ppc_get_crf(const CPUPPCState *env, int i)
+{
+    return env->crf[i];
+}
+
+static inline void ppc_set_crf(CPUPPCState *env, int i, uint32_t val)
+{
+    env->crf[i] = val;
+}
+
 static inline uint64_t ppc_dump_gpr(CPUPPCState *env, int gprn)
 {
     uint64_t gprv;
diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
index bf25d44..522fce4 100644
--- a/target-ppc/excp_helper.c
+++ b/target-ppc/excp_helper.c
@@ -504,7 +504,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
                          env->error_code);
             }
 #endif
-            msr |= env->crf[0] << 28;
+            msr |= ppc_get_crf(env, 0) << 28;
             msr |= env->error_code; /* key, D/I, S/L bits */
             /* Set way using a LRU mechanism */
             msr |= ((env->last_way + 1) & (env->nb_ways - 1)) << 17;
diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 0fe006a..1ccbcf3 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1099,7 +1099,8 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
     env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
-    env->crf[crfD] = (1 << ret);
+    ppc_set_crf(env, crfD, 1 << ret);
+
     if (unlikely(ret == CRF_SO
                  && (float64_is_signaling_nan(farg1.d) ||
                      float64_is_signaling_nan(farg2.d)))) {
@@ -1130,7 +1131,8 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
     env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
-    env->crf[crfD] = (1 << ret);
+    ppc_set_crf(env, crfD, 1 << ret);
+
     if (unlikely(ret == CRF_SO)) {
         if (float64_is_signaling_nan(farg1.d) ||
             float64_is_signaling_nan(farg2.d)) {
diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
index bad49ae..e0f340c 100644
--- a/target-ppc/gdbstub.c
+++ b/target-ppc/gdbstub.c
@@ -139,7 +139,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
                 uint32_t cr = 0;
                 int i;
                 for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-                    cr |= env->crf[i] << (32 - ((i + 1) * 4));
+                    cr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
                 }
                 gdb_get_reg32(mem_buf, cr);
                 break;
@@ -247,7 +247,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
                 uint32_t cr = ldl_p(mem_buf);
                 int i;
                 for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-                    env->crf[i] = (cr >> (32 - ((i + 1) * 4))) & 0xF;
+                    ppc_set_crf(env, i, (cr >> (32 - ((i + 1) * 4))) & 0xF);
                 }
                 break;
             }
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 5fa10c7..2287064 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -311,7 +311,7 @@ void helper_mtocrf(CPUPPCState *env, target_ulong cr, uint32_t mask)
     int i;
     for (i = 7; i >= 0; i--) {
         if (mask & 1) {
-            env->crf[i] = cr & 0x0F;
+            ppc_set_crf(env, i, cr & 0x0F);
         }
         cr >>= 4;
         mask >>= 1;
@@ -323,7 +323,7 @@ target_ulong helper_mfocrf(CPUPPCState *env)
     uint32_t cr = 0;
     int i;
     for (i = 0; i < 8; i++) {
-        cr |= env->crf[i] << (32 - (i + 1) * 4);
+        cr |= ppc_get_crf(env, i) << (32 - (i + 1) * 4);
     }
     return cr;
 }
@@ -679,7 +679,7 @@ VCF(sx, int32_to_float32, s32)
             none |= result;                                             \
         }                                                               \
         if (record) {                                                   \
-            env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1);       \
+            ppc_set_crf(env, 6, ((all != 0) << 3) | ((none == 0) << 1)); \
         }                                                               \
     }
 #define VCMP(suffix, compare, element)          \
@@ -725,7 +725,7 @@ VCMP(gtsd, >, s64)
             none |= result;                                             \
         }                                                               \
         if (record) {                                                   \
-            env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1);       \
+            ppc_set_crf(env, 6, ((all != 0) << 3) | ((none == 0) << 1)); \
         }                                                               \
     }
 #define VCMPFP(suffix, compare, order)          \
@@ -759,7 +759,7 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
         }
     }
     if (record) {
-        env->crf[6] = (all_in == 0) << 1;
+        ppc_set_crf(env, 6, (all_in == 0) << 1);
     }
 }
 
@@ -2580,7 +2580,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
     for (mask = 0xFF000000; mask != 0; mask = mask >> 8) {
         if ((high & mask) == 0) {
             if (update_Rc) {
-                env->crf[0] = 0x4;
+                ppc_set_crf(env, 0, 0x4);
             }
             goto done;
         }
@@ -2589,7 +2589,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
     for (mask = 0xFF000000; mask != 0; mask = mask >> 8) {
         if ((low & mask) == 0) {
             if (update_Rc) {
-                env->crf[0] = 0x8;
+                ppc_set_crf(env, 0, 0x8);
             }
             goto done;
         }
@@ -2597,7 +2597,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
     }
     i = 8;
     if (update_Rc) {
-        env->crf[0] = 0x2;
+        ppc_set_crf(env, 0, 0x2);
     }
  done:
     env->xer = (env->xer & ~0x7F) | i;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 42718f7..a4eca17 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -795,7 +795,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 
     regs.cr = 0;
     for (i = 0; i < 8; i++) {
-        regs.cr |= (env->crf[i] & 15) << (4 * (7 - i));
+        regs.cr |= ppc_get_crf(env, i) << (4 * (7 - i));
     }
 
     ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, &regs);
@@ -914,7 +914,7 @@ int kvm_arch_get_registers(CPUState *cs)
 
     cr = regs.cr;
     for (i = 7; i >= 0; i--) {
-        env->crf[i] = cr & 15;
+        ppc_set_crf(env->cr[i], cr & 15);
         cr >>= 4;
     }
 
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 4ce7af4..1ed6a8f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -11071,18 +11071,19 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
             cpu_fprintf(f, "\n");
     }
     cpu_fprintf(f, "CR ");
-    for (i = 0; i < 8; i++)
-        cpu_fprintf(f, "%01x", env->crf[i]);
+    for (i = 0; i < 8; i++) {
+        cpu_fprintf(f, "%01x", ppc_get_crf(env, i));
+    }
     cpu_fprintf(f, "  [");
     for (i = 0; i < 8; i++) {
         char a = '-';
-        if (env->crf[i] & 0x08)
+        if (ppc_get_crf(env, i) & 0x08)
             a = 'L';
-        else if (env->crf[i] & 0x04)
+        else if (ppc_get_crf(env, i) & 0x04)
             a = 'G';
-        else if (env->crf[i] & 0x02)
+        else if (ppc_get_crf(env, i) & 0x02)
             a = 'E';
-        cpu_fprintf(f, " %c%c", a, env->crf[i] & 0x01 ? 'O' : ' ');
+        cpu_fprintf(f, " %c%c", a, ppc_get_crf(env, i) & 0x01 ? 'O' : ' ');
     }
     cpu_fprintf(f, " ]             RES " TARGET_FMT_lx "\n",
                 env->reserve_addr);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (13 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-09-04 18:27   ` Tom Musta
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 16/17] ppc: inline ppc_get_crf/ppc_set_crf when clearer Paolo Bonzini
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

This makes comparisons much smaller and faster.  The speedup is
approximately 10% on user-mode emulation on x86 host, 3-4% on PPC.

Note that CRF_* constants are flipped to match PowerPC's big
bit-endianness.  Previously, the CR register was effectively stored
in mixed endianness, so now there is less indirection going on.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/main.c       |   4 +-
 target-ppc/cpu.h        |  33 ++++--
 target-ppc/fpu_helper.c |  39 ++----
 target-ppc/helper.h     |   6 -
 target-ppc/int_helper.c |   2 +-
 target-ppc/machine.c    |   9 ++
 target-ppc/translate.c  | 307 +++++++++++++++++++++++++-----------------------
 7 files changed, 204 insertions(+), 196 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 152c031..b403f24 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -1929,7 +1929,7 @@ void cpu_loop(CPUPPCState *env)
              * PPC ABI uses overflow flag in cr0 to signal an error
              * in syscalls.
              */
-            env->crf[0] &= ~0x1;
+            env->cr[CRF_SO] = 0;
             ret = do_syscall(env, env->gpr[0], env->gpr[3], env->gpr[4],
                              env->gpr[5], env->gpr[6], env->gpr[7],
                              env->gpr[8], 0, 0);
@@ -1939,7 +1939,7 @@ void cpu_loop(CPUPPCState *env)
                 break;
             }
             if (ret > (target_ulong)(-515)) {
-                env->crf[0] |= 0x1;
+                env->cr[CRF_SO] = 1;
                 ret = -ret;
             }
             env->gpr[3] = ret;
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 05c29b2..67510e8 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -939,7 +939,7 @@ struct CPUPPCState {
     /* CTR */
     target_ulong ctr;
     /* condition register */
-    uint32_t crf[8];
+    uint32_t cr[32];
 #if defined(TARGET_PPC64)
     /* CFAR */
     target_ulong cfar;
@@ -1058,6 +1058,9 @@ struct CPUPPCState {
     uint64_t dtl_addr, dtl_size;
 #endif /* TARGET_PPC64 */
 
+    /* condition register, for migration compatibility */
+    uint32_t crf[8];
+
     int error_code;
     uint32_t pending_interrupts;
 #if !defined(CONFIG_USER_ONLY)
@@ -1200,12 +1203,20 @@ void store_fpscr(CPUPPCState *env, uint64_t arg, uint32_t mask);
 
 static inline uint32_t ppc_get_crf(const CPUPPCState *env, int i)
 {
-    return env->crf[i];
+    uint32_t r;
+    r = env->cr[i * 4];
+    r = (r << 1) | (env->cr[i * 4 + 1]);
+    r = (r << 1) | (env->cr[i * 4 + 2]);
+    r = (r << 1) | (env->cr[i * 4 + 3]);
+    return r;
 }
 
 static inline void ppc_set_crf(CPUPPCState *env, int i, uint32_t val)
 {
-    env->crf[i] = val;
+    env->cr[i * 4 + 0] = (val & 0x08) != 0;
+    env->cr[i * 4 + 1] = (val & 0x04) != 0;
+    env->cr[i * 4 + 2] = (val & 0x02) != 0;
+    env->cr[i * 4 + 3] = (val & 0x01) != 0;
 }
 
 static inline uint64_t ppc_dump_gpr(CPUPPCState *env, int gprn)
@@ -1256,14 +1267,14 @@ static inline int cpu_mmu_index (CPUPPCState *env)
 
 /*****************************************************************************/
 /* CRF definitions */
-#define CRF_LT        3
-#define CRF_GT        2
-#define CRF_EQ        1
-#define CRF_SO        0
-#define CRF_CH        (1 << CRF_LT)
-#define CRF_CL        (1 << CRF_GT)
-#define CRF_CH_OR_CL  (1 << CRF_EQ)
-#define CRF_CH_AND_CL (1 << CRF_SO)
+#define CRF_LT        0
+#define CRF_GT        1
+#define CRF_EQ        2
+#define CRF_SO        3
+#define CRF_CH        CRF_LT
+#define CRF_CL        CRF_GT
+#define CRF_CH_OR_CL  CRF_EQ
+#define CRF_CH_AND_CL CRF_SO
 
 /* XER definitions */
 #define XER_SO  31
diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 1ccbcf3..9574ebe 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1098,8 +1098,8 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
     }
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
-    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
-    ppc_set_crf(env, crfD, 1 << ret);
+    env->fpscr |= (0x08 << FPSCR_FPRF) >> ret;
+    ppc_set_crf(env, crfD, 0x08 >> ret);
 
     if (unlikely(ret == CRF_SO
                  && (float64_is_signaling_nan(farg1.d) ||
@@ -1130,8 +1130,8 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
     }
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
-    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
-    ppc_set_crf(env, crfD, 1 << ret);
+    env->fpscr |= (0x08 << FPSCR_FPRF) >> ret;
+    ppc_set_crf(env, crfD, 0x08 >> ret);
 
     if (unlikely(ret == CRF_SO)) {
         if (float64_is_signaling_nan(farg1.d) ||
@@ -1403,7 +1403,7 @@ static inline uint32_t efscmplt(CPUPPCState *env, uint32_t op1, uint32_t op2)
 
     u1.l = op1;
     u2.l = op2;
-    return float32_lt(u1.f, u2.f, &env->vec_status) ? 4 : 0;
+    return float32_lt(u1.f, u2.f, &env->vec_status);
 }
 
 static inline uint32_t efscmpgt(CPUPPCState *env, uint32_t op1, uint32_t op2)
@@ -1412,7 +1412,7 @@ static inline uint32_t efscmpgt(CPUPPCState *env, uint32_t op1, uint32_t op2)
 
     u1.l = op1;
     u2.l = op2;
-    return float32_le(u1.f, u2.f, &env->vec_status) ? 0 : 4;
+    return !float32_le(u1.f, u2.f, &env->vec_status);
 }
 
 static inline uint32_t efscmpeq(CPUPPCState *env, uint32_t op1, uint32_t op2)
@@ -1421,7 +1421,7 @@ static inline uint32_t efscmpeq(CPUPPCState *env, uint32_t op1, uint32_t op2)
 
     u1.l = op1;
     u2.l = op2;
-    return float32_eq(u1.f, u2.f, &env->vec_status) ? 4 : 0;
+    return float32_eq(u1.f, u2.f, &env->vec_status);
 }
 
 static inline uint32_t efststlt(CPUPPCState *env, uint32_t op1, uint32_t op2)
@@ -1465,25 +1465,6 @@ static inline uint32_t evcmp_merge(int t0, int t1)
     return (t0 << 3) | (t1 << 2) | ((t0 | t1) << 1) | (t0 & t1);
 }
 
-#define HELPER_VECTOR_SPE_CMP(name)                                     \
-    uint32_t helper_ev##name(CPUPPCState *env, uint64_t op1, uint64_t op2) \
-    {                                                                   \
-        return evcmp_merge(e##name(env, op1 >> 32, op2 >> 32),          \
-                           e##name(env, op1, op2));                     \
-    }
-/* evfststlt */
-HELPER_VECTOR_SPE_CMP(fststlt);
-/* evfststgt */
-HELPER_VECTOR_SPE_CMP(fststgt);
-/* evfststeq */
-HELPER_VECTOR_SPE_CMP(fststeq);
-/* evfscmplt */
-HELPER_VECTOR_SPE_CMP(fscmplt);
-/* evfscmpgt */
-HELPER_VECTOR_SPE_CMP(fscmpgt);
-/* evfscmpeq */
-HELPER_VECTOR_SPE_CMP(fscmpeq);
-
 /* Double-precision floating-point conversion */
 uint64_t helper_efdcfsi(CPUPPCState *env, uint32_t val)
 {
@@ -1725,7 +1706,7 @@ uint32_t helper_efdtstlt(CPUPPCState *env, uint64_t op1, uint64_t op2)
 
     u1.ll = op1;
     u2.ll = op2;
-    return float64_lt(u1.d, u2.d, &env->vec_status) ? 4 : 0;
+    return float64_lt(u1.d, u2.d, &env->vec_status);
 }
 
 uint32_t helper_efdtstgt(CPUPPCState *env, uint64_t op1, uint64_t op2)
@@ -1734,7 +1715,7 @@ uint32_t helper_efdtstgt(CPUPPCState *env, uint64_t op1, uint64_t op2)
 
     u1.ll = op1;
     u2.ll = op2;
-    return float64_le(u1.d, u2.d, &env->vec_status) ? 0 : 4;
+    return !float64_le(u1.d, u2.d, &env->vec_status);
 }
 
 uint32_t helper_efdtsteq(CPUPPCState *env, uint64_t op1, uint64_t op2)
@@ -1743,7 +1724,7 @@ uint32_t helper_efdtsteq(CPUPPCState *env, uint64_t op1, uint64_t op2)
 
     u1.ll = op1;
     u2.ll = op2;
-    return float64_eq_quiet(u1.d, u2.d, &env->vec_status) ? 4 : 0;
+    return float64_eq_quiet(u1.d, u2.d, &env->vec_status);
 }
 
 uint32_t helper_efdcmplt(CPUPPCState *env, uint64_t op1, uint64_t op2)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 5342f13..8d6a92b 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -493,12 +493,6 @@ DEF_HELPER_3(efststeq, i32, env, i32, i32)
 DEF_HELPER_3(efscmplt, i32, env, i32, i32)
 DEF_HELPER_3(efscmpgt, i32, env, i32, i32)
 DEF_HELPER_3(efscmpeq, i32, env, i32, i32)
-DEF_HELPER_3(evfststlt, i32, env, i64, i64)
-DEF_HELPER_3(evfststgt, i32, env, i64, i64)
-DEF_HELPER_3(evfststeq, i32, env, i64, i64)
-DEF_HELPER_3(evfscmplt, i32, env, i64, i64)
-DEF_HELPER_3(evfscmpgt, i32, env, i64, i64)
-DEF_HELPER_3(evfscmpeq, i32, env, i64, i64)
 DEF_HELPER_2(efdcfsi, i64, env, i32)
 DEF_HELPER_2(efdcfsid, i64, env, i64)
 DEF_HELPER_2(efdcfui, i64, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 2287064..d3ace6a 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -2602,7 +2602,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
  done:
     env->xer = (env->xer & ~0x7F) | i;
     if (update_Rc) {
-        env->crf[0] |= xer_so;
+        env->cr[CRF_SO] = xer_so;
     }
     return i;
 }
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index c801b82..9fa309a 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -132,6 +132,10 @@ static void cpu_pre_save(void *opaque)
     CPUPPCState *env = &cpu->env;
     int i;
 
+    for (i = 0; i < 8; i++) {
+        env->crf[i] = ppc_get_crf(env, i);
+    }
+
     env->spr[SPR_LR] = env->lr;
     env->spr[SPR_CTR] = env->ctr;
     env->spr[SPR_XER] = env->xer;
@@ -165,6 +169,11 @@ static int cpu_post_load(void *opaque, int version_id)
      * software has to take care of running QEMU in a compatible mode.
      */
     env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
+
+    for (i = 0; i < 8; i++) {
+        ppc_set_crf(env, i, env->crf[i]);
+    }
+
     env->lr = env->spr[SPR_LR];
     env->ctr = env->spr[SPR_CTR];
     env->xer = env->spr[SPR_XER];
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 1ed6a8f..dd19b39 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -53,13 +53,13 @@ static char cpu_reg_names[10*3 + 22*4 /* GPR */
     + 10*4 + 22*5 /* FPR */
     + 2*(10*6 + 22*7) /* AVRh, AVRl */
     + 10*5 + 22*6 /* VSR */
-    + 8*5 /* CRF */];
+    + 32*8 /* CR */];
 static TCGv cpu_gpr[32];
 static TCGv cpu_gprh[32];
 static TCGv_i64 cpu_fpr[32];
 static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
 static TCGv_i64 cpu_vsr[32];
-static TCGv_i32 cpu_crf[8];
+static TCGv_i32 cpu_cr[32];
 static TCGv cpu_nip;
 static TCGv cpu_msr;
 static TCGv cpu_ctr;
@@ -89,12 +89,13 @@ void ppc_translate_init(void)
     p = cpu_reg_names;
     cpu_reg_names_size = sizeof(cpu_reg_names);
 
-    for (i = 0; i < 8; i++) {
-        snprintf(p, cpu_reg_names_size, "crf%d", i);
-        cpu_crf[i] = tcg_global_mem_new_i32(TCG_AREG0,
-                                            offsetof(CPUPPCState, crf[i]), p);
-        p += 5;
-        cpu_reg_names_size -= 5;
+    for (i = 0; i < 32; i++) {
+        static const char names[] = "lt\0gt\0eq\0so";
+        snprintf(p, cpu_reg_names_size, "cr%d[%s]", i >> 2, names + (i & 3) * 3);
+        cpu_cr[i] = tcg_global_mem_new_i32(TCG_AREG0,
+                                           offsetof(CPUPPCState, cr[i]), p);
+        p += 8;
+        cpu_reg_names_size -= 8;
     }
 
     for (i = 0; i < 32; i++) {
@@ -251,17 +252,30 @@ static inline void gen_reset_fpstatus(void)
 
 static inline void gen_op_mfcr(TCGv dest, int first_cr, int shift)
 {
-    tcg_gen_shli_i32(dest, cpu_crf[first_cr >> 2], shift);
+    TCGv_i32 t0 = tcg_temp_new_i32();
+
+    tcg_gen_shli_i32(dest, cpu_cr[first_cr + 3], shift);
+    tcg_gen_shli_i32(t0, cpu_cr[first_cr + 2], shift + 1);
+    tcg_gen_or_i32(dest, dest, t0);
+    tcg_gen_shli_i32(t0, cpu_cr[first_cr + 1], shift + 2);
+    tcg_gen_or_i32(dest, dest, t0);
+    tcg_gen_shli_i32(t0, cpu_cr[first_cr], shift + 3);
 }
 
 static inline void gen_op_mtcr(int first_cr, TCGv src, int shift)
 {
     if (shift) {
-        tcg_gen_shri_i32(cpu_crf[first_cr >> 2], src, shift);
-        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], cpu_crf[first_cr >> 2], 0x0F);
+        tcg_gen_shri_i32(cpu_cr[first_cr + 3], src, shift);
+        tcg_gen_andi_i32(cpu_cr[first_cr + 3], cpu_cr[first_cr + 3], 1);
     } else {
-        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], src, 0x0F);
+        tcg_gen_andi_i32(cpu_cr[first_cr + 3], src, 1);
     }
+    tcg_gen_shri_i32(cpu_cr[first_cr + 2], src, shift + 1);
+    tcg_gen_andi_i32(cpu_cr[first_cr + 2], cpu_cr[first_cr + 2], 1);
+    tcg_gen_shri_i32(cpu_cr[first_cr + 1], src, shift + 2);
+    tcg_gen_andi_i32(cpu_cr[first_cr + 1], cpu_cr[first_cr + 1], 1);
+    tcg_gen_shri_i32(cpu_cr[first_cr], src, shift + 3);
+    tcg_gen_andi_i32(cpu_cr[first_cr], cpu_cr[first_cr], 1);
 }
 
 static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
@@ -675,27 +689,19 @@ static bool is_user_mode(DisasContext *ctx)
 static inline void gen_op_cmp(TCGv arg0, TCGv arg1, int s, int crf)
 {
     TCGv t0 = tcg_temp_new();
-    TCGv_i32 t1 = tcg_temp_new_i32();
 
-    tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_SO], cpu_so);
 
     tcg_gen_setcond_tl((s ? TCG_COND_LT: TCG_COND_LTU), t0, arg0, arg1);
-    tcg_gen_trunc_tl_i32(t1, t0);
-    tcg_gen_shli_i32(t1, t1, CRF_LT);
-    tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], t1);
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_LT], t0);
 
     tcg_gen_setcond_tl((s ? TCG_COND_GT: TCG_COND_GTU), t0, arg0, arg1);
-    tcg_gen_trunc_tl_i32(t1, t0);
-    tcg_gen_shli_i32(t1, t1, CRF_GT);
-    tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], t1);
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_GT], t0);
 
     tcg_gen_setcond_tl(TCG_COND_EQ, t0, arg0, arg1);
-    tcg_gen_trunc_tl_i32(t1, t0);
-    tcg_gen_shli_i32(t1, t1, CRF_EQ);
-    tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], t1);
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_EQ], t0);
 
     tcg_temp_free(t0);
-    tcg_temp_free_i32(t1);
 }
 
 static inline void gen_op_cmpi(TCGv arg0, target_ulong arg1, int s, int crf)
@@ -707,17 +713,22 @@ static inline void gen_op_cmpi(TCGv arg0, target_ulong arg1, int s, int crf)
 
 static inline void gen_op_cmp32(TCGv arg0, TCGv arg1, int s, int crf)
 {
-    TCGv t0, t1;
+    TCGv_i32 t0, t1;
+
     t0 = tcg_temp_new();
     t1 = tcg_temp_new();
-    if (s) {
-        tcg_gen_ext32s_tl(t0, arg0);
-        tcg_gen_ext32s_tl(t1, arg1);
-    } else {
-        tcg_gen_ext32u_tl(t0, arg0);
-        tcg_gen_ext32u_tl(t1, arg1);
-    }
-    gen_op_cmp(t0, t1, s, crf);
+    tcg_gen_trunc_tl_i32(t0, arg0);
+    tcg_gen_trunc_tl_i32(t1, arg1);
+
+    tcg_gen_setcond_i32((s ? TCG_COND_LT: TCG_COND_LTU), 
+                        cpu_cr[crf * 4 + CRF_LT], t0, t1);
+
+    tcg_gen_setcond_i32((s ? TCG_COND_GT: TCG_COND_GTU), 
+                        cpu_cr[crf * 4 + CRF_GT], t0, t1);
+
+    tcg_gen_setcond_i32(TCG_COND_EQ, 
+                        cpu_cr[crf * 4 + CRF_EQ], t0, t1);
+
     tcg_temp_free(t1);
     tcg_temp_free(t0);
 }
@@ -790,15 +801,10 @@ static void gen_cmpli(DisasContext *ctx)
 static void gen_isel(DisasContext *ctx)
 {
     uint32_t bi = rC(ctx->opcode);
-    uint32_t mask;
-    TCGv_i32 t0;
     TCGv t1, true_op, zero;
 
-    mask = 0x08 >> (bi & 0x03);
-    t0 = tcg_temp_new_i32();
-    tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
     t1 = tcg_temp_new();
-    tcg_gen_extu_i32_tl(t1, t0);
+    tcg_gen_extu_i32_tl(t1, cpu_cr[bi]);
     zero = tcg_const_tl(0);
     if (rA(ctx->opcode) == 0)
         true_op = zero;
@@ -2288,21 +2294,29 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
 
 static void gen_ftdiv(DisasContext *ctx)
 {
+    TCGv_i32 crf;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
+    crf = tcg_temp_new_i32();
+    gen_helper_ftdiv(crf, cpu_fpr[rA(ctx->opcode)],
                      cpu_fpr[rB(ctx->opcode)]);
+    gen_op_mtcr(crfD(ctx->opcode) << 2, crf, 0);
+    tcg_temp_free_i32(crf);
 }
 
 static void gen_ftsqrt(DisasContext *ctx)
 {
+    TCGv_i32 crf;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    crf = tcg_temp_new_i32();
+    gen_helper_ftsqrt(crf, cpu_fpr[rB(ctx->opcode)]);
+    gen_op_mtcr(crfD(ctx->opcode) << 2, crf, 0);
+    tcg_temp_free_i32(crf);
 }
 
 
@@ -3300,10 +3314,13 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
 {
     int l1;
 
-    tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+    tcg_gen_trunc_tl_i32(cpu_cr[CRF_SO], cpu_so);
+    tcg_gen_movi_i32(cpu_cr[CRF_LT], 0);
+    tcg_gen_movi_i32(cpu_cr[CRF_EQ], 0);
+    tcg_gen_movi_i32(cpu_cr[CRF_GT], 0);
     l1 = gen_new_label();
     tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
-    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
+    tcg_gen_movi_i32(cpu_cr[CRF_EQ], 1);
 #if defined(TARGET_PPC64)
     if (size == 8) {
         gen_qemu_st64(ctx, cpu_gpr[reg], EA);
@@ -3870,17 +3887,11 @@ static inline void gen_bcond(DisasContext *ctx, int type)
     if ((bo & 0x10) == 0) {
         /* Test CR */
         uint32_t bi = BI(ctx->opcode);
-        uint32_t mask = 0x08 >> (bi & 0x03);
-        TCGv_i32 temp = tcg_temp_new_i32();
-
         if (bo & 0x8) {
-            tcg_gen_andi_i32(temp, cpu_crf[bi >> 2], mask);
-            tcg_gen_brcondi_i32(TCG_COND_EQ, temp, 0, l1);
+            tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cr[bi], 0, l1);
         } else {
-            tcg_gen_andi_i32(temp, cpu_crf[bi >> 2], mask);
-            tcg_gen_brcondi_i32(TCG_COND_NE, temp, 0, l1);
+            tcg_gen_brcondi_i32(TCG_COND_NE, cpu_cr[bi], 0, l1);
         }
-        tcg_temp_free_i32(temp);
     }
     gen_update_cfar(ctx, ctx->nip);
     if (type == BCOND_IM) {
@@ -3929,35 +3940,11 @@ static void gen_bctar(DisasContext *ctx)
 }
 
 /***                      Condition register logical                       ***/
-#define GEN_CRLOGIC(name, tcg_op, opc)                                        \
-static void glue(gen_, name)(DisasContext *ctx)                                       \
-{                                                                             \
-    uint8_t bitmask;                                                          \
-    int sh;                                                                   \
-    TCGv_i32 t0, t1;                                                          \
-    sh = (crbD(ctx->opcode) & 0x03) - (crbA(ctx->opcode) & 0x03);             \
-    t0 = tcg_temp_new_i32();                                                  \
-    if (sh > 0)                                                               \
-        tcg_gen_shri_i32(t0, cpu_crf[crbA(ctx->opcode) >> 2], sh);            \
-    else if (sh < 0)                                                          \
-        tcg_gen_shli_i32(t0, cpu_crf[crbA(ctx->opcode) >> 2], -sh);           \
-    else                                                                      \
-        tcg_gen_mov_i32(t0, cpu_crf[crbA(ctx->opcode) >> 2]);                 \
-    t1 = tcg_temp_new_i32();                                                  \
-    sh = (crbD(ctx->opcode) & 0x03) - (crbB(ctx->opcode) & 0x03);             \
-    if (sh > 0)                                                               \
-        tcg_gen_shri_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2], sh);            \
-    else if (sh < 0)                                                          \
-        tcg_gen_shli_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2], -sh);           \
-    else                                                                      \
-        tcg_gen_mov_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2]);                 \
-    tcg_op(t0, t0, t1);                                                       \
-    bitmask = 0x08 >> (crbD(ctx->opcode) & 0x03);                             \
-    tcg_gen_andi_i32(t0, t0, bitmask);                                        \
-    tcg_gen_andi_i32(t1, cpu_crf[crbD(ctx->opcode) >> 2], ~bitmask);          \
-    tcg_gen_or_i32(cpu_crf[crbD(ctx->opcode) >> 2], t0, t1);                  \
-    tcg_temp_free_i32(t0);                                                    \
-    tcg_temp_free_i32(t1);                                                    \
+#define GEN_CRLOGIC(name, tcg_op, opc)                                         \
+static void glue(gen_, name)(DisasContext *ctx)                                \
+{                                                                              \
+    tcg_op(cpu_cr[crbD(ctx->opcode)], cpu_cr[crbA(ctx->opcode)],               \
+           cpu_cr[crbB(ctx->opcode)]);                                         \
 }
 
 /* crand */
@@ -3980,7 +3967,11 @@ GEN_CRLOGIC(crxor, tcg_gen_xor_i32, 0x06);
 /* mcrf */
 static void gen_mcrf(DisasContext *ctx)
 {
-    tcg_gen_mov_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfS(ctx->opcode)]);
+    int i;
+    for (i = 0; i < 4; i++) {
+        tcg_gen_mov_i32(cpu_cr[crfD(ctx->opcode) * 4 + i],
+                        cpu_cr[crfS(ctx->opcode) * 4 + i]);
+    }
 }
 
 /***                           System linkage                              ***/
@@ -4133,20 +4124,12 @@ static void gen_write_xer(TCGv src)
 /* mcrxr */
 static void gen_mcrxr(DisasContext *ctx)
 {
-    TCGv_i32 t0 = tcg_temp_new_i32();
-    TCGv_i32 t1 = tcg_temp_new_i32();
-    TCGv_i32 dst = cpu_crf[crfD(ctx->opcode)];
-
-    tcg_gen_trunc_tl_i32(t0, cpu_so);
-    tcg_gen_trunc_tl_i32(t1, cpu_ov);
-    tcg_gen_trunc_tl_i32(dst, cpu_ca);
-    tcg_gen_shli_i32(t0, t0, 3);
-    tcg_gen_shli_i32(t1, t1, 2);
-    tcg_gen_shli_i32(dst, dst, 1);
-    tcg_gen_or_i32(dst, dst, t0);
-    tcg_gen_or_i32(dst, dst, t1);
-    tcg_temp_free_i32(t0);
-    tcg_temp_free_i32(t1);
+    int crf = crfD(ctx->opcode);
+
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_LT], cpu_so);
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_GT], cpu_ov);
+    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_EQ], cpu_ca);
+    tcg_gen_movi_i32(cpu_cr[crf * 4 + CRF_SO], 0);
 
     tcg_gen_movi_tl(cpu_so, 0);
     tcg_gen_movi_tl(cpu_ov, 0);
@@ -6320,11 +6303,13 @@ static void gen_tlbsx_40x(DisasContext *ctx)
     gen_helper_4xx_tlbsx(cpu_gpr[rD(ctx->opcode)], cpu_env, t0);
     tcg_temp_free(t0);
     if (Rc(ctx->opcode)) {
-        int l1 = gen_new_label();
-        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[rD(ctx->opcode)], -1, l1);
-        tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 0x02);
-        gen_set_label(l1);
+        t0 = tcg_temp_new();
+        tcg_gen_trunc_tl_i32(cpu_cr[CRF_SO], cpu_so);
+        tcg_gen_movi_i32(cpu_cr[CRF_LT], 0);
+        tcg_gen_movi_i32(cpu_cr[CRF_GT], 0);
+        tcg_gen_setcondi_tl(TCG_COND_EQ, t0, cpu_gpr[rD(ctx->opcode)], -1);
+        tcg_gen_trunc_tl_i32(cpu_cr[CRF_EQ], t0);
+        tcg_temp_free(t0);
     }
 #endif
 }
@@ -6401,11 +6386,13 @@ static void gen_tlbsx_440(DisasContext *ctx)
     gen_helper_440_tlbsx(cpu_gpr[rD(ctx->opcode)], cpu_env, t0);
     tcg_temp_free(t0);
     if (Rc(ctx->opcode)) {
-        int l1 = gen_new_label();
-        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[rD(ctx->opcode)], -1, l1);
-        tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 0x02);
-        gen_set_label(l1);
+        t0 = tcg_temp_new();
+        tcg_gen_trunc_tl_i32(cpu_cr[CRF_SO], cpu_so);
+        tcg_gen_movi_i32(cpu_cr[CRF_LT], 0);
+        tcg_gen_movi_i32(cpu_cr[CRF_GT], 0);
+        tcg_gen_setcondi_tl(TCG_COND_EQ, t0, cpu_gpr[rD(ctx->opcode)], -1);
+        tcg_gen_trunc_tl_i32(cpu_cr[CRF_EQ], t0);
+        tcg_temp_free(t0);
     }
 #endif
 }
@@ -7371,7 +7358,7 @@ GEN_VXFORM(vpmsumd, 4, 19)
 static void gen_##op(DisasContext *ctx)             \
 {                                                   \
     TCGv_ptr ra, rb, rd;                            \
-    TCGv_i32 ps;                                    \
+    TCGv_i32 ps, crf;                               \
                                                     \
     if (unlikely(!ctx->altivec_enabled)) {          \
         gen_exception(ctx, POWERPC_EXCP_VPU);       \
@@ -7383,13 +7370,16 @@ static void gen_##op(DisasContext *ctx)             \
     rd = gen_avr_ptr(rD(ctx->opcode));              \
                                                     \
     ps = tcg_const_i32((ctx->opcode & 0x200) != 0); \
+    crf = tcg_temp_new_i32();                       \
                                                     \
-    gen_helper_##op(cpu_crf[6], rd, ra, rb, ps);    \
+    gen_helper_##op(crf, rd, ra, rb, ps);           \
+    gen_op_mtcr(6 << 2, crf, 0);                    \
                                                     \
     tcg_temp_free_ptr(ra);                          \
     tcg_temp_free_ptr(rb);                          \
     tcg_temp_free_ptr(rd);                          \
     tcg_temp_free_i32(ps);                          \
+    tcg_temp_free_ptr(crf);                         \
 }
 
 GEN_BCD(bcdadd)
@@ -8217,6 +8207,7 @@ static void gen_##name(DisasContext *ctx)        \
 static void gen_##name(DisasContext *ctx)         \
 {                                                 \
     TCGv_ptr ra, rb;                              \
+    TCGv_i32 tmp;                                 \
     if (unlikely(!ctx->fpu_enabled)) {            \
         gen_exception(ctx, POWERPC_EXCP_FPU);     \
         return;                                   \
@@ -8224,8 +8215,10 @@ static void gen_##name(DisasContext *ctx)         \
     gen_update_nip(ctx, ctx->nip - 4);            \
     ra = gen_fprp_ptr(rA(ctx->opcode));           \
     rb = gen_fprp_ptr(rB(ctx->opcode));           \
-    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], \
-                      cpu_env, ra, rb);           \
+    tmp = tcg_temp_new_i32();                     \
+    gen_helper_##name(tmp, cpu_env, ra, rb);      \
+    gen_op_mtcr(crfD(ctx->opcode) << 2, tmp, 0);  \
+    tcg_temp_free_i32(tmp);                       \
     tcg_temp_free_ptr(ra);                        \
     tcg_temp_free_ptr(rb);                        \
 }
@@ -8234,7 +8227,7 @@ static void gen_##name(DisasContext *ctx)         \
 static void gen_##name(DisasContext *ctx)         \
 {                                                 \
     TCGv_ptr ra;                                  \
-    TCGv_i32 dcm;                                 \
+    TCGv_i32 dcm, tmp;                            \
     if (unlikely(!ctx->fpu_enabled)) {            \
         gen_exception(ctx, POWERPC_EXCP_FPU);     \
         return;                                   \
@@ -8242,8 +8235,10 @@ static void gen_##name(DisasContext *ctx)         \
     gen_update_nip(ctx, ctx->nip - 4);            \
     ra = gen_fprp_ptr(rA(ctx->opcode));           \
     dcm = tcg_const_i32(DCM(ctx->opcode));        \
-    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], \
-                      cpu_env, ra, dcm);          \
+    tmp = tcg_temp_new_i32();                     \
+    gen_helper_##name(tmp, cpu_env, ra, dcm);     \
+    gen_op_mtcr(crfD(ctx->opcode) << 2, tmp, 0);  \
+    tcg_temp_free_i32(tmp);                       \
     tcg_temp_free_ptr(ra);                        \
     tcg_temp_free_i32(dcm);                       \
 }
@@ -8668,37 +8663,32 @@ GEN_SPEOP_ARITH_IMM2(evsubifw, tcg_gen_subi_i32);
 #define GEN_SPEOP_COMP(name, tcg_cond)                                        \
 static inline void gen_##name(DisasContext *ctx)                              \
 {                                                                             \
+    TCGv tmp = tcg_temp_new();                                                \
+                                                                              \
     if (unlikely(!ctx->spe_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_SPEU);                                \
         return;                                                               \
     }                                                                         \
-    int l1 = gen_new_label();                                                 \
-    int l2 = gen_new_label();                                                 \
-    int l3 = gen_new_label();                                                 \
-    int l4 = gen_new_label();                                                 \
                                                                               \
     tcg_gen_ext32s_tl(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);    \
     tcg_gen_ext32s_tl(cpu_gpr[rB(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);    \
     tcg_gen_ext32s_tl(cpu_gprh[rA(ctx->opcode)], cpu_gprh[rA(ctx->opcode)]);  \
     tcg_gen_ext32s_tl(cpu_gprh[rB(ctx->opcode)], cpu_gprh[rB(ctx->opcode)]);  \
                                                                               \
-    tcg_gen_brcond_tl(tcg_cond, cpu_gpr[rA(ctx->opcode)],                     \
-                       cpu_gpr[rB(ctx->opcode)], l1);                         \
-    tcg_gen_movi_i32(cpu_crf[crfD(ctx->opcode)], 0);                          \
-    tcg_gen_br(l2);                                                           \
-    gen_set_label(l1);                                                        \
-    tcg_gen_movi_i32(cpu_crf[crfD(ctx->opcode)],                              \
-                     CRF_CL | CRF_CH_OR_CL | CRF_CH_AND_CL);                  \
-    gen_set_label(l2);                                                        \
-    tcg_gen_brcond_tl(tcg_cond, cpu_gprh[rA(ctx->opcode)],                    \
-                       cpu_gprh[rB(ctx->opcode)], l3);                        \
-    tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)],  \
-                     ~(CRF_CH | CRF_CH_AND_CL));                              \
-    tcg_gen_br(l4);                                                           \
-    gen_set_label(l3);                                                        \
-    tcg_gen_ori_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)],   \
-                    CRF_CH | CRF_CH_OR_CL);                                   \
-    gen_set_label(l4);                                                        \
+    tcg_gen_setcond_tl(tcg_cond, tmp,                                         \
+                       cpu_gpr[rA(ctx->opcode)],                              \
+                       cpu_gpr[rB(ctx->opcode)]);                             \
+    tcg_gen_trunc_tl_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL], tmp);        \
+    tcg_gen_setcond_tl(tcg_cond, tmp,                                         \
+                       cpu_gprh[rA(ctx->opcode)],                             \
+                       cpu_gprh[rB(ctx->opcode)]);                            \
+    tcg_gen_trunc_tl_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH], tmp);        \
+    tcg_gen_or_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_OR_CL],              \
+                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                    \
+                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                   \
+    tcg_gen_and_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_AND_CL],            \
+                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                   \
+                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                  \
 }
 GEN_SPEOP_COMP(evcmpgtu, TCG_COND_GTU);
 GEN_SPEOP_COMP(evcmpgts, TCG_COND_GT);
@@ -8769,22 +8759,20 @@ static inline void gen_evsel(DisasContext *ctx)
     int l2 = gen_new_label();
     int l3 = gen_new_label();
     int l4 = gen_new_label();
-    TCGv_i32 t0 = tcg_temp_local_new_i32();
-    tcg_gen_andi_i32(t0, cpu_crf[ctx->opcode & 0x07], 1 << 3);
-    tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
+
+    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cr[(ctx->opcode & 0x07) * 4], 0, l1);
     tcg_gen_mov_tl(cpu_gprh[rD(ctx->opcode)], cpu_gprh[rA(ctx->opcode)]);
     tcg_gen_br(l2);
     gen_set_label(l1);
     tcg_gen_mov_tl(cpu_gprh[rD(ctx->opcode)], cpu_gprh[rB(ctx->opcode)]);
     gen_set_label(l2);
-    tcg_gen_andi_i32(t0, cpu_crf[ctx->opcode & 0x07], 1 << 2);
-    tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l3);
+
+    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cr[(ctx->opcode & 0x07) * 4 + 1], 0, l3);
     tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
     tcg_gen_br(l4);
     gen_set_label(l3);
     tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
     gen_set_label(l4);
-    tcg_temp_free_i32(t0);
 }
 
 static void gen_evsel0(DisasContext *ctx)
@@ -9366,9 +9354,12 @@ static inline void gen_##name(DisasContext *ctx)                              \
     t0 = tcg_temp_new_i32();                                                  \
     t1 = tcg_temp_new_i32();                                                  \
                                                                               \
+    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_LT], 0);              \
+    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_GT], 0);              \
+    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_SO], 0);              \
     tcg_gen_trunc_tl_i32(t0, cpu_gpr[rA(ctx->opcode)]);                       \
     tcg_gen_trunc_tl_i32(t1, cpu_gpr[rB(ctx->opcode)]);                       \
-    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], cpu_env, t0, t1);           \
+    gen_helper_##name(cpu_cr[crfD(ctx->opcode) * 4 + CRF_EQ], cpu_env, t0, t1); \
                                                                               \
     tcg_temp_free_i32(t0);                                                    \
     tcg_temp_free_i32(t1);                                                    \
@@ -9385,10 +9376,32 @@ static inline void gen_##name(DisasContext *ctx)                              \
     t1 = tcg_temp_new_i64();                                                  \
     gen_load_gpr64(t0, rA(ctx->opcode));                                      \
     gen_load_gpr64(t1, rB(ctx->opcode));                                      \
-    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], cpu_env, t0, t1);           \
+    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_LT], 0);              \
+    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_GT], 0);              \
+    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_SO], 0);              \
+    gen_helper_##name(cpu_cr[crfD(ctx->opcode) * 4 + CRF_EQ], cpu_env,        \
+                      t0, t1);                                                \
     tcg_temp_free_i64(t0);                                                    \
     tcg_temp_free_i64(t1);                                                    \
 }
+#define GEN_SPEFPUOP_COMP_V64(name, helper)                                   \
+static inline void gen_##name(DisasContext *ctx)                              \
+{                                                                             \
+    if (unlikely(!ctx->spe_enabled)) {                                        \
+        gen_exception(ctx, POWERPC_EXCP_SPEU);                                \
+        return;                                                               \
+    }                                                                         \
+    gen_helper_##helper(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL], cpu_env,      \
+                        cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);  \
+    gen_helper_##helper(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH], cpu_env,      \
+                        cpu_gprh[rA(ctx->opcode)], cpu_gprh[rB(ctx->opcode)]);\
+    tcg_gen_or_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_OR_CL],              \
+                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                    \
+                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                   \
+    tcg_gen_and_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_AND_CL],            \
+                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                   \
+                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                  \
+}
 
 /* Single precision floating-point vectors operations */
 /* Arithmetic */
@@ -9443,12 +9456,12 @@ GEN_SPEFPUOP_CONV_64_64(evfsctuiz);
 GEN_SPEFPUOP_CONV_64_64(evfsctsiz);
 
 /* Comparison */
-GEN_SPEFPUOP_COMP_64(evfscmpgt);
-GEN_SPEFPUOP_COMP_64(evfscmplt);
-GEN_SPEFPUOP_COMP_64(evfscmpeq);
-GEN_SPEFPUOP_COMP_64(evfststgt);
-GEN_SPEFPUOP_COMP_64(evfststlt);
-GEN_SPEFPUOP_COMP_64(evfststeq);
+GEN_SPEFPUOP_COMP_V64(evfscmpgt, efscmpgt);
+GEN_SPEFPUOP_COMP_V64(evfscmplt, efscmplt);
+GEN_SPEFPUOP_COMP_V64(evfscmpeq, efscmpeq);
+GEN_SPEFPUOP_COMP_V64(evfststgt, efststgt);
+GEN_SPEFPUOP_COMP_V64(evfststlt, efststlt);
+GEN_SPEFPUOP_COMP_V64(evfststeq, efststeq);
 
 /* Opcodes definitions */
 GEN_SPE(evfsadd,   evfssub,   0x00, 0x0A, 0x00000000, 0x00000000, PPC_SPE_SINGLE); //
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 16/17] ppc: inline ppc_get_crf/ppc_set_crf when clearer
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (14 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 17/17] ppc: dump all 32 CR bits Paolo Bonzini
  2014-08-28 18:05 ` [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Tom Musta
  17 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/elfload.c    |  4 ++--
 linux-user/main.c       |  5 ++++-
 linux-user/signal.c     |  8 ++++----
 monitor.c               |  2 +-
 target-ppc/fpu_helper.c | 12 ++++++++++--
 target-ppc/gdbstub.c    |  8 ++++----
 target-ppc/int_helper.c | 31 +++++++++++++++++++++++--------
 target-ppc/kvm.c        | 10 +++++-----
 8 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3769ae6..73a3189 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -857,8 +857,8 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUPPCState *en
     (*regs)[36] = tswapreg(env->lr);
     (*regs)[37] = tswapreg(env->xer);
 
-    for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-        ccr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
+    for (i = 0; i < ARRAY_SIZE(env->cr); i++) {
+        ccr |= env->cr[i] << (31 - i);
     }
     (*regs)[38] = tswapreg(ccr);
 }
diff --git a/linux-user/main.c b/linux-user/main.c
index b403f24..5a0b31f 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -1550,7 +1550,10 @@ static int do_store_exclusive(CPUPPCState *env)
                 }
             }
         }
-        ppc_set_crf(env, 0, (stored << 1) | xer_so);
+        env->cr[CRF_LT] = 0;
+        env->cr[CRF_GT] = 0;
+        env->cr[CRF_EQ] = stored;
+        env->cr[CRF_SO] = xer_so;
         env->reserve_addr = (target_ulong)-1;
     }
     if (!segv) {
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 4f5d79f..5d7914c 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -4511,8 +4511,8 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame,
     __put_user(env->lr, &frame->mc_gregs[TARGET_PT_LNK]);
     __put_user(env->xer, &frame->mc_gregs[TARGET_PT_XER]);
 
-    for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-        ccr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
+    for (i = 0; i < ARRAY_SIZE(env->cr); i++) {
+        ccr |= env->cr[i] << (31 - i);
     }
     __put_user(ccr, &frame->mc_gregs[TARGET_PT_CCR]);
 
@@ -4590,8 +4590,8 @@ static void restore_user_regs(CPUPPCState *env,
     __get_user(env->xer, &frame->mc_gregs[TARGET_PT_XER]);
     __get_user(ccr, &frame->mc_gregs[TARGET_PT_CCR]);
 
-    for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-        ppc_set_crf(env, i, (ccr >> (32 - ((i + 1) * 4))) & 0xf);
+    for (i = 0; i < ARRAY_SIZE(env->cr); i++) {
+        env->cr[i] = (ccr >> (31 - i)) & 1;
     }
 
     if (!sig) {
diff --git a/monitor.c b/monitor.c
index 97d72f4..b9def76 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2967,8 +2967,8 @@ static target_long monitor_get_ccr (const struct MonitorDef *md, int val)
     int i;
 
     u = 0;
-    for (i = 0; i < 8; i++)
-        u |= ppc_get_crf(env, i) << (32 - (4 * (i + 1)));
+    for (i = 0; i < 32; i++)
+        u |= env->cr[i] << (31 - i);
 
     return u;
 }
diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index 9574ebe..2d2239f 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -1099,7 +1099,11 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
     env->fpscr |= (0x08 << FPSCR_FPRF) >> ret;
-    ppc_set_crf(env, crfD, 0x08 >> ret);
+
+    env->cr[crfD * 4 + CRF_LT] = (ret == CRF_LT);
+    env->cr[crfD * 4 + CRF_GT] = (ret == CRF_GT);
+    env->cr[crfD * 4 + CRF_EQ] = (ret == CRF_EQ);
+    env->cr[crfD * 4 + CRF_SO] = (ret == CRF_SO);
 
     if (unlikely(ret == CRF_SO
                  && (float64_is_signaling_nan(farg1.d) ||
@@ -1131,7 +1135,11 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
 
     env->fpscr &= ~(0x0F << FPSCR_FPRF);
     env->fpscr |= (0x08 << FPSCR_FPRF) >> ret;
-    ppc_set_crf(env, crfD, 0x08 >> ret);
+
+    env->cr[crfD * 4 + CRF_LT] = (ret == CRF_LT);
+    env->cr[crfD * 4 + CRF_GT] = (ret == CRF_GT);
+    env->cr[crfD * 4 + CRF_EQ] = (ret == CRF_EQ);
+    env->cr[crfD * 4 + CRF_SO] = (ret == CRF_SO);
 
     if (unlikely(ret == CRF_SO)) {
         if (float64_is_signaling_nan(farg1.d) ||
diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
index e0f340c..4457f81 100644
--- a/target-ppc/gdbstub.c
+++ b/target-ppc/gdbstub.c
@@ -138,8 +138,8 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
             {
                 uint32_t cr = 0;
                 int i;
-                for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-                    cr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
+                for (i = 0; i < ARRAY_SIZE(env->cr); i++) {
+                    cr |= env->cr[i] << (31 - i);
                 }
                 gdb_get_reg32(mem_buf, cr);
                 break;
@@ -246,8 +246,8 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
             {
                 uint32_t cr = ldl_p(mem_buf);
                 int i;
-                for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
-                    ppc_set_crf(env, i, (cr >> (32 - ((i + 1) * 4))) & 0xF);
+                for (i = 0; i < ARRAY_SIZE(env->cr); i++) {
+                    env->cr[i] = (cr >> (31 - i)) & 1;
                 }
                 break;
             }
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index d3ace6a..4b8dbcb 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -322,8 +322,8 @@ target_ulong helper_mfocrf(CPUPPCState *env)
 {
     uint32_t cr = 0;
     int i;
-    for (i = 0; i < 8; i++) {
-        cr |= ppc_get_crf(env, i) << (32 - (i + 1) * 4);
+    for (i = 0; i < 32; i++) {
+        cr |= env->cr[i] << (31 - i);
     }
     return cr;
 }
@@ -679,7 +679,10 @@ VCF(sx, int32_to_float32, s32)
             none |= result;                                             \
         }                                                               \
         if (record) {                                                   \
-            ppc_set_crf(env, 6, ((all != 0) << 3) | ((none == 0) << 1)); \
+            env->cr[24 + CRF_LT] = (all != 0);                          \
+            env->cr[24 + CRF_GT] = 0;                                   \
+            env->cr[24 + CRF_EQ] = (none == 0);                         \
+            env->cr[24 + CRF_SO] = 0;                                   \
         }                                                               \
     }
 #define VCMP(suffix, compare, element)          \
@@ -725,7 +728,10 @@ VCMP(gtsd, >, s64)
             none |= result;                                             \
         }                                                               \
         if (record) {                                                   \
-            ppc_set_crf(env, 6, ((all != 0) << 3) | ((none == 0) << 1)); \
+            env->cr[24 + CRF_LT] = (all != 0);                          \
+            env->cr[24 + CRF_GT] = 0;                                   \
+            env->cr[24 + CRF_EQ] = (none == 0);                         \
+            env->cr[24 + CRF_SO] = 0;                                   \
         }                                                               \
     }
 #define VCMPFP(suffix, compare, order)          \
@@ -759,7 +765,10 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
         }
     }
     if (record) {
-        ppc_set_crf(env, 6, (all_in == 0) << 1);
+        env->cr[24 + CRF_LT] = 0;
+        env->cr[24 + CRF_GT] = 0;
+        env->cr[24 + CRF_EQ] = (all_in == 0);
+        env->cr[24 + CRF_SO] = 0;
     }
 }
 
@@ -2580,7 +2589,9 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
     for (mask = 0xFF000000; mask != 0; mask = mask >> 8) {
         if ((high & mask) == 0) {
             if (update_Rc) {
-                ppc_set_crf(env, 0, 0x4);
+                env->cr[CRF_LT] = 0;
+                env->cr[CRF_GT] = 1;
+                env->cr[CRF_EQ] = 0;
             }
             goto done;
         }
@@ -2589,7 +2600,9 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
     for (mask = 0xFF000000; mask != 0; mask = mask >> 8) {
         if ((low & mask) == 0) {
             if (update_Rc) {
-                ppc_set_crf(env, 0, 0x8);
+                env->cr[CRF_LT] = 1;
+                env->cr[CRF_GT] = 0;
+                env->cr[CRF_EQ] = 0;
             }
             goto done;
         }
@@ -2597,7 +2610,9 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
     }
     i = 8;
     if (update_Rc) {
-        ppc_set_crf(env, 0, 0x2);
+        env->cr[CRF_LT] = 0;
+        env->cr[CRF_GT] = 0;
+        env->cr[CRF_EQ] = 1;
     }
  done:
     env->xer = (env->xer & ~0x7F) | i;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index a4eca17..f3feef7 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -794,8 +794,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
         regs.gpr[i] = env->gpr[i];
 
     regs.cr = 0;
-    for (i = 0; i < 8; i++) {
-        regs.cr |= ppc_get_crf(env, i) << (4 * (7 - i));
+    for (i = 0; i < 32; i++) {
+        regs.cr |= env->cr[i] << (31 - i);
     }
 
     ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, &regs);
@@ -913,9 +913,9 @@ int kvm_arch_get_registers(CPUState *cs)
         return ret;
 
     cr = regs.cr;
-    for (i = 7; i >= 0; i--) {
-        ppc_set_crf(env->cr[i], cr & 15);
-        cr >>= 4;
+    for (i = 31; i >= 0; i--) {
+        env->cr[i] = cr & 1;
+        cr >>= 1;
     }
 
     env->ctr = regs.ctr;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH 17/17] ppc: dump all 32 CR bits
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (15 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 16/17] ppc: inline ppc_get_crf/ppc_set_crf when clearer Paolo Bonzini
@ 2014-08-28 17:15 ` Paolo Bonzini
  2014-08-28 18:05 ` [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Tom Musta
  17 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

This is more precise when bits have been modified with CR
boolean operations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-ppc/translate.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index dd19b39..35e7a8b 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -11087,18 +11087,14 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
     for (i = 0; i < 8; i++) {
         cpu_fprintf(f, "%01x", ppc_get_crf(env, i));
     }
-    cpu_fprintf(f, "  [");
-    for (i = 0; i < 8; i++) {
-        char a = '-';
-        if (ppc_get_crf(env, i) & 0x08)
-            a = 'L';
-        else if (ppc_get_crf(env, i) & 0x04)
-            a = 'G';
-        else if (ppc_get_crf(env, i) & 0x02)
-            a = 'E';
-        cpu_fprintf(f, " %c%c", a, ppc_get_crf(env, i) & 0x01 ? 'O' : ' ');
-    }
-    cpu_fprintf(f, " ]             RES " TARGET_FMT_lx "\n",
+    cpu_fprintf(f, "  ");
+    for (i = 0; i < 32; i++) {
+        if ((i & 3) == 0) {
+            cpu_fprintf(f, "%c", i ? ' ' : '[');
+        }
+        cpu_fprintf(f, "%c", env->cr[i] ? "LGEO"[i&3] : '.');
+    }
+    cpu_fprintf(f, "]       RES " TARGET_FMT_lx "\n",
                 env->reserve_addr);
     for (i = 0; i < 32; i++) {
         if ((i & (RFPL - 1)) == 0)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
@ 2014-08-28 17:30   ` Peter Maydell
  2014-08-28 19:35     ` Paolo Bonzini
  2014-09-05  7:10   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  1 sibling, 1 reply; 50+ messages in thread
From: Peter Maydell @ 2014-08-28 17:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: dgibson, qemu-ppc, QEMU Developers, Tom Musta

On 28 August 2014 18:14, Paolo Bonzini <pbonzini@redhat.com> wrote:
> PowerPC TCG flushes the TLB on every IR/DR change, which basically
> means on every user<->kernel context switch.  Use the 6-element
> TLB array as a cache, where each MMU index is mapped to a different
> state of the IR/DR/PR/HV bits.
>
> This brings the number of TLB flushes down from ~900000 to ~50000
> for starting up the Debian installer, which is in line with x86
> and gives a ~10% performance improvement.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cputlb.c                    | 19 +++++++++++++++++
>  hw/ppc/spapr_hcall.c        |  6 +++++-
>  include/exec/exec-all.h     |  5 +++++
>  target-ppc/cpu.h            |  4 +++-
>  target-ppc/excp_helper.c    |  6 +-----
>  target-ppc/helper_regs.h    | 52 +++++++++++++++++++++++++++++++--------------
>  target-ppc/translate_init.c |  5 +++++
>  7 files changed, 74 insertions(+), 23 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index afd3705..17e1b03 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>      tlb_flush_count++;
>  }
>
> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
> +{
> +    CPUArchState *env = cpu->env_ptr;
> +
> +#if defined(DEBUG_TLB)
> +    printf("tlb_flush_idx %d:\n", mmu_idx);
> +#endif
> +    /* must reset current TB so that interrupts cannot modify the
> +       links while we are modifying them */
> +    cpu->current_tb = NULL;
> +
> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> +
> +    env->tlb_flush_addr = -1;
> +    env->tlb_flush_mask = 0;

Isn't this going to break huge page support? Consider
the case:
 * set up huge pages in one TLB index (causing tlb_flush_addr
   and tlb_flush_mask to be set to cover that range)
 * switch to a different TLB index
 * tlb_flush_idx() for that index (causing flush_addr/mask to
   be reset)
 * switch back to first TLB index
 * do tlb_flush_page for an address inside the huge-page
    region

I think you need the flush addr/mask to be per-TLB-index
if you want this to work.

Personally I would put the "implement new feature in core
code" in a separate patch from "use new feature in PPC code".

Does PPC hardware do lots of TLB flushes on user-kernel
transitions, or does it have some sort of info in the TLB
entry about whether it should match or not? (I'm wondering
if there's a generalisation possible here that might help ARM
too.)

-- PMM

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG
  2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
                   ` (16 preceding siblings ...)
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 17/17] ppc: dump all 32 CR bits Paolo Bonzini
@ 2014-08-28 18:05 ` Tom Musta
  17 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-08-28 18:05 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:14 PM, Paolo Bonzini wrote:
> Hi everyone,
> 
> these patches provide a speedup around 20% when running PPC softmmu
> emulation on x86 machines (10% for user-mode emulation).  There are
> actually two separate speedups here:
> 
> * avoiding TLB flushing on every kernel<->user transition (patches 1-2)
> 
> * rewriting CR handling to use 32 1-bit registers instead of 8
>   4-bit registers (patches 3-16)
> 
> They must not be too shoddy; they boot a Linux guest fine. :) And the
> speedup is very interesting of course.  The three problems with it are:
> 
> * I don't have a good testsuite.  So floating-point, decimal and SPE
>   are mostly untested
> 
> * I don't have much time to work on them (they are about a year old and
>   I have just rebased them).
> 
> * Patch 15 is a monster and hard to review, but I have no idea how to
>   split it.
> 
> Please take a look and if you are interested help in any way you can. :)

Paolo:  I will carve out some time to help with both testing and review.

> 

[ ... ]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-08-28 17:30   ` Peter Maydell
@ 2014-08-28 19:35     ` Paolo Bonzini
  2014-09-05  6:00       ` David Gibson
  0 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-08-28 19:35 UTC (permalink / raw)
  To: Peter Maydell; +Cc: dgibson, qemu-ppc, QEMU Developers, Tom Musta

Il 28/08/2014 19:30, Peter Maydell ha scritto:
> On 28 August 2014 18:14, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> PowerPC TCG flushes the TLB on every IR/DR change, which basically
>> means on every user<->kernel context switch.  Use the 6-element
>> TLB array as a cache, where each MMU index is mapped to a different
>> state of the IR/DR/PR/HV bits.
>>
>> This brings the number of TLB flushes down from ~900000 to ~50000
>> for starting up the Debian installer, which is in line with x86
>> and gives a ~10% performance improvement.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  cputlb.c                    | 19 +++++++++++++++++
>>  hw/ppc/spapr_hcall.c        |  6 +++++-
>>  include/exec/exec-all.h     |  5 +++++
>>  target-ppc/cpu.h            |  4 +++-
>>  target-ppc/excp_helper.c    |  6 +-----
>>  target-ppc/helper_regs.h    | 52 +++++++++++++++++++++++++++++++--------------
>>  target-ppc/translate_init.c |  5 +++++
>>  7 files changed, 74 insertions(+), 23 deletions(-)
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index afd3705..17e1b03 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>      tlb_flush_count++;
>>  }
>>
>> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
>> +{
>> +    CPUArchState *env = cpu->env_ptr;
>> +
>> +#if defined(DEBUG_TLB)
>> +    printf("tlb_flush_idx %d:\n", mmu_idx);
>> +#endif
>> +    /* must reset current TB so that interrupts cannot modify the
>> +       links while we are modifying them */
>> +    cpu->current_tb = NULL;
>> +
>> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
>> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
>> +
>> +    env->tlb_flush_addr = -1;
>> +    env->tlb_flush_mask = 0;
> 
> Isn't this going to break huge page support? Consider
> the case:
>  * set up huge pages in one TLB index (causing tlb_flush_addr
>    and tlb_flush_mask to be set to cover that range)
>  * switch to a different TLB index
>  * tlb_flush_idx() for that index (causing flush_addr/mask to
>    be reset)
>  * switch back to first TLB index
>  * do tlb_flush_page for an address inside the huge-page
>     region
> 
> I think you need the flush addr/mask to be per-TLB-index
> if you want this to work.

Yes, you're right.

> Personally I would put the "implement new feature in core
> code" in a separate patch from "use new feature in PPC code".

This too, of course.  The patches aren't quite ready, I wanted to post
early because the speedups are very appealing to me.

> Does PPC hardware do lots of TLB flushes on user-kernel
> transitions, or does it have some sort of info in the TLB
> entry about whether it should match or not?

The IR and DR bits simply disable paging for respectively instructions
and data.  I suppose real hardware simply does not use the TLB when
paging is disabled.

IIRC each user->kernel transition disables paging, and then the kernel
can re-enable it (optionally only on data).  So the transition is
user->kernel unpaged->kernel paged, and the kernel unpaged->kernel paged
part is what triggers the TLB flush.  (Something like this---Alex
explained it to me a year ago when I asked why tlb_flush was always the
top function in the profile of qemu-system-ppc*).

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
@ 2014-08-29 18:30   ` Richard Henderson
  2014-09-03 19:41   ` Tom Musta
  1 sibling, 0 replies; 50+ messages in thread
From: Richard Henderson @ 2014-08-29 18:30 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc, tommusta

On 08/28/2014 10:15 AM, Paolo Bonzini wrote:
> +    TCGv t1, true_op, zero;
...
> +    tcg_temp_free_i32(t1);

Not _i32 for the free.


r~

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR Paolo Bonzini
@ 2014-09-03 18:21   ` Tom Musta
  2014-09-05  7:10     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:21 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:14 PM, Paolo Bonzini wrote:
> This was off-by-one.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  monitor.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 34cee74..ec73dd4 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2968,7 +2968,7 @@ static target_long monitor_get_ccr (const struct MonitorDef *md, int val)
>  
>      u = 0;
>      for (i = 0; i < 8; i++)
> -        u |= env->crf[i] << (32 - (4 * i));
> +        u |= env->crf[i] << (32 - (4 * (i + 1)));
>  
>      return u;
>  }
> 

Reviewed-by: Tom Musta <tommusta@gmail.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c Paolo Bonzini
@ 2014-09-03 18:21   ` Tom Musta
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:21 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Match the idiom used by linux-user/signal.c and
> linux-user/elfload.c.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/gdbstub.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
> index 14675f4..bad49ae 100644
> --- a/target-ppc/gdbstub.c
> +++ b/target-ppc/gdbstub.c
> @@ -138,7 +138,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
>              {
>                  uint32_t cr = 0;
>                  int i;
> -                for (i = 0; i < 8; i++) {
> +                for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
>                      cr |= env->crf[i] << (32 - ((i + 1) * 4));
>                  }
>                  gdb_get_reg32(mem_buf, cr);
> @@ -246,7 +246,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
>              {
>                  uint32_t cr = ldl_p(mem_buf);
>                  int i;
> -                for (i = 0; i < 8; i++) {
> +                for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
>                      env->crf[i] = (cr >> (32 - ((i + 1) * 4))) & 0xF;
>                  }
>                  break;
> 

Since the same code appears in 3 different places, would it be better to implement a reusable function in target-ppc/cpu.h?

I.e.:

static inline uint32_t ppc_get_cr(const CPUPPCState *env) {
    uint32_t cr = 0;
    for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
        cr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
    }
    return cr;
}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c Paolo Bonzini
@ 2014-09-03 18:21   ` Tom Musta
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:21 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/fpu_helper.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
> index da93d12..0fe006a 100644
> --- a/target-ppc/fpu_helper.c
> +++ b/target-ppc/fpu_helper.c
> @@ -1043,7 +1043,7 @@ uint32_t helper_ftdiv(uint64_t fra, uint64_t frb)
>          }
>      }
>  
> -    return 0x8 | (fg_flag ? 4 : 0) | (fe_flag ? 2 : 0);
> +    return (1 << CRF_LT) | (fg_flag << CRF_GT) | (fe_flag << CRF_EQ);
>  }
>  
>  uint32_t helper_ftsqrt(uint64_t frb)
> @@ -1074,7 +1074,7 @@ uint32_t helper_ftsqrt(uint64_t frb)
>          }
>      }
>  
> -    return 0x8 | (fg_flag ? 4 : 0) | (fe_flag ? 2 : 0);
> +    return (1 << CRF_LT) | (fg_flag << CRF_GT) | (fe_flag << CRF_EQ);
>  }
>  
>  void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
> @@ -1088,19 +1088,19 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
>  
>      if (unlikely(float64_is_any_nan(farg1.d) ||
>                   float64_is_any_nan(farg2.d))) {
> -        ret = 0x01UL;
> +        ret = CRF_SO;
>      } else if (float64_lt(farg1.d, farg2.d, &env->fp_status)) {
> -        ret = 0x08UL;
> +        ret = CRF_LT;
>      } else if (!float64_le(farg1.d, farg2.d, &env->fp_status)) {
> -        ret = 0x04UL;
> +        ret = CRF_GT;
>      } else {
> -        ret = 0x02UL;
> +        ret = CRF_EQ;
>      }
>  
>      env->fpscr &= ~(0x0F << FPSCR_FPRF);
> -    env->fpscr |= ret << FPSCR_FPRF;
> -    env->crf[crfD] = ret;
> -    if (unlikely(ret == 0x01UL
> +    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
> +    env->crf[crfD] = (1 << ret);
> +    if (unlikely(ret == CRF_SO
>                   && (float64_is_signaling_nan(farg1.d) ||
>                       float64_is_signaling_nan(farg2.d)))) {
>          /* sNaN comparison */
> @@ -1119,19 +1119,19 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
>  
>      if (unlikely(float64_is_any_nan(farg1.d) ||
>                   float64_is_any_nan(farg2.d))) {
> -        ret = 0x01UL;
> +        ret = CRF_SO;
>      } else if (float64_lt(farg1.d, farg2.d, &env->fp_status)) {
> -        ret = 0x08UL;
> +        ret = CRF_LT;
>      } else if (!float64_le(farg1.d, farg2.d, &env->fp_status)) {
> -        ret = 0x04UL;
> +        ret = CRF_GT;
>      } else {
> -        ret = 0x02UL;
> +        ret = CRF_EQ;
>      }
>  
>      env->fpscr &= ~(0x0F << FPSCR_FPRF);
> -    env->fpscr |= ret << FPSCR_FPRF;
> -    env->crf[crfD] = ret;
> -    if (unlikely(ret == 0x01UL)) {
> +    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
> +    env->crf[crfD] = (1 << ret);
> +    if (unlikely(ret == CRF_SO)) {
>          if (float64_is_signaling_nan(farg1.d) ||
>              float64_is_signaling_nan(farg2.d)) {
>              /* sNaN comparison */
> 

I like this patch.

Nit: for the fcmp* functions, "ret" is not a very good name for the variable.  Since this is a cleanup patch, I would suggest renaming it to "fpcc".

Other than that ...

Reviewed-by: Tom Musta <tommusta@gmail.com>
Tested-by: Tom Musta <tommusta@gmail.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c Paolo Bonzini
@ 2014-09-03 18:28   ` Tom Musta
  2014-09-05  7:12     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:28 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/int_helper.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index f6e8846..9c1c5cd 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2303,25 +2303,25 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
>          if (sgna == sgnb) {
>              result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgna, ps);
>              zero = bcd_add_mag(&result, a, b, &invalid, &overflow);
> -            cr = (sgna > 0) ? 4 : 8;
> +            cr = (sgna > 0) ? 1 << CRF_GT : 1 << CRF_LT;
>          } else if (bcd_cmp_mag(a, b) > 0) {
>              result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgna, ps);
>              zero = bcd_sub_mag(&result, a, b, &invalid, &overflow);
> -            cr = (sgna > 0) ? 4 : 8;
> +            cr = (sgna > 0) ? 1 << CRF_GT : 1 << CRF_LT;
>          } else {
>              result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgnb, ps);
>              zero = bcd_sub_mag(&result, b, a, &invalid, &overflow);
> -            cr = (sgnb > 0) ? 4 : 8;
> +            cr = (sgnb > 0) ? 1 << CRF_GT : 1 << CRF_LT;
>          }
>      }
>  
>      if (unlikely(invalid)) {
>          result.u64[HI_IDX] = result.u64[LO_IDX] = -1;
> -        cr = 1;
> +        cr = 1 << CRF_SO;
>      } else if (overflow) {
> -        cr |= 1;
> +        cr |= 1 << CRF_SO;
>      } else if (zero) {
> -        cr = 2;
> +        cr = 1 << CRF_EQ;
>      }
>  
>      *r = result;
> 

Reviewed-by: Tom Musta <tommusta@gmail.com>
Tested-by: Tom Musta <tommusta@gmail.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found Paolo Bonzini
@ 2014-09-03 18:28   ` Tom Musta
  2014-09-05  7:26     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:28 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> It must return 8 and place 8 in XER, but the current code uses
> i directly which is 9 at this point of the code.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/int_helper.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 9c1c5cd..7955bf7 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2573,6 +2573,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
>          }
>          i++;
>      }
> +    i = 8;
>      if (update_Rc) {
>          env->crf[0] = 0x2;
>      }
> 

Reviewed-by: Tom Musta <tommusta@gmail.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf Paolo Bonzini
@ 2014-09-03 18:28   ` Tom Musta
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:28 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/helper.h     |  3 +++
>  target-ppc/int_helper.c | 22 ++++++++++++++++++++++
>  target-ppc/translate.c  | 31 ++++---------------------------
>  3 files changed, 29 insertions(+), 27 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 509eae5..5342f13 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -60,6 +60,9 @@ DEF_HELPER_2(fpscr_setbit, void, env, i32)
>  DEF_HELPER_2(float64_to_float32, i32, env, i64)
>  DEF_HELPER_2(float32_to_float64, i64, env, i32)
>  
> +DEF_HELPER_1(mfocrf, tl, env)
> +DEF_HELPER_3(mtocrf, void, env, tl, i32)
> +
>  DEF_HELPER_4(fcmpo, void, env, i64, i64, i32)
>  DEF_HELPER_4(fcmpu, void, env, i64, i64, i32)
>  
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 7955bf7..5fa10c7 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -306,6 +306,28 @@ target_ulong helper_popcntw(target_ulong val)
>  }
>  #endif
>  
> +void helper_mtocrf(CPUPPCState *env, target_ulong cr, uint32_t mask)
> +{
> +    int i;
> +    for (i = 7; i >= 0; i--) {
> +        if (mask & 1) {
> +            env->crf[i] = cr & 0x0F;
> +        }
> +        cr >>= 4;
> +        mask >>= 1;
> +    }
> +}

Use ARRAY_SIZE?

> +
> +target_ulong helper_mfocrf(CPUPPCState *env)
> +{
> +    uint32_t cr = 0;
> +    int i;
> +    for (i = 0; i < 8; i++) {
> +        cr |= env->crf[i] << (32 - (i + 1) * 4);
> +    }
> +    return cr;
> +}
> +

Use ARRAY_SIZE?  Or better yet, reuse the utility that I recommended adding as part of patch 4.

>  /*****************************************************************************/
>  /* PowerPC 601 specific instructions (POWER bridge) */
>  target_ulong helper_div(CPUPPCState *env, target_ulong arg1, target_ulong arg2)
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 5a8267a..0a85a23 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -4145,24 +4145,7 @@ static void gen_mfcr(DisasContext *ctx)
>                              cpu_gpr[rD(ctx->opcode)], crn * 4);
>          }
>      } else {
> -        TCGv_i32 t0 = tcg_temp_new_i32();
> -        tcg_gen_mov_i32(t0, cpu_crf[0]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[1]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[2]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[3]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[4]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[5]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[6]);
> -        tcg_gen_shli_i32(t0, t0, 4);
> -        tcg_gen_or_i32(t0, t0, cpu_crf[7]);
> -        tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], t0);
> -        tcg_temp_free_i32(t0);
> +        gen_helper_mfocrf(cpu_gpr[rD(ctx->opcode)], cpu_env);
>      }
>  }
>  
> @@ -4257,15 +4240,9 @@ static void gen_mtcrf(DisasContext *ctx)
>              tcg_temp_free_i32(temp);
>          }
>      } else {
> -        TCGv_i32 temp = tcg_temp_new_i32();
> -        tcg_gen_trunc_tl_i32(temp, cpu_gpr[rS(ctx->opcode)]);
> -        for (crn = 0 ; crn < 8 ; crn++) {
> -            if (crm & (1 << crn)) {
> -                    tcg_gen_shri_i32(cpu_crf[7 - crn], temp, crn * 4);
> -                    tcg_gen_andi_i32(cpu_crf[7 - crn], cpu_crf[7 - crn], 0xf);
> -            }
> -        }
> -        tcg_temp_free_i32(temp);
> +        TCGv_i32 t0 = tcg_const_i32(crm);
> +        gen_helper_mtocrf(cpu_env, cpu_gpr[rS(ctx->opcode)], t0);
> +        tcg_temp_free_i32(t0);
>      }
>  }
>  
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf Paolo Bonzini
@ 2014-09-03 18:29   ` Tom Musta
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:29 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/translate.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 0a85a23..afbd336 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -253,21 +253,19 @@ static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
>  {
>      TCGv_i32 t0 = tcg_temp_new_i32();
>  
> -    if (set_fprf != 0) {
> -        /* This case might be optimized later */
> -        tcg_gen_movi_i32(t0, 1);
> -        gen_helper_compute_fprf(t0, cpu_env, arg, t0);
> -        if (unlikely(set_rc)) {
> -            tcg_gen_mov_i32(cpu_crf[1], t0);
> -        }
> -        gen_helper_float_check_status(cpu_env);
> -    } else if (unlikely(set_rc)) {
> -        /* We always need to compute fpcc */
> -        tcg_gen_movi_i32(t0, 0);
> -        gen_helper_compute_fprf(t0, cpu_env, arg, t0);
> +    if (set_fprf == 0 && !set_rc) {
> +        return;
> +    }
> +
> +    tcg_gen_movi_i32(t0, set_fprf != 0);
> +    gen_helper_compute_fprf(t0, cpu_env, arg, t0);
> +    if (set_rc) {
>          tcg_gen_mov_i32(cpu_crf[1], t0);
>      }
>  
> +    if (set_fprf != 0) {
> +        gen_helper_float_check_status(cpu_env);
> +    }
>      tcg_temp_free_i32(t0);
>  }
>  
> 

This has a leak:

Opcode 3f 07 12 (fc00048e) leaked temporaries

I made this modification on top of your patch to fix it:

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 0911c18..ff9b966 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -251,12 +251,13 @@ static inline void gen_reset_fpstatus(void)

 static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
 {
-    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t0;

     if (set_fprf == 0 && !set_rc) {
         return;
     }

+    t0 = tcg_temp_new_i32();
     tcg_gen_movi_i32(t0, set_fprf != 0);
     gen_helper_compute_fprf(t0, cpu_env, arg, t0);
     if (set_rc) {

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr Paolo Bonzini
@ 2014-09-03 18:58   ` Tom Musta
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-03 18:58 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This patch does not compile for 64 bit targets when TCG debug is enabled -- there are several places in this patch that need to be more explicit about the "i32-ness" of variables.  There is also a leak of temporaries in mfcr.  Details are below.
> ---
>  target-ppc/translate.c | 60 +++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 42 insertions(+), 18 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index afbd336..8def0ae 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -249,6 +249,21 @@ static inline void gen_reset_fpstatus(void)
>      gen_helper_reset_fpstatus(cpu_env);
>  }
>  
> +static inline void gen_op_mfcr(TCGv dest, int first_cr, int shift)

                              --> TCGv_i32 dest

> +{
> +    tcg_gen_shli_i32(dest, cpu_crf[first_cr >> 2], shift);
> +}
> +
> +static inline void gen_op_mtcr(int first_cr, TCGv src, int shift)

                                         -----> TCGv_i32 src
> +{
> +    if (shift) {
> +        tcg_gen_shri_i32(cpu_crf[first_cr >> 2], src, shift);
> +        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], cpu_crf[first_cr >> 2], 0x0F);
> +    } else {
> +        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], src, 0x0F);
> +    }
> +}
> +
>  static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
>  {
>      TCGv_i32 t0 = tcg_temp_new_i32();
> @@ -260,7 +275,7 @@ static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
>      tcg_gen_movi_i32(t0, set_fprf != 0);
>      gen_helper_compute_fprf(t0, cpu_env, arg, t0);
>      if (set_rc) {
> -        tcg_gen_mov_i32(cpu_crf[1], t0);
> +        gen_op_mtcr(4, t0, 0);
>      }
>  
>      if (set_fprf != 0) {
> @@ -2428,6 +2443,7 @@ static void gen_fmrgow(DisasContext *ctx)
>  static void gen_mcrfs(DisasContext *ctx)
>  {
>      TCGv tmp = tcg_temp_new();
> +    TCGv_i32 tmp32 = tcg_temp_new_i32();
>      int bfa;
>  
>      if (unlikely(!ctx->fpu_enabled)) {
> @@ -2436,10 +2452,11 @@ static void gen_mcrfs(DisasContext *ctx)
>      }
>      bfa = 4 * (7 - crfS(ctx->opcode));
>      tcg_gen_shri_tl(tmp, cpu_fpscr, bfa);
> -    tcg_gen_trunc_tl_i32(cpu_crf[crfD(ctx->opcode)], tmp);
> +    tcg_gen_trunc_tl_i32(tmp32, tmp);
>      tcg_temp_free(tmp);
> -    tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)], 0xf);
> +    gen_op_mtcr(crfD(ctx->opcode) << 2, tmp32, 0);
>      tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
> +    tcg_temp_free(tmp32);

  -->  tcg_temp_free_i32(tmp32);

>  }
>  
>  /* mffs */
> @@ -2474,8 +2491,10 @@ static void gen_mtfsb0(DisasContext *ctx)
>          tcg_temp_free_i32(t0);
>      }
>      if (unlikely(Rc(ctx->opcode) != 0)) {
> -        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
> -        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
> +        TCGv_i32 tmp32 = tcg_temp_new_i32();
> +        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
> +        gen_op_mtcr(4, tmp32, FPSCR_OX);
> +        tcg_temp_free_i32(tmp32);
>      }
>  }
>  
> @@ -2500,8 +2519,10 @@ static void gen_mtfsb1(DisasContext *ctx)
>          tcg_temp_free_i32(t0);
>      }
>      if (unlikely(Rc(ctx->opcode) != 0)) {
> -        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
> -        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
> +        TCGv_i32 tmp32 = tcg_temp_new_i32();
> +        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
> +        gen_op_mtcr(4, tmp32, FPSCR_OX);
> +        tcg_temp_free_i32(tmp32);
>      }
>      /* We can raise a differed exception */
>      gen_helper_float_check_status(cpu_env);
> @@ -2535,8 +2556,10 @@ static void gen_mtfsf(DisasContext *ctx)
>      gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
>      tcg_temp_free_i32(t0);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
> -        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
> -        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
> +        TCGv_i32 tmp32 = tcg_temp_new_i32();
> +        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
> +        gen_op_mtcr(4, tmp32, FPSCR_OX);
> +        tcg_temp_free_i32(tmp32);
>      }
>      /* We can raise a differed exception */
>      gen_helper_float_check_status(cpu_env);
> @@ -2569,8 +2592,10 @@ static void gen_mtfsfi(DisasContext *ctx)
>      tcg_temp_free_i64(t0);
>      tcg_temp_free_i32(t1);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
> -        tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
> -        tcg_gen_shri_i32(cpu_crf[1], cpu_crf[1], FPSCR_OX);
> +        TCGv_i32 tmp32 = tcg_temp_new_i32();
> +        tcg_gen_trunc_tl_i32(tmp32, cpu_fpscr);
> +        gen_op_mtcr(4, tmp32, FPSCR_OX);
> +        tcg_temp_free_i32(tmp32);
>      }
>      /* We can raise a differed exception */
>      gen_helper_float_check_status(cpu_env);
> @@ -4137,10 +4162,10 @@ static void gen_mfcr(DisasContext *ctx)
>      if (likely(ctx->opcode & 0x00100000)) {
>          crm = CRM(ctx->opcode);
>          if (likely(crm && ((crm & (crm - 1)) == 0))) {
> +            TCGv_i32 t0 = tcg_temp_new_i32();
>              crn = ctz32 (crm);
> -            tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], cpu_crf[7 - crn]);
> -            tcg_gen_shli_tl(cpu_gpr[rD(ctx->opcode)],
> -                            cpu_gpr[rD(ctx->opcode)], crn * 4);
> +            gen_op_mfcr(t0, (7 - crn) * 4, crn * 4);
> +            tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], t0);

               tcg_temp_free_i32(t0);  <<<<<<<< LEAKS WITHOUT THIS <<<<<<<<<<<<<<<
>          }
>      } else {
>          gen_helper_mfocrf(cpu_gpr[rD(ctx->opcode)], cpu_env);
> @@ -4233,8 +4258,7 @@ static void gen_mtcrf(DisasContext *ctx)
>              TCGv_i32 temp = tcg_temp_new_i32();
>              crn = ctz32 (crm);
>              tcg_gen_trunc_tl_i32(temp, cpu_gpr[rS(ctx->opcode)]);
> -            tcg_gen_shri_i32(temp, temp, crn * 4);
> -            tcg_gen_andi_i32(cpu_crf[7 - crn], temp, 0xf);
> +            gen_op_mtcr((7 - crn) * 4, temp, crn * 4);
>              tcg_temp_free_i32(temp);
>          }
>      } else {
> @@ -8159,13 +8183,13 @@ static void gen_set_cr6_from_fpscr(DisasContext *ctx)
>  {
>      TCGv_i32 tmp = tcg_temp_new_i32();
>      tcg_gen_trunc_tl_i32(tmp, cpu_fpscr);
> -    tcg_gen_shri_i32(cpu_crf[1], tmp, 28);
> +    gen_op_mtcr(4, tmp, 28);
>      tcg_temp_free_i32(tmp);
>  }
>  #else
>  static void gen_set_cr6_from_fpscr(DisasContext *ctx)
>  {
> -        tcg_gen_shri_tl(cpu_crf[1], cpu_fpscr, 28);
> +    gen_op_mtcr(4, cpu_fpscr, 28);
>  }
>  #endif
>  
> 


In case it is useful, here is my overall amendment to your patch:

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index e67f95c..f847432 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -249,12 +249,12 @@ static inline void gen_reset_fpstatus(void)
     gen_helper_reset_fpstatus(cpu_env);
 }

-static inline void gen_op_mfcr(TCGv dest, int first_cr, int shift)
+static inline void gen_op_mfcr(TCGv_i32 dest, int first_cr, int shift)
 {
     tcg_gen_shli_i32(dest, cpu_crf[first_cr >> 2], shift);
 }

-static inline void gen_op_mtcr(int first_cr, TCGv src, int shift)
+static inline void gen_op_mtcr(int first_cr, TCGv_i32 src, int shift)
 {
     if (shift) {
         tcg_gen_shri_i32(cpu_crf[first_cr >> 2], src, shift);
@@ -2498,7 +2498,7 @@ static void gen_mcrfs(DisasContext *ctx)
     tcg_temp_free(tmp);
     gen_op_mtcr(crfD(ctx->opcode) << 2, tmp32, 0);
     tcg_gen_andi_tl(cpu_fpscr, cpu_fpscr, ~(0xF << bfa));
-    tcg_temp_free(tmp32);
+    tcg_temp_free_i32(tmp32);
 }

 /* mffs */
@@ -4208,6 +4208,7 @@ static void gen_mfcr(DisasContext *ctx)
             crn = ctz32 (crm);
             gen_op_mfcr(t0, (7 - crn) * 4, crn * 4);
             tcg_gen_extu_i32_tl(cpu_gpr[rD(ctx->opcode)], t0);
+            tcg_temp_free_i32(t0);
         }
     } else {
         gen_helper_mfocrf(cpu_gpr[rD(ctx->opcode)], cpu_env);

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
  2014-08-29 18:30   ` Richard Henderson
@ 2014-09-03 19:41   ` Tom Musta
  2014-09-15 13:39     ` Paolo Bonzini
  1 sibling, 1 reply; 50+ messages in thread
From: Tom Musta @ 2014-09-03 19:41 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/translate.c | 23 +++++++++++------------
>  1 file changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 67f13f7..48c7b66 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -789,27 +789,26 @@ static void gen_cmpli(DisasContext *ctx)
>  /* isel (PowerPC 2.03 specification) */
>  static void gen_isel(DisasContext *ctx)
>  {
> -    int l1, l2;
>      uint32_t bi = rC(ctx->opcode);
>      uint32_t mask;
>      TCGv_i32 t0;
> -
> -    l1 = gen_new_label();
> -    l2 = gen_new_label();
> +    TCGv t1, true_op, zero;
>  
>      mask = 1 << (3 - (bi & 0x03));
>      t0 = tcg_temp_new_i32();

This leaks t0 (never freed).

>      tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
> -    tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
> +    t1 = tcg_temp_new();
> +    tcg_gen_extu_i32_tl(t1, t0);
> +    zero = tcg_const_tl(0);
>      if (rA(ctx->opcode) == 0)
> -        tcg_gen_movi_tl(cpu_gpr[rD(ctx->opcode)], 0);
> +        true_op = zero;
>      else
> -        tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
> -    tcg_gen_br(l2);
> -    gen_set_label(l1);
> -    tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
> -    gen_set_label(l2);
> -    tcg_temp_free_i32(t0);
> +        true_op = cpu_gpr[rA(ctx->opcode)];
> +
> +    tcg_gen_movcond_tl(cpu_gpr[rD(ctx->opcode)], t1, zero,
> +                       true_op, cpu_gpr[rB(ctx->opcode)], TCG_COND_NE);

This doesnt compile for me ... the order of the arguments does not match what is defined in tcg-op.h.

> +    tcg_temp_free_i32(t1);

Just tcg_temp_free(t1);

> +    tcg_temp_free(zero);
>  }
>  
>  /* cmpb: PowerPC 2.05 specification */
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr Paolo Bonzini
@ 2014-09-03 19:41   ` Tom Musta
  2014-09-05  7:27     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Musta @ 2014-09-03 19:41 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> It sets CR1, not CR6 (and the spec agrees).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/translate.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 8def0ae..67f13f7 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -8179,7 +8179,7 @@ static inline TCGv_ptr gen_fprp_ptr(int reg)
>  }
>  
>  #if defined(TARGET_PPC64)
> -static void gen_set_cr6_from_fpscr(DisasContext *ctx)
> +static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>  {
>      TCGv_i32 tmp = tcg_temp_new_i32();
>      tcg_gen_trunc_tl_i32(tmp, cpu_fpscr);
> @@ -8187,7 +8187,7 @@ static void gen_set_cr6_from_fpscr(DisasContext *ctx)
>      tcg_temp_free_i32(tmp);
>  }
>  #else
> -static void gen_set_cr6_from_fpscr(DisasContext *ctx)
> +static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>  {
>      gen_op_mtcr(4, cpu_fpscr, 28);
>  }
> @@ -8207,7 +8207,7 @@ static void gen_##name(DisasContext *ctx)        \
>      rb = gen_fprp_ptr(rB(ctx->opcode));          \
>      gen_helper_##name(cpu_env, rd, ra, rb);      \
>      if (unlikely(Rc(ctx->opcode) != 0)) {        \
> -        gen_set_cr6_from_fpscr(ctx);             \
> +        gen_set_cr1_from_fpscr(ctx);             \
>      }                                            \
>      tcg_temp_free_ptr(rd);                       \
>      tcg_temp_free_ptr(ra);                       \
> @@ -8265,7 +8265,7 @@ static void gen_##name(DisasContext *ctx)             \
>      u32_2 = tcg_const_i32(u32f2(ctx->opcode));        \
>      gen_helper_##name(cpu_env, rt, rb, u32_1, u32_2); \
>      if (unlikely(Rc(ctx->opcode) != 0)) {             \
> -        gen_set_cr6_from_fpscr(ctx);                  \
> +        gen_set_cr1_from_fpscr(ctx);                  \
>      }                                                 \
>      tcg_temp_free_ptr(rt);                            \
>      tcg_temp_free_ptr(rb);                            \
> @@ -8289,7 +8289,7 @@ static void gen_##name(DisasContext *ctx)        \
>      i32 = tcg_const_i32(i32fld(ctx->opcode));    \
>      gen_helper_##name(cpu_env, rt, ra, rb, i32); \
>      if (unlikely(Rc(ctx->opcode) != 0)) {        \
> -        gen_set_cr6_from_fpscr(ctx);             \
> +        gen_set_cr1_from_fpscr(ctx);             \
>      }                                            \
>      tcg_temp_free_ptr(rt);                       \
>      tcg_temp_free_ptr(rb);                       \
> @@ -8310,7 +8310,7 @@ static void gen_##name(DisasContext *ctx)        \
>      rb = gen_fprp_ptr(rB(ctx->opcode));          \
>      gen_helper_##name(cpu_env, rt, rb);          \
>      if (unlikely(Rc(ctx->opcode) != 0)) {        \
> -        gen_set_cr6_from_fpscr(ctx);             \
> +        gen_set_cr1_from_fpscr(ctx);             \
>      }                                            \
>      tcg_temp_free_ptr(rt);                       \
>      tcg_temp_free_ptr(rb);                       \
> @@ -8331,7 +8331,7 @@ static void gen_##name(DisasContext *ctx)          \
>      i32 = tcg_const_i32(i32fld(ctx->opcode));      \
>      gen_helper_##name(cpu_env, rt, rs, i32);       \
>      if (unlikely(Rc(ctx->opcode) != 0)) {          \
> -        gen_set_cr6_from_fpscr(ctx);               \
> +        gen_set_cr1_from_fpscr(ctx);               \
>      }                                              \
>      tcg_temp_free_ptr(rt);                         \
>      tcg_temp_free_ptr(rs);                         \
> 

Reviewed-by: Tom Musta <tommusta@gmail.com>
Tested-by: Tom Musta <tommusta@gmail.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift Paolo Bonzini
@ 2014-09-03 20:59   ` Tom Musta
  2014-09-05  7:29     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Musta @ 2014-09-03 20:59 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> This will match the code we use in fpu_helper.c when we flip
> CRF_* bit-endianness.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target-ppc/translate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 48c7b66..4ce7af4 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -794,7 +794,7 @@ static void gen_isel(DisasContext *ctx)
>      TCGv_i32 t0;
>      TCGv t1, true_op, zero;
>  
> -    mask = 1 << (3 - (bi & 0x03));
> +    mask = 0x08 >> (bi & 0x03);
>      t0 = tcg_temp_new_i32();
>      tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
>      t1 = tcg_temp_new();
> @@ -3870,7 +3870,7 @@ static inline void gen_bcond(DisasContext *ctx, int type)
>      if ((bo & 0x10) == 0) {
>          /* Test CR */
>          uint32_t bi = BI(ctx->opcode);
> -        uint32_t mask = 1 << (3 - (bi & 0x03));
> +        uint32_t mask = 0x08 >> (bi & 0x03);
>          TCGv_i32 temp = tcg_temp_new_i32();
>  
>          if (bo & 0x8) {
> @@ -3952,7 +3952,7 @@ static void glue(gen_, name)(DisasContext *ctx)
>      else                                                                      \
>          tcg_gen_mov_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2]);                 \
>      tcg_op(t0, t0, t1);                                                       \
> -    bitmask = 1 << (3 - (crbD(ctx->opcode) & 0x03));                          \
> +    bitmask = 0x08 >> (crbD(ctx->opcode) & 0x03);                             \
>      tcg_gen_andi_i32(t0, t0, bitmask);                                        \
>      tcg_gen_andi_i32(t1, cpu_crf[crbD(ctx->opcode) >> 2], ~bitmask);          \
>      tcg_gen_or_i32(cpu_crf[crbD(ctx->opcode) >> 2], t0, t1);                  \
> 

Reviewed-by: Tom Musta <tommusta@gmail.com>
Tested-by: Tom Musta <tommusta@gmail.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf Paolo Bonzini
@ 2014-09-04 18:26   ` Tom Musta
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-04 18:26 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> These two functions will group together four CR bits into a single
> value, once we change the representation of condition registers.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  linux-user/elfload.c     |  2 +-
>  linux-user/main.c        |  2 +-
>  linux-user/signal.c      |  4 ++--
>  monitor.c                |  2 +-
>  target-ppc/cpu.h         | 10 ++++++++++
>  target-ppc/excp_helper.c |  2 +-
>  target-ppc/fpu_helper.c  |  6 ++++--
>  target-ppc/gdbstub.c     |  4 ++--
>  target-ppc/int_helper.c  | 16 ++++++++--------
>  target-ppc/kvm.c         |  4 ++--
>  target-ppc/translate.c   | 13 +++++++------
>  11 files changed, 39 insertions(+), 26 deletions(-)
> 

The patch doesn't pass checkpatch.pl

> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index bea803b..3769ae6 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -858,7 +858,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, const CPUPPCState *en
>      (*regs)[37] = tswapreg(env->xer);
>  
>      for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
> -        ccr |= env->crf[i] << (32 - ((i + 1) * 4));
> +        ccr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
>      }
>      (*regs)[38] = tswapreg(ccr);
>  }
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 472a16d..152c031 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -1550,7 +1550,7 @@ static int do_store_exclusive(CPUPPCState *env)
>                  }
>              }
>          }
> -        env->crf[0] = (stored << 1) | xer_so;
> +        ppc_set_crf(env, 0, (stored << 1) | xer_so);
>          env->reserve_addr = (target_ulong)-1;
>      }
>      if (!segv) {
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 26929c5..4f5d79f 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -4512,7 +4512,7 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame,
>      __put_user(env->xer, &frame->mc_gregs[TARGET_PT_XER]);
>  
>      for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
> -        ccr |= env->crf[i] << (32 - ((i + 1) * 4));
> +        ccr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
>      }
>      __put_user(ccr, &frame->mc_gregs[TARGET_PT_CCR]);
>  
> @@ -4591,7 +4591,7 @@ static void restore_user_regs(CPUPPCState *env,
>      __get_user(ccr, &frame->mc_gregs[TARGET_PT_CCR]);
>  
>      for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
> -        env->crf[i] = (ccr >> (32 - ((i + 1) * 4))) & 0xf;
> +        ppc_set_crf(env, i, (ccr >> (32 - ((i + 1) * 4))) & 0xf);
>      }
>  
>      if (!sig) {
> diff --git a/monitor.c b/monitor.c
> index ec73dd4..97d72f4 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2968,7 +2968,7 @@ static target_long monitor_get_ccr (const struct MonitorDef *md, int val)
>  
>      u = 0;
>      for (i = 0; i < 8; i++)

ARRAY_SIZE ?

> -        u |= env->crf[i] << (32 - (4 * (i + 1)));
> +        u |= ppc_get_crf(env, i) << (32 - (4 * (i + 1)));
>  
>      return u;
>  }
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index c1cb27f..05c29b2 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -1198,6 +1198,16 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr);
>  
>  void store_fpscr(CPUPPCState *env, uint64_t arg, uint32_t mask);
>  
> +static inline uint32_t ppc_get_crf(const CPUPPCState *env, int i)
> +{
> +    return env->crf[i];
> +}
> +
> +static inline void ppc_set_crf(CPUPPCState *env, int i, uint32_t val)
> +{
> +    env->crf[i] = val;
> +}
> +
>  static inline uint64_t ppc_dump_gpr(CPUPPCState *env, int gprn)
>  {
>      uint64_t gprv;
> diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
> index bf25d44..522fce4 100644
> --- a/target-ppc/excp_helper.c
> +++ b/target-ppc/excp_helper.c
> @@ -504,7 +504,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>                           env->error_code);
>              }
>  #endif
> -            msr |= env->crf[0] << 28;
> +            msr |= ppc_get_crf(env, 0) << 28;
>              msr |= env->error_code; /* key, D/I, S/L bits */
>              /* Set way using a LRU mechanism */
>              msr |= ((env->last_way + 1) & (env->nb_ways - 1)) << 17;
> diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
> index 0fe006a..1ccbcf3 100644
> --- a/target-ppc/fpu_helper.c
> +++ b/target-ppc/fpu_helper.c
> @@ -1099,7 +1099,8 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
>  
>      env->fpscr &= ~(0x0F << FPSCR_FPRF);
>      env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
> -    env->crf[crfD] = (1 << ret);
> +    ppc_set_crf(env, crfD, 1 << ret);
> +
>      if (unlikely(ret == CRF_SO
>                   && (float64_is_signaling_nan(farg1.d) ||
>                       float64_is_signaling_nan(farg2.d)))) {
> @@ -1130,7 +1131,8 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
>  
>      env->fpscr &= ~(0x0F << FPSCR_FPRF);
>      env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
> -    env->crf[crfD] = (1 << ret);
> +    ppc_set_crf(env, crfD, 1 << ret);
> +
>      if (unlikely(ret == CRF_SO)) {
>          if (float64_is_signaling_nan(farg1.d) ||
>              float64_is_signaling_nan(farg2.d)) {
> diff --git a/target-ppc/gdbstub.c b/target-ppc/gdbstub.c
> index bad49ae..e0f340c 100644
> --- a/target-ppc/gdbstub.c
> +++ b/target-ppc/gdbstub.c
> @@ -139,7 +139,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
>                  uint32_t cr = 0;
>                  int i;
>                  for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
> -                    cr |= env->crf[i] << (32 - ((i + 1) * 4));
> +                    cr |= ppc_get_crf(env, i) << (32 - ((i + 1) * 4));
>                  }
>                  gdb_get_reg32(mem_buf, cr);
>                  break;
> @@ -247,7 +247,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
>                  uint32_t cr = ldl_p(mem_buf);
>                  int i;
>                  for (i = 0; i < ARRAY_SIZE(env->crf); i++) {
> -                    env->crf[i] = (cr >> (32 - ((i + 1) * 4))) & 0xF;
> +                    ppc_set_crf(env, i, (cr >> (32 - ((i + 1) * 4))) & 0xF);
>                  }
>                  break;
>              }
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 5fa10c7..2287064 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -311,7 +311,7 @@ void helper_mtocrf(CPUPPCState *env, target_ulong cr, uint32_t mask)
>      int i;
>      for (i = 7; i >= 0; i--) {

ARRAY_SIZE ?

>          if (mask & 1) {
> -            env->crf[i] = cr & 0x0F;
> +            ppc_set_crf(env, i, cr & 0x0F);
>          }
>          cr >>= 4;
>          mask >>= 1;
> @@ -323,7 +323,7 @@ target_ulong helper_mfocrf(CPUPPCState *env)
>      uint32_t cr = 0;
>      int i;
>      for (i = 0; i < 8; i++) {

ARRAY_SIZE?

> -        cr |= env->crf[i] << (32 - (i + 1) * 4);
> +        cr |= ppc_get_crf(env, i) << (32 - (i + 1) * 4);
>      }
>      return cr;
>  }
> @@ -679,7 +679,7 @@ VCF(sx, int32_to_float32, s32)
>              none |= result;                                             \
>          }                                                               \
>          if (record) {                                                   \
> -            env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1);       \
> +            ppc_set_crf(env, 6, ((all != 0) << 3) | ((none == 0) << 1)); \
>          }                                                               \
>      }
>  #define VCMP(suffix, compare, element)          \
> @@ -725,7 +725,7 @@ VCMP(gtsd, >, s64)
>              none |= result;                                             \
>          }                                                               \
>          if (record) {                                                   \
> -            env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1);       \
> +            ppc_set_crf(env, 6, ((all != 0) << 3) | ((none == 0) << 1)); \
>          }                                                               \
>      }
>  #define VCMPFP(suffix, compare, order)          \
> @@ -759,7 +759,7 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
>          }
>      }
>      if (record) {
> -        env->crf[6] = (all_in == 0) << 1;
> +        ppc_set_crf(env, 6, (all_in == 0) << 1);
>      }
>  }
>  
> @@ -2580,7 +2580,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
>      for (mask = 0xFF000000; mask != 0; mask = mask >> 8) {
>          if ((high & mask) == 0) {
>              if (update_Rc) {
> -                env->crf[0] = 0x4;
> +                ppc_set_crf(env, 0, 0x4);
>              }
>              goto done;
>          }
> @@ -2589,7 +2589,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
>      for (mask = 0xFF000000; mask != 0; mask = mask >> 8) {
>          if ((low & mask) == 0) {
>              if (update_Rc) {
> -                env->crf[0] = 0x8;
> +                ppc_set_crf(env, 0, 0x8);
>              }
>              goto done;
>          }
> @@ -2597,7 +2597,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
>      }
>      i = 8;
>      if (update_Rc) {
> -        env->crf[0] = 0x2;
> +        ppc_set_crf(env, 0, 0x2);
>      }
>   done:
>      env->xer = (env->xer & ~0x7F) | i;
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 42718f7..a4eca17 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -795,7 +795,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  
>      regs.cr = 0;
>      for (i = 0; i < 8; i++) {

ARRAY_SIZE ?

> -        regs.cr |= (env->crf[i] & 15) << (4 * (7 - i));
> +        regs.cr |= ppc_get_crf(env, i) << (4 * (7 - i));
>      }
>  
>      ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, &regs);
> @@ -914,7 +914,7 @@ int kvm_arch_get_registers(CPUState *cs)
>  
>      cr = regs.cr;
>      for (i = 7; i >= 0; i--) {

ARRAY_SIZE ?

> -        env->crf[i] = cr & 15;
> +        ppc_set_crf(env->cr[i], cr & 15);

This doesn't compile ... did you mean this?

           ppc_set_crf(env, i, cr & 15);

>          cr >>= 4;
>      }
>  
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 4ce7af4..1ed6a8f 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -11071,18 +11071,19 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>              cpu_fprintf(f, "\n");
>      }
>      cpu_fprintf(f, "CR ");
> -    for (i = 0; i < 8; i++)
> -        cpu_fprintf(f, "%01x", env->crf[i]);
> +    for (i = 0; i < 8; i++) {
> +        cpu_fprintf(f, "%01x", ppc_get_crf(env, i));
> +    }
>      cpu_fprintf(f, "  [");
>      for (i = 0; i < 8; i++) {
>          char a = '-';
> -        if (env->crf[i] & 0x08)
> +        if (ppc_get_crf(env, i) & 0x08)
>              a = 'L';
> -        else if (env->crf[i] & 0x04)
> +        else if (ppc_get_crf(env, i) & 0x04)
>              a = 'G';
> -        else if (env->crf[i] & 0x02)
> +        else if (ppc_get_crf(env, i) & 0x02)
>              a = 'E';
> -        cpu_fprintf(f, " %c%c", a, env->crf[i] & 0x01 ? 'O' : ' ');
> +        cpu_fprintf(f, " %c%c", a, ppc_get_crf(env, i) & 0x01 ? 'O' : ' ');
>      }
>      cpu_fprintf(f, " ]             RES " TARGET_FMT_lx "\n",
>                  env->reserve_addr);
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers
  2014-08-28 17:15 ` [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers Paolo Bonzini
@ 2014-09-04 18:27   ` Tom Musta
  2014-09-09 15:44     ` Paolo Bonzini
  2014-09-09 16:03     ` Richard Henderson
  0 siblings, 2 replies; 50+ messages in thread
From: Tom Musta @ 2014-09-04 18:27 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc, Richard Henderson

On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
> This makes comparisons much smaller and faster.  The speedup is
> approximately 10% on user-mode emulation on x86 host, 3-4% on PPC.
> 
> Note that CRF_* constants are flipped to match PowerPC's big
> bit-endianness.  Previously, the CR register was effectively stored
> in mixed endianness, so now there is less indirection going on.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

There are some issues with this patch -- it doesn't compile due to some typing issues.  There are also some functional issues.  Some details are below

(nit) Also it doesnt pass checkpatch.pl.

> ---
>  linux-user/main.c       |   4 +-
>  target-ppc/cpu.h        |  33 ++++--
>  target-ppc/fpu_helper.c |  39 ++----
>  target-ppc/helper.h     |   6 -
>  target-ppc/int_helper.c |   2 +-
>  target-ppc/machine.c    |   9 ++
>  target-ppc/translate.c  | 307 +++++++++++++++++++++++++-----------------------
>  7 files changed, 204 insertions(+), 196 deletions(-)
> 
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 152c031..b403f24 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -1929,7 +1929,7 @@ void cpu_loop(CPUPPCState *env)
>               * PPC ABI uses overflow flag in cr0 to signal an error
>               * in syscalls.
>               */
> -            env->crf[0] &= ~0x1;
> +            env->cr[CRF_SO] = 0;
>              ret = do_syscall(env, env->gpr[0], env->gpr[3], env->gpr[4],
>                               env->gpr[5], env->gpr[6], env->gpr[7],
>                               env->gpr[8], 0, 0);
> @@ -1939,7 +1939,7 @@ void cpu_loop(CPUPPCState *env)
>                  break;
>              }
>              if (ret > (target_ulong)(-515)) {
> -                env->crf[0] |= 0x1;
> +                env->cr[CRF_SO] = 1;
>                  ret = -ret;
>              }
>              env->gpr[3] = ret;
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 05c29b2..67510e8 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -939,7 +939,7 @@ struct CPUPPCState {
>      /* CTR */
>      target_ulong ctr;
>      /* condition register */
> -    uint32_t crf[8];
> +    uint32_t cr[32];
>  #if defined(TARGET_PPC64)
>      /* CFAR */
>      target_ulong cfar;
> @@ -1058,6 +1058,9 @@ struct CPUPPCState {
>      uint64_t dtl_addr, dtl_size;
>  #endif /* TARGET_PPC64 */
>  
> +    /* condition register, for migration compatibility */
> +    uint32_t crf[8];
> +
>      int error_code;
>      uint32_t pending_interrupts;
>  #if !defined(CONFIG_USER_ONLY)
> @@ -1200,12 +1203,20 @@ void store_fpscr(CPUPPCState *env, uint64_t arg, uint32_t mask);
>  
>  static inline uint32_t ppc_get_crf(const CPUPPCState *env, int i)
>  {
> -    return env->crf[i];
> +    uint32_t r;
> +    r = env->cr[i * 4];
> +    r = (r << 1) | (env->cr[i * 4 + 1]);
> +    r = (r << 1) | (env->cr[i * 4 + 2]);
> +    r = (r << 1) | (env->cr[i * 4 + 3]);
> +    return r;
>  }
>  
>  static inline void ppc_set_crf(CPUPPCState *env, int i, uint32_t val)
>  {
> -    env->crf[i] = val;
> +    env->cr[i * 4 + 0] = (val & 0x08) != 0;
> +    env->cr[i * 4 + 1] = (val & 0x04) != 0;
> +    env->cr[i * 4 + 2] = (val & 0x02) != 0;
> +    env->cr[i * 4 + 3] = (val & 0x01) != 0;
>  }
>  
>  static inline uint64_t ppc_dump_gpr(CPUPPCState *env, int gprn)
> @@ -1256,14 +1267,14 @@ static inline int cpu_mmu_index (CPUPPCState *env)
>  
>  /*****************************************************************************/
>  /* CRF definitions */
> -#define CRF_LT        3
> -#define CRF_GT        2
> -#define CRF_EQ        1
> -#define CRF_SO        0
> -#define CRF_CH        (1 << CRF_LT)
> -#define CRF_CL        (1 << CRF_GT)
> -#define CRF_CH_OR_CL  (1 << CRF_EQ)
> -#define CRF_CH_AND_CL (1 << CRF_SO)
> +#define CRF_LT        0
> +#define CRF_GT        1
> +#define CRF_EQ        2
> +#define CRF_SO        3
> +#define CRF_CH        CRF_LT
> +#define CRF_CL        CRF_GT
> +#define CRF_CH_OR_CL  CRF_EQ
> +#define CRF_CH_AND_CL CRF_SO
>  
>  /* XER definitions */
>  #define XER_SO  31
> diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
> index 1ccbcf3..9574ebe 100644
> --- a/target-ppc/fpu_helper.c
> +++ b/target-ppc/fpu_helper.c
> @@ -1098,8 +1098,8 @@ void helper_fcmpu(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
>      }
>  
>      env->fpscr &= ~(0x0F << FPSCR_FPRF);
> -    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
> -    ppc_set_crf(env, crfD, 1 << ret);
> +    env->fpscr |= (0x08 << FPSCR_FPRF) >> ret;
> +    ppc_set_crf(env, crfD, 0x08 >> ret);
>  
>      if (unlikely(ret == CRF_SO
>                   && (float64_is_signaling_nan(farg1.d) ||
> @@ -1130,8 +1130,8 @@ void helper_fcmpo(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
>      }
>  
>      env->fpscr &= ~(0x0F << FPSCR_FPRF);
> -    env->fpscr |= (0x01 << FPSCR_FPRF) << ret;
> -    ppc_set_crf(env, crfD, 1 << ret);
> +    env->fpscr |= (0x08 << FPSCR_FPRF) >> ret;
> +    ppc_set_crf(env, crfD, 0x08 >> ret);
>  
>      if (unlikely(ret == CRF_SO)) {
>          if (float64_is_signaling_nan(farg1.d) ||
> @@ -1403,7 +1403,7 @@ static inline uint32_t efscmplt(CPUPPCState *env, uint32_t op1, uint32_t op2)
>  
>      u1.l = op1;
>      u2.l = op2;
> -    return float32_lt(u1.f, u2.f, &env->vec_status) ? 4 : 0;
> +    return float32_lt(u1.f, u2.f, &env->vec_status);
>  }
>  
>  static inline uint32_t efscmpgt(CPUPPCState *env, uint32_t op1, uint32_t op2)
> @@ -1412,7 +1412,7 @@ static inline uint32_t efscmpgt(CPUPPCState *env, uint32_t op1, uint32_t op2)
>  
>      u1.l = op1;
>      u2.l = op2;
> -    return float32_le(u1.f, u2.f, &env->vec_status) ? 0 : 4;
> +    return !float32_le(u1.f, u2.f, &env->vec_status);
>  }
>  
>  static inline uint32_t efscmpeq(CPUPPCState *env, uint32_t op1, uint32_t op2)
> @@ -1421,7 +1421,7 @@ static inline uint32_t efscmpeq(CPUPPCState *env, uint32_t op1, uint32_t op2)
>  
>      u1.l = op1;
>      u2.l = op2;
> -    return float32_eq(u1.f, u2.f, &env->vec_status) ? 4 : 0;
> +    return float32_eq(u1.f, u2.f, &env->vec_status);
>  }
>  
>  static inline uint32_t efststlt(CPUPPCState *env, uint32_t op1, uint32_t op2)
> @@ -1465,25 +1465,6 @@ static inline uint32_t evcmp_merge(int t0, int t1)
>      return (t0 << 3) | (t1 << 2) | ((t0 | t1) << 1) | (t0 & t1);
>  }
>  
> -#define HELPER_VECTOR_SPE_CMP(name)                                     \
> -    uint32_t helper_ev##name(CPUPPCState *env, uint64_t op1, uint64_t op2) \
> -    {                                                                   \
> -        return evcmp_merge(e##name(env, op1 >> 32, op2 >> 32),          \
> -                           e##name(env, op1, op2));                     \
> -    }
> -/* evfststlt */
> -HELPER_VECTOR_SPE_CMP(fststlt);
> -/* evfststgt */
> -HELPER_VECTOR_SPE_CMP(fststgt);
> -/* evfststeq */
> -HELPER_VECTOR_SPE_CMP(fststeq);
> -/* evfscmplt */
> -HELPER_VECTOR_SPE_CMP(fscmplt);
> -/* evfscmpgt */
> -HELPER_VECTOR_SPE_CMP(fscmpgt);
> -/* evfscmpeq */
> -HELPER_VECTOR_SPE_CMP(fscmpeq);
> -
>  /* Double-precision floating-point conversion */
>  uint64_t helper_efdcfsi(CPUPPCState *env, uint32_t val)
>  {
> @@ -1725,7 +1706,7 @@ uint32_t helper_efdtstlt(CPUPPCState *env, uint64_t op1, uint64_t op2)
>  
>      u1.ll = op1;
>      u2.ll = op2;
> -    return float64_lt(u1.d, u2.d, &env->vec_status) ? 4 : 0;
> +    return float64_lt(u1.d, u2.d, &env->vec_status);
>  }
>  
>  uint32_t helper_efdtstgt(CPUPPCState *env, uint64_t op1, uint64_t op2)
> @@ -1734,7 +1715,7 @@ uint32_t helper_efdtstgt(CPUPPCState *env, uint64_t op1, uint64_t op2)
>  
>      u1.ll = op1;
>      u2.ll = op2;
> -    return float64_le(u1.d, u2.d, &env->vec_status) ? 0 : 4;
> +    return !float64_le(u1.d, u2.d, &env->vec_status);
>  }
>  
>  uint32_t helper_efdtsteq(CPUPPCState *env, uint64_t op1, uint64_t op2)
> @@ -1743,7 +1724,7 @@ uint32_t helper_efdtsteq(CPUPPCState *env, uint64_t op1, uint64_t op2)
>  
>      u1.ll = op1;
>      u2.ll = op2;
> -    return float64_eq_quiet(u1.d, u2.d, &env->vec_status) ? 4 : 0;
> +    return float64_eq_quiet(u1.d, u2.d, &env->vec_status);
>  }
>  
>  uint32_t helper_efdcmplt(CPUPPCState *env, uint64_t op1, uint64_t op2)
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 5342f13..8d6a92b 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -493,12 +493,6 @@ DEF_HELPER_3(efststeq, i32, env, i32, i32)
>  DEF_HELPER_3(efscmplt, i32, env, i32, i32)
>  DEF_HELPER_3(efscmpgt, i32, env, i32, i32)
>  DEF_HELPER_3(efscmpeq, i32, env, i32, i32)
> -DEF_HELPER_3(evfststlt, i32, env, i64, i64)
> -DEF_HELPER_3(evfststgt, i32, env, i64, i64)
> -DEF_HELPER_3(evfststeq, i32, env, i64, i64)
> -DEF_HELPER_3(evfscmplt, i32, env, i64, i64)
> -DEF_HELPER_3(evfscmpgt, i32, env, i64, i64)
> -DEF_HELPER_3(evfscmpeq, i32, env, i64, i64)
>  DEF_HELPER_2(efdcfsi, i64, env, i32)
>  DEF_HELPER_2(efdcfsid, i64, env, i64)
>  DEF_HELPER_2(efdcfui, i64, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 2287064..d3ace6a 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2602,7 +2602,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
>   done:
>      env->xer = (env->xer & ~0x7F) | i;
>      if (update_Rc) {
> -        env->crf[0] |= xer_so;
> +        env->cr[CRF_SO] = xer_so;
>      }
>      return i;
>  }
> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> index c801b82..9fa309a 100644
> --- a/target-ppc/machine.c
> +++ b/target-ppc/machine.c
> @@ -132,6 +132,10 @@ static void cpu_pre_save(void *opaque)
>      CPUPPCState *env = &cpu->env;
>      int i;
>  
> +    for (i = 0; i < 8; i++) {
> +        env->crf[i] = ppc_get_crf(env, i);
> +    }
> +
>      env->spr[SPR_LR] = env->lr;
>      env->spr[SPR_CTR] = env->ctr;
>      env->spr[SPR_XER] = env->xer;
> @@ -165,6 +169,11 @@ static int cpu_post_load(void *opaque, int version_id)
>       * software has to take care of running QEMU in a compatible mode.
>       */
>      env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
> +
> +    for (i = 0; i < 8; i++) {
> +        ppc_set_crf(env, i, env->crf[i]);
> +    }
> +
>      env->lr = env->spr[SPR_LR];
>      env->ctr = env->spr[SPR_CTR];
>      env->xer = env->spr[SPR_XER];
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 1ed6a8f..dd19b39 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -53,13 +53,13 @@ static char cpu_reg_names[10*3 + 22*4 /* GPR */
>      + 10*4 + 22*5 /* FPR */
>      + 2*(10*6 + 22*7) /* AVRh, AVRl */
>      + 10*5 + 22*6 /* VSR */
> -    + 8*5 /* CRF */];
> +    + 32*8 /* CR */];
>  static TCGv cpu_gpr[32];
>  static TCGv cpu_gprh[32];
>  static TCGv_i64 cpu_fpr[32];
>  static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
>  static TCGv_i64 cpu_vsr[32];
> -static TCGv_i32 cpu_crf[8];
> +static TCGv_i32 cpu_cr[32];
>  static TCGv cpu_nip;
>  static TCGv cpu_msr;
>  static TCGv cpu_ctr;
> @@ -89,12 +89,13 @@ void ppc_translate_init(void)
>      p = cpu_reg_names;
>      cpu_reg_names_size = sizeof(cpu_reg_names);
>  
> -    for (i = 0; i < 8; i++) {
> -        snprintf(p, cpu_reg_names_size, "crf%d", i);
> -        cpu_crf[i] = tcg_global_mem_new_i32(TCG_AREG0,
> -                                            offsetof(CPUPPCState, crf[i]), p);
> -        p += 5;
> -        cpu_reg_names_size -= 5;
> +    for (i = 0; i < 32; i++) {
> +        static const char names[] = "lt\0gt\0eq\0so";
> +        snprintf(p, cpu_reg_names_size, "cr%d[%s]", i >> 2, names + (i & 3) * 3);
> +        cpu_cr[i] = tcg_global_mem_new_i32(TCG_AREG0,
> +                                           offsetof(CPUPPCState, cr[i]), p);
> +        p += 8;
> +        cpu_reg_names_size -= 8;
>      }
>  
>      for (i = 0; i < 32; i++) {
> @@ -251,17 +252,30 @@ static inline void gen_reset_fpstatus(void)
>  
>  static inline void gen_op_mfcr(TCGv dest, int first_cr, int shift)
>  {
> -    tcg_gen_shli_i32(dest, cpu_crf[first_cr >> 2], shift);
> +    TCGv_i32 t0 = tcg_temp_new_i32();
> +
> +    tcg_gen_shli_i32(dest, cpu_cr[first_cr + 3], shift);
> +    tcg_gen_shli_i32(t0, cpu_cr[first_cr + 2], shift + 1);
> +    tcg_gen_or_i32(dest, dest, t0);
> +    tcg_gen_shli_i32(t0, cpu_cr[first_cr + 1], shift + 2);
> +    tcg_gen_or_i32(dest, dest, t0);
> +    tcg_gen_shli_i32(t0, cpu_cr[first_cr], shift + 3);

This leaks t0.

>  }
>  
>  static inline void gen_op_mtcr(int first_cr, TCGv src, int shift)
>  {
>      if (shift) {
> -        tcg_gen_shri_i32(cpu_crf[first_cr >> 2], src, shift);
> -        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], cpu_crf[first_cr >> 2], 0x0F);
> +        tcg_gen_shri_i32(cpu_cr[first_cr + 3], src, shift);
> +        tcg_gen_andi_i32(cpu_cr[first_cr + 3], cpu_cr[first_cr + 3], 1);
>      } else {
> -        tcg_gen_andi_i32(cpu_crf[first_cr >> 2], src, 0x0F);
> +        tcg_gen_andi_i32(cpu_cr[first_cr + 3], src, 1);
>      }
> +    tcg_gen_shri_i32(cpu_cr[first_cr + 2], src, shift + 1);
> +    tcg_gen_andi_i32(cpu_cr[first_cr + 2], cpu_cr[first_cr + 2], 1);
> +    tcg_gen_shri_i32(cpu_cr[first_cr + 1], src, shift + 2);
> +    tcg_gen_andi_i32(cpu_cr[first_cr + 1], cpu_cr[first_cr + 1], 1);
> +    tcg_gen_shri_i32(cpu_cr[first_cr], src, shift + 3);
> +    tcg_gen_andi_i32(cpu_cr[first_cr], cpu_cr[first_cr], 1);
>  }
>  
>  static inline void gen_compute_fprf(TCGv_i64 arg, int set_fprf, int set_rc)
> @@ -675,27 +689,19 @@ static bool is_user_mode(DisasContext *ctx)
>  static inline void gen_op_cmp(TCGv arg0, TCGv arg1, int s, int crf)
>  {
>      TCGv t0 = tcg_temp_new();
> -    TCGv_i32 t1 = tcg_temp_new_i32();
>  
> -    tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_SO], cpu_so);

This looks correct to me but is causing problems.  The above statement seems to get dropped in the generated asm ... at least on a PPC host:

IN:
0x00000000100005b4:  cmpw    cr3,r30,r29

OUT: [size=160]
0x6041ad30:  lwz     r14,-4(r27)
0x6041ad34:  cmpwi   cr7,r14,0
0x6041ad38:  bne-    cr7,0x6041adbc
0x6041ad3c:  ld      r14,240(r27)   <<< r30
0x6041ad40:  ld      r15,232(r27)   <<< r31
0x6041ad44:  cmpw    cr7,r14,r15    <<< this is the TCG_COND_LTx code
0x6041ad48:  li      r16,1
0x6041ad4c:  li      r0,0
0x6041ad50:  isel    r16,r16,r0,28
0x6041ad54:  stw     r16,576(r27)   <<< store cpu_cr[LT]
0x6041ad58:  cmpw    cr7,r14,r15
0x6041ad5c:  li      r16,1
0x6041ad60:  li      r0,0
0x6041ad64:  isel    r16,r16,r0,29
0x6041ad68:  stw     r16,580(r27)   <<< store cpu_cr[GT]
0x6041ad6c:  cmplw   cr7,r14,r15
0x6041ad70:  li      r14,1
0x6041ad74:  li      r0,0
0x6041ad78:  isel    r14,r14,r0,30
0x6041ad7c:  stw     r14,584(r27)   <<< store cpu_cr[EQ]
0x6041ad80:  .long 0x0
0x6041ad84:  .long 0x0

Richard:  any ideas or hints on how to proceed?
>  
>      tcg_gen_setcond_tl((s ? TCG_COND_LT: TCG_COND_LTU), t0, arg0, arg1);
> -    tcg_gen_trunc_tl_i32(t1, t0);
> -    tcg_gen_shli_i32(t1, t1, CRF_LT);
> -    tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], t1);
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_LT], t0);
>  
>      tcg_gen_setcond_tl((s ? TCG_COND_GT: TCG_COND_GTU), t0, arg0, arg1);
> -    tcg_gen_trunc_tl_i32(t1, t0);
> -    tcg_gen_shli_i32(t1, t1, CRF_GT);
> -    tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], t1);
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_GT], t0);
>  
>      tcg_gen_setcond_tl(TCG_COND_EQ, t0, arg0, arg1);
> -    tcg_gen_trunc_tl_i32(t1, t0);
> -    tcg_gen_shli_i32(t1, t1, CRF_EQ);
> -    tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], t1);
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_EQ], t0);
>  
>      tcg_temp_free(t0);
> -    tcg_temp_free_i32(t1);
>  }
>  
>  static inline void gen_op_cmpi(TCGv arg0, target_ulong arg1, int s, int crf)
> @@ -707,17 +713,22 @@ static inline void gen_op_cmpi(TCGv arg0, target_ulong arg1, int s, int crf)
>  
>  static inline void gen_op_cmp32(TCGv arg0, TCGv arg1, int s, int crf)
>  {
> -    TCGv t0, t1;
> +    TCGv_i32 t0, t1;
> +
>      t0 = tcg_temp_new();
>      t1 = tcg_temp_new();

Needs to be tcg_temp_new_i32() ....

> -    if (s) {
> -        tcg_gen_ext32s_tl(t0, arg0);
> -        tcg_gen_ext32s_tl(t1, arg1);
> -    } else {
> -        tcg_gen_ext32u_tl(t0, arg0);
> -        tcg_gen_ext32u_tl(t1, arg1);
> -    }
> -    gen_op_cmp(t0, t1, s, crf);
> +    tcg_gen_trunc_tl_i32(t0, arg0);
> +    tcg_gen_trunc_tl_i32(t1, arg1);
> +
> +    tcg_gen_setcond_i32((s ? TCG_COND_LT: TCG_COND_LTU), 
> +                        cpu_cr[crf * 4 + CRF_LT], t0, t1);
> +
> +    tcg_gen_setcond_i32((s ? TCG_COND_GT: TCG_COND_GTU), 
> +                        cpu_cr[crf * 4 + CRF_GT], t0, t1);
> +
> +    tcg_gen_setcond_i32(TCG_COND_EQ, 
> +                        cpu_cr[crf * 4 + CRF_EQ], t0, t1);
> +
>      tcg_temp_free(t1);
>      tcg_temp_free(t0);

... and tcg_temp_free_i32()

>  }
> @@ -790,15 +801,10 @@ static void gen_cmpli(DisasContext *ctx)
>  static void gen_isel(DisasContext *ctx)
>  {
>      uint32_t bi = rC(ctx->opcode);
> -    uint32_t mask;
> -    TCGv_i32 t0;
>      TCGv t1, true_op, zero;
>  
> -    mask = 0x08 >> (bi & 0x03);
> -    t0 = tcg_temp_new_i32();
> -    tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
>      t1 = tcg_temp_new();
> -    tcg_gen_extu_i32_tl(t1, t0);
> +    tcg_gen_extu_i32_tl(t1, cpu_cr[bi]);
>      zero = tcg_const_tl(0);
>      if (rA(ctx->opcode) == 0)
>          true_op = zero;




> @@ -2288,21 +2294,29 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
>  
>  static void gen_ftdiv(DisasContext *ctx)
>  {
> +    TCGv_i32 crf;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
> +    crf = tcg_temp_new_i32();
> +    gen_helper_ftdiv(crf, cpu_fpr[rA(ctx->opcode)],
>                       cpu_fpr[rB(ctx->opcode)]);
> +    gen_op_mtcr(crfD(ctx->opcode) << 2, crf, 0);
> +    tcg_temp_free_i32(crf);
>  }
>  
>  static void gen_ftsqrt(DisasContext *ctx)
>  {
> +    TCGv_i32 crf;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
> +    crf = tcg_temp_new_i32();
> +    gen_helper_ftsqrt(crf, cpu_fpr[rB(ctx->opcode)]);
> +    gen_op_mtcr(crfD(ctx->opcode) << 2, crf, 0);
> +    tcg_temp_free_i32(crf);
>  }
>  
>  
> @@ -3300,10 +3314,13 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
>  {
>      int l1;
>  
> -    tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
> +    tcg_gen_trunc_tl_i32(cpu_cr[CRF_SO], cpu_so);
> +    tcg_gen_movi_i32(cpu_cr[CRF_LT], 0);
> +    tcg_gen_movi_i32(cpu_cr[CRF_EQ], 0);
> +    tcg_gen_movi_i32(cpu_cr[CRF_GT], 0);
>      l1 = gen_new_label();
>      tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
> -    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
> +    tcg_gen_movi_i32(cpu_cr[CRF_EQ], 1);
>  #if defined(TARGET_PPC64)
>      if (size == 8) {
>          gen_qemu_st64(ctx, cpu_gpr[reg], EA);
> @@ -3870,17 +3887,11 @@ static inline void gen_bcond(DisasContext *ctx, int type)
>      if ((bo & 0x10) == 0) {
>          /* Test CR */
>          uint32_t bi = BI(ctx->opcode);
> -        uint32_t mask = 0x08 >> (bi & 0x03);
> -        TCGv_i32 temp = tcg_temp_new_i32();
> -
>          if (bo & 0x8) {
> -            tcg_gen_andi_i32(temp, cpu_crf[bi >> 2], mask);
> -            tcg_gen_brcondi_i32(TCG_COND_EQ, temp, 0, l1);
> +            tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cr[bi], 0, l1);
>          } else {
> -            tcg_gen_andi_i32(temp, cpu_crf[bi >> 2], mask);
> -            tcg_gen_brcondi_i32(TCG_COND_NE, temp, 0, l1);
> +            tcg_gen_brcondi_i32(TCG_COND_NE, cpu_cr[bi], 0, l1);
>          }
> -        tcg_temp_free_i32(temp);
>      }
>      gen_update_cfar(ctx, ctx->nip);
>      if (type == BCOND_IM) {
> @@ -3929,35 +3940,11 @@ static void gen_bctar(DisasContext *ctx)
>  }
>  
>  /***                      Condition register logical                       ***/
> -#define GEN_CRLOGIC(name, tcg_op, opc)                                        \
> -static void glue(gen_, name)(DisasContext *ctx)                                       \
> -{                                                                             \
> -    uint8_t bitmask;                                                          \
> -    int sh;                                                                   \
> -    TCGv_i32 t0, t1;                                                          \
> -    sh = (crbD(ctx->opcode) & 0x03) - (crbA(ctx->opcode) & 0x03);             \
> -    t0 = tcg_temp_new_i32();                                                  \
> -    if (sh > 0)                                                               \
> -        tcg_gen_shri_i32(t0, cpu_crf[crbA(ctx->opcode) >> 2], sh);            \
> -    else if (sh < 0)                                                          \
> -        tcg_gen_shli_i32(t0, cpu_crf[crbA(ctx->opcode) >> 2], -sh);           \
> -    else                                                                      \
> -        tcg_gen_mov_i32(t0, cpu_crf[crbA(ctx->opcode) >> 2]);                 \
> -    t1 = tcg_temp_new_i32();                                                  \
> -    sh = (crbD(ctx->opcode) & 0x03) - (crbB(ctx->opcode) & 0x03);             \
> -    if (sh > 0)                                                               \
> -        tcg_gen_shri_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2], sh);            \
> -    else if (sh < 0)                                                          \
> -        tcg_gen_shli_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2], -sh);           \
> -    else                                                                      \
> -        tcg_gen_mov_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2]);                 \
> -    tcg_op(t0, t0, t1);                                                       \
> -    bitmask = 0x08 >> (crbD(ctx->opcode) & 0x03);                             \
> -    tcg_gen_andi_i32(t0, t0, bitmask);                                        \
> -    tcg_gen_andi_i32(t1, cpu_crf[crbD(ctx->opcode) >> 2], ~bitmask);          \
> -    tcg_gen_or_i32(cpu_crf[crbD(ctx->opcode) >> 2], t0, t1);                  \
> -    tcg_temp_free_i32(t0);                                                    \
> -    tcg_temp_free_i32(t1);                                                    \
> +#define GEN_CRLOGIC(name, tcg_op, opc)                                         \
> +static void glue(gen_, name)(DisasContext *ctx)                                \
> +{                                                                              \
> +    tcg_op(cpu_cr[crbD(ctx->opcode)], cpu_cr[crbA(ctx->opcode)],               \
> +           cpu_cr[crbB(ctx->opcode)]);                                         \
>  }
>  

This is a very nice cleanup ... but it oversteers just a little.  For some CR logical instructions, the generated code can produce non-zero bits in the i32 cr variable in places other than the LSB.
For example, consider crnand, which produces the following on a PPC host:

IN:
0x0000000010000578:  crnand  4*cr7+so,4*cr7+lt,4*cr7+eq

OUT: [size=112]
0x6041a630:  lwz     r14,-4(r27)
0x6041a634:  cmpwi   cr7,r14,0
0x6041a638:  bne-    cr7,0x6041a68c
0x6041a63c:  lwz     r14,640(r27)
0x6041a640:  lwz     r15,648(r27)
0x6041a644:  nand    r14,r14,r15
0x6041a648:  andi.   r14,r14,1
0x6041a64c:  stw     r14,652(r27)
0x6041a650:  .long 0x0
0x6041a654:  .long 0x0
0x6041a658:  .long 0x0
0x6041a65c:  .long 0x0

The host nand operation will always produce an i32 value that has 1s in bits 0-30, since they are presumably zero.  A brute-force fix would be to add a tcg_gen_andi_i32(D,D,1) to your macro.  But I think this is required only for a subset of the
instructions (crnand, crnor, creqv, crorc).

>  /* crand */
> @@ -3980,7 +3967,11 @@ GEN_CRLOGIC(crxor, tcg_gen_xor_i32, 0x06);
>  /* mcrf */
>  static void gen_mcrf(DisasContext *ctx)
>  {
> -    tcg_gen_mov_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfS(ctx->opcode)]);
> +    int i;
> +    for (i = 0; i < 4; i++) {
> +        tcg_gen_mov_i32(cpu_cr[crfD(ctx->opcode) * 4 + i],
> +                        cpu_cr[crfS(ctx->opcode) * 4 + i]);
> +    }
>  }
>  
>  /***                           System linkage                              ***/
> @@ -4133,20 +4124,12 @@ static void gen_write_xer(TCGv src)
>  /* mcrxr */
>  static void gen_mcrxr(DisasContext *ctx)
>  {
> -    TCGv_i32 t0 = tcg_temp_new_i32();
> -    TCGv_i32 t1 = tcg_temp_new_i32();
> -    TCGv_i32 dst = cpu_crf[crfD(ctx->opcode)];
> -
> -    tcg_gen_trunc_tl_i32(t0, cpu_so);
> -    tcg_gen_trunc_tl_i32(t1, cpu_ov);
> -    tcg_gen_trunc_tl_i32(dst, cpu_ca);
> -    tcg_gen_shli_i32(t0, t0, 3);
> -    tcg_gen_shli_i32(t1, t1, 2);
> -    tcg_gen_shli_i32(dst, dst, 1);
> -    tcg_gen_or_i32(dst, dst, t0);
> -    tcg_gen_or_i32(dst, dst, t1);
> -    tcg_temp_free_i32(t0);
> -    tcg_temp_free_i32(t1);
> +    int crf = crfD(ctx->opcode);
> +
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_LT], cpu_so);
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_GT], cpu_ov);
> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_EQ], cpu_ca);
> +    tcg_gen_movi_i32(cpu_cr[crf * 4 + CRF_SO], 0);
>  
>      tcg_gen_movi_tl(cpu_so, 0);
>      tcg_gen_movi_tl(cpu_ov, 0);
> @@ -6320,11 +6303,13 @@ static void gen_tlbsx_40x(DisasContext *ctx)
>      gen_helper_4xx_tlbsx(cpu_gpr[rD(ctx->opcode)], cpu_env, t0);
>      tcg_temp_free(t0);
>      if (Rc(ctx->opcode)) {
> -        int l1 = gen_new_label();
> -        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
> -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[rD(ctx->opcode)], -1, l1);
> -        tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 0x02);
> -        gen_set_label(l1);
> +        t0 = tcg_temp_new();
> +        tcg_gen_trunc_tl_i32(cpu_cr[CRF_SO], cpu_so);
> +        tcg_gen_movi_i32(cpu_cr[CRF_LT], 0);
> +        tcg_gen_movi_i32(cpu_cr[CRF_GT], 0);
> +        tcg_gen_setcondi_tl(TCG_COND_EQ, t0, cpu_gpr[rD(ctx->opcode)], -1);
> +        tcg_gen_trunc_tl_i32(cpu_cr[CRF_EQ], t0);
> +        tcg_temp_free(t0);
>      }
>  #endif
>  }
> @@ -6401,11 +6386,13 @@ static void gen_tlbsx_440(DisasContext *ctx)
>      gen_helper_440_tlbsx(cpu_gpr[rD(ctx->opcode)], cpu_env, t0);
>      tcg_temp_free(t0);
>      if (Rc(ctx->opcode)) {
> -        int l1 = gen_new_label();
> -        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
> -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[rD(ctx->opcode)], -1, l1);
> -        tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 0x02);
> -        gen_set_label(l1);
> +        t0 = tcg_temp_new();
> +        tcg_gen_trunc_tl_i32(cpu_cr[CRF_SO], cpu_so);
> +        tcg_gen_movi_i32(cpu_cr[CRF_LT], 0);
> +        tcg_gen_movi_i32(cpu_cr[CRF_GT], 0);
> +        tcg_gen_setcondi_tl(TCG_COND_EQ, t0, cpu_gpr[rD(ctx->opcode)], -1);
> +        tcg_gen_trunc_tl_i32(cpu_cr[CRF_EQ], t0);
> +        tcg_temp_free(t0);
>      }
>  #endif
>  }
> @@ -7371,7 +7358,7 @@ GEN_VXFORM(vpmsumd, 4, 19)
>  static void gen_##op(DisasContext *ctx)             \
>  {                                                   \
>      TCGv_ptr ra, rb, rd;                            \
> -    TCGv_i32 ps;                                    \
> +    TCGv_i32 ps, crf;                               \
>                                                      \
>      if (unlikely(!ctx->altivec_enabled)) {          \
>          gen_exception(ctx, POWERPC_EXCP_VPU);       \
> @@ -7383,13 +7370,16 @@ static void gen_##op(DisasContext *ctx)             \
>      rd = gen_avr_ptr(rD(ctx->opcode));              \
>                                                      \
>      ps = tcg_const_i32((ctx->opcode & 0x200) != 0); \
> +    crf = tcg_temp_new_i32();                       \
>                                                      \
> -    gen_helper_##op(cpu_crf[6], rd, ra, rb, ps);    \
> +    gen_helper_##op(crf, rd, ra, rb, ps);           \
> +    gen_op_mtcr(6 << 2, crf, 0);                    \
>                                                      \
>      tcg_temp_free_ptr(ra);                          \
>      tcg_temp_free_ptr(rb);                          \
>      tcg_temp_free_ptr(rd);                          \
>      tcg_temp_free_i32(ps);                          \
> +    tcg_temp_free_ptr(crf);                         \

tcg_temp_free_i32() ?

>  }
>  
>  GEN_BCD(bcdadd)
> @@ -8217,6 +8207,7 @@ static void gen_##name(DisasContext *ctx)        \
>  static void gen_##name(DisasContext *ctx)         \
>  {                                                 \
>      TCGv_ptr ra, rb;                              \
> +    TCGv_i32 tmp;                                 \
>      if (unlikely(!ctx->fpu_enabled)) {            \
>          gen_exception(ctx, POWERPC_EXCP_FPU);     \
>          return;                                   \
> @@ -8224,8 +8215,10 @@ static void gen_##name(DisasContext *ctx)         \
>      gen_update_nip(ctx, ctx->nip - 4);            \
>      ra = gen_fprp_ptr(rA(ctx->opcode));           \
>      rb = gen_fprp_ptr(rB(ctx->opcode));           \
> -    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], \
> -                      cpu_env, ra, rb);           \
> +    tmp = tcg_temp_new_i32();                     \
> +    gen_helper_##name(tmp, cpu_env, ra, rb);      \
> +    gen_op_mtcr(crfD(ctx->opcode) << 2, tmp, 0);  \
> +    tcg_temp_free_i32(tmp);                       \
>      tcg_temp_free_ptr(ra);                        \
>      tcg_temp_free_ptr(rb);                        \
>  }
> @@ -8234,7 +8227,7 @@ static void gen_##name(DisasContext *ctx)         \
>  static void gen_##name(DisasContext *ctx)         \
>  {                                                 \
>      TCGv_ptr ra;                                  \
> -    TCGv_i32 dcm;                                 \
> +    TCGv_i32 dcm, tmp;                            \
>      if (unlikely(!ctx->fpu_enabled)) {            \
>          gen_exception(ctx, POWERPC_EXCP_FPU);     \
>          return;                                   \
> @@ -8242,8 +8235,10 @@ static void gen_##name(DisasContext *ctx)         \
>      gen_update_nip(ctx, ctx->nip - 4);            \
>      ra = gen_fprp_ptr(rA(ctx->opcode));           \
>      dcm = tcg_const_i32(DCM(ctx->opcode));        \
> -    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], \
> -                      cpu_env, ra, dcm);          \
> +    tmp = tcg_temp_new_i32();                     \
> +    gen_helper_##name(tmp, cpu_env, ra, dcm);     \
> +    gen_op_mtcr(crfD(ctx->opcode) << 2, tmp, 0);  \
> +    tcg_temp_free_i32(tmp);                       \
>      tcg_temp_free_ptr(ra);                        \
>      tcg_temp_free_i32(dcm);                       \
>  }
> @@ -8668,37 +8663,32 @@ GEN_SPEOP_ARITH_IMM2(evsubifw, tcg_gen_subi_i32);
>  #define GEN_SPEOP_COMP(name, tcg_cond)                                        \
>  static inline void gen_##name(DisasContext *ctx)                              \
>  {                                                                             \
> +    TCGv tmp = tcg_temp_new();                                                \
> +                                                                              \
>      if (unlikely(!ctx->spe_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_SPEU);                                \
>          return;                                                               \
>      }                                                                         \
> -    int l1 = gen_new_label();                                                 \
> -    int l2 = gen_new_label();                                                 \
> -    int l3 = gen_new_label();                                                 \
> -    int l4 = gen_new_label();                                                 \
>                                                                                \
>      tcg_gen_ext32s_tl(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);    \
>      tcg_gen_ext32s_tl(cpu_gpr[rB(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);    \
>      tcg_gen_ext32s_tl(cpu_gprh[rA(ctx->opcode)], cpu_gprh[rA(ctx->opcode)]);  \
>      tcg_gen_ext32s_tl(cpu_gprh[rB(ctx->opcode)], cpu_gprh[rB(ctx->opcode)]);  \
>                                                                                \
> -    tcg_gen_brcond_tl(tcg_cond, cpu_gpr[rA(ctx->opcode)],                     \
> -                       cpu_gpr[rB(ctx->opcode)], l1);                         \
> -    tcg_gen_movi_i32(cpu_crf[crfD(ctx->opcode)], 0);                          \
> -    tcg_gen_br(l2);                                                           \
> -    gen_set_label(l1);                                                        \
> -    tcg_gen_movi_i32(cpu_crf[crfD(ctx->opcode)],                              \
> -                     CRF_CL | CRF_CH_OR_CL | CRF_CH_AND_CL);                  \
> -    gen_set_label(l2);                                                        \
> -    tcg_gen_brcond_tl(tcg_cond, cpu_gprh[rA(ctx->opcode)],                    \
> -                       cpu_gprh[rB(ctx->opcode)], l3);                        \
> -    tcg_gen_andi_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)],  \
> -                     ~(CRF_CH | CRF_CH_AND_CL));                              \
> -    tcg_gen_br(l4);                                                           \
> -    gen_set_label(l3);                                                        \
> -    tcg_gen_ori_i32(cpu_crf[crfD(ctx->opcode)], cpu_crf[crfD(ctx->opcode)],   \
> -                    CRF_CH | CRF_CH_OR_CL);                                   \
> -    gen_set_label(l4);                                                        \
> +    tcg_gen_setcond_tl(tcg_cond, tmp,                                         \
> +                       cpu_gpr[rA(ctx->opcode)],                              \
> +                       cpu_gpr[rB(ctx->opcode)]);                             \
> +    tcg_gen_trunc_tl_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL], tmp);        \
> +    tcg_gen_setcond_tl(tcg_cond, tmp,                                         \
> +                       cpu_gprh[rA(ctx->opcode)],                             \
> +                       cpu_gprh[rB(ctx->opcode)]);                            \
> +    tcg_gen_trunc_tl_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH], tmp);        \
> +    tcg_gen_or_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_OR_CL],              \
> +                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                    \
> +                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                   \
> +    tcg_gen_and_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_AND_CL],            \
> +                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                   \
> +                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                  \
>  }
>  GEN_SPEOP_COMP(evcmpgtu, TCG_COND_GTU);
>  GEN_SPEOP_COMP(evcmpgts, TCG_COND_GT);
> @@ -8769,22 +8759,20 @@ static inline void gen_evsel(DisasContext *ctx)
>      int l2 = gen_new_label();
>      int l3 = gen_new_label();
>      int l4 = gen_new_label();
> -    TCGv_i32 t0 = tcg_temp_local_new_i32();
> -    tcg_gen_andi_i32(t0, cpu_crf[ctx->opcode & 0x07], 1 << 3);
> -    tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l1);
> +
> +    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cr[(ctx->opcode & 0x07) * 4], 0, l1);
>      tcg_gen_mov_tl(cpu_gprh[rD(ctx->opcode)], cpu_gprh[rA(ctx->opcode)]);
>      tcg_gen_br(l2);
>      gen_set_label(l1);
>      tcg_gen_mov_tl(cpu_gprh[rD(ctx->opcode)], cpu_gprh[rB(ctx->opcode)]);
>      gen_set_label(l2);
> -    tcg_gen_andi_i32(t0, cpu_crf[ctx->opcode & 0x07], 1 << 2);
> -    tcg_gen_brcondi_i32(TCG_COND_EQ, t0, 0, l3);
> +
> +    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cr[(ctx->opcode & 0x07) * 4 + 1], 0, l3);
>      tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
>      tcg_gen_br(l4);
>      gen_set_label(l3);
>      tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
>      gen_set_label(l4);
> -    tcg_temp_free_i32(t0);
>  }
>  
>  static void gen_evsel0(DisasContext *ctx)
> @@ -9366,9 +9354,12 @@ static inline void gen_##name(DisasContext *ctx)                              \
>      t0 = tcg_temp_new_i32();                                                  \
>      t1 = tcg_temp_new_i32();                                                  \
>                                                                                \
> +    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_LT], 0);              \
> +    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_GT], 0);              \
> +    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_SO], 0);              \
>      tcg_gen_trunc_tl_i32(t0, cpu_gpr[rA(ctx->opcode)]);                       \
>      tcg_gen_trunc_tl_i32(t1, cpu_gpr[rB(ctx->opcode)]);                       \
> -    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], cpu_env, t0, t1);           \
> +    gen_helper_##name(cpu_cr[crfD(ctx->opcode) * 4 + CRF_EQ], cpu_env, t0, t1); \
>                                                                                \
>      tcg_temp_free_i32(t0);                                                    \
>      tcg_temp_free_i32(t1);                                                    \
> @@ -9385,10 +9376,32 @@ static inline void gen_##name(DisasContext *ctx)                              \
>      t1 = tcg_temp_new_i64();                                                  \
>      gen_load_gpr64(t0, rA(ctx->opcode));                                      \
>      gen_load_gpr64(t1, rB(ctx->opcode));                                      \
> -    gen_helper_##name(cpu_crf[crfD(ctx->opcode)], cpu_env, t0, t1);           \
> +    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_LT], 0);              \
> +    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_GT], 0);              \
> +    tcg_gen_movi_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_SO], 0);              \
> +    gen_helper_##name(cpu_cr[crfD(ctx->opcode) * 4 + CRF_EQ], cpu_env,        \
> +                      t0, t1);                                                \
>      tcg_temp_free_i64(t0);                                                    \
>      tcg_temp_free_i64(t1);                                                    \
>  }
> +#define GEN_SPEFPUOP_COMP_V64(name, helper)                                   \
> +static inline void gen_##name(DisasContext *ctx)                              \
> +{                                                                             \
> +    if (unlikely(!ctx->spe_enabled)) {                                        \
> +        gen_exception(ctx, POWERPC_EXCP_SPEU);                                \
> +        return;                                                               \
> +    }                                                                         \
> +    gen_helper_##helper(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL], cpu_env,      \
> +                        cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);  \
> +    gen_helper_##helper(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH], cpu_env,      \
> +                        cpu_gprh[rA(ctx->opcode)], cpu_gprh[rB(ctx->opcode)]);\

This doesn't compile for 64 bit targets because the helpers declare i32 types for the GPR arguments.

> +    tcg_gen_or_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_OR_CL],              \
> +                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                    \
> +                   cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                   \
> +    tcg_gen_and_i32(cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH_AND_CL],            \
> +                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CH],                   \
> +                    cpu_cr[crfD(ctx->opcode) * 4 + CRF_CL]);                  \
> +}
>  
>  /* Single precision floating-point vectors operations */
>  /* Arithmetic */
> @@ -9443,12 +9456,12 @@ GEN_SPEFPUOP_CONV_64_64(evfsctuiz);
>  GEN_SPEFPUOP_CONV_64_64(evfsctsiz);
>  
>  /* Comparison */
> -GEN_SPEFPUOP_COMP_64(evfscmpgt);
> -GEN_SPEFPUOP_COMP_64(evfscmplt);
> -GEN_SPEFPUOP_COMP_64(evfscmpeq);
> -GEN_SPEFPUOP_COMP_64(evfststgt);
> -GEN_SPEFPUOP_COMP_64(evfststlt);
> -GEN_SPEFPUOP_COMP_64(evfststeq);
> +GEN_SPEFPUOP_COMP_V64(evfscmpgt, efscmpgt);
> +GEN_SPEFPUOP_COMP_V64(evfscmplt, efscmplt);
> +GEN_SPEFPUOP_COMP_V64(evfscmpeq, efscmpeq);
> +GEN_SPEFPUOP_COMP_V64(evfststgt, efststgt);
> +GEN_SPEFPUOP_COMP_V64(evfststlt, efststlt);
> +GEN_SPEFPUOP_COMP_V64(evfststeq, efststeq);
>  
>  /* Opcodes definitions */
>  GEN_SPE(evfsadd,   evfssub,   0x00, 0x0A, 0x00000000, 0x00000000, PPC_SPE_SINGLE); //
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-08-28 19:35     ` Paolo Bonzini
@ 2014-09-05  6:00       ` David Gibson
  0 siblings, 0 replies; 50+ messages in thread
From: David Gibson @ 2014-09-05  6:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tom Musta, Peter Maydell, qemu-ppc, QEMU Developers, dgibson

[-- Attachment #1: Type: text/plain, Size: 2450 bytes --]

On Thu, Aug 28, 2014 at 09:35:27PM +0200, Paolo Bonzini wrote:
> Il 28/08/2014 19:30, Peter Maydell ha scritto:
> > On 28 August 2014 18:14, Paolo Bonzini <pbonzini@redhat.com>
> wrote:
[snip]
> > Does PPC hardware do lots of TLB flushes on user-kernel
> > transitions, or does it have some sort of info in the TLB
> > entry about whether it should match or not?
> 
> The IR and DR bits simply disable paging for respectively instructions
> and data.  I suppose real hardware simply does not use the TLB when
> paging is disabled.

That's right for the most part, although IR and DR transitions are
still pretty horribly slow, due to the various synchronizations that
need to happen.

There are some other complications though.  At least POWER7 and POWER8
supports a "virtual real mode area" (VRMA, which is where a guest OS
sees itself as having translation off, but in fact translations
(managed by the hypervisor) are still in use.  In the (hashed) page
table the VRMA co-exists with the guest's normal translations.  I'm
not up to date with the TLB architecture and how it's impacted.  Note
that POWER chips (since POWER4?) have both the ERAT (which is what
most cpus would think of as fast but smallish TLB) and the TLB which
is a sort of L1.5 cache, which needs to be combined with the SLB to
form full translations.  I strongly suspect the ERAT does need to be
flushed on real-mode/virtual-mode transitions, but I'm less sure about
the SLB and TLB.

As yet another case, BookE (embedded) powerpc CPUs have IS/DS bits
instead of IR/DR, reflecting that instead of a true "translation off"
mode, they have two different "address spaces".  In this case,
however, the address space is tagged into the TLB, so the whole thing
doesn't need to be flushed on address space transitions.

> IIRC each user->kernel transition disables paging, and then the kernel
> can re-enable it (optionally only on data).  So the transition is
> user->kernel unpaged->kernel paged, and the kernel unpaged->kernel paged
> part is what triggers the TLB flush.  (Something like this---Alex
> explained it to me a year ago when I asked why tlb_flush was always the
> top function in the profile of qemu-system-ppc*).
> 
> Paolo
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
  2014-08-28 17:30   ` Peter Maydell
@ 2014-09-05  7:10   ` Alexander Graf
  2014-09-05 12:11     ` Paolo Bonzini
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-09-05  7:10 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc, tommusta



On 28.08.14 19:14, Paolo Bonzini wrote:
> PowerPC TCG flushes the TLB on every IR/DR change, which basically
> means on every user<->kernel context switch.  Use the 6-element
> TLB array as a cache, where each MMU index is mapped to a different
> state of the IR/DR/PR/HV bits.
> 
> This brings the number of TLB flushes down from ~900000 to ~50000
> for starting up the Debian installer, which is in line with x86
> and gives a ~10% performance improvement.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cputlb.c                    | 19 +++++++++++++++++
>  hw/ppc/spapr_hcall.c        |  6 +++++-
>  include/exec/exec-all.h     |  5 +++++
>  target-ppc/cpu.h            |  4 +++-
>  target-ppc/excp_helper.c    |  6 +-----
>  target-ppc/helper_regs.h    | 52 +++++++++++++++++++++++++++++++--------------
>  target-ppc/translate_init.c |  5 +++++
>  7 files changed, 74 insertions(+), 23 deletions(-)
> 
> diff --git a/cputlb.c b/cputlb.c
> index afd3705..17e1b03 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>      tlb_flush_count++;
>  }
>  
> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
> +{
> +    CPUArchState *env = cpu->env_ptr;
> +
> +#if defined(DEBUG_TLB)
> +    printf("tlb_flush_idx %d:\n", mmu_idx);
> +#endif
> +    /* must reset current TB so that interrupts cannot modify the
> +       links while we are modifying them */
> +    cpu->current_tb = NULL;
> +
> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> +
> +    env->tlb_flush_addr = -1;
> +    env->tlb_flush_mask = 0;
> +    tlb_flush_count++;
> +}
> +
>  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
>  {
>      if (addr == (tlb_entry->addr_read &
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 467858c..b95961c 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>  {
>      CPUPPCState *env = &cpu->env;
>      CPUState *cs = CPU(cpu);
> +    bool flush;
>  
>      env->msr |= (1ULL << MSR_EE);
> -    hreg_compute_hflags(env);
> +    flush = hreg_compute_hflags(env);
>      if (!cpu_has_work(cs)) {
>          cs->halted = 1;
>          cs->exception_index = EXCP_HLT;
>          cs->exit_request = 1;
> +    } else if (flush) {
> +        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +        cs->exit_request = 1;

Can this ever happen?

>      }
>      return H_SUCCESS;
>  }
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 5e5d86e..629a550 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -100,6 +100,7 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as);
>  /* cputlb.c */
>  void tlb_flush_page(CPUState *cpu, target_ulong addr);
>  void tlb_flush(CPUState *cpu, int flush_global);
> +void tlb_flush_idx(CPUState *cpu, int mmu_idx);
>  void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>                    hwaddr paddr, int prot,
>                    int mmu_idx, target_ulong size);
> @@ -112,6 +113,10 @@ static inline void tlb_flush_page(CPUState *cpu, target_ulong addr)
>  static inline void tlb_flush(CPUState *cpu, int flush_global)
>  {
>  }
> +
> +static inline void tlb_flush_idx(CPUState *cpu, int mmu_idx)
> +{
> +}
>  #endif
>  
>  #define CODE_GEN_ALIGN           16 /* must be >= of the size of a icache line */
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index b64c652..c1cb27f 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -922,7 +922,7 @@ struct ppc_segment_page_sizes {
>  
>  /*****************************************************************************/
>  /* The whole PowerPC CPU context */
> -#define NB_MMU_MODES 3
> +#define NB_MMU_MODES 6
>  
>  #define PPC_CPU_OPCODES_LEN 0x40
>  
> @@ -1085,6 +1085,8 @@ struct CPUPPCState {
>      target_ulong hflags;      /* hflags is a MSR & HFLAGS_MASK         */
>      target_ulong hflags_nmsr; /* specific hflags, not coming from MSR */
>      int mmu_idx;         /* precomputed MMU index to speed up mem accesses */
> +    uint32_t mmu_msr[NB_MMU_MODES];  /* ir/dr/hv/pr values for TLBs */
> +    int mmu_fifo;  /* for replacement in mmu_msr */
>  
>      /* Power management */
>      int (*check_pow)(CPUPPCState *env);
> diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
> index be71590..bf25d44 100644
> --- a/target-ppc/excp_helper.c
> +++ b/target-ppc/excp_helper.c
> @@ -623,9 +623,6 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>  
>      if (env->spr[SPR_LPCR] & LPCR_AIL) {
>          new_msr |= (1 << MSR_IR) | (1 << MSR_DR);
> -    } else if (msr & ((1 << MSR_IR) | (1 << MSR_DR))) {
> -        /* If we disactivated any translation, flush TLBs */
> -        tlb_flush(cs, 1);
>      }
>  
>  #ifdef TARGET_PPC64
> @@ -678,8 +675,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>      if ((env->mmu_model == POWERPC_MMU_BOOKE) ||
>          (env->mmu_model == POWERPC_MMU_BOOKE206)) {
>          /* XXX: The BookE changes address space when switching modes,
> -                we should probably implement that as different MMU indexes,
> -                but for the moment we do it the slow way and flush all.  */
> +                TODO: still needed?!?  */
>          tlb_flush(cs, 1);
>      }
>  }
> diff --git a/target-ppc/helper_regs.h b/target-ppc/helper_regs.h
> index 271fddf..291f9c1 100644
> --- a/target-ppc/helper_regs.h
> +++ b/target-ppc/helper_regs.h
> @@ -39,17 +39,38 @@ static inline void hreg_swap_gpr_tgpr(CPUPPCState *env)
>      env->tgpr[3] = tmp;
>  }
>  
> -static inline void hreg_compute_mem_idx(CPUPPCState *env)
> +static inline bool hreg_compute_mem_idx(CPUPPCState *env)
>  {
> -    /* Precompute MMU index */
> -    if (msr_pr == 0 && msr_hv != 0) {
> -        env->mmu_idx = 2;
> -    } else {
> -        env->mmu_idx = 1 - msr_pr;
> +    CPUState *cs = CPU(ppc_env_get_cpu(env));
> +    int msr = env->msr;
> +    int i;
> +
> +    if (!tcg_enabled()) {
> +        return false;
> +    }
> +
> +    msr &= (1 << MSR_IR) | (1 << MSR_DR) | (1 << MSR_PR) | MSR_HVB;
> +    if (msr_pr == 1) {
> +        msr &= ~MSR_HVB;
>      }
> +
> +    for (i = 0; i < NB_MMU_MODES; i++) {
> +        if (env->mmu_msr[i] == msr) {
> +            env->mmu_idx = i;
> +            return false;
> +        }
> +    }
> +
> +    /* Use a new index with FIFO replacement.  */
> +    i = (env->mmu_fifo == NB_MMU_MODES - 1 ? 0 : env->mmu_fifo + 1);
> +    env->mmu_fifo = i;
> +    env->mmu_msr[i] = msr;
> +    env->mmu_idx = i;
> +    tlb_flush_idx(cs, i);
> +    return true;
>  }

Ok, so this basically changes the semantics of mmu_idx from a static
array with predefined meanings to a dynamic array with runtime changing
semantics.

The first thing that comes to mind here is why we're not just extending
the existing array? After all, we have 4 bits -> 16 states minus one for
PR+HV. Can our existing logic not deal with this?

Second thing I'm failing to grasp still is that in the previous patch
you're changing ctx.mem_idx into to different static semantics. But that
mem_idx gets passed to our ld/st helpers which again boils down to the
mem_idx above, no? So aren't we accessing random unrelated mmu contexts now?

There's a good chance I'm not fully grasping something here :).


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 03/17] ppc: fix monitor access to CR
  2014-09-03 18:21   ` Tom Musta
@ 2014-09-05  7:10     ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-09-05  7:10 UTC (permalink / raw)
  To: Tom Musta, Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc



On 03.09.14 20:21, Tom Musta wrote:
> On 8/28/2014 12:14 PM, Paolo Bonzini wrote:
>> This was off-by-one.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  monitor.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index 34cee74..ec73dd4 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -2968,7 +2968,7 @@ static target_long monitor_get_ccr (const struct MonitorDef *md, int val)
>>  
>>      u = 0;
>>      for (i = 0; i < 8; i++)
>> -        u |= env->crf[i] << (32 - (4 * i));
>> +        u |= env->crf[i] << (32 - (4 * (i + 1)));
>>  
>>      return u;
>>  }
>>
> 
> Reviewed-by: Tom Musta <tommusta@gmail.com>
> 

Thanks, applied to ppc-next.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 06/17] ppc: use CRF_* in int_helper.c
  2014-09-03 18:28   ` Tom Musta
@ 2014-09-05  7:12     ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-09-05  7:12 UTC (permalink / raw)
  To: Tom Musta, Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc



On 03.09.14 20:28, Tom Musta wrote:
> On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  target-ppc/int_helper.c | 12 ++++++------
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
>> index f6e8846..9c1c5cd 100644
>> --- a/target-ppc/int_helper.c
>> +++ b/target-ppc/int_helper.c
>> @@ -2303,25 +2303,25 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
>>          if (sgna == sgnb) {
>>              result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgna, ps);
>>              zero = bcd_add_mag(&result, a, b, &invalid, &overflow);
>> -            cr = (sgna > 0) ? 4 : 8;
>> +            cr = (sgna > 0) ? 1 << CRF_GT : 1 << CRF_LT;
>>          } else if (bcd_cmp_mag(a, b) > 0) {
>>              result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgna, ps);
>>              zero = bcd_sub_mag(&result, a, b, &invalid, &overflow);
>> -            cr = (sgna > 0) ? 4 : 8;
>> +            cr = (sgna > 0) ? 1 << CRF_GT : 1 << CRF_LT;
>>          } else {
>>              result.u8[BCD_DIG_BYTE(0)] = bcd_preferred_sgn(sgnb, ps);
>>              zero = bcd_sub_mag(&result, b, a, &invalid, &overflow);
>> -            cr = (sgnb > 0) ? 4 : 8;
>> +            cr = (sgnb > 0) ? 1 << CRF_GT : 1 << CRF_LT;
>>          }
>>      }
>>  
>>      if (unlikely(invalid)) {
>>          result.u64[HI_IDX] = result.u64[LO_IDX] = -1;
>> -        cr = 1;
>> +        cr = 1 << CRF_SO;
>>      } else if (overflow) {
>> -        cr |= 1;
>> +        cr |= 1 << CRF_SO;
>>      } else if (zero) {
>> -        cr = 2;
>> +        cr = 1 << CRF_EQ;
>>      }
>>  
>>      *r = result;
>>
> 
> Reviewed-by: Tom Musta <tommusta@gmail.com>
> Tested-by: Tom Musta <tommusta@gmail.com>

Thanks, applied to ppc-next.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found
  2014-09-03 18:28   ` Tom Musta
@ 2014-09-05  7:26     ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-09-05  7:26 UTC (permalink / raw)
  To: Tom Musta, Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc



On 03.09.14 20:28, Tom Musta wrote:
> On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
>> It must return 8 and place 8 in XER, but the current code uses
>> i directly which is 9 at this point of the code.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  target-ppc/int_helper.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
>> index 9c1c5cd..7955bf7 100644
>> --- a/target-ppc/int_helper.c
>> +++ b/target-ppc/int_helper.c
>> @@ -2573,6 +2573,7 @@ target_ulong helper_dlmzb(CPUPPCState *env, target_ulong high,
>>          }
>>          i++;
>>      }
>> +    i = 8;
>>      if (update_Rc) {
>>          env->crf[0] = 0x2;
>>      }
>>
> 
> Reviewed-by: Tom Musta <tommusta@gmail.com>

Thanks, applied to ppc-next.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr
  2014-09-03 19:41   ` Tom Musta
@ 2014-09-05  7:27     ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-09-05  7:27 UTC (permalink / raw)
  To: Tom Musta, Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc



On 03.09.14 21:41, Tom Musta wrote:
> On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
>> It sets CR1, not CR6 (and the spec agrees).
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  target-ppc/translate.c | 14 +++++++-------
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index 8def0ae..67f13f7 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -8179,7 +8179,7 @@ static inline TCGv_ptr gen_fprp_ptr(int reg)
>>  }
>>  
>>  #if defined(TARGET_PPC64)
>> -static void gen_set_cr6_from_fpscr(DisasContext *ctx)
>> +static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>>  {
>>      TCGv_i32 tmp = tcg_temp_new_i32();
>>      tcg_gen_trunc_tl_i32(tmp, cpu_fpscr);
>> @@ -8187,7 +8187,7 @@ static void gen_set_cr6_from_fpscr(DisasContext *ctx)
>>      tcg_temp_free_i32(tmp);
>>  }
>>  #else
>> -static void gen_set_cr6_from_fpscr(DisasContext *ctx)
>> +static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>>  {
>>      gen_op_mtcr(4, cpu_fpscr, 28);
>>  }
>> @@ -8207,7 +8207,7 @@ static void gen_##name(DisasContext *ctx)        \
>>      rb = gen_fprp_ptr(rB(ctx->opcode));          \
>>      gen_helper_##name(cpu_env, rd, ra, rb);      \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {        \
>> -        gen_set_cr6_from_fpscr(ctx);             \
>> +        gen_set_cr1_from_fpscr(ctx);             \
>>      }                                            \
>>      tcg_temp_free_ptr(rd);                       \
>>      tcg_temp_free_ptr(ra);                       \
>> @@ -8265,7 +8265,7 @@ static void gen_##name(DisasContext *ctx)             \
>>      u32_2 = tcg_const_i32(u32f2(ctx->opcode));        \
>>      gen_helper_##name(cpu_env, rt, rb, u32_1, u32_2); \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {             \
>> -        gen_set_cr6_from_fpscr(ctx);                  \
>> +        gen_set_cr1_from_fpscr(ctx);                  \
>>      }                                                 \
>>      tcg_temp_free_ptr(rt);                            \
>>      tcg_temp_free_ptr(rb);                            \
>> @@ -8289,7 +8289,7 @@ static void gen_##name(DisasContext *ctx)        \
>>      i32 = tcg_const_i32(i32fld(ctx->opcode));    \
>>      gen_helper_##name(cpu_env, rt, ra, rb, i32); \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {        \
>> -        gen_set_cr6_from_fpscr(ctx);             \
>> +        gen_set_cr1_from_fpscr(ctx);             \
>>      }                                            \
>>      tcg_temp_free_ptr(rt);                       \
>>      tcg_temp_free_ptr(rb);                       \
>> @@ -8310,7 +8310,7 @@ static void gen_##name(DisasContext *ctx)        \
>>      rb = gen_fprp_ptr(rB(ctx->opcode));          \
>>      gen_helper_##name(cpu_env, rt, rb);          \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {        \
>> -        gen_set_cr6_from_fpscr(ctx);             \
>> +        gen_set_cr1_from_fpscr(ctx);             \
>>      }                                            \
>>      tcg_temp_free_ptr(rt);                       \
>>      tcg_temp_free_ptr(rb);                       \
>> @@ -8331,7 +8331,7 @@ static void gen_##name(DisasContext *ctx)          \
>>      i32 = tcg_const_i32(i32fld(ctx->opcode));      \
>>      gen_helper_##name(cpu_env, rt, rs, i32);       \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {          \
>> -        gen_set_cr6_from_fpscr(ctx);               \
>> +        gen_set_cr1_from_fpscr(ctx);               \
>>      }                                              \
>>      tcg_temp_free_ptr(rt);                         \
>>      tcg_temp_free_ptr(rs);                         \
>>
> 
> Reviewed-by: Tom Musta <tommusta@gmail.com>
> Tested-by: Tom Musta <tommusta@gmail.com>

Thanks, applied to ppc-next.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 13/17] ppc: compute mask from BI using right shift
  2014-09-03 20:59   ` Tom Musta
@ 2014-09-05  7:29     ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-09-05  7:29 UTC (permalink / raw)
  To: Tom Musta, Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc



On 03.09.14 22:59, Tom Musta wrote:
> On 8/28/2014 12:15 PM, Paolo Bonzini wrote:
>> This will match the code we use in fpu_helper.c when we flip
>> CRF_* bit-endianness.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  target-ppc/translate.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index 48c7b66..4ce7af4 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -794,7 +794,7 @@ static void gen_isel(DisasContext *ctx)
>>      TCGv_i32 t0;
>>      TCGv t1, true_op, zero;
>>  
>> -    mask = 1 << (3 - (bi & 0x03));
>> +    mask = 0x08 >> (bi & 0x03);
>>      t0 = tcg_temp_new_i32();
>>      tcg_gen_andi_i32(t0, cpu_crf[bi >> 2], mask);
>>      t1 = tcg_temp_new();
>> @@ -3870,7 +3870,7 @@ static inline void gen_bcond(DisasContext *ctx, int type)
>>      if ((bo & 0x10) == 0) {
>>          /* Test CR */
>>          uint32_t bi = BI(ctx->opcode);
>> -        uint32_t mask = 1 << (3 - (bi & 0x03));
>> +        uint32_t mask = 0x08 >> (bi & 0x03);
>>          TCGv_i32 temp = tcg_temp_new_i32();
>>  
>>          if (bo & 0x8) {
>> @@ -3952,7 +3952,7 @@ static void glue(gen_, name)(DisasContext *ctx)
>>      else                                                                      \
>>          tcg_gen_mov_i32(t1, cpu_crf[crbB(ctx->opcode) >> 2]);                 \
>>      tcg_op(t0, t0, t1);                                                       \
>> -    bitmask = 1 << (3 - (crbD(ctx->opcode) & 0x03));                          \
>> +    bitmask = 0x08 >> (crbD(ctx->opcode) & 0x03);                             \
>>      tcg_gen_andi_i32(t0, t0, bitmask);                                        \
>>      tcg_gen_andi_i32(t1, cpu_crf[crbD(ctx->opcode) >> 2], ~bitmask);          \
>>      tcg_gen_or_i32(cpu_crf[crbD(ctx->opcode) >> 2], t0, t1);                  \
>>
> 
> Reviewed-by: Tom Musta <tommusta@gmail.com>
> Tested-by: Tom Musta <tommusta@gmail.com>

Thanks, applied to ppc-next.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-09-05  7:10   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
@ 2014-09-05 12:11     ` Paolo Bonzini
  2014-09-09 16:42       ` Paolo Bonzini
  0 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-09-05 12:11 UTC (permalink / raw)
  To: Alexander Graf; +Cc: dgibson, qemu-ppc, qemu-devel, tommusta



----- Messaggio originale -----
> Da: "Alexander Graf" <agraf@suse.de>
> A: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
> Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, tommusta@gmail.com
> Inviato: Venerdì, 5 settembre 2014 9:10:01
> Oggetto: Re: [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
> 
> 
> 
> On 28.08.14 19:14, Paolo Bonzini wrote:
> > PowerPC TCG flushes the TLB on every IR/DR change, which basically
> > means on every user<->kernel context switch.  Use the 6-element
> > TLB array as a cache, where each MMU index is mapped to a different
> > state of the IR/DR/PR/HV bits.
> > 
> > This brings the number of TLB flushes down from ~900000 to ~50000
> > for starting up the Debian installer, which is in line with x86
> > and gives a ~10% performance improvement.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  cputlb.c                    | 19 +++++++++++++++++
> >  hw/ppc/spapr_hcall.c        |  6 +++++-
> >  include/exec/exec-all.h     |  5 +++++
> >  target-ppc/cpu.h            |  4 +++-
> >  target-ppc/excp_helper.c    |  6 +-----
> >  target-ppc/helper_regs.h    | 52
> >  +++++++++++++++++++++++++++++++--------------
> >  target-ppc/translate_init.c |  5 +++++
> >  7 files changed, 74 insertions(+), 23 deletions(-)
> > 
> > diff --git a/cputlb.c b/cputlb.c
> > index afd3705..17e1b03 100644
> > --- a/cputlb.c
> > +++ b/cputlb.c
> > @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
> >      tlb_flush_count++;
> >  }
> >  
> > +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
> > +{
> > +    CPUArchState *env = cpu->env_ptr;
> > +
> > +#if defined(DEBUG_TLB)
> > +    printf("tlb_flush_idx %d:\n", mmu_idx);
> > +#endif
> > +    /* must reset current TB so that interrupts cannot modify the
> > +       links while we are modifying them */
> > +    cpu->current_tb = NULL;
> > +
> > +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
> > +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> > +
> > +    env->tlb_flush_addr = -1;
> > +    env->tlb_flush_mask = 0;
> > +    tlb_flush_count++;
> > +}
> > +
> >  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong
> >  addr)
> >  {
> >      if (addr == (tlb_entry->addr_read &
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index 467858c..b95961c 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu,
> > sPAPREnvironment *spapr,
> >  {
> >      CPUPPCState *env = &cpu->env;
> >      CPUState *cs = CPU(cpu);
> > +    bool flush;
> >  
> >      env->msr |= (1ULL << MSR_EE);
> > -    hreg_compute_hflags(env);
> > +    flush = hreg_compute_hflags(env);
> >      if (!cpu_has_work(cs)) {
> >          cs->halted = 1;
> >          cs->exception_index = EXCP_HLT;
> >          cs->exit_request = 1;
> > +    } else if (flush) {
> > +        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> > +        cs->exit_request = 1;
> 
> Can this ever happen?

No, I think it can't.

> Ok, so this basically changes the semantics of mmu_idx from a static
> array with predefined meanings to a dynamic array with runtime changing
> semantics.
> 
> The first thing that comes to mind here is why we're not just extending
> the existing array? After all, we have 4 bits -> 16 states minus one for
> PR+HV. Can our existing logic not deal with this?

Yeah, that would require 12 MMU indices.  Right now, include/exec/cpu_ldst.h
only supports 6 but that's easy to extend.

tlb_flush becomes progressively more expensive as you add more MMU modes,
but it may work.  This patch removes 98.8% of the TLB flushes, makes the
remaining ones twice as slow (NB_MMU_MODES goes from 3 to 6), and speeds
up QEMU by 10%.  You can solve this:

    0.9 = 0.988 * 0 + 0.012 * tlb_time * 2 + (1 - tlb_time) * 1
    tlb_time = 0.1 / 0.98 = 0.102

to compute that the time spent in TLB flushes before the patch is 10.2% of the
whole emulation time.

Doubling the NB_MMU_MODES further from 6 to 12 would still save 98.8% of the TLB
flushes, while making the remaining ones even more expensive.  The savings will be
smaller, but actually not by much:

    0.988 * 0 + 0.012 * tlb_time * 4 + (1 - tlb_time) * 1 = 0.903

i.e. what you propose would still save 9.7%.  Still, having 12 modes seemed like a
waste, since only 4 or 5 are used in practice...

On top of this patch it is possible to do another optimization: instead of
doing a full flush, tlb_flush could clear the TLB for the current index only
and invalidate the mapping.  The TLBs for other indices will be invalidated
lazily as they are populated again.  This would cut the cost of the TLB
flushes further, though the above math suggests that the actual speedup
will likely be smallish.

> Second thing I'm failing to grasp still is that in the previous patch
> you're changing ctx.mem_idx into to different static semantics. But that
> mem_idx gets passed to our ld/st helpers which again boils down to the
> mem_idx above, no? So aren't we accessing random unrelated mmu contexts now?

Yeah, that's br0ken.  In most cases mem_idx is used to check for privilege,
so we'd need to split the field in two (mem_idx and priv_level).

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers
  2014-09-04 18:27   ` Tom Musta
@ 2014-09-09 15:44     ` Paolo Bonzini
  2014-09-09 16:41       ` Paolo Bonzini
  2014-09-09 16:03     ` Richard Henderson
  1 sibling, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-09-09 15:44 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: dgibson, qemu-ppc, Richard Henderson

Il 04/09/2014 20:27, Tom Musta ha scritto:
>> > -    tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
>> > +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_SO], cpu_so);
> This looks correct to me but is causing problems.  The above statement seems to get dropped in the generated asm ... at least on a PPC host:
> 
> IN:
> 0x00000000100005b4:  cmpw    cr3,r30,r29
> 
> OUT: [size=160]
> 0x6041ad30:  lwz     r14,-4(r27)
> 0x6041ad34:  cmpwi   cr7,r14,0
> 0x6041ad38:  bne-    cr7,0x6041adbc
> 0x6041ad3c:  ld      r14,240(r27)   <<< r30
> 0x6041ad40:  ld      r15,232(r27)   <<< r31
> 0x6041ad44:  cmpw    cr7,r14,r15    <<< this is the TCG_COND_LTx code
> 0x6041ad48:  li      r16,1
> 0x6041ad4c:  li      r0,0
> 0x6041ad50:  isel    r16,r16,r0,28
> 0x6041ad54:  stw     r16,576(r27)   <<< store cpu_cr[LT]
> 0x6041ad58:  cmpw    cr7,r14,r15
> 0x6041ad5c:  li      r16,1
> 0x6041ad60:  li      r0,0
> 0x6041ad64:  isel    r16,r16,r0,29
> 0x6041ad68:  stw     r16,580(r27)   <<< store cpu_cr[GT]
> 0x6041ad6c:  cmplw   cr7,r14,r15
> 0x6041ad70:  li      r14,1
> 0x6041ad74:  li      r0,0
> 0x6041ad78:  isel    r14,r14,r0,30
> 0x6041ad7c:  stw     r14,584(r27)   <<< store cpu_cr[EQ]
> 0x6041ad80:  .long 0x0
> 0x6041ad84:  .long 0x0

If this is 32-bit, the problem is simply that the trunc is missing in
gen_op_cmp32.  I still see a bunch of failures with the patches though,
I'll look into them as I have time.

Thanks for the other remarks.

Paolo

> 
> Richard:  any ideas or hints on how to proceed?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers
  2014-09-04 18:27   ` Tom Musta
  2014-09-09 15:44     ` Paolo Bonzini
@ 2014-09-09 16:03     ` Richard Henderson
  2014-09-09 16:26       ` Paolo Bonzini
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Henderson @ 2014-09-09 16:03 UTC (permalink / raw)
  To: Tom Musta, Paolo Bonzini, qemu-devel; +Cc: dgibson, qemu-ppc

On 09/04/2014 11:27 AM, Tom Musta wrote:
>> -    tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
>> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_SO], cpu_so);
> 
> This looks correct to me but is causing problems.  The above statement seems to get dropped in the generated asm ... at least on a PPC host:
> 
> IN:
> 0x00000000100005b4:  cmpw    cr3,r30,r29
> 
> OUT: [size=160]
> 0x6041ad30:  lwz     r14,-4(r27)
> 0x6041ad34:  cmpwi   cr7,r14,0
> 0x6041ad38:  bne-    cr7,0x6041adbc
> 0x6041ad3c:  ld      r14,240(r27)   <<< r30
> 0x6041ad40:  ld      r15,232(r27)   <<< r31
> 0x6041ad44:  cmpw    cr7,r14,r15    <<< this is the TCG_COND_LTx code
> 0x6041ad48:  li      r16,1
> 0x6041ad4c:  li      r0,0
> 0x6041ad50:  isel    r16,r16,r0,28
> 0x6041ad54:  stw     r16,576(r27)   <<< store cpu_cr[LT]
> 0x6041ad58:  cmpw    cr7,r14,r15
> 0x6041ad5c:  li      r16,1
> 0x6041ad60:  li      r0,0
> 0x6041ad64:  isel    r16,r16,r0,29
> 0x6041ad68:  stw     r16,580(r27)   <<< store cpu_cr[GT]
> 0x6041ad6c:  cmplw   cr7,r14,r15
> 0x6041ad70:  li      r14,1
> 0x6041ad74:  li      r0,0
> 0x6041ad78:  isel    r14,r14,r0,30
> 0x6041ad7c:  stw     r14,584(r27)   <<< store cpu_cr[EQ]
> 0x6041ad80:  .long 0x0
> 0x6041ad84:  .long 0x0
> 
> Richard:  any ideas or hints on how to proceed?

Check the op dumps and make sure it's there.  If it is, but is getting
discarded somewhere further down the pipeline, then try and get me a testcase.


> This is a very nice cleanup ... but it oversteers just a little.  For some CR logical instructions, the generated code can produce non-zero bits in the i32 cr variable in places other than the LSB.
> For example, consider crnand, which produces the following on a PPC host:
> 
> IN:
> 0x0000000010000578:  crnand  4*cr7+so,4*cr7+lt,4*cr7+eq
> 
> OUT: [size=112]
> 0x6041a630:  lwz     r14,-4(r27)
> 0x6041a634:  cmpwi   cr7,r14,0
> 0x6041a638:  bne-    cr7,0x6041a68c
> 0x6041a63c:  lwz     r14,640(r27)
> 0x6041a640:  lwz     r15,648(r27)
> 0x6041a644:  nand    r14,r14,r15
> 0x6041a648:  andi.   r14,r14,1
> 0x6041a64c:  stw     r14,652(r27)
> 0x6041a650:  .long 0x0
> 0x6041a654:  .long 0x0
> 0x6041a658:  .long 0x0
> 0x6041a65c:  .long 0x0
> 
> The host nand operation will always produce an i32 value that has 1s in bits 0-30, since they are presumably zero.  A brute-force fix would be to add a tcg_gen_andi_i32(D,D,1) to your macro.  But I think this is required only for a subset of the
> instructions (crnand, crnor, creqv, crorc).

Note that since most hosts don't have nand, the combination

  nand x,y,z
  and  x.x,1

would be better represented with

  and  x,y,z
  xor  x,x,1


r~

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers
  2014-09-09 16:03     ` Richard Henderson
@ 2014-09-09 16:26       ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-09-09 16:26 UTC (permalink / raw)
  To: Richard Henderson, Tom Musta, qemu-devel; +Cc: dgibson, qemu-ppc

Il 09/09/2014 18:03, Richard Henderson ha scritto:
> Note that since most hosts don't have nand, the combination
> 
>   nand x,y,z
>   and  x.x,1
> 
> would be better represented with
> 
>   and  x,y,z
>   xor  x,x,1

True (and even for crorc a,b,c you can change it to crandc a,c,b
followed by xor).  But this is quite a borderline case.  You'll find in
practice only "creqv a,a,a", which the optimizer can handle fine.

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers
  2014-09-09 15:44     ` Paolo Bonzini
@ 2014-09-09 16:41       ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-09-09 16:41 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: dgibson, qemu-ppc, Richard Henderson

Il 09/09/2014 17:44, Paolo Bonzini ha scritto:
> Il 04/09/2014 20:27, Tom Musta ha scritto:
>>>> -    tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
>>>> +    tcg_gen_trunc_tl_i32(cpu_cr[crf * 4 + CRF_SO], cpu_so);
>> This looks correct to me but is causing problems.  The above statement seems to get dropped in the generated asm ... at least on a PPC host:
>>
>> IN:
>> 0x00000000100005b4:  cmpw    cr3,r30,r29
>>
>> OUT: [size=160]
>> 0x6041ad30:  lwz     r14,-4(r27)
>> 0x6041ad34:  cmpwi   cr7,r14,0
>> 0x6041ad38:  bne-    cr7,0x6041adbc
>> 0x6041ad3c:  ld      r14,240(r27)   <<< r30
>> 0x6041ad40:  ld      r15,232(r27)   <<< r31
>> 0x6041ad44:  cmpw    cr7,r14,r15    <<< this is the TCG_COND_LTx code
>> 0x6041ad48:  li      r16,1
>> 0x6041ad4c:  li      r0,0
>> 0x6041ad50:  isel    r16,r16,r0,28
>> 0x6041ad54:  stw     r16,576(r27)   <<< store cpu_cr[LT]
>> 0x6041ad58:  cmpw    cr7,r14,r15
>> 0x6041ad5c:  li      r16,1
>> 0x6041ad60:  li      r0,0
>> 0x6041ad64:  isel    r16,r16,r0,29
>> 0x6041ad68:  stw     r16,580(r27)   <<< store cpu_cr[GT]
>> 0x6041ad6c:  cmplw   cr7,r14,r15
>> 0x6041ad70:  li      r14,1
>> 0x6041ad74:  li      r0,0
>> 0x6041ad78:  isel    r14,r14,r0,30
>> 0x6041ad7c:  stw     r14,584(r27)   <<< store cpu_cr[EQ]
>> 0x6041ad80:  .long 0x0
>> 0x6041ad84:  .long 0x0
> 
> If this is 32-bit, the problem is simply that the trunc is missing in
> gen_op_cmp32.  I still see a bunch of failures with the patches though,
> I'll look into them as I have time.

Nah, the failure was a bug in the new function I introduced to get all
32 CR bits as a uint32_t (as you suggested in the reply to patch 4).

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-09-05 12:11     ` Paolo Bonzini
@ 2014-09-09 16:42       ` Paolo Bonzini
  2014-09-09 20:51         ` Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Paolo Bonzini @ 2014-09-09 16:42 UTC (permalink / raw)
  To: Alexander Graf; +Cc: dgibson, qemu-ppc, qemu-devel, tommusta

Il 05/09/2014 14:11, Paolo Bonzini ha scritto:
> 
> 
> ----- Messaggio originale -----
>> Da: "Alexander Graf" <agraf@suse.de>
>> A: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>> Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, tommusta@gmail.com
>> Inviato: Venerdì, 5 settembre 2014 9:10:01
>> Oggetto: Re: [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
>>
>>
>>
>> On 28.08.14 19:14, Paolo Bonzini wrote:
>>> PowerPC TCG flushes the TLB on every IR/DR change, which basically
>>> means on every user<->kernel context switch.  Use the 6-element
>>> TLB array as a cache, where each MMU index is mapped to a different
>>> state of the IR/DR/PR/HV bits.
>>>
>>> This brings the number of TLB flushes down from ~900000 to ~50000
>>> for starting up the Debian installer, which is in line with x86
>>> and gives a ~10% performance improvement.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  cputlb.c                    | 19 +++++++++++++++++
>>>  hw/ppc/spapr_hcall.c        |  6 +++++-
>>>  include/exec/exec-all.h     |  5 +++++
>>>  target-ppc/cpu.h            |  4 +++-
>>>  target-ppc/excp_helper.c    |  6 +-----
>>>  target-ppc/helper_regs.h    | 52
>>>  +++++++++++++++++++++++++++++++--------------
>>>  target-ppc/translate_init.c |  5 +++++
>>>  7 files changed, 74 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/cputlb.c b/cputlb.c
>>> index afd3705..17e1b03 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>>      tlb_flush_count++;
>>>  }
>>>  
>>> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
>>> +{
>>> +    CPUArchState *env = cpu->env_ptr;
>>> +
>>> +#if defined(DEBUG_TLB)
>>> +    printf("tlb_flush_idx %d:\n", mmu_idx);
>>> +#endif
>>> +    /* must reset current TB so that interrupts cannot modify the
>>> +       links while we are modifying them */
>>> +    cpu->current_tb = NULL;
>>> +
>>> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
>>> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
>>> +
>>> +    env->tlb_flush_addr = -1;
>>> +    env->tlb_flush_mask = 0;
>>> +    tlb_flush_count++;
>>> +}
>>> +
>>>  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong
>>>  addr)
>>>  {
>>>      if (addr == (tlb_entry->addr_read &
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index 467858c..b95961c 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>  {
>>>      CPUPPCState *env = &cpu->env;
>>>      CPUState *cs = CPU(cpu);
>>> +    bool flush;
>>>  
>>>      env->msr |= (1ULL << MSR_EE);
>>> -    hreg_compute_hflags(env);
>>> +    flush = hreg_compute_hflags(env);
>>>      if (!cpu_has_work(cs)) {
>>>          cs->halted = 1;
>>>          cs->exception_index = EXCP_HLT;
>>>          cs->exit_request = 1;
>>> +    } else if (flush) {
>>> +        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
>>> +        cs->exit_request = 1;
>>
>> Can this ever happen?
> 
> No, I think it can't.
> 
>> Ok, so this basically changes the semantics of mmu_idx from a static
>> array with predefined meanings to a dynamic array with runtime changing
>> semantics.
>>
>> The first thing that comes to mind here is why we're not just extending
>> the existing array? After all, we have 4 bits -> 16 states minus one for
>> PR+HV. Can our existing logic not deal with this?
> 
> Yeah, that would require 12 MMU indices.  Right now, include/exec/cpu_ldst.h
> only supports 6 but that's easy to extend.
> 
> tlb_flush becomes progressively more expensive as you add more MMU modes,
> but it may work.  This patch removes 98.8% of the TLB flushes, makes the
> remaining ones twice as slow (NB_MMU_MODES goes from 3 to 6), and speeds
> up QEMU by 10%.  You can solve this:
> 
>     0.9 = 0.988 * 0 + 0.012 * tlb_time * 2 + (1 - tlb_time) * 1
>     tlb_time = 0.1 / 0.98 = 0.102
> 
> to compute that the time spent in TLB flushes before the patch is 10.2% of the
> whole emulation time.
> 
> Doubling the NB_MMU_MODES further from 6 to 12 would still save 98.8% of the TLB
> flushes, while making the remaining ones even more expensive.  The savings will be
> smaller, but actually not by much:
> 
>     0.988 * 0 + 0.012 * tlb_time * 4 + (1 - tlb_time) * 1 = 0.903
> 
> i.e. what you propose would still save 9.7%.  Still, having 12 modes seemed like a
> waste, since only 4 or 5 are used in practice...

The 12 MMU modes work just fine.  Thanks for the suggestion!

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
  2014-09-09 16:42       ` Paolo Bonzini
@ 2014-09-09 20:51         ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-09-09 20:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: dgibson, qemu-ppc, qemu-devel, tommusta



> Am 09.09.2014 um 18:42 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
> Il 05/09/2014 14:11, Paolo Bonzini ha scritto:
>> 
>> 
>> ----- Messaggio originale -----
>>> Da: "Alexander Graf" <agraf@suse.de>
>>> A: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>>> Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, tommusta@gmail.com
>>> Inviato: Venerdì, 5 settembre 2014 9:10:01
>>> Oggetto: Re: [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
>>> 
>>> 
>>> 
>>>> On 28.08.14 19:14, Paolo Bonzini wrote:
>>>> PowerPC TCG flushes the TLB on every IR/DR change, which basically
>>>> means on every user<->kernel context switch.  Use the 6-element
>>>> TLB array as a cache, where each MMU index is mapped to a different
>>>> state of the IR/DR/PR/HV bits.
>>>> 
>>>> This brings the number of TLB flushes down from ~900000 to ~50000
>>>> for starting up the Debian installer, which is in line with x86
>>>> and gives a ~10% performance improvement.
>>>> 
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> ---
>>>> cputlb.c                    | 19 +++++++++++++++++
>>>> hw/ppc/spapr_hcall.c        |  6 +++++-
>>>> include/exec/exec-all.h     |  5 +++++
>>>> target-ppc/cpu.h            |  4 +++-
>>>> target-ppc/excp_helper.c    |  6 +-----
>>>> target-ppc/helper_regs.h    | 52
>>>> +++++++++++++++++++++++++++++++--------------
>>>> target-ppc/translate_init.c |  5 +++++
>>>> 7 files changed, 74 insertions(+), 23 deletions(-)
>>>> 
>>>> diff --git a/cputlb.c b/cputlb.c
>>>> index afd3705..17e1b03 100644
>>>> --- a/cputlb.c
>>>> +++ b/cputlb.c
>>>> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>>>     tlb_flush_count++;
>>>> }
>>>> 
>>>> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
>>>> +{
>>>> +    CPUArchState *env = cpu->env_ptr;
>>>> +
>>>> +#if defined(DEBUG_TLB)
>>>> +    printf("tlb_flush_idx %d:\n", mmu_idx);
>>>> +#endif
>>>> +    /* must reset current TB so that interrupts cannot modify the
>>>> +       links while we are modifying them */
>>>> +    cpu->current_tb = NULL;
>>>> +
>>>> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
>>>> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
>>>> +
>>>> +    env->tlb_flush_addr = -1;
>>>> +    env->tlb_flush_mask = 0;
>>>> +    tlb_flush_count++;
>>>> +}
>>>> +
>>>> static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong
>>>> addr)
>>>> {
>>>>     if (addr == (tlb_entry->addr_read &
>>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>>> index 467858c..b95961c 100644
>>>> --- a/hw/ppc/spapr_hcall.c
>>>> +++ b/hw/ppc/spapr_hcall.c
>>>> @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>> {
>>>>     CPUPPCState *env = &cpu->env;
>>>>     CPUState *cs = CPU(cpu);
>>>> +    bool flush;
>>>> 
>>>>     env->msr |= (1ULL << MSR_EE);
>>>> -    hreg_compute_hflags(env);
>>>> +    flush = hreg_compute_hflags(env);
>>>>     if (!cpu_has_work(cs)) {
>>>>         cs->halted = 1;
>>>>         cs->exception_index = EXCP_HLT;
>>>>         cs->exit_request = 1;
>>>> +    } else if (flush) {
>>>> +        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
>>>> +        cs->exit_request = 1;
>>> 
>>> Can this ever happen?
>> 
>> No, I think it can't.
>> 
>>> Ok, so this basically changes the semantics of mmu_idx from a static
>>> array with predefined meanings to a dynamic array with runtime changing
>>> semantics.
>>> 
>>> The first thing that comes to mind here is why we're not just extending
>>> the existing array? After all, we have 4 bits -> 16 states minus one for
>>> PR+HV. Can our existing logic not deal with this?
>> 
>> Yeah, that would require 12 MMU indices.  Right now, include/exec/cpu_ldst.h
>> only supports 6 but that's easy to extend.
>> 
>> tlb_flush becomes progressively more expensive as you add more MMU modes,
>> but it may work.  This patch removes 98.8% of the TLB flushes, makes the
>> remaining ones twice as slow (NB_MMU_MODES goes from 3 to 6), and speeds
>> up QEMU by 10%.  You can solve this:
>> 
>>    0.9 = 0.988 * 0 + 0.012 * tlb_time * 2 + (1 - tlb_time) * 1
>>    tlb_time = 0.1 / 0.98 = 0.102
>> 
>> to compute that the time spent in TLB flushes before the patch is 10.2% of the
>> whole emulation time.
>> 
>> Doubling the NB_MMU_MODES further from 6 to 12 would still save 98.8% of the TLB
>> flushes, while making the remaining ones even more expensive.  The savings will be
>> smaller, but actually not by much:
>> 
>>    0.988 * 0 + 0.012 * tlb_time * 4 + (1 - tlb_time) * 1 = 0.903
>> 
>> i.e. what you propose would still save 9.7%.  Still, having 12 modes seemed like a
>> waste, since only 4 or 5 are used in practice...
> 
> The 12 MMU modes work just fine.  Thanks for the suggestion!

Awesome! While slightly more wasteful than the dynamic approach, I'm sure it's a lot easier to understand and debug :).


Alex

> 
> Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel
  2014-09-03 19:41   ` Tom Musta
@ 2014-09-15 13:39     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-09-15 13:39 UTC (permalink / raw)
  To: Tom Musta, qemu-devel; +Cc: dgibson, qemu-ppc

Il 03/09/2014 21:41, Tom Musta ha scritto:
>> > +    tcg_gen_movcond_tl(cpu_gpr[rD(ctx->opcode)], t1, zero,
>> > +                       true_op, cpu_gpr[rB(ctx->opcode)], TCG_COND_NE);
> This doesnt compile for me ... the order of the arguments does not match what is defined in tcg-op.h.
> 

It compiles by chance without DEBUG_TCGV.

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-09-15 13:39 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
2014-08-28 17:14 ` [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index Paolo Bonzini
2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
2014-08-28 17:30   ` Peter Maydell
2014-08-28 19:35     ` Paolo Bonzini
2014-09-05  6:00       ` David Gibson
2014-09-05  7:10   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-09-05 12:11     ` Paolo Bonzini
2014-09-09 16:42       ` Paolo Bonzini
2014-09-09 20:51         ` Alexander Graf
2014-08-28 17:14 ` [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-09-05  7:10     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-09-05  7:12     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-09-05  7:26     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf Paolo Bonzini
2014-09-03 18:29   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr Paolo Bonzini
2014-09-03 18:58   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr Paolo Bonzini
2014-09-03 19:41   ` Tom Musta
2014-09-05  7:27     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
2014-08-29 18:30   ` Richard Henderson
2014-09-03 19:41   ` Tom Musta
2014-09-15 13:39     ` Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift Paolo Bonzini
2014-09-03 20:59   ` Tom Musta
2014-09-05  7:29     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf Paolo Bonzini
2014-09-04 18:26   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers Paolo Bonzini
2014-09-04 18:27   ` Tom Musta
2014-09-09 15:44     ` Paolo Bonzini
2014-09-09 16:41       ` Paolo Bonzini
2014-09-09 16:03     ` Richard Henderson
2014-09-09 16:26       ` Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 16/17] ppc: inline ppc_get_crf/ppc_set_crf when clearer Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 17/17] ppc: dump all 32 CR bits Paolo Bonzini
2014-08-28 18:05 ` [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Tom Musta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.