All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations
@ 2019-01-02  9:14 Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 1/9] target/ppc: fix typo in SIMM5 extraction helper Mark Cave-Ayland
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

This patchset is an attempt at trying to improve the VMX (Altivec) instruction
performance by laying the groundwork for use of the new TCG vector operations.

Patches 1 and 2 fix a sign-extension error discovered in EXTRACT_SHELPER and an
associated typo in the SIMM5 macro which were discovered whilst testing Richard's
follow-on TCG vector improvements patchset.

In order to use TCG vector operations, the registers must be accessible from cpu_env
whilst currently they are accessed via arrays of static TCG globals. Patches 3-5
are therefore mechanical patches which introduce access helpers for FPR, AVR and VSR
registers using the supplied TCGv_i64 parameter.

Once this is done, patch 6 enables us to remove the static TCG global arrays and updates
the access helpers to read/write to the relevant fields in cpu_env directly.

Patches 7 and 8 perform the legwork required to enable VSX instructions to be converted
to use TCG vector operations in future by rearranging the FP, VMX and VSX registers into
a single aligned VSR register array (the scope of this patchset is VMX only).

Patch 9 removes the AVR* macros and replaces them with the corresponding Vsr* macros
since they are equivalent.

Finally thanks to Richard for taking the time to answer some of my (mostly beginner)
questions related to TCG.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

v5:
- Fix up KVM-enabled builds on PPC host due to missing conversion of target/ppc/kvm.c

v4:
- Rebase onto master
- Add extra R-B tags from Richard
- Leave HI_IDX/LO_IDX in int_helper.c in patch 9 (similarly named macros are also
  used in other files so let's ensure there is no confusion)
- Add cpu_fpr_ptr(), cpu_vsrl_ptr() and cpu_avr_ptr() as suggested by Richard in
  patch 8

v3:
- Rebase onto master, drop RFC prefix, alter subject line
- Add A-B tags from David
- Add SIMM5/EXTRACT_HELPER macro fix patches to the start of the series
- Drop patch 4 from previous patchset (delay AVR register writeback) as it should
  not be required.
- Remove extra get_fpr() accidentally added to GEN_FLOAT macros in patch 3
- Fix temporary leak when VMX/VSX not enabled in patches 4 and 5
- Add patch to remove AVR* macros, replacing them with Vsr* macros
- Drop patches converting logical, add and sub instructions to TCG vector ops (let
  Richard incorporate this into his TCG vector improvements patchset)

v2:
- Rebase onto master
- Add comment explaining rationale for FPR helpers in description for patch 1
- Add R-B tags from Richard
- Add patch 3 to delay AVR register writeback as spotted by Richard
- Add patches 6 and 7 to merge FPR, VMX and VSX registers into the vsr array
  to facilitate conversion of VSX instructions to vector operations later
- Fix accidental bug whereby the conversion of get_vsr()/set_vsr() to access
  data from cpu_env was incorrectly squashed into patch 3
- Move set_fpr() further down in gen_fsqrts() and gen_frsqrtes() in patch 1

Mark Cave-Ayland (9):
  target/ppc: fix typo in SIMM5 extraction helper
  target/ppc: switch EXTRACT_HELPER macros over to use
    sextract32/extract32
  target/ppc: introduce get_fpr() and set_fpr() helpers for FP register
    access
  target/ppc: introduce get_avr64() and set_avr64() helpers for VMX
    register access
  target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}()
    helpers for VSR register access
  target/ppc: switch FPR, VMX and VSX helpers to access data directly
    from cpu_env
  target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  target/ppc: move FP and VMX registers into aligned vsr register array
  target/ppc: replace AVR* macros with Vsr* macros

 linux-user/ppc/signal.c             |  28 +-
 target/ppc/arch_dump.c              |  15 +-
 target/ppc/cpu.h                    |  42 +-
 target/ppc/gdbstub.c                |   8 +-
 target/ppc/int_helper.c             |  86 ++--
 target/ppc/internal.h               |  39 +-
 target/ppc/kvm.c                    |  24 +-
 target/ppc/machine.c                |  72 ++-
 target/ppc/monitor.c                |   4 +-
 target/ppc/translate.c              |  73 ++-
 target/ppc/translate/dfp-impl.inc.c |   2 +-
 target/ppc/translate/fp-impl.inc.c  | 486 +++++++++++++++-----
 target/ppc/translate/vmx-impl.inc.c | 154 +++++--
 target/ppc/translate/vsx-impl.inc.c | 862 ++++++++++++++++++++++++++----------
 target/ppc/translate_init.inc.c     |  26 +-
 15 files changed, 1374 insertions(+), 547 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 1/9] target/ppc: fix typo in SIMM5 extraction helper
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 2/9] target/ppc: switch EXTRACT_HELPER macros over to use sextract32/extract32 Mark Cave-Ayland
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

As the macro name suggests, the argument should be signed and not unsigned.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index a9bcadff42..8b35863549 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -124,7 +124,7 @@ EXTRACT_SHELPER(SIMM, 0, 16);
 /* 16 bits unsigned immediate value */
 EXTRACT_HELPER(UIMM, 0, 16);
 /* 5 bits signed immediate value */
-EXTRACT_HELPER(SIMM5, 16, 5);
+EXTRACT_SHELPER(SIMM5, 16, 5);
 /* 5 bits signed immediate value */
 EXTRACT_HELPER(UIMM5, 16, 5);
 /* 4 bits unsigned immediate value */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 2/9] target/ppc: switch EXTRACT_HELPER macros over to use sextract32/extract32
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 1/9] target/ppc: fix typo in SIMM5 extraction helper Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 3/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

These ensure that we consistently handle signed and unsigned extensions correctly
when decoding immediates from instruction opcodes.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/internal.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 8b35863549..5d460247e2 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -52,20 +52,20 @@ FUNC_MASK(mask_u64, uint64_t, 64, UINT64_MAX);
 #define EXTRACT_HELPER(name, shift, nb)                                       \
 static inline uint32_t name(uint32_t opcode)                                  \
 {                                                                             \
-    return (opcode >> (shift)) & ((1 << (nb)) - 1);                           \
+    return extract32(opcode, shift, nb);                                      \
 }
 
 #define EXTRACT_SHELPER(name, shift, nb)                                      \
 static inline int32_t name(uint32_t opcode)                                   \
 {                                                                             \
-    return (int16_t)((opcode >> (shift)) & ((1 << (nb)) - 1));                \
+    return sextract32(opcode, shift, nb);                                     \
 }
 
 #define EXTRACT_HELPER_SPLIT(name, shift1, nb1, shift2, nb2)                  \
 static inline uint32_t name(uint32_t opcode)                                  \
 {                                                                             \
-    return (((opcode >> (shift1)) & ((1 << (nb1)) - 1)) << nb2) |             \
-            ((opcode >> (shift2)) & ((1 << (nb2)) - 1));                      \
+    return extract32(opcode, shift1, nb1) << nb2 |                            \
+               extract32(opcode, shift2, nb2);                                \
 }
 
 #define EXTRACT_HELPER_SPLIT_3(name,                                          \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 3/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 1/9] target/ppc: fix typo in SIMM5 extraction helper Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 2/9] target/ppc: switch EXTRACT_HELPER macros over to use sextract32/extract32 Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 4/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Mark Cave-Ayland
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

These helpers allow us to move FP register values to/from the specified TCGv_i64
argument in the VSR helpers to be introduced shortly.

To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
---
 target/ppc/translate.c             |  10 +
 target/ppc/translate/fp-impl.inc.c | 486 ++++++++++++++++++++++++++++---------
 2 files changed, 386 insertions(+), 110 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 96894ab9a8..9cecab42f3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6699,6 +6699,16 @@ static inline void gen_##name(DisasContext *ctx)               \
 GEN_TM_PRIV_NOOP(treclaim);
 GEN_TM_PRIV_NOOP(trechkpt);
 
+static inline void get_fpr(TCGv_i64 dst, int regno)
+{
+    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+}
+
+static inline void set_fpr(int regno, TCGv_i64 src)
+{
+    tcg_gen_mov_i64(cpu_fpr[regno], src);
+}
+
 #include "translate/fp-impl.inc.c"
 
 #include "translate/vmx-impl.inc.c"
diff --git a/target/ppc/translate/fp-impl.inc.c b/target/ppc/translate/fp-impl.inc.c
index 08770ba9f5..0f21a4e477 100644
--- a/target/ppc/translate/fp-impl.inc.c
+++ b/target/ppc/translate/fp-impl.inc.c
@@ -34,24 +34,37 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 #define _GEN_FLOAT_ACB(name, op, op1, op2, isfloat, set_fprf, type)           \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
+    TCGv_i64 t3;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
+    t3 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);     \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rC(ctx->opcode));                                             \
+    get_fpr(t2, rB(ctx->opcode));                                             \
+    gen_helper_f##op(t3, cpu_env, t0, t1, t2);                                \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        gen_helper_frsp(t3, cpu_env, t3);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t3);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t3);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
+    tcg_temp_free_i64(t3);                                                    \
 }
 
 #define GEN_FLOAT_ACB(name, op2, set_fprf, type)                              \
@@ -61,24 +74,33 @@ _GEN_FLOAT_ACB(name##s, name, 0x3B, op2, 1, set_fprf, type);
 #define _GEN_FLOAT_AB(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rB(ctx->opcode)]);                               \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rB(ctx->opcode));                                             \
+    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        gen_helper_frsp(t2, cpu_env, t2);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t2);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t2);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
 }
 #define GEN_FLOAT_AB(name, op2, inval, set_fprf, type)                        \
 _GEN_FLOAT_AB(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
@@ -87,24 +109,33 @@ _GEN_FLOAT_AB(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
 #define _GEN_FLOAT_AC(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rC(ctx->opcode)]);                               \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rC(ctx->opcode));                                             \
+    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        gen_helper_frsp(t2, cpu_env, t2);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t2);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t2);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
 }
 #define GEN_FLOAT_AC(name, op2, inval, set_fprf, type)                        \
 _GEN_FLOAT_AC(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
@@ -113,37 +144,51 @@ _GEN_FLOAT_AC(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
 #define GEN_FLOAT_B(name, op2, op3, set_fprf, type)                           \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
-                       cpu_fpr[rB(ctx->opcode)]);                             \
+    get_fpr(t0, rB(ctx->opcode));                                             \
+    gen_helper_f##name(t1, cpu_env, t0);                                      \
+    set_fpr(rD(ctx->opcode), t1);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t1);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
 }
 
 #define GEN_FLOAT_BS(name, op1, op2, set_fprf, type)                          \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
-                       cpu_fpr[rB(ctx->opcode)]);                             \
+    get_fpr(t0, rB(ctx->opcode));                                             \
+    gen_helper_f##name(t1, cpu_env, t0);                                      \
+    set_fpr(rD(ctx->opcode), t1);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t1);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
 }
 
 /* fadd - fadds */
@@ -165,19 +210,25 @@ GEN_FLOAT_BS(rsqrte, 0x3F, 0x1A, 1, PPC_FLOAT_FRSQRTE);
 /* frsqrtes */
 static void gen_frsqrtes(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_frsqrte(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                       cpu_fpr[rB(ctx->opcode)]);
-    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                    cpu_fpr[rD(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_frsqrte(t1, cpu_env, t0);
+    gen_helper_frsp(t1, cpu_env, t1);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fsel */
@@ -189,34 +240,47 @@ GEN_FLOAT_AB(sub, 0x14, 0x000007C0, 1, PPC_FLOAT);
 /* fsqrt */
 static void gen_fsqrt(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                     cpu_fpr[rB(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_fsqrt(t1, cpu_env, t0);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_fsqrts(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                     cpu_fpr[rB(ctx->opcode)]);
-    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                    cpu_fpr[rD(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_fsqrt(t1, cpu_env, t0);
+    gen_helper_frsp(t1, cpu_env, t1);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /***                     Floating-Point multiply-and-add                   ***/
@@ -268,21 +332,32 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
 
 static void gen_ftdiv(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], t0, t1);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_ftsqrt(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
 
@@ -293,32 +368,46 @@ static void gen_ftsqrt(DisasContext *ctx)
 static void gen_fcmpo(DisasContext *ctx)
 {
     TCGv_i32 crf;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
     crf = tcg_const_i32(crfD(ctx->opcode));
-    gen_helper_fcmpo(cpu_env, cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)], crf);
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_fcmpo(cpu_env, t0, t1, crf);
     tcg_temp_free_i32(crf);
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fcmpu */
 static void gen_fcmpu(DisasContext *ctx)
 {
     TCGv_i32 crf;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
     crf = tcg_const_i32(crfD(ctx->opcode));
-    gen_helper_fcmpu(cpu_env, cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)], crf);
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_fcmpu(cpu_env, t0, t1, crf);
     tcg_temp_free_i32(crf);
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /***                         Floating-point move                           ***/
@@ -326,100 +415,153 @@ static void gen_fcmpu(DisasContext *ctx)
 /* XXX: beware that fabs never checks for NaNs nor update FPSCR */
 static void gen_fabs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_andi_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                     ~(1ULL << 63));
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_andi_i64(t1, t0, ~(1ULL << 63));
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fmr  - fmr. */
 /* XXX: beware that fmr never checks for NaNs nor update FPSCR */
 static void gen_fmr(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_mov_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    set_fpr(rD(ctx->opcode), t0);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
 }
 
 /* fnabs */
 /* XXX: beware that fnabs never checks for NaNs nor update FPSCR */
 static void gen_fnabs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_ori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                    1ULL << 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_ori_i64(t1, t0, 1ULL << 63);
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fneg */
 /* XXX: beware that fneg never checks for NaNs nor update FPSCR */
 static void gen_fneg(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_xori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                     1ULL << 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_xori_i64(t1, t0, 1ULL << 63);
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fcpsgn: PowerPC 2.05 specification */
 /* XXX: beware that fcpsgn never checks for NaNs nor update FPSCR */
 static void gen_fcpsgn(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
+    TCGv_i64 t2;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                        cpu_fpr[rB(ctx->opcode)], 0, 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    tcg_gen_deposit_i64(t2, t0, t1, 0, 63);
+    set_fpr(rD(ctx->opcode), t2);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
 }
 
 static void gen_fmrgew(DisasContext *ctx)
 {
     TCGv_i64 b0;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     b0 = tcg_temp_new_i64();
-    tcg_gen_shri_i64(b0, cpu_fpr[rB(ctx->opcode)], 32);
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                        b0, 0, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_shri_i64(b0, t0, 32);
+    get_fpr(t0, rA(ctx->opcode));
+    tcg_gen_deposit_i64(t1, t0, b0, 0, 32);
+    set_fpr(rD(ctx->opcode), t1);
     tcg_temp_free_i64(b0);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_fmrgow(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
+    TCGv_i64 t2;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)],
-                        cpu_fpr[rB(ctx->opcode)],
-                        cpu_fpr[rA(ctx->opcode)],
-                        32, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    get_fpr(t1, rA(ctx->opcode));
+    tcg_gen_deposit_i64(t2, t0, t1, 32, 32);
+    set_fpr(rD(ctx->opcode), t2);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
 }
 
 /***                  Floating-Point status & ctrl register                ***/
@@ -458,15 +600,19 @@ static void gen_mcrfs(DisasContext *ctx)
 /* mffs */
 static void gen_mffs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    tcg_gen_extu_tl_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpscr);
+    tcg_gen_extu_tl_i64(t0, cpu_fpscr);
+    set_fpr(rD(ctx->opcode), t0);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
 }
 
 /* mtfsb0 */
@@ -522,6 +668,7 @@ static void gen_mtfsb1(DisasContext *ctx)
 static void gen_mtfsf(DisasContext *ctx)
 {
     TCGv_i32 t0;
+    TCGv_i64 t1;
     int flm, l, w;
 
     if (unlikely(!ctx->fpu_enabled)) {
@@ -541,7 +688,9 @@ static void gen_mtfsf(DisasContext *ctx)
     } else {
         t0 = tcg_const_i32(flm << (w * 8));
     }
-    gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
+    t1 = tcg_temp_new_i64();
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_store_fpscr(cpu_env, t1, t0);
     tcg_temp_free_i32(t0);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
@@ -549,6 +698,7 @@ static void gen_mtfsf(DisasContext *ctx)
     }
     /* We can raise a differed exception */
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t1);
 }
 
 /* mtfsfi */
@@ -588,21 +738,26 @@ static void gen_mtfsfi(DisasContext *ctx)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDUF(name, ldop, opc, type)                                       \
 static void glue(gen_, name##u)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -613,20 +768,25 @@ static void glue(gen_, name##u)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDUXF(name, ldop, opc, type)                                      \
 static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
     if (unlikely(rA(ctx->opcode) == 0)) {                                     \
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);                   \
         return;                                                               \
@@ -634,24 +794,30 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDXF(name, ldop, opc2, opc3, type)                                \
 static void glue(gen_, name##x)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDFS(name, ldop, op, type)                                        \
@@ -677,6 +843,7 @@ GEN_LDFS(lfs, ld32fs, 0x10, PPC_FLOAT);
 static void gen_lfdepx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     CHK_SV;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
@@ -684,16 +851,19 @@ static void gen_lfdepx(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_qemu_ld_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_LOAD,
-        DEF_MEMOP(MO_Q));
+    tcg_gen_qemu_ld_i64(t0, EA, PPC_TLB_EPID_LOAD, DEF_MEMOP(MO_Q));
+    set_fpr(rD(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfdp */
 static void gen_lfdp(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -701,24 +871,31 @@ static void gen_lfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);
+    t0 = tcg_temp_new_i64();
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfdpx */
 static void gen_lfdpx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -726,18 +903,24 @@ static void gen_lfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
+    t0 = tcg_temp_new_i64();
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfiwax */
@@ -745,6 +928,7 @@ static void gen_lfiwax(DisasContext *ctx)
 {
     TCGv EA;
     TCGv t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -752,47 +936,59 @@ static void gen_lfiwax(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
     gen_qemu_ld32s(ctx, t0, EA);
-    tcg_gen_ext_tl_i64(cpu_fpr[rD(ctx->opcode)], t0);
+    tcg_gen_ext_tl_i64(t1, t0);
+    set_fpr(rD(ctx->opcode), t1);
     tcg_temp_free(EA);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* lfiwzx */
 static void gen_lfiwzx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld32u_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+    gen_qemu_ld32u_i64(ctx, t0, EA);
+    set_fpr(rD(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 /***                         Floating-point store                          ***/
 #define GEN_STF(name, stop, opc, type)                                        \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STUF(name, stop, opc, type)                                       \
 static void glue(gen_, name##u)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -803,16 +999,20 @@ static void glue(gen_, name##u)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STUXF(name, stop, opc, type)                                      \
 static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -823,25 +1023,32 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STXF(name, stop, opc2, opc3, type)                                \
 static void glue(gen_, name##x)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STFS(name, stop, op, type)                                        \
@@ -867,6 +1074,7 @@ GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT);
 static void gen_stfdepx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     CHK_SV;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
@@ -874,60 +1082,76 @@ static void gen_stfdepx(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_qemu_st_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_STORE,
-                       DEF_MEMOP(MO_Q));
+    get_fpr(t0, rD(ctx->opcode));
+    tcg_gen_qemu_st_i64(t0, EA, PPC_TLB_EPID_STORE, DEF_MEMOP(MO_Q));
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* stfdp */
 static void gen_stfdp(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, EA, 0);
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
     } else {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* stfdpx */
 static void gen_stfdpx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
     } else {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* Optional: */
@@ -949,13 +1173,18 @@ static void gen_lfq(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr(rd, t1);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr((rd + 1) % 32, t1);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* lfqu */
@@ -964,17 +1193,22 @@ static void gen_lfqu(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     t1 = tcg_temp_new();
+    t2 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t2, t0);
+    set_fpr(rd, t2);
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, t2, t1);
+    set_fpr((rd + 1) % 32, t2);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
+    tcg_temp_free_i64(t2);
 }
 
 /* lfqux */
@@ -984,16 +1218,21 @@ static void gen_lfqux(DisasContext *ctx)
     int rd = rD(ctx->opcode);
     gen_set_access_type(ctx, ACCESS_FLOAT);
     TCGv t0, t1;
+    TCGv_i64 t2;
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t2, t0);
+    set_fpr(rd, t2);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, t2, t1);
+    set_fpr((rd + 1) % 32, t2);
     tcg_temp_free(t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* lfqx */
@@ -1001,13 +1240,18 @@ static void gen_lfqx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr(rd, t1);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr((rd + 1) % 32, t1);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* stfq */
@@ -1015,13 +1259,18 @@ static void gen_stfq(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t1, rd);
+    gen_qemu_st64_i64(ctx, t1, t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    get_fpr(t1, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t1, t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* stfqu */
@@ -1030,17 +1279,23 @@ static void gen_stfqu(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t2, rd);
+    gen_qemu_st64_i64(ctx, t2, t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    get_fpr(t2, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t2, t1);
     tcg_temp_free(t1);
-    if (ra != 0)
+    if (ra != 0) {
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
+    }
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* stfqux */
@@ -1049,17 +1304,23 @@ static void gen_stfqux(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t2, rd);
+    gen_qemu_st64_i64(ctx, t2, t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    get_fpr(t2, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t2, t1);
     tcg_temp_free(t1);
-    if (ra != 0)
+    if (ra != 0) {
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
+    }
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* stfqx */
@@ -1067,13 +1328,18 @@ static void gen_stfqx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t1 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t1, rd);
+    gen_qemu_st64_i64(ctx, t1, t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    get_fpr(t1, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t1, t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 #undef _GEN_FLOAT_ACB
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 4/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX register access
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (2 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 3/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 5/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

These helpers allow us to move AVR register values to/from the specified TCGv_i64
argument.

To prevent VMX helpers accessing the cpu_avr{l,h} arrays directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
---
 target/ppc/translate.c              |  10 +++
 target/ppc/translate/vmx-impl.inc.c | 147 ++++++++++++++++++++++++++++--------
 2 files changed, 124 insertions(+), 33 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 9cecab42f3..3bb24e7310 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6709,6 +6709,16 @@ static inline void set_fpr(int regno, TCGv_i64 src)
     tcg_gen_mov_i64(cpu_fpr[regno], src);
 }
 
+static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
+{
+    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+}
+
+static inline void set_avr64(int regno, TCGv_i64 src, bool high)
+{
+    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+}
+
 #include "translate/fp-impl.inc.c"
 
 #include "translate/vmx-impl.inc.c"
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 3cb6fc2926..5e8327e9a3 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -18,52 +18,66 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 avr;                                                             \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
+    avr = tcg_temp_new_i64();                                                 \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, false);                               \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, true);                                \
     } else {                                                                  \
-        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, true);                                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, false);                               \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(avr);                                                   \
 }
 
 #define GEN_VR_STX(name, opc2, opc3)                                          \
 static void gen_st##name(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 avr;                                                             \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
+    avr = tcg_temp_new_i64();                                                 \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), false);                               \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), true);                                \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
     } else {                                                                  \
-        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), true);                                \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), false);                               \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(avr);                                                   \
 }
 
 #define GEN_VR_LVE(name, opc2, opc3, size)                              \
@@ -159,15 +173,20 @@ static void gen_lvsr(DisasContext *ctx)
 static void gen_mfvscr(DisasContext *ctx)
 {
     TCGv_i32 t;
+    TCGv_i64 avr;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
         return;
     }
-    tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);
+    avr = tcg_temp_new_i64();
+    tcg_gen_movi_i64(avr, 0);
+    set_avr64(rD(ctx->opcode), avr, true);
     t = tcg_temp_new_i32();
     tcg_gen_ld_i32(t, cpu_env, offsetof(CPUPPCState, vscr));
-    tcg_gen_extu_i32_i64(cpu_avrl[rD(ctx->opcode)], t);
+    tcg_gen_extu_i32_i64(avr, t);
+    set_avr64(rD(ctx->opcode), avr, false);
     tcg_temp_free_i32(t);
+    tcg_temp_free_i64(avr);
 }
 
 static void gen_mtvscr(DisasContext *ctx)
@@ -185,9 +204,10 @@ static void gen_mtvscr(DisasContext *ctx)
 #define GEN_VX_VMUL10(name, add_cin, ret_carry)                         \
 static void glue(gen_, name)(DisasContext *ctx)                         \
 {                                                                       \
-    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
-    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
-    TCGv_i64 t2 = tcg_temp_new_i64();                                   \
+    TCGv_i64 t0;                                                        \
+    TCGv_i64 t1;                                                        \
+    TCGv_i64 t2;                                                        \
+    TCGv_i64 avr;                                                       \
     TCGv_i64 ten, z;                                                    \
                                                                         \
     if (unlikely(!ctx->altivec_enabled)) {                              \
@@ -195,30 +215,43 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
         return;                                                         \
     }                                                                   \
                                                                         \
+    t0 = tcg_temp_new_i64();                                            \
+    t1 = tcg_temp_new_i64();                                            \
+    t2 = tcg_temp_new_i64();                                            \
+    avr = tcg_temp_new_i64();                                           \
     ten = tcg_const_i64(10);                                            \
     z = tcg_const_i64(0);                                               \
                                                                         \
     if (add_cin) {                                                      \
-        tcg_gen_mulu2_i64(t0, t1, cpu_avrl[rA(ctx->opcode)], ten);      \
-        tcg_gen_andi_i64(t2, cpu_avrl[rB(ctx->opcode)], 0xF);           \
-        tcg_gen_add2_i64(cpu_avrl[rD(ctx->opcode)], t2, t0, t1, t2, z); \
+        get_avr64(avr, rA(ctx->opcode), false);                         \
+        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
+        get_avr64(avr, rB(ctx->opcode), false);                         \
+        tcg_gen_andi_i64(t2, avr, 0xF);                                 \
+        tcg_gen_add2_i64(avr, t2, t0, t1, t2, z);                       \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
     } else {                                                            \
-        tcg_gen_mulu2_i64(cpu_avrl[rD(ctx->opcode)], t2,                \
-                          cpu_avrl[rA(ctx->opcode)], ten);              \
+        get_avr64(avr, rA(ctx->opcode), false);                         \
+        tcg_gen_mulu2_i64(avr, t2, avr, ten);                           \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
     }                                                                   \
                                                                         \
     if (ret_carry) {                                                    \
-        tcg_gen_mulu2_i64(t0, t1, cpu_avrh[rA(ctx->opcode)], ten);      \
-        tcg_gen_add2_i64(t0, cpu_avrl[rD(ctx->opcode)], t0, t1, t2, z); \
-        tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);                 \
+        get_avr64(avr, rA(ctx->opcode), true);                          \
+        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
+        tcg_gen_add2_i64(t0, avr, t0, t1, t2, z);                       \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
+        set_avr64(rD(ctx->opcode), z, true);                            \
     } else {                                                            \
-        tcg_gen_mul_i64(t0, cpu_avrh[rA(ctx->opcode)], ten);            \
-        tcg_gen_add_i64(cpu_avrh[rD(ctx->opcode)], t0, t2);             \
+        get_avr64(avr, rA(ctx->opcode), true);                          \
+        tcg_gen_mul_i64(t0, avr, ten);                                  \
+        tcg_gen_add_i64(avr, t0, t2);                                   \
+        set_avr64(rD(ctx->opcode), avr, true);                          \
     }                                                                   \
                                                                         \
     tcg_temp_free_i64(t0);                                              \
     tcg_temp_free_i64(t1);                                              \
     tcg_temp_free_i64(t2);                                              \
+    tcg_temp_free_i64(avr);                                             \
     tcg_temp_free_i64(ten);                                             \
     tcg_temp_free_i64(z);                                               \
 }                                                                       \
@@ -232,12 +265,31 @@ GEN_VX_VMUL10(vmul10ecuq, 1, 1);
 #define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
 {                                                                       \
+    TCGv_i64 t0;                                                        \
+    TCGv_i64 t1;                                                        \
+    TCGv_i64 avr;                                                       \
+                                                                        \
     if (unlikely(!ctx->altivec_enabled)) {                              \
         gen_exception(ctx, POWERPC_EXCP_VPU);                           \
         return;                                                         \
     }                                                                   \
-    tcg_op(cpu_avrh[rD(ctx->opcode)], cpu_avrh[rA(ctx->opcode)], cpu_avrh[rB(ctx->opcode)]); \
-    tcg_op(cpu_avrl[rD(ctx->opcode)], cpu_avrl[rA(ctx->opcode)], cpu_avrl[rB(ctx->opcode)]); \
+    t0 = tcg_temp_new_i64();                                            \
+    t1 = tcg_temp_new_i64();                                            \
+    avr = tcg_temp_new_i64();                                           \
+                                                                        \
+    get_avr64(t0, rA(ctx->opcode), true);                               \
+    get_avr64(t1, rB(ctx->opcode), true);                               \
+    tcg_op(avr, t0, t1);                                                \
+    set_avr64(rD(ctx->opcode), avr, true);                              \
+                                                                        \
+    get_avr64(t0, rA(ctx->opcode), false);                              \
+    get_avr64(t1, rB(ctx->opcode), false);                              \
+    tcg_op(avr, t0, t1);                                                \
+    set_avr64(rD(ctx->opcode), avr, false);                             \
+                                                                        \
+    tcg_temp_free_i64(t0);                                              \
+    tcg_temp_free_i64(t1);                                              \
+    tcg_temp_free_i64(avr);                                             \
 }
 
 GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
@@ -406,6 +458,7 @@ GEN_VXFORM(vmrglw, 6, 6);
 static void gen_vmrgew(DisasContext *ctx)
 {
     TCGv_i64 tmp;
+    TCGv_i64 avr;
     int VT, VA, VB;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -415,15 +468,28 @@ static void gen_vmrgew(DisasContext *ctx)
     VA = rA(ctx->opcode);
     VB = rB(ctx->opcode);
     tmp = tcg_temp_new_i64();
-    tcg_gen_shri_i64(tmp, cpu_avrh[VB], 32);
-    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VA], tmp, 0, 32);
-    tcg_gen_shri_i64(tmp, cpu_avrl[VB], 32);
-    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VA], tmp, 0, 32);
+    avr = tcg_temp_new_i64();
+
+    get_avr64(avr, VB, true);
+    tcg_gen_shri_i64(tmp, avr, 32);
+    get_avr64(avr, VA, true);
+    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
+    set_avr64(VT, avr, true);
+
+    get_avr64(avr, VB, false);
+    tcg_gen_shri_i64(tmp, avr, 32);
+    get_avr64(avr, VA, false);
+    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
+    set_avr64(VT, avr, false);
+
     tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(avr);
 }
 
 static void gen_vmrgow(DisasContext *ctx)
 {
+    TCGv_i64 t0, t1;
+    TCGv_i64 avr;
     int VT, VA, VB;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -432,9 +498,23 @@ static void gen_vmrgow(DisasContext *ctx)
     VT = rD(ctx->opcode);
     VA = rA(ctx->opcode);
     VB = rB(ctx->opcode);
-
-    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VB], cpu_avrh[VA], 32, 32);
-    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VB], cpu_avrl[VA], 32, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    avr = tcg_temp_new_i64();
+
+    get_avr64(t0, VB, true);
+    get_avr64(t1, VA, true);
+    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
+    set_avr64(VT, avr, true);
+
+    get_avr64(t0, VB, false);
+    get_avr64(t1, VA, false);
+    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
+    set_avr64(VT, avr, false);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(avr);
 }
 
 GEN_VXFORM(vmuloub, 4, 0);
@@ -790,7 +870,7 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     {                                                                   \
         TCGv_ptr rb, rd;                                                \
         uint8_t uimm = UIMM4(ctx->opcode);                              \
-        TCGv_i32 t0 = tcg_temp_new_i32();                               \
+        TCGv_i32 t0;                                                    \
         if (unlikely(!ctx->altivec_enabled)) {                          \
             gen_exception(ctx, POWERPC_EXCP_VPU);                       \
             return;                                                     \
@@ -798,6 +878,7 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
         if (uimm > splat_max) {                                         \
             uimm = 0;                                                   \
         }                                                               \
+        t0 = tcg_temp_new_i32();                                        \
         tcg_gen_movi_i32(t0, uimm);                                     \
         rb = gen_avr_ptr(rB(ctx->opcode));                              \
         rd = gen_avr_ptr(rD(ctx->opcode));                              \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 5/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (3 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 4/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 6/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Mark Cave-Ayland
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

These helpers allow us to move VSR register values to/from the specified TCGv_i64
argument.

To prevent VSX helpers accessing the cpu_vsr array directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
---
 target/ppc/translate/vsx-impl.inc.c | 862 ++++++++++++++++++++++++++----------
 1 file changed, 638 insertions(+), 224 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 85ed135d44..f0665df1a5 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1,20 +1,48 @@
 /***                           VSX extension                               ***/
 
-static inline TCGv_i64 cpu_vsrh(int n)
+static inline void get_vsr(TCGv_i64 dst, int n)
+{
+    tcg_gen_mov_i64(dst, cpu_vsr[n]);
+}
+
+static inline void set_vsr(int n, TCGv_i64 src)
+{
+    tcg_gen_mov_i64(cpu_vsr[n], src);
+}
+
+static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
+{
+    if (n < 32) {
+        get_fpr(dst, n);
+    } else {
+        get_avr64(dst, n - 32, true);
+    }
+}
+
+static inline void get_cpu_vsrl(TCGv_i64 dst, int n)
+{
+    if (n < 32) {
+        get_vsr(dst, n);
+    } else {
+        get_avr64(dst, n - 32, false);
+    }
+}
+
+static inline void set_cpu_vsrh(int n, TCGv_i64 src)
 {
     if (n < 32) {
-        return cpu_fpr[n];
+        set_fpr(n, src);
     } else {
-        return cpu_avrh[n-32];
+        set_avr64(n - 32, src, true);
     }
 }
 
-static inline TCGv_i64 cpu_vsrl(int n)
+static inline void set_cpu_vsrl(int n, TCGv_i64 src)
 {
     if (n < 32) {
-        return cpu_vsr[n];
+        set_vsr(n, src);
     } else {
-        return cpu_avrl[n-32];
+        set_avr64(n - 32, src, false);
     }
 }
 
@@ -22,16 +50,20 @@ static inline TCGv_i64 cpu_vsrl(int n)
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
     TCGv EA;                                                  \
+    TCGv_i64 t0;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
+    t0 = tcg_temp_new_i64();                                  \
     gen_set_access_type(ctx, ACCESS_INT);                     \
     EA = tcg_temp_new();                                      \
     gen_addr_reg_index(ctx, EA);                              \
-    gen_qemu_##operation(ctx, cpu_vsrh(xT(ctx->opcode)), EA); \
+    gen_qemu_##operation(ctx, t0, EA);                        \
+    set_cpu_vsrh(xT(ctx->opcode), t0);                        \
     /* NOTE: cpu_vsrl is undefined */                         \
     tcg_temp_free(EA);                                        \
+    tcg_temp_free_i64(t0);                                    \
 }
 
 VSX_LOAD_SCALAR(lxsdx, ld64_i64)
@@ -44,43 +76,60 @@ VSX_LOAD_SCALAR(lxsspx, ld32fs)
 static void gen_lxvd2x(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_ld64_i64(ctx, cpu_vsrl(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrl(xT(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_lxvdsx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_gen_mov_i64(t1, t0);
+    set_cpu_vsrl(xT(ctx->opcode), t1);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_lxvw4x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrl(xtl, xT(ctx->opcode));
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
 
@@ -104,6 +153,8 @@ static void gen_lxvw4x(DisasContext *ctx)
         tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl,
@@ -151,13 +202,17 @@ static void gen_bswap32x4(TCGv_i64 outh, TCGv_i64 outl,
 static void gen_lxvh8x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrl(xtl, xT(ctx->opcode));
     gen_set_access_type(ctx, ACCESS_INT);
 
     EA = tcg_temp_new();
@@ -169,18 +224,24 @@ static void gen_lxvh8x(DisasContext *ctx)
         gen_bswap16x8(xth, xtl, xth, xtl);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_lxvb16x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrl(xtl, xT(ctx->opcode));
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
@@ -188,6 +249,8 @@ static void gen_lxvb16x(DisasContext *ctx)
     tcg_gen_addi_tl(EA, EA, 8);
     tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 #define VSX_VECTOR_LOAD_STORE(name, op, indexed)            \
@@ -195,15 +258,14 @@ static void gen_##name(DisasContext *ctx)                   \
 {                                                           \
     int xt;                                                 \
     TCGv EA;                                                \
-    TCGv_i64 xth, xtl;                                      \
+    TCGv_i64 xth;                                           \
+    TCGv_i64 xtl;                                           \
                                                             \
     if (indexed) {                                          \
         xt = xT(ctx->opcode);                               \
     } else {                                                \
         xt = DQxT(ctx->opcode);                             \
     }                                                       \
-    xth = cpu_vsrh(xt);                                     \
-    xtl = cpu_vsrl(xt);                                     \
                                                             \
     if (xt < 32) {                                          \
         if (unlikely(!ctx->vsx_enabled)) {                  \
@@ -216,6 +278,10 @@ static void gen_##name(DisasContext *ctx)                   \
             return;                                         \
         }                                                   \
     }                                                       \
+    xth = tcg_temp_new_i64();                               \
+    xtl = tcg_temp_new_i64();                               \
+    get_cpu_vsrh(xth, xt);                                  \
+    get_cpu_vsrl(xtl, xt);                                  \
     gen_set_access_type(ctx, ACCESS_INT);                   \
     EA = tcg_temp_new();                                    \
     if (indexed) {                                          \
@@ -225,14 +291,20 @@ static void gen_##name(DisasContext *ctx)                   \
     }                                                       \
     if (ctx->le_mode) {                                     \
         tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_LEQ);   \
+        set_cpu_vsrl(xt, xtl);                              \
         tcg_gen_addi_tl(EA, EA, 8);                         \
         tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_LEQ);   \
+        set_cpu_vsrh(xt, xth);                              \
     } else {                                                \
         tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_BEQ);   \
+        set_cpu_vsrh(xt, xth);                              \
         tcg_gen_addi_tl(EA, EA, 8);                         \
         tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_BEQ);   \
+        set_cpu_vsrl(xt, xtl);                              \
     }                                                       \
     tcg_temp_free(EA);                                      \
+    tcg_temp_free_i64(xth);                                 \
+    tcg_temp_free_i64(xtl);                                 \
 }
 
 VSX_VECTOR_LOAD_STORE(lxv, ld_i64, 0)
@@ -276,18 +348,22 @@ VSX_VECTOR_LOAD_STORE_LENGTH(stxvll)
 static void gen_##name(DisasContext *ctx)                         \
 {                                                                 \
     TCGv EA;                                                      \
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
+    TCGv_i64 xth;                                                 \
                                                                   \
     if (unlikely(!ctx->altivec_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VPU);                     \
         return;                                                   \
     }                                                             \
+    xth = tcg_temp_new_i64();                                     \
+    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
     gen_set_access_type(ctx, ACCESS_INT);                         \
     EA = tcg_temp_new();                                          \
     gen_addr_imm_index(ctx, EA, 0x03);                            \
     gen_qemu_##operation(ctx, xth, EA);                           \
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);                      \
     /* NOTE: cpu_vsrl is undefined */                             \
     tcg_temp_free(EA);                                            \
+    tcg_temp_free_i64(xth);                                       \
 }
 
 VSX_LOAD_SCALAR_DS(lxsd, ld64_i64)
@@ -297,15 +373,19 @@ VSX_LOAD_SCALAR_DS(lxssp, ld32fs)
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
     TCGv EA;                                                  \
+    TCGv_i64 t0;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
+    t0 = tcg_temp_new_i64();                                  \
     gen_set_access_type(ctx, ACCESS_INT);                     \
     EA = tcg_temp_new();                                      \
     gen_addr_reg_index(ctx, EA);                              \
-    gen_qemu_##operation(ctx, cpu_vsrh(xS(ctx->opcode)), EA); \
+    gen_qemu_##operation(ctx, t0, EA);                        \
+    set_cpu_vsrh(xS(ctx->opcode), t0);                        \
     tcg_temp_free(EA);                                        \
+    tcg_temp_free_i64(t0);                                    \
 }
 
 VSX_STORE_SCALAR(stxsdx, st64_i64)
@@ -318,28 +398,38 @@ VSX_STORE_SCALAR(stxsspx, st32fs)
 static void gen_stxvd2x(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_st64_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
+    get_cpu_vsrh(t0, xS(ctx->opcode));
+    gen_qemu_st64_i64(ctx, t0, EA);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_st64_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
+    get_cpu_vsrl(t0, xS(ctx->opcode));
+    gen_qemu_st64_i64(ctx, t0, EA);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_stxvw4x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh;
+    TCGv_i64 xsl;
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xsh = tcg_temp_new_i64();
+    xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
@@ -362,18 +452,24 @@ static void gen_stxvw4x(DisasContext *ctx)
         tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 static void gen_stxvh8x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh;
+    TCGv_i64 xsl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xsh = tcg_temp_new_i64();
+    xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
@@ -393,18 +489,24 @@ static void gen_stxvh8x(DisasContext *ctx)
         tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 static void gen_stxvb16x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh;
+    TCGv_i64 xsl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xsh = tcg_temp_new_i64();
+    xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
@@ -412,80 +514,144 @@ static void gen_stxvb16x(DisasContext *ctx)
     tcg_gen_addi_tl(EA, EA, 8);
     tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 #define VSX_STORE_SCALAR_DS(name, operation)                      \
 static void gen_##name(DisasContext *ctx)                         \
 {                                                                 \
     TCGv EA;                                                      \
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
+    TCGv_i64 xth;                                                 \
                                                                   \
     if (unlikely(!ctx->altivec_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VPU);                     \
         return;                                                   \
     }                                                             \
+    xth = tcg_temp_new_i64();                                     \
+    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
     gen_set_access_type(ctx, ACCESS_INT);                         \
     EA = tcg_temp_new();                                          \
     gen_addr_imm_index(ctx, EA, 0x03);                            \
     gen_qemu_##operation(ctx, xth, EA);                           \
     /* NOTE: cpu_vsrl is undefined */                             \
     tcg_temp_free(EA);                                            \
+    tcg_temp_free_i64(xth);                                       \
 }
 
 VSX_LOAD_SCALAR_DS(stxsd, st64_i64)
 VSX_LOAD_SCALAR_DS(stxssp, st32fs)
 
-#define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
-static void gen_##name(DisasContext *ctx)                       \
-{                                                               \
-    if (xS(ctx->opcode) < 32) {                                 \
-        if (unlikely(!ctx->fpu_enabled)) {                      \
-            gen_exception(ctx, POWERPC_EXCP_FPU);               \
-            return;                                             \
-        }                                                       \
-    } else {                                                    \
-        if (unlikely(!ctx->altivec_enabled)) {                  \
-            gen_exception(ctx, POWERPC_EXCP_VPU);               \
-            return;                                             \
-        }                                                       \
-    }                                                           \
-    TCGv_i64 tmp = tcg_temp_new_i64();                          \
-    tcg_gen_##tcgop1(tmp, source);                              \
-    tcg_gen_##tcgop2(target, tmp);                              \
-    tcg_temp_free_i64(tmp);                                     \
+static void gen_mfvsrwz(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    tcg_gen_ext32u_i64(tmp, xsh);
+    tcg_gen_trunc_i64_tl(cpu_gpr[rA(ctx->opcode)], tmp);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
 }
 
+static void gen_mtvsrwa(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_ext32s_i64(xsh, tmp);
+    set_cpu_vsrh(xT(ctx->opcode), xsh);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
+}
 
-MV_VSRW(mfvsrwz, ext32u_i64, trunc_i64_tl, cpu_gpr[rA(ctx->opcode)], \
-        cpu_vsrh(xS(ctx->opcode)))
-MV_VSRW(mtvsrwa, extu_tl_i64, ext32s_i64, cpu_vsrh(xT(ctx->opcode)), \
-        cpu_gpr[rA(ctx->opcode)])
-MV_VSRW(mtvsrwz, extu_tl_i64, ext32u_i64, cpu_vsrh(xT(ctx->opcode)), \
-        cpu_gpr[rA(ctx->opcode)])
+static void gen_mtvsrwz(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_ext32u_i64(xsh, tmp);
+    set_cpu_vsrh(xT(ctx->opcode), xsh);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
+}
 
 #if defined(TARGET_PPC64)
-#define MV_VSRD(name, target, source)                           \
-static void gen_##name(DisasContext *ctx)                       \
-{                                                               \
-    if (xS(ctx->opcode) < 32) {                                 \
-        if (unlikely(!ctx->fpu_enabled)) {                      \
-            gen_exception(ctx, POWERPC_EXCP_FPU);               \
-            return;                                             \
-        }                                                       \
-    } else {                                                    \
-        if (unlikely(!ctx->altivec_enabled)) {                  \
-            gen_exception(ctx, POWERPC_EXCP_VPU);               \
-            return;                                             \
-        }                                                       \
-    }                                                           \
-    tcg_gen_mov_i64(target, source);                            \
+static void gen_mfvsrd(DisasContext *ctx)
+{
+    TCGv_i64 t0;
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    t0 = tcg_temp_new_i64();
+    get_cpu_vsrh(t0, xS(ctx->opcode));
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
-MV_VSRD(mfvsrd, cpu_gpr[rA(ctx->opcode)], cpu_vsrh(xS(ctx->opcode)))
-MV_VSRD(mtvsrd, cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)])
+static void gen_mtvsrd(DisasContext *ctx)
+{
+    TCGv_i64 t0;
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    t0 = tcg_temp_new_i64();
+    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
+}
 
 static void gen_mfvsrld(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (xS(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -497,12 +663,15 @@ static void gen_mfvsrld(DisasContext *ctx)
             return;
         }
     }
-
-    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_vsrl(xS(ctx->opcode)));
+    t0 = tcg_temp_new_i64();
+    get_cpu_vsrl(t0, xS(ctx->opcode));
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_mtvsrdd(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (xT(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -515,17 +684,22 @@ static void gen_mtvsrdd(DisasContext *ctx)
         }
     }
 
+    t0 = tcg_temp_new_i64();
     if (!rA(ctx->opcode)) {
-        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);
+        tcg_gen_movi_i64(t0, 0);
     } else {
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)]);
+        tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
     }
+    set_cpu_vsrh(xT(ctx->opcode), t0);
 
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rB(ctx->opcode)]);
+    tcg_gen_mov_i64(t0, cpu_gpr[rB(ctx->opcode)]);
+    set_cpu_vsrl(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_mtvsrws(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (xT(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -538,55 +712,61 @@ static void gen_mtvsrws(DisasContext *ctx)
         }
     }
 
-    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)],
+    t0 = tcg_temp_new_i64();
+    tcg_gen_deposit_i64(t0, cpu_gpr[rA(ctx->opcode)],
                         cpu_gpr[rA(ctx->opcode)], 32, 32);
-    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xT(ctx->opcode)));
+    set_cpu_vsrl(xT(ctx->opcode), t0);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
 }
 
 #endif
 
 static void gen_xxpermdi(DisasContext *ctx)
 {
+    TCGv_i64 xh, xl;
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
 
+    xh = tcg_temp_new_i64();
+    xl = tcg_temp_new_i64();
+
     if (unlikely((xT(ctx->opcode) == xA(ctx->opcode)) ||
                  (xT(ctx->opcode) == xB(ctx->opcode)))) {
-        TCGv_i64 xh, xl;
-
-        xh = tcg_temp_new_i64();
-        xl = tcg_temp_new_i64();
-
         if ((DM(ctx->opcode) & 2) == 0) {
-            tcg_gen_mov_i64(xh, cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xh, xA(ctx->opcode));
         } else {
-            tcg_gen_mov_i64(xh, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xh, xA(ctx->opcode));
         }
         if ((DM(ctx->opcode) & 1) == 0) {
-            tcg_gen_mov_i64(xl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xl, xB(ctx->opcode));
         } else {
-            tcg_gen_mov_i64(xl, cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(xl, xB(ctx->opcode));
         }
 
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xh);
-        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xl);
-
-        tcg_temp_free_i64(xh);
-        tcg_temp_free_i64(xl);
+        set_cpu_vsrh(xT(ctx->opcode), xh);
+        set_cpu_vsrl(xT(ctx->opcode), xl);
     } else {
         if ((DM(ctx->opcode) & 2) == 0) {
-            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xh, xA(ctx->opcode));
+            set_cpu_vsrh(xT(ctx->opcode), xh);
         } else {
-            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xh, xA(ctx->opcode));
+            set_cpu_vsrh(xT(ctx->opcode), xh);
         }
         if ((DM(ctx->opcode) & 1) == 0) {
-            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xl, xB(ctx->opcode));
+            set_cpu_vsrl(xT(ctx->opcode), xl);
         } else {
-            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(xl, xB(ctx->opcode));
+            set_cpu_vsrl(xT(ctx->opcode), xl);
         }
     }
+    tcg_temp_free_i64(xh);
+    tcg_temp_free_i64(xl);
 }
 
 #define OP_ABS 1
@@ -606,7 +786,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
         }                                                         \
         xb = tcg_temp_new_i64();                                  \
         sgm = tcg_temp_new_i64();                                 \
-        tcg_gen_mov_i64(xb, cpu_vsrh(xB(ctx->opcode)));           \
+        get_cpu_vsrh(xb, xB(ctx->opcode));                        \
         tcg_gen_movi_i64(sgm, sgn_mask);                          \
         switch (op) {                                             \
             case OP_ABS: {                                        \
@@ -623,7 +803,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
             }                                                     \
             case OP_CPSGN: {                                      \
                 TCGv_i64 xa = tcg_temp_new_i64();                 \
-                tcg_gen_mov_i64(xa, cpu_vsrh(xA(ctx->opcode)));   \
+                get_cpu_vsrh(xa, xA(ctx->opcode));                \
                 tcg_gen_and_i64(xa, xa, sgm);                     \
                 tcg_gen_andc_i64(xb, xb, sgm);                    \
                 tcg_gen_or_i64(xb, xb, xa);                       \
@@ -631,7 +811,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
                 break;                                            \
             }                                                     \
         }                                                         \
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xb);           \
+        set_cpu_vsrh(xT(ctx->opcode), xb);                        \
         tcg_temp_free_i64(xb);                                    \
         tcg_temp_free_i64(sgm);                                   \
     }
@@ -647,7 +827,7 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     int xa;                                                       \
     int xt = rD(ctx->opcode) + 32;                                \
     int xb = rB(ctx->opcode) + 32;                                \
-    TCGv_i64 xah, xbh, xbl, sgm;                                  \
+    TCGv_i64 xah, xbh, xbl, sgm, tmp;                             \
                                                                   \
     if (unlikely(!ctx->vsx_enabled)) {                            \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                    \
@@ -656,8 +836,9 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     xbh = tcg_temp_new_i64();                                     \
     xbl = tcg_temp_new_i64();                                     \
     sgm = tcg_temp_new_i64();                                     \
-    tcg_gen_mov_i64(xbh, cpu_vsrh(xb));                           \
-    tcg_gen_mov_i64(xbl, cpu_vsrl(xb));                           \
+    tmp = tcg_temp_new_i64();                                     \
+    get_cpu_vsrh(xbh, xb);                                        \
+    get_cpu_vsrl(xbl, xb);                                        \
     tcg_gen_movi_i64(sgm, sgn_mask);                              \
     switch (op) {                                                 \
     case OP_ABS:                                                  \
@@ -672,17 +853,19 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     case OP_CPSGN:                                                \
         xah = tcg_temp_new_i64();                                 \
         xa = rA(ctx->opcode) + 32;                                \
-        tcg_gen_and_i64(xah, cpu_vsrh(xa), sgm);                  \
+        get_cpu_vsrh(tmp, xa);                                    \
+        tcg_gen_and_i64(xah, tmp, sgm);                           \
         tcg_gen_andc_i64(xbh, xbh, sgm);                          \
         tcg_gen_or_i64(xbh, xbh, xah);                            \
         tcg_temp_free_i64(xah);                                   \
         break;                                                    \
     }                                                             \
-    tcg_gen_mov_i64(cpu_vsrh(xt), xbh);                           \
-    tcg_gen_mov_i64(cpu_vsrl(xt), xbl);                           \
+    set_cpu_vsrh(xt, xbh);                                        \
+    set_cpu_vsrl(xt, xbl);                                        \
     tcg_temp_free_i64(xbl);                                       \
     tcg_temp_free_i64(xbh);                                       \
     tcg_temp_free_i64(sgm);                                       \
+    tcg_temp_free_i64(tmp);                                       \
 }
 
 VSX_SCALAR_MOVE_QP(xsabsqp, OP_ABS, SGN_MASK_DP)
@@ -701,8 +884,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
         xbh = tcg_temp_new_i64();                                \
         xbl = tcg_temp_new_i64();                                \
         sgm = tcg_temp_new_i64();                                \
-        tcg_gen_mov_i64(xbh, cpu_vsrh(xB(ctx->opcode)));         \
-        tcg_gen_mov_i64(xbl, cpu_vsrl(xB(ctx->opcode)));         \
+        set_cpu_vsrh(xB(ctx->opcode), xbh);                      \
+        set_cpu_vsrl(xB(ctx->opcode), xbl);                      \
         tcg_gen_movi_i64(sgm, sgn_mask);                         \
         switch (op) {                                            \
             case OP_ABS: {                                       \
@@ -723,8 +906,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
             case OP_CPSGN: {                                     \
                 TCGv_i64 xah = tcg_temp_new_i64();               \
                 TCGv_i64 xal = tcg_temp_new_i64();               \
-                tcg_gen_mov_i64(xah, cpu_vsrh(xA(ctx->opcode))); \
-                tcg_gen_mov_i64(xal, cpu_vsrl(xA(ctx->opcode))); \
+                get_cpu_vsrh(xah, xA(ctx->opcode));              \
+                get_cpu_vsrl(xal, xA(ctx->opcode));              \
                 tcg_gen_and_i64(xah, xah, sgm);                  \
                 tcg_gen_and_i64(xal, xal, sgm);                  \
                 tcg_gen_andc_i64(xbh, xbh, sgm);                 \
@@ -736,8 +919,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
                 break;                                           \
             }                                                    \
         }                                                        \
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xbh);         \
-        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xbl);         \
+        set_cpu_vsrh(xT(ctx->opcode), xbh);                      \
+        set_cpu_vsrl(xT(ctx->opcode), xbl);                      \
         tcg_temp_free_i64(xbh);                                  \
         tcg_temp_free_i64(xbl);                                  \
         tcg_temp_free_i64(sgm);                                  \
@@ -768,12 +951,19 @@ static void gen_##name(DisasContext * ctx)                                    \
 #define GEN_VSX_HELPER_XT_XB_ENV(name, op1, op2, inval, type) \
 static void gen_##name(DisasContext * ctx)                    \
 {                                                             \
+    TCGv_i64 t0;                                              \
+    TCGv_i64 t1;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
-    gen_helper_##name(cpu_vsrh(xT(ctx->opcode)), cpu_env,     \
-                      cpu_vsrh(xB(ctx->opcode)));             \
+    t0 = tcg_temp_new_i64();                                  \
+    t1 = tcg_temp_new_i64();                                  \
+    get_cpu_vsrh(t0, xB(ctx->opcode));                        \
+    gen_helper_##name(t1, cpu_env, t0);                       \
+    set_cpu_vsrh(xT(ctx->opcode), t1);                        \
+    tcg_temp_free_i64(t0);                                    \
+    tcg_temp_free_i64(t1);                                    \
 }
 
 GEN_VSX_HELPER_2(xsadddp, 0x00, 0x04, 0, PPC2_VSX)
@@ -949,76 +1139,146 @@ GEN_VSX_HELPER_2(xxpermr, 0x08, 0x07, 0, PPC2_ISA300)
 
 static void gen_xxbrd(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     tcg_gen_bswap64_i64(xth, xbh);
     tcg_gen_bswap64_i64(xtl, xbl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrh(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     gen_bswap16x8(xth, xtl, xbh, xbl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrq(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
-    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
+    TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+    t0 = tcg_temp_new_i64();
+
     tcg_gen_bswap64_i64(t0, xbl);
     tcg_gen_bswap64_i64(xtl, xbh);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
     tcg_gen_mov_i64(xth, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xth);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrw(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     gen_bswap32x4(xth, xtl, xbh, xbl);
+    set_cpu_vsrl(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 #define VSX_LOGICAL(name, tcg_op)                                    \
 static void glue(gen_, name)(DisasContext * ctx)                     \
     {                                                                \
+        TCGv_i64 t0;                                                 \
+        TCGv_i64 t1;                                                 \
+        TCGv_i64 t2;                                                 \
         if (unlikely(!ctx->vsx_enabled)) {                           \
             gen_exception(ctx, POWERPC_EXCP_VSXU);                   \
             return;                                                  \
         }                                                            \
-        tcg_op(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)), \
-            cpu_vsrh(xB(ctx->opcode)));                              \
-        tcg_op(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)), \
-            cpu_vsrl(xB(ctx->opcode)));                              \
+        t0 = tcg_temp_new_i64();                                     \
+        t1 = tcg_temp_new_i64();                                     \
+        t2 = tcg_temp_new_i64();                                     \
+        get_cpu_vsrh(t0, xA(ctx->opcode));                           \
+        get_cpu_vsrh(t1, xB(ctx->opcode));                           \
+        tcg_op(t2, t0, t1);                                          \
+        set_cpu_vsrh(xT(ctx->opcode), t2);                           \
+        get_cpu_vsrl(t0, xA(ctx->opcode));                           \
+        get_cpu_vsrl(t1, xB(ctx->opcode));                           \
+        tcg_op(t2, t0, t1);                                          \
+        set_cpu_vsrl(xT(ctx->opcode), t2);                           \
+        tcg_temp_free_i64(t0);                                       \
+        tcg_temp_free_i64(t1);                                       \
+        tcg_temp_free_i64(t2);                                       \
     }
 
 VSX_LOGICAL(xxland, tcg_gen_and_i64)
@@ -1033,7 +1293,7 @@ VSX_LOGICAL(xxlorc, tcg_gen_orc_i64)
 #define VSX_XXMRG(name, high)                               \
 static void glue(gen_, name)(DisasContext * ctx)            \
     {                                                       \
-        TCGv_i64 a0, a1, b0, b1;                            \
+        TCGv_i64 a0, a1, b0, b1, tmp;                       \
         if (unlikely(!ctx->vsx_enabled)) {                  \
             gen_exception(ctx, POWERPC_EXCP_VSXU);          \
             return;                                         \
@@ -1042,27 +1302,29 @@ static void glue(gen_, name)(DisasContext * ctx)            \
         a1 = tcg_temp_new_i64();                            \
         b0 = tcg_temp_new_i64();                            \
         b1 = tcg_temp_new_i64();                            \
+        tmp = tcg_temp_new_i64();                           \
         if (high) {                                         \
-            tcg_gen_mov_i64(a0, cpu_vsrh(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(a1, cpu_vsrh(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(b0, cpu_vsrh(xB(ctx->opcode))); \
-            tcg_gen_mov_i64(b1, cpu_vsrh(xB(ctx->opcode))); \
+            get_cpu_vsrh(a0, xA(ctx->opcode));              \
+            get_cpu_vsrh(a1, xA(ctx->opcode));              \
+            get_cpu_vsrh(b0, xB(ctx->opcode));              \
+            get_cpu_vsrh(b1, xB(ctx->opcode));              \
         } else {                                            \
-            tcg_gen_mov_i64(a0, cpu_vsrl(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(a1, cpu_vsrl(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(b0, cpu_vsrl(xB(ctx->opcode))); \
-            tcg_gen_mov_i64(b1, cpu_vsrl(xB(ctx->opcode))); \
+            get_cpu_vsrl(a0, xA(ctx->opcode));              \
+            get_cpu_vsrl(a1, xA(ctx->opcode));              \
+            get_cpu_vsrl(b0, xB(ctx->opcode));              \
+            get_cpu_vsrl(b1, xB(ctx->opcode));              \
         }                                                   \
         tcg_gen_shri_i64(a0, a0, 32);                       \
         tcg_gen_shri_i64(b0, b0, 32);                       \
-        tcg_gen_deposit_i64(cpu_vsrh(xT(ctx->opcode)),      \
-                            b0, a0, 32, 32);                \
-        tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)),      \
-                            b1, a1, 32, 32);                \
+        tcg_gen_deposit_i64(tmp, b0, a0, 32, 32);           \
+        set_cpu_vsrh(xT(ctx->opcode), tmp);                 \
+        tcg_gen_deposit_i64(tmp, b1, a1, 32, 32);           \
+        set_cpu_vsrl(xT(ctx->opcode), tmp);                 \
         tcg_temp_free_i64(a0);                              \
         tcg_temp_free_i64(a1);                              \
         tcg_temp_free_i64(b0);                              \
         tcg_temp_free_i64(b1);                              \
+        tcg_temp_free_i64(tmp);                             \
     }
 
 VSX_XXMRG(xxmrghw, 1)
@@ -1070,7 +1332,7 @@ VSX_XXMRG(xxmrglw, 0)
 
 static void gen_xxsel(DisasContext * ctx)
 {
-    TCGv_i64 a, b, c;
+    TCGv_i64 a, b, c, tmp;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -1078,40 +1340,49 @@ static void gen_xxsel(DisasContext * ctx)
     a = tcg_temp_new_i64();
     b = tcg_temp_new_i64();
     c = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
 
-    tcg_gen_mov_i64(a, cpu_vsrh(xA(ctx->opcode)));
-    tcg_gen_mov_i64(b, cpu_vsrh(xB(ctx->opcode)));
-    tcg_gen_mov_i64(c, cpu_vsrh(xC(ctx->opcode)));
+    get_cpu_vsrh(a, xA(ctx->opcode));
+    get_cpu_vsrh(b, xB(ctx->opcode));
+    get_cpu_vsrh(c, xC(ctx->opcode));
 
     tcg_gen_and_i64(b, b, c);
     tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), a, b);
+    tcg_gen_or_i64(tmp, a, b);
+    set_cpu_vsrh(xT(ctx->opcode), tmp);
 
-    tcg_gen_mov_i64(a, cpu_vsrl(xA(ctx->opcode)));
-    tcg_gen_mov_i64(b, cpu_vsrl(xB(ctx->opcode)));
-    tcg_gen_mov_i64(c, cpu_vsrl(xC(ctx->opcode)));
+    get_cpu_vsrl(a, xA(ctx->opcode));
+    get_cpu_vsrl(b, xB(ctx->opcode));
+    get_cpu_vsrl(c, xC(ctx->opcode));
 
     tcg_gen_and_i64(b, b, c);
     tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(cpu_vsrl(xT(ctx->opcode)), a, b);
+    tcg_gen_or_i64(tmp, a, b);
+    set_cpu_vsrl(xT(ctx->opcode), tmp);
 
     tcg_temp_free_i64(a);
     tcg_temp_free_i64(b);
     tcg_temp_free_i64(c);
+    tcg_temp_free_i64(tmp);
 }
 
 static void gen_xxspltw(DisasContext *ctx)
 {
     TCGv_i64 b, b2;
-    TCGv_i64 vsr = (UIM(ctx->opcode) & 2) ?
-                   cpu_vsrl(xB(ctx->opcode)) :
-                   cpu_vsrh(xB(ctx->opcode));
+    TCGv_i64 vsr;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
 
+    vsr = tcg_temp_new_i64();
+    if (UIM(ctx->opcode) & 2) {
+        get_cpu_vsrl(vsr, xB(ctx->opcode));
+    } else {
+        get_cpu_vsrh(vsr, xB(ctx->opcode));
+    }
+
     b = tcg_temp_new_i64();
     b2 = tcg_temp_new_i64();
 
@@ -1122,9 +1393,11 @@ static void gen_xxspltw(DisasContext *ctx)
     }
 
     tcg_gen_shli_i64(b2, b, 32);
-    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), b, b2);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
+    tcg_gen_or_i64(vsr, b, b2);
+    set_cpu_vsrh(xT(ctx->opcode), vsr);
+    set_cpu_vsrl(xT(ctx->opcode), vsr);
 
+    tcg_temp_free_i64(vsr);
     tcg_temp_free_i64(b);
     tcg_temp_free_i64(b2);
 }
@@ -1134,6 +1407,7 @@ static void gen_xxspltw(DisasContext *ctx)
 static void gen_xxspltib(DisasContext *ctx)
 {
     unsigned char uim8 = IMM8(ctx->opcode);
+    TCGv_i64 vsr;
     if (xS(ctx->opcode) < 32) {
         if (unlikely(!ctx->altivec_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -1145,8 +1419,11 @@ static void gen_xxspltib(DisasContext *ctx)
             return;
         }
     }
-    tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), pattern(uim8));
-    tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), pattern(uim8));
+    vsr = tcg_temp_new_i64();
+    tcg_gen_movi_i64(vsr, pattern(uim8));
+    set_cpu_vsrh(xT(ctx->opcode), vsr);
+    set_cpu_vsrl(xT(ctx->opcode), vsr);
+    tcg_temp_free_i64(vsr);
 }
 
 static void gen_xxsldwi(DisasContext *ctx)
@@ -1161,40 +1438,40 @@ static void gen_xxsldwi(DisasContext *ctx)
 
     switch (SHW(ctx->opcode)) {
         case 0: {
-            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
-            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrh(xth, xA(ctx->opcode));
+            get_cpu_vsrl(xtl, xA(ctx->opcode));
             break;
         }
         case 1: {
             TCGv_i64 t0 = tcg_temp_new_i64();
-            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xth, xA(ctx->opcode));
             tcg_gen_shli_i64(xth, xth, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(t0, xA(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xth, xth, t0);
-            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xtl, xA(ctx->opcode));
             tcg_gen_shli_i64(xtl, xtl, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xtl, xtl, t0);
             tcg_temp_free_i64(t0);
             break;
         }
         case 2: {
-            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
-            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrl(xth, xA(ctx->opcode));
+            get_cpu_vsrh(xtl, xB(ctx->opcode));
             break;
         }
         case 3: {
             TCGv_i64 t0 = tcg_temp_new_i64();
-            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xth, xA(ctx->opcode));
             tcg_gen_shli_i64(xth, xth, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xth, xth, t0);
-            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xtl, xB(ctx->opcode));
             tcg_gen_shli_i64(xtl, xtl, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xtl, xtl, t0);
             tcg_temp_free_i64(t0);
@@ -1202,8 +1479,8 @@ static void gen_xxsldwi(DisasContext *ctx)
         }
     }
 
-    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xth);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xtl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
 
     tcg_temp_free_i64(xth);
     tcg_temp_free_i64(xtl);
@@ -1213,7 +1490,8 @@ static void gen_xxsldwi(DisasContext *ctx)
 static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
     TCGv xt, xb;                                                \
-    TCGv_i32 t0 = tcg_temp_new_i32();                           \
+    TCGv_i32 t0;                                                \
+    TCGv_i64 t1;                                                \
     uint8_t uimm = UIMM4(ctx->opcode);                          \
                                                                 \
     if (unlikely(!ctx->vsx_enabled)) {                          \
@@ -1222,12 +1500,15 @@ static void gen_##name(DisasContext *ctx)                       \
     }                                                           \
     xt = tcg_const_tl(xT(ctx->opcode));                         \
     xb = tcg_const_tl(xB(ctx->opcode));                         \
+    t0 = tcg_temp_new_i32();                                    \
+    t1 = tcg_temp_new_i64();                                    \
     /* uimm > 15 out of bound and for                           \
      * uimm > 12 handle as per hardware in helper               \
      */                                                         \
     if (uimm > 15) {                                            \
-        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);         \
-        tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), 0);         \
+        tcg_gen_movi_i64(t1, 0);                                \
+        set_cpu_vsrh(xT(ctx->opcode), t1);                      \
+        set_cpu_vsrl(xT(ctx->opcode), t1);                      \
         return;                                                 \
     }                                                           \
     tcg_gen_movi_i32(t0, uimm);                                 \
@@ -1235,6 +1516,7 @@ static void gen_##name(DisasContext *ctx)                       \
     tcg_temp_free(xb);                                          \
     tcg_temp_free(xt);                                          \
     tcg_temp_free_i32(t0);                                      \
+    tcg_temp_free_i64(t1);                                      \
 }
 
 VSX_EXTRACT_INSERT(xxextractuw)
@@ -1244,30 +1526,45 @@ VSX_EXTRACT_INSERT(xxinsertw)
 static void gen_xsxexpdp(DisasContext *ctx)
 {
     TCGv rt = cpu_gpr[rD(ctx->opcode)];
+    TCGv_i64 t0;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
-    tcg_gen_extract_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52, 11);
+    t0 = tcg_temp_new_i64();
+    get_cpu_vsrh(t0, xB(ctx->opcode));
+    tcg_gen_extract_i64(rt, t0, 52, 11);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_xsxexpqp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
-    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
+
     tcg_gen_extract_i64(xth, xbh, 48, 15);
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
     tcg_gen_movi_i64(xtl, 0);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
+
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_xsiexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xth;
     TCGv ra = cpu_gpr[rA(ctx->opcode)];
     TCGv rb = cpu_gpr[rB(ctx->opcode)];
     TCGv_i64 t0;
@@ -1277,40 +1574,60 @@ static void gen_xsiexpdp(DisasContext *ctx)
         return;
     }
     t0 = tcg_temp_new_i64();
+    xth = tcg_temp_new_i64();
     tcg_gen_andi_i64(xth, ra, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, rb, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     /* dword[1] is undefined */
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
 }
 
 static void gen_xsiexpqp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
-    TCGv_i64 xah = cpu_vsrh(rA(ctx->opcode) + 32);
-    TCGv_i64 xal = cpu_vsrl(rA(ctx->opcode) + 32);
-    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xah;
+    TCGv_i64 xal;
+    TCGv_i64 xbh;
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xah = tcg_temp_new_i64();
+    xal = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, rA(ctx->opcode) + 32);
+    get_cpu_vsrl(xal, rA(ctx->opcode) + 32);
+    xbh = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
     t0 = tcg_temp_new_i64();
+
     tcg_gen_andi_i64(xth, xah, 0x8000FFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, xbh, 0x7FFF);
     tcg_gen_shli_i64(t0, t0, 48);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
     tcg_gen_mov_i64(xtl, xal);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
 }
 
 static void gen_xsxsigdp(DisasContext *ctx)
 {
     TCGv rt = cpu_gpr[rD(ctx->opcode)];
-    TCGv_i64 t0, zr, nan, exp;
+    TCGv_i64 t0, t1, zr, nan, exp;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1318,17 +1635,21 @@ static void gen_xsxsigdp(DisasContext *ctx)
     }
     exp = tcg_temp_new_i64();
     t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(2047);
 
-    tcg_gen_extract_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52, 11);
+    get_cpu_vsrh(t1, xB(ctx->opcode));
+    tcg_gen_extract_i64(exp, t1, 52, 11);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-    tcg_gen_andi_i64(rt, cpu_vsrh(xB(ctx->opcode)), 0x000FFFFFFFFFFFFF);
+    get_cpu_vsrh(t1, xB(ctx->opcode));
+    tcg_gen_andi_i64(rt, t1, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(rt, rt, t0);
 
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
@@ -1337,132 +1658,219 @@ static void gen_xsxsigdp(DisasContext *ctx)
 static void gen_xsxsigqp(DisasContext *ctx)
 {
     TCGv_i64 t0, zr, nan, exp;
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
+    get_cpu_vsrl(xbl, rB(ctx->opcode) + 32);
     exp = tcg_temp_new_i64();
     t0 = tcg_temp_new_i64();
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(32767);
 
-    tcg_gen_extract_i64(exp, cpu_vsrh(rB(ctx->opcode) + 32), 48, 15);
+    tcg_gen_extract_i64(exp, xbh, 48, 15);
     tcg_gen_movi_i64(t0, 0x0001000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-    tcg_gen_andi_i64(xth, cpu_vsrh(rB(ctx->opcode) + 32), 0x0000FFFFFFFFFFFF);
+    tcg_gen_andi_i64(xth, xbh, 0x0000FFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
-    tcg_gen_mov_i64(xtl, cpu_vsrl(rB(ctx->opcode) + 32));
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
+    tcg_gen_mov_i64(xtl, xbl);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
 
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 #endif
 
 static void gen_xviexpsp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
-    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xah;
+    TCGv_i64 xal;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xah = tcg_temp_new_i64();
+    xal = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, xA(ctx->opcode));
+    get_cpu_vsrl(xal, xA(ctx->opcode));
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
     t0 = tcg_temp_new_i64();
+
     tcg_gen_andi_i64(xth, xah, 0x807FFFFF807FFFFF);
     tcg_gen_andi_i64(t0, xbh, 0xFF000000FF);
     tcg_gen_shli_i64(t0, t0, 23);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_andi_i64(xtl, xal, 0x807FFFFF807FFFFF);
     tcg_gen_andi_i64(t0, xbl, 0xFF000000FF);
     tcg_gen_shli_i64(t0, t0, 23);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xviexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
-    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xah;
+    TCGv_i64 xal;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xah = tcg_temp_new_i64();
+    xal = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, xA(ctx->opcode));
+    get_cpu_vsrl(xal, xA(ctx->opcode));
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
     t0 = tcg_temp_new_i64();
+
     tcg_gen_andi_i64(xth, xah, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, xbh, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_andi_i64(xtl, xal, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, xbl, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xvxexpsp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     tcg_gen_shri_i64(xth, xbh, 23);
     tcg_gen_andi_i64(xth, xth, 0xFF000000FF);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_shri_i64(xtl, xbl, 23);
     tcg_gen_andi_i64(xtl, xtl, 0xFF000000FF);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xvxexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     tcg_gen_extract_i64(xth, xbh, 52, 11);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_extract_i64(xtl, xbl, 52, 11);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
 
 static void gen_xvxsigdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
-
+    TCGv_i64 xth;
+    TCGv_i64 xtl;
+    TCGv_i64 xbh;
+    TCGv_i64 xbl;
     TCGv_i64 t0, zr, nan, exp;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    xth = tcg_temp_new_i64();
+    xtl = tcg_temp_new_i64();
+    xbh = tcg_temp_new_i64();
+    xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
     exp = tcg_temp_new_i64();
     t0 = tcg_temp_new_i64();
     zr = tcg_const_i64(0);
@@ -1474,6 +1882,7 @@ static void gen_xvxsigdp(DisasContext *ctx)
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xth, xbh, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
 
     tcg_gen_extract_i64(exp, xbl, 52, 11);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
@@ -1481,11 +1890,16 @@ static void gen_xvxsigdp(DisasContext *ctx)
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xtl, xbl, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
 
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 #undef GEN_XX2FORM
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 6/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (4 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 5/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 7/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

Instead of accessing the FPR, VMX and VSX registers through static arrays of
TCGv_i64 globals, remove them and change the helpers to load/store data directly
within cpu_env.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
---
 target/ppc/translate.c              | 59 ++++++++++---------------------------
 target/ppc/translate/vsx-impl.inc.c |  4 +--
 2 files changed, 18 insertions(+), 45 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3bb24e7310..b18ded07b3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -55,15 +55,9 @@
 /* global register indexes */
 static char cpu_reg_names[10*3 + 22*4 /* GPR */
     + 10*4 + 22*5 /* SPE GPRh */
-    + 10*4 + 22*5 /* FPR */
-    + 2*(10*6 + 22*7) /* AVRh, AVRl */
-    + 10*5 + 22*6 /* VSR */
     + 8*5 /* CRF */];
 static TCGv cpu_gpr[32];
 static TCGv cpu_gprh[32];
-static TCGv_i64 cpu_fpr[32];
-static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
-static TCGv_i64 cpu_vsr[32];
 static TCGv_i32 cpu_crf[8];
 static TCGv cpu_nip;
 static TCGv cpu_msr;
@@ -108,39 +102,6 @@ void ppc_translate_init(void)
                                          offsetof(CPUPPCState, gprh[i]), p);
         p += (i < 10) ? 4 : 5;
         cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-        snprintf(p, cpu_reg_names_size, "fp%d", i);
-        cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env,
-                                            offsetof(CPUPPCState, fpr[i]), p);
-        p += (i < 10) ? 4 : 5;
-        cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-        snprintf(p, cpu_reg_names_size, "avr%dH", i);
-#ifdef HOST_WORDS_BIGENDIAN
-        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
-#else
-        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
-#endif
-        p += (i < 10) ? 6 : 7;
-        cpu_reg_names_size -= (i < 10) ? 6 : 7;
-
-        snprintf(p, cpu_reg_names_size, "avr%dL", i);
-#ifdef HOST_WORDS_BIGENDIAN
-        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
-#else
-        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
-#endif
-        p += (i < 10) ? 6 : 7;
-        cpu_reg_names_size -= (i < 10) ? 6 : 7;
-        snprintf(p, cpu_reg_names_size, "vsr%d", i);
-        cpu_vsr[i] = tcg_global_mem_new_i64(cpu_env,
-                                            offsetof(CPUPPCState, vsr[i]), p);
-        p += (i < 10) ? 5 : 6;
-        cpu_reg_names_size -= (i < 10) ? 5 : 6;
     }
 
     cpu_nip = tcg_global_mem_new(cpu_env,
@@ -6701,22 +6662,34 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-    tcg_gen_mov_i64(cpu_fpr[regno], src);
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
-    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+#ifdef HOST_WORDS_BIGENDIAN
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 0 : 1)]));
+#else
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
-    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+#ifdef HOST_WORDS_BIGENDIAN
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 0 : 1)]));
+#else
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 #include "translate/fp-impl.inc.c"
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index f0665df1a5..7eaa36b4d5 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -2,12 +2,12 @@
 
 static inline void get_vsr(TCGv_i64 dst, int n)
 {
-    tcg_gen_mov_i64(dst, cpu_vsr[n]);
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
 }
 
 static inline void set_vsr(int n, TCGv_i64 src)
 {
-    tcg_gen_mov_i64(cpu_vsr[n], src);
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
 }
 
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 7/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (5 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 6/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 8/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

Since the VSX registers are actually a superset of the VMX registers then they
can be represented by the same type. Merge ppc_avr_t into ppc_vsr_t and change
ppc_avr_t to be a simple typedef alias.

Note that due to a difference in the naming of the float32 member between
ppc_avr_t and ppc_vsr_t, references to the ppc_avr_t f member must be replaced
with f32 instead.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
---
 target/ppc/cpu.h        | 17 ++++++++-------
 target/ppc/int_helper.c | 56 +++++++++++++++++++++++++------------------------
 target/ppc/internal.h   | 11 ----------
 3 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index d5f99f1fc7..578641ac20 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -218,7 +218,6 @@ typedef struct opc_handler_t opc_handler_t;
 /* Types used to describe some PowerPC registers etc. */
 typedef struct DisasContext DisasContext;
 typedef struct ppc_spr_t ppc_spr_t;
-typedef union ppc_avr_t ppc_avr_t;
 typedef union ppc_tlb_t ppc_tlb_t;
 typedef struct ppc_hash_pte64 ppc_hash_pte64_t;
 
@@ -242,22 +241,26 @@ struct ppc_spr_t {
 #endif
 };
 
-/* Altivec registers (128 bits) */
-union ppc_avr_t {
-    float32 f[4];
+/* VSX/Altivec registers (128 bits) */
+typedef union _ppc_vsr_t {
     uint8_t u8[16];
     uint16_t u16[8];
     uint32_t u32[4];
+    uint64_t u64[2];
     int8_t s8[16];
     int16_t s16[8];
     int32_t s32[4];
-    uint64_t u64[2];
     int64_t s64[2];
+    float32 f32[4];
+    float64 f64[2];
+    float128 f128;
 #ifdef CONFIG_INT128
     __uint128_t u128;
 #endif
-    Int128 s128;
-};
+    Int128  s128;
+} ppc_vsr_t;
+
+typedef ppc_vsr_t ppc_avr_t;
 
 #if !defined(CONFIG_USER_ONLY)
 /* Software TLB cache */
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index fcac90a4a9..9d715be25c 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -548,8 +548,8 @@ VARITH_DO(muluwm, *, u32)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            r->f[i] = func(a->f[i], b->f[i], &env->vec_status);         \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            r->f32[i] = func(a->f32[i], b->f32[i], &env->vec_status);   \
         }                                                               \
     }
 VARITHFP(addfp, float32_add)
@@ -563,9 +563,9 @@ VARITHFP(maxfp, float32_max)
                            ppc_avr_t *b, ppc_avr_t *c)                  \
     {                                                                   \
         int i;                                                          \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            r->f[i] = float32_muladd(a->f[i], c->f[i], b->f[i],         \
-                                     type, &env->vec_status);           \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            r->f32[i] = float32_muladd(a->f32[i], c->f32[i], b->f32[i], \
+                                       type, &env->vec_status);         \
         }                                                               \
     }
 VARITHFPFMA(maddfp, 0);
@@ -670,9 +670,9 @@ VABSDU(w, u32)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
             float32 t = cvt(b->element[i], &env->vec_status);           \
-            r->f[i] = float32_scalbn(t, -uim, &env->vec_status);        \
+            r->f32[i] = float32_scalbn(t, -uim, &env->vec_status);      \
         }                                                               \
     }
 VCF(ux, uint32_to_float32, u32)
@@ -782,9 +782,9 @@ VCMPNE(w, u32, uint32_t, 0)
         uint32_t none = 0;                                              \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
             uint32_t result;                                            \
-            int rel = float32_compare_quiet(a->f[i], b->f[i],           \
+            int rel = float32_compare_quiet(a->f32[i], b->f32[i],       \
                                             &env->vec_status);          \
             if (rel == float_relation_unordered) {                      \
                 result = 0;                                             \
@@ -816,14 +816,16 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
     int i;
     int all_in = 0;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        int le_rel = float32_compare_quiet(a->f[i], b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        int le_rel = float32_compare_quiet(a->f32[i], b->f32[i],
+                                           &env->vec_status);
         if (le_rel == float_relation_unordered) {
             r->u32[i] = 0xc0000000;
             all_in = 1;
         } else {
-            float32 bneg = float32_chs(b->f[i]);
-            int ge_rel = float32_compare_quiet(a->f[i], bneg, &env->vec_status);
+            float32 bneg = float32_chs(b->f32[i]);
+            int ge_rel = float32_compare_quiet(a->f32[i], bneg,
+                                               &env->vec_status);
             int le = le_rel != float_relation_greater;
             int ge = ge_rel != float_relation_less;
 
@@ -856,11 +858,11 @@ void helper_vcmpbfp_dot(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
         float_status s = env->vec_status;                               \
                                                                         \
         set_float_rounding_mode(float_round_to_zero, &s);               \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            if (float32_is_any_nan(b->f[i])) {                          \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            if (float32_is_any_nan(b->f32[i])) {                        \
                 r->element[i] = 0;                                      \
             } else {                                                    \
-                float64 t = float32_to_float64(b->f[i], &s);            \
+                float64 t = float32_to_float64(b->f32[i], &s);          \
                 int64_t j;                                              \
                                                                         \
                 t = float64_scalbn(t, uim, &s);                         \
@@ -1661,8 +1663,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_div(float32_one, b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_div(float32_one, b->f32[i], &env->vec_status);
     }
 }
 
@@ -1674,8 +1676,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
         float_status s = env->vec_status;                       \
                                                                 \
         set_float_rounding_mode(rounding, &s);                  \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                \
-            r->f[i] = float32_round_to_int (b->f[i], &s);       \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {              \
+            r->f32[i] = float32_round_to_int (b->f32[i], &s);   \
         }                                                       \
     }
 VRFI(n, float_round_nearest_even)
@@ -1705,10 +1707,10 @@ void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        float32 t = float32_sqrt(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        float32 t = float32_sqrt(b->f32[i], &env->vec_status);
 
-        r->f[i] = float32_div(float32_one, t, &env->vec_status);
+        r->f32[i] = float32_div(float32_one, t, &env->vec_status);
     }
 }
 
@@ -1751,8 +1753,8 @@ void helper_vexptefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_exp2(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_exp2(b->f32[i], &env->vec_status);
     }
 }
 
@@ -1760,8 +1762,8 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_log2(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_log2(b->f32[i], &env->vec_status);
     }
 }
 
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 5d460247e2..bd247f2504 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -204,17 +204,6 @@ EXTRACT_HELPER(IMM8, 11, 8);
 EXTRACT_HELPER(DCMX, 16, 7);
 EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
-typedef union _ppc_vsr_t {
-    uint8_t u8[16];
-    uint16_t u16[8];
-    uint32_t u32[4];
-    uint64_t u64[2];
-    float32 f32[4];
-    float64 f64[2];
-    float128 f128;
-    Int128  s128;
-} ppc_vsr_t;
-
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VsrB(i) u8[i]
 #define VsrH(i) u16[i]
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 8/9] target/ppc: move FP and VMX registers into aligned vsr register array
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (6 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 7/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 9/9] target/ppc: replace AVR* macros with Vsr* macros Mark Cave-Ayland
  2019-01-03  0:23 ` [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations David Gibson
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

The VSX register array is a block of 64 128-bit registers where the first 32
registers consist of the existing 64-bit FP registers extended to 128-bit
using new VSR registers, and the last 32 registers are the VMX 128-bit
registers as show below:

            64-bit               64-bit
    +--------------------+--------------------+
    |        FP0         |                    |  VSR0
    +--------------------+--------------------+
    |        FP1         |                    |  VSR1
    +--------------------+--------------------+
    |        ...         |        ...         |  ...
    +--------------------+--------------------+
    |        FP30        |                    |  VSR30
    +--------------------+--------------------+
    |        FP31        |                    |  VSR31
    +--------------------+--------------------+
    |                  VMX0                   |  VSR32
    +-----------------------------------------+
    |                  VMX1                   |  VSR33
    +-----------------------------------------+
    |                  ...                    |  ...
    +-----------------------------------------+
    |                  VMX30                  |  VSR62
    +-----------------------------------------+
    |                  VMX31                  |  VSR63
    +-----------------------------------------+

In order to allow for future conversion of VSX instructions to use TCG vector
operations, recreate the same layout using an aligned version of the existing
vsr register array.

Since the old fpr and avr register arrays are removed, the existing callers
must also be updated to use the correct offset in the vsr register array. This
also includes switching the relevant VMState fields over to using subarrays
to make sure that migration is preserved.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
---
 linux-user/ppc/signal.c             | 28 ++++++++-------
 target/ppc/arch_dump.c              | 15 ++++----
 target/ppc/cpu.h                    | 25 +++++++++----
 target/ppc/gdbstub.c                |  8 ++---
 target/ppc/internal.h               | 18 +++-------
 target/ppc/kvm.c                    | 24 +++++++------
 target/ppc/machine.c                | 72 ++++++++++++++++++++++++++++++++++---
 target/ppc/monitor.c                |  4 +--
 target/ppc/translate.c              | 14 ++++----
 target/ppc/translate/dfp-impl.inc.c |  2 +-
 target/ppc/translate/vmx-impl.inc.c |  7 +++-
 target/ppc/translate/vsx-impl.inc.c |  4 +--
 target/ppc/translate_init.inc.c     | 26 +++++++-------
 13 files changed, 165 insertions(+), 82 deletions(-)

diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index 2ae120a2bc..619a56950d 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -258,8 +258,8 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
     /* Save Altivec registers if necessary.  */
     if (env->insns_flags & PPC_ALTIVEC) {
         uint32_t *vrsave;
-        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
-            ppc_avr_t *avr = &env->avr[i];
+        for (i = 0; i < 32; i++) {
+            ppc_avr_t *avr = cpu_avr_ptr(env, i);
             ppc_avr_t *vreg = (ppc_avr_t *)&frame->mc_vregs.altivec[i];
 
             __put_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
@@ -281,15 +281,17 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
     /* Save VSX second halves */
     if (env->insns_flags2 & PPC2_VSX) {
         uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
-        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
-            __put_user(env->vsr[i], &vsregs[i]);
+        for (i = 0; i < 32; i++) {
+            uint64_t *vsrl = cpu_vsrl_ptr(env, i);
+            __put_user(*vsrl, &vsregs[i]);
         }
     }
 
     /* Save floating point registers.  */
     if (env->insns_flags & PPC_FLOAT) {
-        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
-            __put_user(env->fpr[i], &frame->mc_fregs[i]);
+        for (i = 0; i < 32; i++) {
+            uint64_t *fpr = cpu_fpr_ptr(env, i);
+            __put_user(*fpr, &frame->mc_fregs[i]);
         }
         __put_user((uint64_t) env->fpscr, &frame->mc_fregs[32]);
     }
@@ -373,8 +375,8 @@ static void restore_user_regs(CPUPPCState *env,
 #else
         v_regs = (ppc_avr_t *)frame->mc_vregs.altivec;
 #endif
-        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
-            ppc_avr_t *avr = &env->avr[i];
+        for (i = 0; i < 32; i++) {
+            ppc_avr_t *avr = cpu_avr_ptr(env, i);
             ppc_avr_t *vreg = &v_regs[i];
 
             __get_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
@@ -393,16 +395,18 @@ static void restore_user_regs(CPUPPCState *env,
     /* Restore VSX second halves */
     if (env->insns_flags2 & PPC2_VSX) {
         uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
-        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
-            __get_user(env->vsr[i], &vsregs[i]);
+        for (i = 0; i < 32; i++) {
+            uint64_t *vsrl = cpu_vsrl_ptr(env, i);
+            __get_user(*vsrl, &vsregs[i]);
         }
     }
 
     /* Restore floating point registers.  */
     if (env->insns_flags & PPC_FLOAT) {
         uint64_t fpscr;
-        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
-            __get_user(env->fpr[i], &frame->mc_fregs[i]);
+        for (i = 0; i < 32; i++) {
+            uint64_t *fpr = cpu_fpr_ptr(env, i);
+            __get_user(*fpr, &frame->mc_fregs[i]);
         }
         __get_user(fpscr, &frame->mc_fregs[32]);
         env->fpscr = (uint32_t) fpscr;
diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
index cc1460e4e3..3a00606d01 100644
--- a/target/ppc/arch_dump.c
+++ b/target/ppc/arch_dump.c
@@ -140,7 +140,8 @@ static void ppc_write_elf_fpregset(NoteFuncArg *arg, PowerPCCPU *cpu)
     memset(fpregset, 0, sizeof(*fpregset));
 
     for (i = 0; i < 32; i++) {
-        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.fpr[i]);
+        uint64_t *fpr = cpu_fpr_ptr(&cpu->env, i);
+        fpregset->fpr[i] = cpu_to_dump64(s, *fpr);
     }
     fpregset->fpscr = cpu_to_dump_reg(s, cpu->env.fpscr);
 }
@@ -158,6 +159,7 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
 
     for (i = 0; i < 32; i++) {
         bool needs_byteswap;
+        ppc_avr_t *avr = cpu_avr_ptr(&cpu->env, i);
 
 #ifdef HOST_WORDS_BIGENDIAN
         needs_byteswap = s->dump_info.d_endian == ELFDATA2LSB;
@@ -166,11 +168,11 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
 #endif
 
         if (needs_byteswap) {
-            vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
-            vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
+            vmxregset->avr[i].u64[0] = bswap64(avr->u64[1]);
+            vmxregset->avr[i].u64[1] = bswap64(avr->u64[0]);
         } else {
-            vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
-            vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
+            vmxregset->avr[i].u64[0] = avr->u64[0];
+            vmxregset->avr[i].u64[1] = avr->u64[1];
         }
     }
     vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
@@ -188,7 +190,8 @@ static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
     memset(vsxregset, 0, sizeof(*vsxregset));
 
     for (i = 0; i < 32; i++) {
-        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i]);
+        uint64_t *vsrl = cpu_vsrl_ptr(&cpu->env, i);
+        vsxregset->vsr[i] = cpu_to_dump64(s, *vsrl);
     }
 }
 
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 578641ac20..91951d7730 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1004,8 +1004,6 @@ struct CPUPPCState {
 
     /* Floating point execution context */
     float_status fp_status;
-    /* floating point registers */
-    float64 fpr[32];
     /* floating point status and control register */
     target_ulong fpscr;
 
@@ -1055,11 +1053,10 @@ struct CPUPPCState {
     /* Special purpose registers */
     target_ulong spr[1024];
     ppc_spr_t spr_cb[1024];
-    /* Altivec registers */
-    ppc_avr_t avr[32];
+    /* Vector status and control register */
     uint32_t vscr;
-    /* VSX registers */
-    uint64_t vsr[32];
+    /* VSX registers (including FP and AVR) */
+    ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
     /* SPE registers */
     uint64_t spe_acc;
     uint32_t spe_fscr;
@@ -2540,6 +2537,22 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
            (start + nregs > 32 && (rx >= start || rx < start + nregs - 32));
 }
 
+/* Accessors for FP, VMX and VSX registers */
+static inline uint64_t *cpu_fpr_ptr(CPUPPCState *env, int i)
+{
+    return &env->vsr[i].u64[0];
+}
+
+static inline uint64_t *cpu_vsrl_ptr(CPUPPCState *env, int i)
+{
+    return &env->vsr[i].u64[1];
+}
+
+static inline ppc_avr_t *cpu_avr_ptr(CPUPPCState *env, int i)
+{
+    return &env->vsr[32 + i];
+}
+
 void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env);
 
 void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index b6f6693583..19565b584d 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -126,7 +126,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
         gdb_get_regl(mem_buf, env->gpr[n]);
     } else if (n < 64) {
         /* fprs */
-        stfq_p(mem_buf, env->fpr[n-32]);
+        stfq_p(mem_buf, *cpu_fpr_ptr(env, n - 32));
     } else {
         switch (n) {
         case 64:
@@ -178,7 +178,7 @@ int ppc_cpu_gdb_read_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
         gdb_get_reg64(mem_buf, env->gpr[n]);
     } else if (n < 64) {
         /* fprs */
-        stfq_p(mem_buf, env->fpr[n-32]);
+        stfq_p(mem_buf, *cpu_fpr_ptr(env, n - 32));
     } else if (n < 96) {
         /* Altivec */
         stq_p(mem_buf, n - 64);
@@ -234,7 +234,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldtul_p(mem_buf);
     } else if (n < 64) {
         /* fprs */
-        env->fpr[n-32] = ldfq_p(mem_buf);
+        *cpu_fpr_ptr(env, n - 32) = ldfq_p(mem_buf);
     } else {
         switch (n) {
         case 64:
@@ -284,7 +284,7 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldq_p(mem_buf);
     } else if (n < 64) {
         /* fprs */
-        env->fpr[n-32] = ldfq_p(mem_buf);
+        *cpu_fpr_ptr(env, n - 32) = ldfq_p(mem_buf);
     } else {
         switch (n) {
         case 64 + 32:
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index bd247f2504..c7c0f77dd6 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -218,24 +218,14 @@ EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
 static inline void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
-    if (n < 32) {
-        vsr->VsrD(0) = env->fpr[n];
-        vsr->VsrD(1) = env->vsr[n];
-    } else {
-        vsr->u64[0] = env->avr[n - 32].u64[0];
-        vsr->u64[1] = env->avr[n - 32].u64[1];
-    }
+    vsr->VsrD(0) = env->vsr[n].u64[0];
+    vsr->VsrD(1) = env->vsr[n].u64[1];
 }
 
 static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
-    if (n < 32) {
-        env->fpr[n] = vsr->VsrD(0);
-        env->vsr[n] = vsr->VsrD(1);
-    } else {
-        env->avr[n - 32].u64[0] = vsr->u64[0];
-        env->avr[n - 32].u64[1] = vsr->u64[1];
-    }
+    env->vsr[n].u64[0] = vsr->VsrD(0);
+    env->vsr[n].u64[1] = vsr->VsrD(1);
 }
 
 void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index f81327d6cd..ebbb48c42f 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -629,13 +629,15 @@ static int kvm_put_fp(CPUState *cs)
 
         for (i = 0; i < 32; i++) {
             uint64_t vsr[2];
+            uint64_t *fpr = cpu_fpr_ptr(&cpu->env, i);
+            uint64_t *vsrl = cpu_vsrl_ptr(&cpu->env, i);
 
 #ifdef HOST_WORDS_BIGENDIAN
-            vsr[0] = float64_val(env->fpr[i]);
-            vsr[1] = env->vsr[i];
+            vsr[0] = float64_val(*fpr);
+            vsr[1] = *vsrl;
 #else
-            vsr[0] = env->vsr[i];
-            vsr[1] = float64_val(env->fpr[i]);
+            vsr[0] = *vsrl;
+            vsr[1] = float64_val(*fpr);
 #endif
             reg.addr = (uintptr_t) &vsr;
             reg.id = vsx ? KVM_REG_PPC_VSR(i) : KVM_REG_PPC_FPR(i);
@@ -660,7 +662,7 @@ static int kvm_put_fp(CPUState *cs)
 
         for (i = 0; i < 32; i++) {
             reg.id = KVM_REG_PPC_VR(i);
-            reg.addr = (uintptr_t)&env->avr[i];
+            reg.addr = (uintptr_t)cpu_avr_ptr(env, i);
             ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
             if (ret < 0) {
                 DPRINTF("Unable to set VR%d to KVM: %s\n", i, strerror(errno));
@@ -696,6 +698,8 @@ static int kvm_get_fp(CPUState *cs)
 
         for (i = 0; i < 32; i++) {
             uint64_t vsr[2];
+            uint64_t *fpr = cpu_fpr_ptr(&cpu->env, i);
+            uint64_t *vsrl = cpu_vsrl_ptr(&cpu->env, i);
 
             reg.addr = (uintptr_t) &vsr;
             reg.id = vsx ? KVM_REG_PPC_VSR(i) : KVM_REG_PPC_FPR(i);
@@ -707,14 +711,14 @@ static int kvm_get_fp(CPUState *cs)
                 return ret;
             } else {
 #ifdef HOST_WORDS_BIGENDIAN
-                env->fpr[i] = vsr[0];
+                *fpr = vsr[0];
                 if (vsx) {
-                    env->vsr[i] = vsr[1];
+                    *vsrl = vsr[1];
                 }
 #else
-                env->fpr[i] = vsr[1];
+                *fpr = vsr[1];
                 if (vsx) {
-                    env->vsr[i] = vsr[0];
+                    *vsrl = vsr[0];
                 }
 #endif
             }
@@ -732,7 +736,7 @@ static int kvm_get_fp(CPUState *cs)
 
         for (i = 0; i < 32; i++) {
             reg.id = KVM_REG_PPC_VR(i);
-            reg.addr = (uintptr_t)&env->avr[i];
+            reg.addr = (uintptr_t)cpu_avr_ptr(env, i);
             ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
             if (ret < 0) {
                 DPRINTF("Unable to get VR%d from KVM: %s\n",
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index e7b3725273..eff30053b0 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -45,7 +45,7 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
             uint64_t l;
         } u;
         u.l = qemu_get_be64(f);
-        env->fpr[i] = u.d;
+        *cpu_fpr_ptr(env, i) = u.d;
     }
     qemu_get_be32s(f, &fpscr);
     env->fpscr = fpscr;
@@ -138,11 +138,73 @@ static const VMStateInfo vmstate_info_avr = {
 };
 
 #define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
-    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
+    VMSTATE_SUB_ARRAY(_f, _s, 32, _n, _v, vmstate_info_avr, ppc_avr_t)
 
 #define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
     VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
 
+static int get_fpr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field)
+{
+    ppc_vsr_t *v = pv;
+
+    v->u64[0] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static int put_fpr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field, QJSON *vmdesc)
+{
+    ppc_vsr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[0]);
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_fpr = {
+    .name = "fpr",
+    .get  = get_fpr,
+    .put  = put_fpr,
+};
+
+#define VMSTATE_FPR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_fpr, ppc_vsr_t)
+
+#define VMSTATE_FPR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_FPR_ARRAY_V(_f, _s, _n, 0)
+
+static int get_vsr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field)
+{
+    ppc_vsr_t *v = pv;
+
+    v->u64[1] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static int put_vsr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field, QJSON *vmdesc)
+{
+    ppc_vsr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[1]);
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_vsr = {
+    .name = "vsr",
+    .get  = get_vsr,
+    .put  = put_vsr,
+};
+
+#define VMSTATE_VSR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_vsr, ppc_vsr_t)
+
+#define VMSTATE_VSR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_VSR_ARRAY_V(_f, _s, _n, 0)
+
 static bool cpu_pre_2_8_migration(void *opaque, int version_id)
 {
     PowerPCCPU *cpu = opaque;
@@ -354,7 +416,7 @@ static const VMStateDescription vmstate_fpu = {
     .minimum_version_id = 1,
     .needed = fpu_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
+        VMSTATE_FPR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },
@@ -373,7 +435,7 @@ static const VMStateDescription vmstate_altivec = {
     .minimum_version_id = 1,
     .needed = altivec_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
+        VMSTATE_AVR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_UINT32(env.vscr, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },
@@ -392,7 +454,7 @@ static const VMStateDescription vmstate_vsx = {
     .minimum_version_id = 1,
     .needed = vsx_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
+        VMSTATE_VSR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/ppc/monitor.c b/target/ppc/monitor.c
index 14915119fc..04deec8030 100644
--- a/target/ppc/monitor.c
+++ b/target/ppc/monitor.c
@@ -123,8 +123,8 @@ int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval)
 
     /* Floating point registers */
     if ((qemu_tolower(name[0]) == 'f') &&
-        ppc_cpu_get_reg_num(name + 1, ARRAY_SIZE(env->fpr), &regnum)) {
-        *pval = env->fpr[regnum];
+        ppc_cpu_get_reg_num(name + 1, 32, &regnum)) {
+        *pval = *cpu_fpr_ptr(env, regnum);
         return 0;
     }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b18ded07b3..e169c43643 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6662,22 +6662,22 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
 #ifdef HOST_WORDS_BIGENDIAN
     tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 0 : 1)]));
+                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
 #else
     tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 1 : 0)]));
+                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
 #endif
 }
 
@@ -6685,10 +6685,10 @@ static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
 #ifdef HOST_WORDS_BIGENDIAN
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 0 : 1)]));
+                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
 #else
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 1 : 0)]));
+                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
 #endif
 }
 
@@ -7440,7 +7440,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
             if ((i & (RFPL - 1)) == 0) {
                 cpu_fprintf(f, "FPR%02d", i);
             }
-            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->fpr[i]));
+            cpu_fprintf(f, " %016" PRIx64, *cpu_fpr_ptr(env, i));
             if ((i & (RFPL - 1)) == (RFPL - 1)) {
                 cpu_fprintf(f, "\n");
             }
diff --git a/target/ppc/translate/dfp-impl.inc.c b/target/ppc/translate/dfp-impl.inc.c
index 634ef73b8a..6c556dc2e1 100644
--- a/target/ppc/translate/dfp-impl.inc.c
+++ b/target/ppc/translate/dfp-impl.inc.c
@@ -3,7 +3,7 @@
 static inline TCGv_ptr gen_fprp_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, fpr[reg]));
+    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[reg].u64[0]));
     return r;
 }
 
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 5e8327e9a3..f99d0284c2 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -10,10 +10,15 @@
 static inline TCGv_ptr gen_avr_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, avr[reg]));
+    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[32 + reg].u64[0]));
     return r;
 }
 
+static inline long avr64_offset(int reg, bool high)
+{
+    return offsetof(CPUPPCState, vsr[32 + reg].u64[(high ? 0 : 1)]);
+}
+
 #define GEN_VR_LDX(name, opc2, opc3)                                          \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 7eaa36b4d5..ed4fdceacf 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -2,12 +2,12 @@
 
 static inline void get_vsr(TCGv_i64 dst, int n)
 {
-    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
 static inline void set_vsr(int n, TCGv_i64 src)
 {
-    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 03f1d34a97..ade06cc773 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -9486,7 +9486,7 @@ static bool avr_need_swap(CPUPPCState *env)
 static int gdb_get_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
-        stfq_p(mem_buf, env->fpr[n]);
+        stfq_p(mem_buf, *cpu_fpr_ptr(env, n));
         ppc_maybe_bswap_register(env, mem_buf, 8);
         return 8;
     }
@@ -9502,7 +9502,7 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         ppc_maybe_bswap_register(env, mem_buf, 8);
-        env->fpr[n] = ldfq_p(mem_buf);
+        *cpu_fpr_ptr(env, n) = ldfq_p(mem_buf);
         return 8;
     }
     if (n == 32) {
@@ -9516,12 +9516,13 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
+        ppc_avr_t *avr = cpu_avr_ptr(env, n);
         if (!avr_need_swap(env)) {
-            stq_p(mem_buf, env->avr[n].u64[0]);
-            stq_p(mem_buf+8, env->avr[n].u64[1]);
+            stq_p(mem_buf, avr->u64[0]);
+            stq_p(mem_buf + 8, avr->u64[1]);
         } else {
-            stq_p(mem_buf, env->avr[n].u64[1]);
-            stq_p(mem_buf+8, env->avr[n].u64[0]);
+            stq_p(mem_buf, avr->u64[1]);
+            stq_p(mem_buf + 8, avr->u64[0]);
         }
         ppc_maybe_bswap_register(env, mem_buf, 8);
         ppc_maybe_bswap_register(env, mem_buf + 8, 8);
@@ -9543,14 +9544,15 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
+        ppc_avr_t *avr = cpu_avr_ptr(env, n);
         ppc_maybe_bswap_register(env, mem_buf, 8);
         ppc_maybe_bswap_register(env, mem_buf + 8, 8);
         if (!avr_need_swap(env)) {
-            env->avr[n].u64[0] = ldq_p(mem_buf);
-            env->avr[n].u64[1] = ldq_p(mem_buf+8);
+            avr->u64[0] = ldq_p(mem_buf);
+            avr->u64[1] = ldq_p(mem_buf + 8);
         } else {
-            env->avr[n].u64[1] = ldq_p(mem_buf);
-            env->avr[n].u64[0] = ldq_p(mem_buf+8);
+            avr->u64[1] = ldq_p(mem_buf);
+            avr->u64[0] = ldq_p(mem_buf + 8);
         }
         return 16;
     }
@@ -9623,7 +9625,7 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 static int gdb_get_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
-        stq_p(mem_buf, env->vsr[n]);
+        stq_p(mem_buf, *cpu_vsrl_ptr(env, n));
         ppc_maybe_bswap_register(env, mem_buf, 8);
         return 8;
     }
@@ -9634,7 +9636,7 @@ static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         ppc_maybe_bswap_register(env, mem_buf, 8);
-        env->vsr[n] = ldq_p(mem_buf);
+        *cpu_vsrl_ptr(env, n) = ldq_p(mem_buf);
         return 8;
     }
     return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v5 9/9] target/ppc: replace AVR* macros with Vsr* macros
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (7 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 8/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
@ 2019-01-02  9:14 ` Mark Cave-Ayland
  2019-01-03  0:23 ` [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations David Gibson
  9 siblings, 0 replies; 11+ messages in thread
From: Mark Cave-Ayland @ 2019-01-02  9:14 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

Now that the VMX and VSR register sets have been combined, the same macros can
be used to access both AVR and VSR field members.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 9d715be25c..598731d47a 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -391,13 +391,9 @@ target_ulong helper_602_mfrom(target_ulong arg)
 #if defined(HOST_WORDS_BIGENDIAN)
 #define HI_IDX 0
 #define LO_IDX 1
-#define AVRB(i) u8[i]
-#define AVRW(i) u32[i]
 #else
 #define HI_IDX 1
 #define LO_IDX 0
-#define AVRB(i) u8[15-(i)]
-#define AVRW(i) u32[3-(i)]
 #endif
 
 #if defined(HOST_WORDS_BIGENDIAN)
@@ -3277,11 +3273,11 @@ void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     int i;
 
     VECTOR_FOR_INORDER_I(i, u32) {
-        result.AVRW(i) = b->AVRW(i) ^
-            (AES_Te0[a->AVRB(AES_shifts[4*i + 0])] ^
-             AES_Te1[a->AVRB(AES_shifts[4*i + 1])] ^
-             AES_Te2[a->AVRB(AES_shifts[4*i + 2])] ^
-             AES_Te3[a->AVRB(AES_shifts[4*i + 3])]);
+        result.VsrW(i) = b->VsrW(i) ^
+            (AES_Te0[a->VsrB(AES_shifts[4 * i + 0])] ^
+             AES_Te1[a->VsrB(AES_shifts[4 * i + 1])] ^
+             AES_Te2[a->VsrB(AES_shifts[4 * i + 2])] ^
+             AES_Te3[a->VsrB(AES_shifts[4 * i + 3])]);
     }
     *r = result;
 }
@@ -3292,7 +3288,7 @@ void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     int i;
 
     VECTOR_FOR_INORDER_I(i, u8) {
-        result.AVRB(i) = b->AVRB(i) ^ (AES_sbox[a->AVRB(AES_shifts[i])]);
+        result.VsrB(i) = b->VsrB(i) ^ (AES_sbox[a->VsrB(AES_shifts[i])]);
     }
     *r = result;
 }
@@ -3305,15 +3301,15 @@ void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     ppc_avr_t tmp;
 
     VECTOR_FOR_INORDER_I(i, u8) {
-        tmp.AVRB(i) = b->AVRB(i) ^ AES_isbox[a->AVRB(AES_ishifts[i])];
+        tmp.VsrB(i) = b->VsrB(i) ^ AES_isbox[a->VsrB(AES_ishifts[i])];
     }
 
     VECTOR_FOR_INORDER_I(i, u32) {
-        r->AVRW(i) =
-            AES_imc[tmp.AVRB(4*i + 0)][0] ^
-            AES_imc[tmp.AVRB(4*i + 1)][1] ^
-            AES_imc[tmp.AVRB(4*i + 2)][2] ^
-            AES_imc[tmp.AVRB(4*i + 3)][3];
+        r->VsrW(i) =
+            AES_imc[tmp.VsrB(4 * i + 0)][0] ^
+            AES_imc[tmp.VsrB(4 * i + 1)][1] ^
+            AES_imc[tmp.VsrB(4 * i + 2)][2] ^
+            AES_imc[tmp.VsrB(4 * i + 3)][3];
     }
 }
 
@@ -3323,7 +3319,7 @@ void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     int i;
 
     VECTOR_FOR_INORDER_I(i, u8) {
-        result.AVRB(i) = b->AVRB(i) ^ (AES_isbox[a->AVRB(AES_ishifts[i])]);
+        result.VsrB(i) = b->VsrB(i) ^ (AES_isbox[a->VsrB(AES_ishifts[i])]);
     }
     *r = result;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations
  2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
                   ` (8 preceding siblings ...)
  2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 9/9] target/ppc: replace AVR* macros with Vsr* macros Mark Cave-Ayland
@ 2019-01-03  0:23 ` David Gibson
  9 siblings, 0 replies; 11+ messages in thread
From: David Gibson @ 2019-01-03  0:23 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: qemu-devel, qemu-ppc, richard.henderson

[-- Attachment #1: Type: text/plain, Size: 5031 bytes --]

On Wed, Jan 02, 2019 at 09:14:14AM +0000, Mark Cave-Ayland wrote:
> This patchset is an attempt at trying to improve the VMX (Altivec) instruction
> performance by laying the groundwork for use of the new TCG vector operations.
> 
> Patches 1 and 2 fix a sign-extension error discovered in EXTRACT_SHELPER and an
> associated typo in the SIMM5 macro which were discovered whilst testing Richard's
> follow-on TCG vector improvements patchset.
> 
> In order to use TCG vector operations, the registers must be accessible from cpu_env
> whilst currently they are accessed via arrays of static TCG globals. Patches 3-5
> are therefore mechanical patches which introduce access helpers for FPR, AVR and VSR
> registers using the supplied TCGv_i64 parameter.
> 
> Once this is done, patch 6 enables us to remove the static TCG global arrays and updates
> the access helpers to read/write to the relevant fields in cpu_env directly.
> 
> Patches 7 and 8 perform the legwork required to enable VSX instructions to be converted
> to use TCG vector operations in future by rearranging the FP, VMX and VSX registers into
> a single aligned VSR register array (the scope of this patchset is VMX only).
> 
> Patch 9 removes the AVR* macros and replaces them with the corresponding Vsr* macros
> since they are equivalent.
> 
> Finally thanks to Richard for taking the time to answer some of my (mostly beginner)
> questions related to TCG.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Applied to ppc-for-4.0, thanks.

> 
> v5:
> - Fix up KVM-enabled builds on PPC host due to missing conversion of target/ppc/kvm.c
> 
> v4:
> - Rebase onto master
> - Add extra R-B tags from Richard
> - Leave HI_IDX/LO_IDX in int_helper.c in patch 9 (similarly named macros are also
>   used in other files so let's ensure there is no confusion)
> - Add cpu_fpr_ptr(), cpu_vsrl_ptr() and cpu_avr_ptr() as suggested by Richard in
>   patch 8
> 
> v3:
> - Rebase onto master, drop RFC prefix, alter subject line
> - Add A-B tags from David
> - Add SIMM5/EXTRACT_HELPER macro fix patches to the start of the series
> - Drop patch 4 from previous patchset (delay AVR register writeback) as it should
>   not be required.
> - Remove extra get_fpr() accidentally added to GEN_FLOAT macros in patch 3
> - Fix temporary leak when VMX/VSX not enabled in patches 4 and 5
> - Add patch to remove AVR* macros, replacing them with Vsr* macros
> - Drop patches converting logical, add and sub instructions to TCG vector ops (let
>   Richard incorporate this into his TCG vector improvements patchset)
> 
> v2:
> - Rebase onto master
> - Add comment explaining rationale for FPR helpers in description for patch 1
> - Add R-B tags from Richard
> - Add patch 3 to delay AVR register writeback as spotted by Richard
> - Add patches 6 and 7 to merge FPR, VMX and VSX registers into the vsr array
>   to facilitate conversion of VSX instructions to vector operations later
> - Fix accidental bug whereby the conversion of get_vsr()/set_vsr() to access
>   data from cpu_env was incorrectly squashed into patch 3
> - Move set_fpr() further down in gen_fsqrts() and gen_frsqrtes() in patch 1
> 
> Mark Cave-Ayland (9):
>   target/ppc: fix typo in SIMM5 extraction helper
>   target/ppc: switch EXTRACT_HELPER macros over to use
>     sextract32/extract32
>   target/ppc: introduce get_fpr() and set_fpr() helpers for FP register
>     access
>   target/ppc: introduce get_avr64() and set_avr64() helpers for VMX
>     register access
>   target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}()
>     helpers for VSR register access
>   target/ppc: switch FPR, VMX and VSX helpers to access data directly
>     from cpu_env
>   target/ppc: merge ppc_vsr_t and ppc_avr_t union types
>   target/ppc: move FP and VMX registers into aligned vsr register array
>   target/ppc: replace AVR* macros with Vsr* macros
> 
>  linux-user/ppc/signal.c             |  28 +-
>  target/ppc/arch_dump.c              |  15 +-
>  target/ppc/cpu.h                    |  42 +-
>  target/ppc/gdbstub.c                |   8 +-
>  target/ppc/int_helper.c             |  86 ++--
>  target/ppc/internal.h               |  39 +-
>  target/ppc/kvm.c                    |  24 +-
>  target/ppc/machine.c                |  72 ++-
>  target/ppc/monitor.c                |   4 +-
>  target/ppc/translate.c              |  73 ++-
>  target/ppc/translate/dfp-impl.inc.c |   2 +-
>  target/ppc/translate/fp-impl.inc.c  | 486 +++++++++++++++-----
>  target/ppc/translate/vmx-impl.inc.c | 154 +++++--
>  target/ppc/translate/vsx-impl.inc.c | 862 ++++++++++++++++++++++++++----------
>  target/ppc/translate_init.inc.c     |  26 +-
>  15 files changed, 1374 insertions(+), 547 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-03  0:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-02  9:14 [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 1/9] target/ppc: fix typo in SIMM5 extraction helper Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 2/9] target/ppc: switch EXTRACT_HELPER macros over to use sextract32/extract32 Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 3/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 4/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 5/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 6/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 7/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 8/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
2019-01-02  9:14 ` [Qemu-devel] [PATCH v5 9/9] target/ppc: replace AVR* macros with Vsr* macros Mark Cave-Ayland
2019-01-03  0:23 ` [Qemu-devel] [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.