All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
@ 2018-12-18  6:38 Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand Richard Henderson
                   ` (35 more replies)
  0 siblings, 36 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

This implements some of the things that I talked about with Mark
this morning / yesterday.  In particular:

(0) Implement expanders for nand, nor, eqv logical operations.

(1) Implement saturating arithmetic for the tcg backend.

    While I had expanders for these, they always went to helpers.
    It's easy enough to expand byte and half-word operations for x86.
    Beyond that, 32 and 64-bit operations can be expanded with integers.

(2) Implement minmax arithmetic for the tcg backend.

    While I had integral minmax operations, I had not yet added
    any vector expanders for this.  (The integral stuff came in
    for atomic minmax.)

(3) Trivial conversions to minmax for target/arm.

(4) Patches 11-18 are identical to Mark's.

(5) Patches 19-25 implement splat and logicals for VMX and VSX.

    VSX is no more difficult than VMX for these.  It does seem to be
    just about everything that we can do for VSX at the momement.

(6) Patches 26-33 implement saturating arithmetic for VMX.

(7) Patch 34 implements minmax arithmetic for VMX.

I've tested the new operations via aarch64 guest, as that's the set
of risu test cases I've got handy.  The rest is untested so far.


r~


Mark Cave-Ayland (8):
  target/ppc: introduce get_fpr() and set_fpr() helpers for FP register
    access
  target/ppc: introduce get_avr64() and set_avr64() helpers for VMX
    register access
  target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}()
    helpers for VSR register access
  target/ppc: switch FPR, VMX and VSX helpers to access data directly
    from cpu_env
  target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  target/ppc: move FP and VMX registers into aligned vsr register array
  target/ppc: convert VMX logical instructions to use vector operations
  target/ppc: convert vaddu[b,h,w,d] and vsubu[b,h,w,d] over to use
    vector operations

Richard Henderson (26):
  tcg: Add logical simplifications during gvec expand
  target/arm: Rely on optimization within tcg_gen_gvec_or
  tcg: Add gvec expanders for nand, nor, eqv
  tcg: Add write_aofs to GVecGen4
  tcg: Add opcodes for vector saturated arithmetic
  tcg/i386: Implement vector saturating arithmetic
  tcg: Add opcodes for vector minmax arithmetic
  tcg/i386: Implement vector minmax arithmetic
  target/arm: Use vector minmax expanders for aarch64
  target/arm: Use vector minmax expanders for aarch32
  target/ppc: convert vspltis[bhw] to use vector operations
  target/ppc: convert vsplt[bhw] to use vector operations
  target/ppc: nand, nor, eqv are now generic vector operations
  target/ppc: convert VSX logical operations to vector operations
  target/ppc: convert xxspltib to vector operations
  target/ppc: convert xxspltw to vector operations
  target/ppc: convert xxsel to vector operations
  target/ppc: Pass integer to helper_mtvscr
  target/ppc: Use helper_mtvscr for reset and gdb
  target/ppc: Remove vscr_nj and vscr_sat
  target/ppc: Add helper_mfvscr
  target/ppc: Use mtvscr/mfvscr for vmstate
  target/ppc: Add set_vscr_sat
  target/ppc: Split out VSCR_SAT to a vector field
  target/ppc: convert vadd*s and vsub*s to vector operations
  target/ppc: convert vmin* and vmax* to vector operations

 accel/tcg/tcg-runtime.h             |  23 +
 target/ppc/cpu.h                    |  30 +-
 target/ppc/helper.h                 |  57 +-
 target/ppc/internal.h               |  29 +-
 tcg/aarch64/tcg-target.h            |   2 +
 tcg/i386/tcg-target.h               |   2 +
 tcg/tcg-op-gvec.h                   |  18 +
 tcg/tcg-op.h                        |  11 +
 tcg/tcg-opc.h                       |   8 +
 tcg/tcg.h                           |   2 +
 accel/tcg/tcg-runtime-gvec.c        | 257 +++++++++
 linux-user/ppc/signal.c             |  24 +-
 target/arm/translate-a64.c          |  41 +-
 target/arm/translate-sve.c          |   6 +-
 target/arm/translate.c              |  37 +-
 target/ppc/arch_dump.c              |  15 +-
 target/ppc/gdbstub.c                |   8 +-
 target/ppc/int_helper.c             | 194 +++----
 target/ppc/machine.c                | 116 +++-
 target/ppc/monitor.c                |   4 +-
 target/ppc/translate.c              |  74 ++-
 target/ppc/translate/dfp-impl.inc.c |   2 +-
 target/ppc/translate/fp-impl.inc.c  | 490 ++++++++++++----
 target/ppc/translate/vmx-impl.inc.c | 349 +++++++-----
 target/ppc/translate/vsx-impl.inc.c | 834 +++++++++++++++++++---------
 target/ppc/translate_init.inc.c     |  31 +-
 tcg/i386/tcg-target.inc.c           | 106 ++++
 tcg/tcg-op-gvec.c                   | 305 ++++++++--
 tcg/tcg-op-vec.c                    |  75 ++-
 tcg/tcg.c                           |  10 +
 30 files changed, 2275 insertions(+), 885 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  5:36   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 02/34] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
                   ` (34 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

We handle many of these during integer expansion, and the
rest of them during integer optimization.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 61c25f5784..ec231b78fb 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1840,7 +1840,12 @@ void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs,
         .opc = INDEX_op_and_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
 }
 
 void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -1853,7 +1858,12 @@ void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs,
         .opc = INDEX_op_or_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
 }
 
 void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -1866,7 +1876,12 @@ void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs,
         .opc = INDEX_op_xor_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
 }
 
 void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -1879,7 +1894,12 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
         .opc = INDEX_op_andc_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
 }
 
 void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
@@ -1892,7 +1912,12 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
         .opc = INDEX_op_orc_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
 }
 
 static const GVecGen2s gop_ands = {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 02/34] target/arm: Rely on optimization within tcg_gen_gvec_or
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  5:37   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 03/34] tcg: Add gvec expanders for nand, nor, eqv Richard Henderson
                   ` (33 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Since we're now handling a == b generically, we no longer need
to do it by hand within target/arm/.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c |  6 +-----
 target/arm/translate-sve.c |  6 +-----
 target/arm/translate.c     | 12 +++---------
 3 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e1da1e4d6f..2d6f8c1b4f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10152,11 +10152,7 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
         gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_andc, 0);
         return;
     case 2: /* ORR */
-        if (rn == rm) { /* MOV */
-            gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_mov, 0);
-        } else {
-            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
-        }
+        gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
         return;
     case 3: /* ORN */
         gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_orc, 0);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b15b615ceb..3a2eb51566 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -280,11 +280,7 @@ static bool trans_AND_zzz(DisasContext *s, arg_rrr_esz *a)
 
 static bool trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    if (a->rn == a->rm) { /* MOV */
-        return do_mov_z(s, a->rd, a->rn);
-    } else {
-        return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
-    }
+    return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
 }
 
 static bool trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 7c4675ffd8..33b1860148 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6294,15 +6294,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
                                   vec_size, vec_size);
                 break;
-            case 2:
-                if (rn == rm) {
-                    /* VMOV */
-                    tcg_gen_gvec_mov(0, rd_ofs, rn_ofs, vec_size, vec_size);
-                } else {
-                    /* VORR */
-                    tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
-                                    vec_size, vec_size);
-                }
+            case 2: /* VORR */
+                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
+                                vec_size, vec_size);
                 break;
             case 3: /* VORN */
                 tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 03/34] tcg: Add gvec expanders for nand, nor, eqv
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 02/34] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  5:39   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 04/34] tcg: Add write_aofs to GVecGen4 Richard Henderson
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  3 +++
 tcg/tcg-op-gvec.h            |  6 +++++
 tcg/tcg-op.h                 |  3 +++
 accel/tcg/tcg-runtime-gvec.c | 33 +++++++++++++++++++++++
 tcg/tcg-op-gvec.c            | 51 ++++++++++++++++++++++++++++++++++++
 tcg/tcg-op-vec.c             | 21 +++++++++++++++
 6 files changed, 117 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 1bd39d136d..835ddfebb2 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -211,6 +211,9 @@ DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_nand, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_nor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_eqv, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(gvec_ands, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_xors, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index ff43a29a0b..d65b9d9d4c 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -242,6 +242,12 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_nand(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_nor(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_eqv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_andi(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t c, uint32_t oprsz, uint32_t maxsz);
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index db4e9188f4..1974bf1cae 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -961,6 +961,9 @@ void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_nand_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 90340e56e0..d1802467d5 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -512,6 +512,39 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc)
     clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_nand)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec64)) {
+        *(vec64 *)(d + i) = ~(*(vec64 *)(a + i) & *(vec64 *)(b + i));
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_nor)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec64)) {
+        *(vec64 *)(d + i) = ~(*(vec64 *)(a + i) | *(vec64 *)(b + i));
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_eqv)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec64)) {
+        *(vec64 *)(d + i) = ~(*(vec64 *)(a + i) ^ *(vec64 *)(b + i));
+    }
+    clear_high(d, oprsz, desc);
+}
+
 void HELPER(gvec_ands)(void *d, void *a, uint64_t b, uint32_t desc)
 {
     intptr_t oprsz = simd_oprsz(desc);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index ec231b78fb..81689d02f7 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1920,6 +1920,57 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
     }
 }
 
+void tcg_gen_gvec_nand(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g = {
+        .fni8 = tcg_gen_nand_i64,
+        .fniv = tcg_gen_nand_vec,
+        .fno = gen_helper_gvec_nand,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_not(vece, dofs, aofs, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
+}
+
+void tcg_gen_gvec_nor(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g = {
+        .fni8 = tcg_gen_nor_i64,
+        .fniv = tcg_gen_nor_vec,
+        .fno = gen_helper_gvec_nor,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_not(vece, dofs, aofs, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
+}
+
+void tcg_gen_gvec_eqv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g = {
+        .fni8 = tcg_gen_eqv_i64,
+        .fniv = tcg_gen_eqv_vec,
+        .fno = gen_helper_gvec_eqv,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    if (aofs == bofs) {
+        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
+    } else {
+        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
+    }
+}
+
 static const GVecGen2s gop_ands = {
     .fni8 = tcg_gen_and_i64,
     .fniv = tcg_gen_and_vec,
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index cefba3d185..d77fdf7c1d 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -275,6 +275,27 @@ void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
     }
 }
 
+void tcg_gen_nand_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    /* TODO: Add TCG_TARGET_HAS_nand_vec when adding a backend supports it. */
+    tcg_gen_and_vec(0, r, a, b);
+    tcg_gen_not_vec(0, r, r);
+}
+
+void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    /* TODO: Add TCG_TARGET_HAS_nor_vec when adding a backend supports it. */
+    tcg_gen_or_vec(0, r, a, b);
+    tcg_gen_not_vec(0, r, r);
+}
+
+void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    /* TODO: Add TCG_TARGET_HAS_eqv_vec when adding a backend supports it. */
+    tcg_gen_xor_vec(0, r, a, b);
+    tcg_gen_not_vec(0, r, r);
+}
+
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 {
     if (TCG_TARGET_HAS_not_vec) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 04/34] tcg: Add write_aofs to GVecGen4
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 03/34] tcg: Add gvec expanders for nand, nor, eqv Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 05/34] tcg: Add opcodes for vector saturated arithmetic Richard Henderson
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

This allows writing 2 output, 3 input operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.h |  2 ++
 tcg/tcg-op-gvec.c | 27 +++++++++++++++++++--------
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index d65b9d9d4c..2cb447112e 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -181,6 +181,8 @@ typedef struct {
     uint8_t vece;
     /* Prefer i64 to v64.  */
     bool prefer_i64;
+    /* Write aofs as a 2nd dest operand.  */
+    bool write_aofs;
 } GVecGen4;
 
 void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 81689d02f7..c10d3d7b26 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -665,7 +665,7 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
 
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
-                         uint32_t cofs, uint32_t oprsz,
+                         uint32_t cofs, uint32_t oprsz, bool write_aofs,
                          void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32))
 {
     TCGv_i32 t0 = tcg_temp_new_i32();
@@ -680,6 +680,9 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
         tcg_gen_ld_i32(t3, cpu_env, cofs + i);
         fni(t0, t1, t2, t3);
         tcg_gen_st_i32(t0, cpu_env, dofs + i);
+        if (write_aofs) {
+            tcg_gen_st_i32(t1, cpu_env, aofs + i);
+        }
     }
     tcg_temp_free_i32(t3);
     tcg_temp_free_i32(t2);
@@ -769,7 +772,7 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
 
 /* Expand OPSZ bytes worth of three-operand operations using i64 elements.  */
 static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
-                         uint32_t cofs, uint32_t oprsz,
+                         uint32_t cofs, uint32_t oprsz, bool write_aofs,
                          void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
 {
     TCGv_i64 t0 = tcg_temp_new_i64();
@@ -784,6 +787,9 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
         tcg_gen_ld_i64(t3, cpu_env, cofs + i);
         fni(t0, t1, t2, t3);
         tcg_gen_st_i64(t0, cpu_env, dofs + i);
+        if (write_aofs) {
+            tcg_gen_st_i64(t1, cpu_env, aofs + i);
+        }
     }
     tcg_temp_free_i64(t3);
     tcg_temp_free_i64(t2);
@@ -880,7 +886,7 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
 /* Expand OPSZ bytes worth of four-operand operations using host vectors.  */
 static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t bofs, uint32_t cofs, uint32_t oprsz,
-                         uint32_t tysz, TCGType type,
+                         uint32_t tysz, TCGType type, bool write_aofs,
                          void (*fni)(unsigned, TCGv_vec, TCGv_vec,
                                      TCGv_vec, TCGv_vec))
 {
@@ -896,6 +902,9 @@ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
         tcg_gen_ld_vec(t3, cpu_env, cofs + i);
         fni(vece, t0, t1, t2, t3);
         tcg_gen_st_vec(t0, cpu_env, dofs + i);
+        if (write_aofs) {
+            tcg_gen_st_vec(t1, cpu_env, aofs + i);
+        }
     }
     tcg_temp_free_vec(t3);
     tcg_temp_free_vec(t2);
@@ -1187,7 +1196,7 @@ void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
          */
         some = QEMU_ALIGN_DOWN(oprsz, 32);
         expand_4_vec(g->vece, dofs, aofs, bofs, cofs, some,
-                     32, TCG_TYPE_V256, g->fniv);
+                     32, TCG_TYPE_V256, g->write_aofs, g->fniv);
         if (some == oprsz) {
             break;
         }
@@ -1200,18 +1209,20 @@ void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
         /* fallthru */
     case TCG_TYPE_V128:
         expand_4_vec(g->vece, dofs, aofs, bofs, cofs, oprsz,
-                     16, TCG_TYPE_V128, g->fniv);
+                     16, TCG_TYPE_V128, g->write_aofs, g->fniv);
         break;
     case TCG_TYPE_V64:
         expand_4_vec(g->vece, dofs, aofs, bofs, cofs, oprsz,
-                     8, TCG_TYPE_V64, g->fniv);
+                     8, TCG_TYPE_V64, g->write_aofs, g->fniv);
         break;
 
     case 0:
         if (g->fni8 && check_size_impl(oprsz, 8)) {
-            expand_4_i64(dofs, aofs, bofs, cofs, oprsz, g->fni8);
+            expand_4_i64(dofs, aofs, bofs, cofs, oprsz,
+                         g->write_aofs, g->fni8);
         } else if (g->fni4 && check_size_impl(oprsz, 4)) {
-            expand_4_i32(dofs, aofs, bofs, cofs, oprsz, g->fni4);
+            expand_4_i32(dofs, aofs, bofs, cofs, oprsz,
+                         g->write_aofs, g->fni4);
         } else {
             assert(g->fno != NULL);
             tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 05/34] tcg: Add opcodes for vector saturated arithmetic
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 04/34] tcg: Add write_aofs to GVecGen4 Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 06/34] tcg/i386: Implement vector saturating arithmetic Richard Henderson
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h    |  1 +
 tcg/tcg-op.h             |  4 ++
 tcg/tcg-opc.h            |  4 ++
 tcg/tcg.h                |  1 +
 tcg/tcg-op-gvec.c        | 84 ++++++++++++++++++++++++++++++----------
 tcg/tcg-op-vec.c         | 34 ++++++++++++++--
 tcg/tcg.c                |  5 +++
 8 files changed, 110 insertions(+), 24 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index f966a4fcb3..98556bcf22 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -135,6 +135,7 @@ typedef enum {
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
+#define TCG_TARGET_HAS_sat_vec          0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f378d29568..44381062e6 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -185,6 +185,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
+#define TCG_TARGET_HAS_sat_vec          0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 1974bf1cae..90b3193bf3 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -966,6 +966,10 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
+void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 
 void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index e3a43aabb6..94691e849b 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -221,6 +221,10 @@ DEF(add_vec, 1, 2, 0, IMPLVEC)
 DEF(sub_vec, 1, 2, 0, IMPLVEC)
 DEF(mul_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_mul_vec))
 DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))
+DEF(ssadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
+DEF(usadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
+DEF(sssub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
+DEF(ussub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 
 DEF(and_vec, 1, 2, 0, IMPLVEC)
 DEF(or_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index ade692fdf5..c90f65a387 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -183,6 +183,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_sat_vec          0
 #else
 #define TCG_TARGET_MAYBE_vec            1
 #endif
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index c10d3d7b26..0a33f51065 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1678,10 +1678,22 @@ void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
     static const GVecGen3 g[4] = {
-        { .fno = gen_helper_gvec_ssadd8, .vece = MO_8 },
-        { .fno = gen_helper_gvec_ssadd16, .vece = MO_16 },
-        { .fno = gen_helper_gvec_ssadd32, .vece = MO_32 },
-        { .fno = gen_helper_gvec_ssadd64, .vece = MO_64 }
+        { .fniv = tcg_gen_ssadd_vec,
+          .fno = gen_helper_gvec_ssadd8,
+          .opc = INDEX_op_ssadd_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_ssadd_vec,
+          .fno = gen_helper_gvec_ssadd16,
+          .opc = INDEX_op_ssadd_vec,
+          .vece = MO_16 },
+        { .fniv = tcg_gen_ssadd_vec,
+          .fno = gen_helper_gvec_ssadd32,
+          .opc = INDEX_op_ssadd_vec,
+          .vece = MO_32 },
+        { .fniv = tcg_gen_ssadd_vec,
+          .fno = gen_helper_gvec_ssadd64,
+          .opc = INDEX_op_ssadd_vec,
+          .vece = MO_64 },
     };
     tcg_debug_assert(vece <= MO_64);
     tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
@@ -1691,16 +1703,28 @@ void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
     static const GVecGen3 g[4] = {
-        { .fno = gen_helper_gvec_sssub8, .vece = MO_8 },
-        { .fno = gen_helper_gvec_sssub16, .vece = MO_16 },
-        { .fno = gen_helper_gvec_sssub32, .vece = MO_32 },
-        { .fno = gen_helper_gvec_sssub64, .vece = MO_64 }
+        { .fniv = tcg_gen_sssub_vec,
+          .fno = gen_helper_gvec_sssub8,
+          .opc = INDEX_op_sssub_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_sssub_vec,
+          .fno = gen_helper_gvec_sssub16,
+          .opc = INDEX_op_sssub_vec,
+          .vece = MO_16 },
+        { .fniv = tcg_gen_sssub_vec,
+          .fno = gen_helper_gvec_sssub32,
+          .opc = INDEX_op_sssub_vec,
+          .vece = MO_32 },
+        { .fniv = tcg_gen_sssub_vec,
+          .fno = gen_helper_gvec_sssub64,
+          .opc = INDEX_op_sssub_vec,
+          .vece = MO_64 },
     };
     tcg_debug_assert(vece <= MO_64);
     tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
 }
 
-static void tcg_gen_vec_usadd32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+static void tcg_gen_usadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
     TCGv_i32 max = tcg_const_i32(-1);
     tcg_gen_add_i32(d, a, b);
@@ -1708,7 +1732,7 @@ static void tcg_gen_vec_usadd32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
     tcg_temp_free_i32(max);
 }
 
-static void tcg_gen_vec_usadd32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+static void tcg_gen_usadd_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 max = tcg_const_i64(-1);
     tcg_gen_add_i64(d, a, b);
@@ -1720,20 +1744,30 @@ void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
     static const GVecGen3 g[4] = {
-        { .fno = gen_helper_gvec_usadd8, .vece = MO_8 },
-        { .fno = gen_helper_gvec_usadd16, .vece = MO_16 },
-        { .fni4 = tcg_gen_vec_usadd32_i32,
+        { .fniv = tcg_gen_usadd_vec,
+          .fno = gen_helper_gvec_usadd8,
+          .opc = INDEX_op_usadd_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_usadd_vec,
+          .fno = gen_helper_gvec_usadd16,
+          .opc = INDEX_op_usadd_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_usadd_i32,
+          .fniv = tcg_gen_usadd_vec,
           .fno = gen_helper_gvec_usadd32,
+          .opc = INDEX_op_usadd_vec,
           .vece = MO_32 },
-        { .fni8 = tcg_gen_vec_usadd32_i64,
+        { .fni8 = tcg_gen_usadd_i64,
+          .fniv = tcg_gen_usadd_vec,
           .fno = gen_helper_gvec_usadd64,
+          .opc = INDEX_op_usadd_vec,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
     tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
 }
 
-static void tcg_gen_vec_ussub32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+static void tcg_gen_ussub_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
     TCGv_i32 min = tcg_const_i32(0);
     tcg_gen_sub_i32(d, a, b);
@@ -1741,7 +1775,7 @@ static void tcg_gen_vec_ussub32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
     tcg_temp_free_i32(min);
 }
 
-static void tcg_gen_vec_ussub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+static void tcg_gen_ussub_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
     TCGv_i64 min = tcg_const_i64(0);
     tcg_gen_sub_i64(d, a, b);
@@ -1753,13 +1787,23 @@ void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
     static const GVecGen3 g[4] = {
-        { .fno = gen_helper_gvec_ussub8, .vece = MO_8 },
-        { .fno = gen_helper_gvec_ussub16, .vece = MO_16 },
-        { .fni4 = tcg_gen_vec_ussub32_i32,
+        { .fniv = tcg_gen_ussub_vec,
+          .fno = gen_helper_gvec_ussub8,
+          .opc = INDEX_op_ussub_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_ussub_vec,
+          .fno = gen_helper_gvec_ussub16,
+          .opc = INDEX_op_ussub_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_ussub_i32,
+          .fniv = tcg_gen_ussub_vec,
           .fno = gen_helper_gvec_ussub32,
+          .opc = INDEX_op_ussub_vec,
           .vece = MO_32 },
-        { .fni8 = tcg_gen_vec_ussub32_i64,
+        { .fni8 = tcg_gen_ussub_i64,
+          .fniv = tcg_gen_ussub_vec,
           .fno = gen_helper_gvec_ussub64,
+          .opc = INDEX_op_ussub_vec,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index d77fdf7c1d..675aa09258 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -386,7 +386,8 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
     }
 }
 
-void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
+                   TCGv_vec b, TCGOpcode opc)
 {
     TCGTemp *rt = tcgv_vec_temp(r);
     TCGTemp *at = tcgv_vec_temp(a);
@@ -399,11 +400,36 @@ void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 
     tcg_debug_assert(at->base_type >= type);
     tcg_debug_assert(bt->base_type >= type);
-    can = tcg_can_emit_vec_op(INDEX_op_mul_vec, type, vece);
+    can = tcg_can_emit_vec_op(opc, type, vece);
     if (can > 0) {
-        vec_gen_3(INDEX_op_mul_vec, type, vece, ri, ai, bi);
+        vec_gen_3(opc, type, vece, ri, ai, bi);
     } else {
         tcg_debug_assert(can < 0);
-        tcg_expand_vec_op(INDEX_op_mul_vec, type, vece, ri, ai, bi);
+        tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
     }
 }
+
+void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_mul_vec);
+}
+
+void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_ssadd_vec);
+}
+
+void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_usadd_vec);
+}
+
+void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_sssub_vec);
+}
+
+void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_ussub_vec);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 963cb37892..f2cf60425b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1607,6 +1607,11 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
         return have_vec && TCG_TARGET_HAS_shv_vec;
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_ussub_vec:
+        return have_vec && TCG_TARGET_HAS_sat_vec;
 
     default:
         tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 06/34] tcg/i386: Implement vector saturating arithmetic
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 05/34] tcg: Add opcodes for vector saturated arithmetic Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 07/34] tcg: Add opcodes for vector minmax arithmetic Richard Henderson
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 42 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 44381062e6..f50234d97b 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -185,7 +185,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
-#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_sat_vec          1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index c21c3272f2..3571483bae 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -377,6 +377,10 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_PADDW       (0xfd | P_EXT | P_DATA16)
 #define OPC_PADDD       (0xfe | P_EXT | P_DATA16)
 #define OPC_PADDQ       (0xd4 | P_EXT | P_DATA16)
+#define OPC_PADDSB      (0xec | P_EXT | P_DATA16)
+#define OPC_PADDSW      (0xed | P_EXT | P_DATA16)
+#define OPC_PADDUB      (0xdc | P_EXT | P_DATA16)
+#define OPC_PADDUW      (0xdd | P_EXT | P_DATA16)
 #define OPC_PAND        (0xdb | P_EXT | P_DATA16)
 #define OPC_PANDN       (0xdf | P_EXT | P_DATA16)
 #define OPC_PBLENDW     (0x0e | P_EXT3A | P_DATA16)
@@ -408,6 +412,10 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_PSUBW       (0xf9 | P_EXT | P_DATA16)
 #define OPC_PSUBD       (0xfa | P_EXT | P_DATA16)
 #define OPC_PSUBQ       (0xfb | P_EXT | P_DATA16)
+#define OPC_PSUBSB      (0xe8 | P_EXT | P_DATA16)
+#define OPC_PSUBSW      (0xe9 | P_EXT | P_DATA16)
+#define OPC_PSUBUB      (0xd8 | P_EXT | P_DATA16)
+#define OPC_PSUBUW      (0xd9 | P_EXT | P_DATA16)
 #define OPC_PUNPCKLBW   (0x60 | P_EXT | P_DATA16)
 #define OPC_PUNPCKLWD   (0x61 | P_EXT | P_DATA16)
 #define OPC_PUNPCKLDQ   (0x62 | P_EXT | P_DATA16)
@@ -2591,9 +2599,21 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     static int const add_insn[4] = {
         OPC_PADDB, OPC_PADDW, OPC_PADDD, OPC_PADDQ
     };
+    static int const ssadd_insn[4] = {
+        OPC_PADDSB, OPC_PADDSW, OPC_UD2, OPC_UD2
+    };
+    static int const usadd_insn[4] = {
+        OPC_PADDSB, OPC_PADDSW, OPC_UD2, OPC_UD2
+    };
     static int const sub_insn[4] = {
         OPC_PSUBB, OPC_PSUBW, OPC_PSUBD, OPC_PSUBQ
     };
+    static int const sssub_insn[4] = {
+        OPC_PSUBSB, OPC_PSUBSW, OPC_UD2, OPC_UD2
+    };
+    static int const ussub_insn[4] = {
+        OPC_PSUBSB, OPC_PSUBSW, OPC_UD2, OPC_UD2
+    };
     static int const mul_insn[4] = {
         OPC_UD2, OPC_PMULLW, OPC_PMULLD, OPC_UD2
     };
@@ -2631,9 +2651,21 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_add_vec:
         insn = add_insn[vece];
         goto gen_simd;
+    case INDEX_op_ssadd_vec:
+        insn = ssadd_insn[vece];
+        goto gen_simd;
+    case INDEX_op_usadd_vec:
+        insn = usadd_insn[vece];
+        goto gen_simd;
     case INDEX_op_sub_vec:
         insn = sub_insn[vece];
         goto gen_simd;
+    case INDEX_op_sssub_vec:
+        insn = sssub_insn[vece];
+        goto gen_simd;
+    case INDEX_op_ussub_vec:
+        insn = ussub_insn[vece];
+        goto gen_simd;
     case INDEX_op_mul_vec:
         insn = mul_insn[vece];
         goto gen_simd;
@@ -3007,6 +3039,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
     case INDEX_op_andc_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_ussub_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_x86_shufps_vec:
     case INDEX_op_x86_blend_vec:
@@ -3074,6 +3110,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
         }
         return 1;
 
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_ussub_vec:
+        return vece <= MO_16;
+
     default:
         return 0;
     }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 07/34] tcg: Add opcodes for vector minmax arithmetic
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 06/34] tcg/i386: Implement vector saturating arithmetic Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 08/34] tcg/i386: Implement " Richard Henderson
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  20 ++++
 tcg/aarch64/tcg-target.h     |   1 +
 tcg/i386/tcg-target.h        |   1 +
 tcg/tcg-op-gvec.h            |  10 ++
 tcg/tcg-op.h                 |   4 +
 tcg/tcg-opc.h                |   4 +
 tcg/tcg.h                    |   1 +
 accel/tcg/tcg-runtime-gvec.c | 224 +++++++++++++++++++++++++++++++++++
 tcg/tcg-op-gvec.c            | 108 +++++++++++++++++
 tcg/tcg-op-vec.c             |  20 ++++
 tcg/tcg.c                    |   5 +
 11 files changed, 398 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 835ddfebb2..dfe325625c 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -200,6 +200,26 @@ DEF_HELPER_FLAGS_4(gvec_ussub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_ussub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_ussub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_smin8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smin16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smin32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smin64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_smax8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smax16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smax32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smax64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_umin8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umin16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umin32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umin64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_umax8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umax16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umax32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umax64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 98556bcf22..545a6eec75 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -136,6 +136,7 @@ typedef enum {
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f50234d97b..efbd5a6fc9 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -186,6 +186,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
+#define TCG_TARGET_HAS_minmax_vec       0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 2cb447112e..4734eef7de 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -234,6 +234,16 @@ void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 
+/* Min/max.  */
+void tcg_gen_gvec_smin(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_umin(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_smax(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_umax(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+
 void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs,
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 90b3193bf3..042c45e807 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -970,6 +970,10 @@ void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
+void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 
 void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 94691e849b..691eddebdf 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -225,6 +225,10 @@ DEF(ssadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(usadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(sssub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(ussub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
+DEF(smin_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_minmax_vec))
+DEF(umin_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_minmax_vec))
+DEF(smax_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_minmax_vec))
+DEF(umax_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_minmax_vec))
 
 DEF(and_vec, 1, 2, 0, IMPLVEC)
 DEF(or_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index c90f65a387..b5bec3abf8 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -184,6 +184,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
 #else
 #define TCG_TARGET_MAYBE_vec            1
 #endif
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index d1802467d5..9358749741 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1028,3 +1028,227 @@ void HELPER(gvec_ussub64)(void *d, void *a, void *b, uint32_t desc)
     }
     clear_high(d, oprsz, desc);
 }
+
+void HELPER(gvec_smin8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int8_t)) {
+        int8_t aa = *(int8_t *)(a + i);
+        int8_t bb = *(int8_t *)(b + i);
+        int8_t dd = aa < bb ? aa : bb;
+        *(int8_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smin16)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
+        int16_t aa = *(int16_t *)(a + i);
+        int16_t bb = *(int16_t *)(b + i);
+        int16_t dd = aa < bb ? aa : bb;
+        *(int16_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smin32)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int32_t)) {
+        int32_t aa = *(int32_t *)(a + i);
+        int32_t bb = *(int32_t *)(b + i);
+        int32_t dd = aa < bb ? aa : bb;
+        *(int32_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smin64)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int64_t)) {
+        int64_t aa = *(int64_t *)(a + i);
+        int64_t bb = *(int64_t *)(b + i);
+        int64_t dd = aa < bb ? aa : bb;
+        *(int64_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smax8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int8_t)) {
+        int8_t aa = *(int8_t *)(a + i);
+        int8_t bb = *(int8_t *)(b + i);
+        int8_t dd = aa > bb ? aa : bb;
+        *(int8_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smax16)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
+        int16_t aa = *(int16_t *)(a + i);
+        int16_t bb = *(int16_t *)(b + i);
+        int16_t dd = aa > bb ? aa : bb;
+        *(int16_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smax32)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int32_t)) {
+        int32_t aa = *(int32_t *)(a + i);
+        int32_t bb = *(int32_t *)(b + i);
+        int32_t dd = aa > bb ? aa : bb;
+        *(int32_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_smax64)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int64_t)) {
+        int64_t aa = *(int64_t *)(a + i);
+        int64_t bb = *(int64_t *)(b + i);
+        int64_t dd = aa > bb ? aa : bb;
+        *(int64_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umin8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        uint8_t aa = *(uint8_t *)(a + i);
+        uint8_t bb = *(uint8_t *)(b + i);
+        uint8_t dd = aa < bb ? aa : bb;
+        *(uint8_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umin16)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        uint16_t aa = *(uint16_t *)(a + i);
+        uint16_t bb = *(uint16_t *)(b + i);
+        uint16_t dd = aa < bb ? aa : bb;
+        *(uint16_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umin32)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        uint32_t aa = *(uint32_t *)(a + i);
+        uint32_t bb = *(uint32_t *)(b + i);
+        uint32_t dd = aa < bb ? aa : bb;
+        *(uint32_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umin64)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        uint64_t aa = *(uint64_t *)(a + i);
+        uint64_t bb = *(uint64_t *)(b + i);
+        uint64_t dd = aa < bb ? aa : bb;
+        *(uint64_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umax8)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        uint8_t aa = *(uint8_t *)(a + i);
+        uint8_t bb = *(uint8_t *)(b + i);
+        uint8_t dd = aa > bb ? aa : bb;
+        *(uint8_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umax16)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        uint16_t aa = *(uint16_t *)(a + i);
+        uint16_t bb = *(uint16_t *)(b + i);
+        uint16_t dd = aa > bb ? aa : bb;
+        *(uint16_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umax32)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        uint32_t aa = *(uint32_t *)(a + i);
+        uint32_t bb = *(uint32_t *)(b + i);
+        uint32_t dd = aa > bb ? aa : bb;
+        *(uint32_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_umax64)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        uint64_t aa = *(uint64_t *)(a + i);
+        uint64_t bb = *(uint64_t *)(b + i);
+        uint64_t dd = aa > bb ? aa : bb;
+        *(uint64_t *)(d + i) = dd;
+    }
+    clear_high(d, oprsz, desc);
+}
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0a33f51065..3ee44fcb75 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1810,6 +1810,114 @@ void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
 }
 
+void tcg_gen_gvec_smin(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_smin_vec,
+          .fno = gen_helper_gvec_smin8,
+          .opc = INDEX_op_smin_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_smin_vec,
+          .fno = gen_helper_gvec_smin16,
+          .opc = INDEX_op_smin_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_smin_i32,
+          .fniv = tcg_gen_smin_vec,
+          .fno = gen_helper_gvec_smin32,
+          .opc = INDEX_op_smin_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_smin_i64,
+          .fniv = tcg_gen_smin_vec,
+          .fno = gen_helper_gvec_smin64,
+          .opc = INDEX_op_smin_vec,
+          .vece = MO_64 }
+    };
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+void tcg_gen_gvec_umin(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_umin_vec,
+          .fno = gen_helper_gvec_umin8,
+          .opc = INDEX_op_umin_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_umin_vec,
+          .fno = gen_helper_gvec_umin16,
+          .opc = INDEX_op_umin_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_umin_i32,
+          .fniv = tcg_gen_umin_vec,
+          .fno = gen_helper_gvec_umin32,
+          .opc = INDEX_op_umin_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_umin_i64,
+          .fniv = tcg_gen_umin_vec,
+          .fno = gen_helper_gvec_umin64,
+          .opc = INDEX_op_umin_vec,
+          .vece = MO_64 }
+    };
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+void tcg_gen_gvec_smax(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_smax_vec,
+          .fno = gen_helper_gvec_smax8,
+          .opc = INDEX_op_smax_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_smax_vec,
+          .fno = gen_helper_gvec_smax16,
+          .opc = INDEX_op_smax_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_smax_i32,
+          .fniv = tcg_gen_smax_vec,
+          .fno = gen_helper_gvec_smax32,
+          .opc = INDEX_op_smax_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_smax_i64,
+          .fniv = tcg_gen_smax_vec,
+          .fno = gen_helper_gvec_smax64,
+          .opc = INDEX_op_smax_vec,
+          .vece = MO_64 }
+    };
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+void tcg_gen_gvec_umax(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_umax_vec,
+          .fno = gen_helper_gvec_umax8,
+          .opc = INDEX_op_umax_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_umax_vec,
+          .fno = gen_helper_gvec_umax16,
+          .opc = INDEX_op_umax_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_umax_i32,
+          .fniv = tcg_gen_umax_vec,
+          .fno = gen_helper_gvec_umax32,
+          .opc = INDEX_op_umax_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_umax_i64,
+          .fniv = tcg_gen_umax_vec,
+          .fno = gen_helper_gvec_umax64,
+          .opc = INDEX_op_umax_vec,
+          .vece = MO_64 }
+    };
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
 /* Perform a vector negation using normal negation and a mask.
    Compare gen_subv_mask above.  */
 static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m)
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 675aa09258..36f35022ac 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -433,3 +433,23 @@ void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
     do_op3(vece, r, a, b, INDEX_op_ussub_vec);
 }
+
+void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_smin_vec);
+}
+
+void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_umin_vec);
+}
+
+void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_smax_vec);
+}
+
+void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_umax_vec);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f2cf60425b..2ee031fcf7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1612,6 +1612,11 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_sssub_vec:
     case INDEX_op_ussub_vec:
         return have_vec && TCG_TARGET_HAS_sat_vec;
+    case INDEX_op_smin_vec:
+    case INDEX_op_umin_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_umax_vec:
+        return have_vec && TCG_TARGET_HAS_minmax_vec;
 
     default:
         tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 08/34] tcg/i386: Implement vector minmax arithmetic
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 07/34] tcg: Add opcodes for vector minmax arithmetic Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 09/34] target/arm: Use vector minmax expanders for aarch64 Richard Henderson
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

The instruction set does not directly provide MO_64.  We can still
implement signed 64-bit with comparison and vpblendvb.  Since the
ISA has no unsigned comparison, it would take 4 insns to implement
unsigned 64-bit, which is probably quicker as integers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 64 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index efbd5a6fc9..7995fe3eab 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -186,7 +186,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
-#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_minmax_vec       1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3571483bae..c56753763a 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -392,6 +392,18 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_PCMPGTW     (0x65 | P_EXT | P_DATA16)
 #define OPC_PCMPGTD     (0x66 | P_EXT | P_DATA16)
 #define OPC_PCMPGTQ     (0x37 | P_EXT38 | P_DATA16)
+#define OPC_PMAXSB      (0x3c | P_EXT38 | P_DATA16)
+#define OPC_PMAXSW      (0xee | P_EXT | P_DATA16)
+#define OPC_PMAXSD      (0x3d | P_EXT38 | P_DATA16)
+#define OPC_PMAXUB      (0xde | P_EXT | P_DATA16)
+#define OPC_PMAXUW      (0x3e | P_EXT38 | P_DATA16)
+#define OPC_PMAXUD      (0x3f | P_EXT38 | P_DATA16)
+#define OPC_PMINSB      (0x38 | P_EXT38 | P_DATA16)
+#define OPC_PMINSW      (0xea | P_EXT | P_DATA16)
+#define OPC_PMINSD      (0x39 | P_EXT38 | P_DATA16)
+#define OPC_PMINUB      (0xda | P_EXT | P_DATA16)
+#define OPC_PMINUW      (0x3a | P_EXT38 | P_DATA16)
+#define OPC_PMINUD      (0x3b | P_EXT38 | P_DATA16)
 #define OPC_PMOVSXBW    (0x20 | P_EXT38 | P_DATA16)
 #define OPC_PMOVSXWD    (0x23 | P_EXT38 | P_DATA16)
 #define OPC_PMOVSXDQ    (0x25 | P_EXT38 | P_DATA16)
@@ -2638,6 +2650,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     static int const packus_insn[4] = {
         OPC_PACKUSWB, OPC_PACKUSDW, OPC_UD2, OPC_UD2
     };
+    static int const smin_insn[4] = {
+        OPC_PMINSB, OPC_PMINSW, OPC_PMINSD, OPC_UD2
+    };
+    static int const smax_insn[4] = {
+        OPC_PMAXSB, OPC_PMAXSW, OPC_PMAXSD, OPC_UD2
+    };
+    static int const umin_insn[4] = {
+        OPC_PMINUB, OPC_PMINUW, OPC_PMINUD, OPC_UD2
+    };
+    static int const umax_insn[4] = {
+        OPC_PMAXUB, OPC_PMAXUW, OPC_PMAXUD, OPC_UD2
+    };
 
     TCGType type = vecl + TCG_TYPE_V64;
     int insn, sub;
@@ -2678,6 +2702,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_xor_vec:
         insn = OPC_PXOR;
         goto gen_simd;
+    case INDEX_op_smin_vec:
+        insn = smin_insn[vece];
+        goto gen_simd;
+    case INDEX_op_umin_vec:
+        insn = umin_insn[vece];
+        goto gen_simd;
+    case INDEX_op_smax_vec:
+        insn = smax_insn[vece];
+        goto gen_simd;
+    case INDEX_op_umax_vec:
+        insn = umax_insn[vece];
+        goto gen_simd;
     case INDEX_op_x86_punpckl_vec:
         insn = punpckl_insn[vece];
         goto gen_simd;
@@ -3043,6 +3079,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_usadd_vec:
     case INDEX_op_sssub_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umin_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_umax_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_x86_shufps_vec:
     case INDEX_op_x86_blend_vec:
@@ -3115,6 +3155,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sssub_vec:
     case INDEX_op_ussub_vec:
         return vece <= MO_16;
+    case INDEX_op_smin_vec:
+    case INDEX_op_smax_vec:
+        return vece <= MO_32 ? 1 : -1;
+    case INDEX_op_umin_vec:
+    case INDEX_op_umax_vec:
+        return vece <= MO_32;
 
     default:
         return 0;
@@ -3370,6 +3416,24 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
         }
         break;
 
+    case INDEX_op_smin_vec:
+    case INDEX_op_smax_vec:
+        tcg_debug_assert(vece == MO_64);
+        a1 = va_arg(va, TCGArg);
+        a2 = va_arg(va, TCGArg);
+        t1 = tcg_temp_new_vec(type);
+        vec_gen_4(INDEX_op_cmp_vec, type, MO_64,
+                  tcgv_vec_arg(t1), a1, a2, TCG_COND_GT);
+        if (opc == INDEX_op_smin_vec) {
+            vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, MO_64,
+                      tcgv_vec_arg(v0), a2, a1, tcgv_vec_arg(t1));
+        } else {
+            vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, MO_64,
+                      tcgv_vec_arg(v0), a1, a2, tcgv_vec_arg(t1));
+        }
+        tcg_temp_free_vec(t1);
+        break;
+
     default:
         break;
     }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 09/34] target/arm: Use vector minmax expanders for aarch64
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 08/34] tcg/i386: Implement " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 10/34] target/arm: Use vector minmax expanders for aarch32 Richard Henderson
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 2d6f8c1b4f..bef21ada71 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10452,6 +10452,20 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
     }
 
     switch (opcode) {
+    case 0x0c: /* SMAX, UMAX */
+        if (u) {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
+        } else {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smax, size);
+        }
+        return;
+    case 0x0d: /* SMIN, UMIN */
+        if (u) {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umin, size);
+        } else {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smin, size);
+        }
+        return;
     case 0x10: /* ADD, SUB */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_sub, size);
@@ -10613,27 +10627,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genenvfn = fns[size][u];
                 break;
             }
-            case 0xc: /* SMAX, UMAX */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_max_s8, gen_helper_neon_max_u8 },
-                    { gen_helper_neon_max_s16, gen_helper_neon_max_u16 },
-                    { tcg_gen_smax_i32, tcg_gen_umax_i32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
-
-            case 0xd: /* SMIN, UMIN */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_min_s8, gen_helper_neon_min_u8 },
-                    { gen_helper_neon_min_s16, gen_helper_neon_min_u16 },
-                    { tcg_gen_smin_i32, tcg_gen_umin_i32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0xe: /* SABD, UABD */
             case 0xf: /* SABA, UABA */
             {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 10/34] target/arm: Use vector minmax expanders for aarch32
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 09/34] target/arm: Use vector minmax expanders for aarch64 Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Richard Henderson
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 33b1860148..f3f172f384 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6368,6 +6368,25 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
                              rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
             return 0;
+
+        case NEON_3R_VMAX:
+            if (u) {
+                tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            } else {
+                tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            }
+            return 0;
+        case NEON_3R_VMIN:
+            if (u) {
+                tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            } else {
+                tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            }
+            return 0;
         }
 
         if (size == 3) {
@@ -6533,12 +6552,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQRSHL:
             GEN_NEON_INTEGER_OP_ENV(qrshl);
             break;
-        case NEON_3R_VMAX:
-            GEN_NEON_INTEGER_OP(max);
-            break;
-        case NEON_3R_VMIN:
-            GEN_NEON_INTEGER_OP(min);
-            break;
         case NEON_3R_VABD:
             GEN_NEON_INTEGER_OP(abd);
             break;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 10/34] target/arm: Use vector minmax expanders for aarch32 Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:15   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Richard Henderson
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

These helpers allow us to move FP register values to/from the specified TCGv_i64
argument in the VSR helpers to be introduced shortly.

To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20181217122405.18732-2-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate.c             |  10 +
 target/ppc/translate/fp-impl.inc.c | 490 ++++++++++++++++++++++-------
 2 files changed, 390 insertions(+), 110 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 2b37910248..1d4bf624a3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6694,6 +6694,16 @@ static inline void gen_##name(DisasContext *ctx)               \
 GEN_TM_PRIV_NOOP(treclaim);
 GEN_TM_PRIV_NOOP(trechkpt);
 
+static inline void get_fpr(TCGv_i64 dst, int regno)
+{
+    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+}
+
+static inline void set_fpr(int regno, TCGv_i64 src)
+{
+    tcg_gen_mov_i64(cpu_fpr[regno], src);
+}
+
 #include "translate/fp-impl.inc.c"
 
 #include "translate/vmx-impl.inc.c"
diff --git a/target/ppc/translate/fp-impl.inc.c b/target/ppc/translate/fp-impl.inc.c
index 08770ba9f5..04b8733055 100644
--- a/target/ppc/translate/fp-impl.inc.c
+++ b/target/ppc/translate/fp-impl.inc.c
@@ -34,24 +34,38 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 #define _GEN_FLOAT_ACB(name, op, op1, op2, isfloat, set_fprf, type)           \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
+    TCGv_i64 t3;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
+    t3 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);     \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rC(ctx->opcode));                                             \
+    get_fpr(t2, rB(ctx->opcode));                                             \
+    gen_helper_f##op(t3, cpu_env, t0, t1, t2);                                \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        get_fpr(t0, rD(ctx->opcode));                                         \
+        gen_helper_frsp(t3, cpu_env, t0);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t3);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t3);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
+    tcg_temp_free_i64(t3);                                                    \
 }
 
 #define GEN_FLOAT_ACB(name, op2, set_fprf, type)                              \
@@ -61,24 +75,34 @@ _GEN_FLOAT_ACB(name##s, name, 0x3B, op2, 1, set_fprf, type);
 #define _GEN_FLOAT_AB(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rB(ctx->opcode)]);                               \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rB(ctx->opcode));                                             \
+    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        get_fpr(t0, rD(ctx->opcode));                                         \
+        gen_helper_frsp(t2, cpu_env, t0);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t2);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t2);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
 }
 #define GEN_FLOAT_AB(name, op2, inval, set_fprf, type)                        \
 _GEN_FLOAT_AB(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
@@ -87,24 +111,35 @@ _GEN_FLOAT_AB(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
 #define _GEN_FLOAT_AC(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rC(ctx->opcode)]);                               \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rC(ctx->opcode));                                             \
+    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
+    set_fpr(rD(ctx->opcode), t2);                                             \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        get_fpr(t0, rD(ctx->opcode));                                         \
+        gen_helper_frsp(t2, cpu_env, t0);                                     \
+        set_fpr(rD(ctx->opcode), t2);                                         \
     }                                                                         \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t2);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
 }
 #define GEN_FLOAT_AC(name, op2, inval, set_fprf, type)                        \
 _GEN_FLOAT_AC(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
@@ -113,37 +148,51 @@ _GEN_FLOAT_AC(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
 #define GEN_FLOAT_B(name, op2, op3, set_fprf, type)                           \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
-                       cpu_fpr[rB(ctx->opcode)]);                             \
+    get_fpr(t0, rB(ctx->opcode));                                             \
+    gen_helper_f##name(t1, cpu_env, t0);                                      \
+    set_fpr(rD(ctx->opcode), t1);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t1);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
 }
 
 #define GEN_FLOAT_BS(name, op1, op2, set_fprf, type)                          \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
-                       cpu_fpr[rB(ctx->opcode)]);                             \
+    get_fpr(t0, rB(ctx->opcode));                                             \
+    gen_helper_f##name(t1, cpu_env, t0);                                      \
+    set_fpr(rD(ctx->opcode), t1);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t1);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
 }
 
 /* fadd - fadds */
@@ -165,19 +214,25 @@ GEN_FLOAT_BS(rsqrte, 0x3F, 0x1A, 1, PPC_FLOAT_FRSQRTE);
 /* frsqrtes */
 static void gen_frsqrtes(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_frsqrte(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                       cpu_fpr[rB(ctx->opcode)]);
-    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                    cpu_fpr[rD(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_frsqrte(t1, cpu_env, t0);
+    gen_helper_frsp(t1, cpu_env, t1);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fsel */
@@ -189,34 +244,47 @@ GEN_FLOAT_AB(sub, 0x14, 0x000007C0, 1, PPC_FLOAT);
 /* fsqrt */
 static void gen_fsqrt(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                     cpu_fpr[rB(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_fsqrt(t1, cpu_env, t0);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_fsqrts(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                     cpu_fpr[rB(ctx->opcode)]);
-    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                    cpu_fpr[rD(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_fsqrt(t1, cpu_env, t0);
+    gen_helper_frsp(t1, cpu_env, t1);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /***                     Floating-Point multiply-and-add                   ***/
@@ -268,21 +336,32 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
 
 static void gen_ftdiv(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], t0, t1);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_ftsqrt(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
 
@@ -293,32 +372,46 @@ static void gen_ftsqrt(DisasContext *ctx)
 static void gen_fcmpo(DisasContext *ctx)
 {
     TCGv_i32 crf;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
     crf = tcg_const_i32(crfD(ctx->opcode));
-    gen_helper_fcmpo(cpu_env, cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)], crf);
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_fcmpo(cpu_env, t0, t1, crf);
     tcg_temp_free_i32(crf);
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fcmpu */
 static void gen_fcmpu(DisasContext *ctx)
 {
     TCGv_i32 crf;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
     crf = tcg_const_i32(crfD(ctx->opcode));
-    gen_helper_fcmpu(cpu_env, cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)], crf);
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_fcmpu(cpu_env, t0, t1, crf);
     tcg_temp_free_i32(crf);
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /***                         Floating-point move                           ***/
@@ -326,100 +419,153 @@ static void gen_fcmpu(DisasContext *ctx)
 /* XXX: beware that fabs never checks for NaNs nor update FPSCR */
 static void gen_fabs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_andi_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                     ~(1ULL << 63));
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_andi_i64(t1, t0, ~(1ULL << 63));
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fmr  - fmr. */
 /* XXX: beware that fmr never checks for NaNs nor update FPSCR */
 static void gen_fmr(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_mov_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    set_fpr(rD(ctx->opcode), t0);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
 }
 
 /* fnabs */
 /* XXX: beware that fnabs never checks for NaNs nor update FPSCR */
 static void gen_fnabs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_ori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                    1ULL << 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_ori_i64(t1, t0, 1ULL << 63);
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fneg */
 /* XXX: beware that fneg never checks for NaNs nor update FPSCR */
 static void gen_fneg(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_xori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                     1ULL << 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_xori_i64(t1, t0, 1ULL << 63);
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fcpsgn: PowerPC 2.05 specification */
 /* XXX: beware that fcpsgn never checks for NaNs nor update FPSCR */
 static void gen_fcpsgn(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
+    TCGv_i64 t2;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                        cpu_fpr[rB(ctx->opcode)], 0, 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    tcg_gen_deposit_i64(t2, t0, t1, 0, 63);
+    set_fpr(rD(ctx->opcode), t2);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
 }
 
 static void gen_fmrgew(DisasContext *ctx)
 {
     TCGv_i64 b0;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     b0 = tcg_temp_new_i64();
-    tcg_gen_shri_i64(b0, cpu_fpr[rB(ctx->opcode)], 32);
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                        b0, 0, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_shri_i64(b0, t0, 32);
+    get_fpr(t0, rA(ctx->opcode));
+    tcg_gen_deposit_i64(t1, t0, b0, 0, 32);
+    set_fpr(rD(ctx->opcode), t1);
     tcg_temp_free_i64(b0);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_fmrgow(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
+    TCGv_i64 t2;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)],
-                        cpu_fpr[rB(ctx->opcode)],
-                        cpu_fpr[rA(ctx->opcode)],
-                        32, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    get_fpr(t1, rA(ctx->opcode));
+    tcg_gen_deposit_i64(t2, t0, t1, 32, 32);
+    set_fpr(rD(ctx->opcode), t2);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
 }
 
 /***                  Floating-Point status & ctrl register                ***/
@@ -458,15 +604,19 @@ static void gen_mcrfs(DisasContext *ctx)
 /* mffs */
 static void gen_mffs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    tcg_gen_extu_tl_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpscr);
+    tcg_gen_extu_tl_i64(t0, cpu_fpscr);
+    set_fpr(rD(ctx->opcode), t0);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
 }
 
 /* mtfsb0 */
@@ -522,6 +672,7 @@ static void gen_mtfsb1(DisasContext *ctx)
 static void gen_mtfsf(DisasContext *ctx)
 {
     TCGv_i32 t0;
+    TCGv_i64 t1;
     int flm, l, w;
 
     if (unlikely(!ctx->fpu_enabled)) {
@@ -541,7 +692,9 @@ static void gen_mtfsf(DisasContext *ctx)
     } else {
         t0 = tcg_const_i32(flm << (w * 8));
     }
-    gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
+    t1 = tcg_temp_new_i64();
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_store_fpscr(cpu_env, t1, t0);
     tcg_temp_free_i32(t0);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
@@ -549,6 +702,7 @@ static void gen_mtfsf(DisasContext *ctx)
     }
     /* We can raise a differed exception */
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t1);
 }
 
 /* mtfsfi */
@@ -588,21 +742,26 @@ static void gen_mtfsfi(DisasContext *ctx)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDUF(name, ldop, opc, type)                                       \
 static void glue(gen_, name##u)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -613,20 +772,25 @@ static void glue(gen_, name##u)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDUXF(name, ldop, opc, type)                                      \
 static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
     if (unlikely(rA(ctx->opcode) == 0)) {                                     \
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);                   \
         return;                                                               \
@@ -634,24 +798,30 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDXF(name, ldop, opc2, opc3, type)                                \
 static void glue(gen_, name##x)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDFS(name, ldop, op, type)                                        \
@@ -677,6 +847,7 @@ GEN_LDFS(lfs, ld32fs, 0x10, PPC_FLOAT);
 static void gen_lfdepx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     CHK_SV;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
@@ -684,16 +855,19 @@ static void gen_lfdepx(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_qemu_ld_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_LOAD,
-        DEF_MEMOP(MO_Q));
+    tcg_gen_qemu_ld_i64(t0, EA, PPC_TLB_EPID_LOAD, DEF_MEMOP(MO_Q));
+    set_fpr(rD(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfdp */
 static void gen_lfdp(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -701,24 +875,31 @@ static void gen_lfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);
+    t0 = tcg_temp_new_i64();
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfdpx */
 static void gen_lfdpx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -726,18 +907,24 @@ static void gen_lfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
+    t0 = tcg_temp_new_i64();
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfiwax */
@@ -745,6 +932,7 @@ static void gen_lfiwax(DisasContext *ctx)
 {
     TCGv EA;
     TCGv t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -752,47 +940,59 @@ static void gen_lfiwax(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
     gen_qemu_ld32s(ctx, t0, EA);
-    tcg_gen_ext_tl_i64(cpu_fpr[rD(ctx->opcode)], t0);
+    tcg_gen_ext_tl_i64(t1, t0);
+    set_fpr(rD(ctx->opcode), t1);
     tcg_temp_free(EA);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* lfiwzx */
 static void gen_lfiwzx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld32u_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+    gen_qemu_ld32u_i64(ctx, t0, EA);
+    set_fpr(rD(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 /***                         Floating-point store                          ***/
 #define GEN_STF(name, stop, opc, type)                                        \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STUF(name, stop, opc, type)                                       \
 static void glue(gen_, name##u)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -803,16 +1003,20 @@ static void glue(gen_, name##u)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STUXF(name, stop, opc, type)                                      \
 static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -823,25 +1027,32 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STXF(name, stop, opc2, opc3, type)                                \
 static void glue(gen_, name##x)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STFS(name, stop, op, type)                                        \
@@ -867,6 +1078,7 @@ GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT);
 static void gen_stfdepx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     CHK_SV;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
@@ -874,60 +1086,76 @@ static void gen_stfdepx(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_qemu_st_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_STORE,
-                       DEF_MEMOP(MO_Q));
+    get_fpr(t0, rD(ctx->opcode));
+    tcg_gen_qemu_st_i64(t0, EA, PPC_TLB_EPID_STORE, DEF_MEMOP(MO_Q));
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* stfdp */
 static void gen_stfdp(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, EA, 0);
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
     } else {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* stfdpx */
 static void gen_stfdpx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
     } else {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* Optional: */
@@ -949,13 +1177,18 @@ static void gen_lfq(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr(rd, t1);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr((rd + 1) % 32, t1);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* lfqu */
@@ -964,17 +1197,22 @@ static void gen_lfqu(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     t1 = tcg_temp_new();
+    t2 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t2, t0);
+    set_fpr(rd, t2);
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, t2, t1);
+    set_fpr((rd + 1) % 32, t2);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
+    tcg_temp_free_i64(t2);
 }
 
 /* lfqux */
@@ -984,16 +1222,21 @@ static void gen_lfqux(DisasContext *ctx)
     int rd = rD(ctx->opcode);
     gen_set_access_type(ctx, ACCESS_FLOAT);
     TCGv t0, t1;
+    TCGv_i64 t2;
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t2, t0);
+    set_fpr(rd, t2);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, t2, t1);
+    set_fpr((rd + 1) % 32, t2);
     tcg_temp_free(t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* lfqx */
@@ -1001,13 +1244,18 @@ static void gen_lfqx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr(rd, t1);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr((rd + 1) % 32, t1);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* stfq */
@@ -1015,13 +1263,18 @@ static void gen_stfq(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t1, rd);
+    gen_qemu_st64_i64(ctx, t1, t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    get_fpr(t1, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t1, t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* stfqu */
@@ -1030,17 +1283,23 @@ static void gen_stfqu(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t2, rd);
+    gen_qemu_st64_i64(ctx, t2, t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    get_fpr(t2, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t2, t1);
     tcg_temp_free(t1);
-    if (ra != 0)
+    if (ra != 0) {
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
+    }
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* stfqux */
@@ -1049,17 +1308,23 @@ static void gen_stfqux(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t2, rd);
+    gen_qemu_st64_i64(ctx, t2, t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    get_fpr(t2, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t2, t1);
     tcg_temp_free(t1);
-    if (ra != 0)
+    if (ra != 0) {
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
+    }
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* stfqx */
@@ -1067,13 +1332,18 @@ static void gen_stfqx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t1 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t1, rd);
+    gen_qemu_st64_i64(ctx, t1, t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    get_fpr(t1, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t1, t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 #undef _GEN_FLOAT_ACB
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX register access
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:15   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Richard Henderson
                   ` (23 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

These helpers allow us to move AVR register values to/from the specified TCGv_i64
argument.

To prevent VMX helpers accessing the cpu_avr{l,h} arrays directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20181217122405.18732-3-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate.c              |  10 +++
 target/ppc/translate/vmx-impl.inc.c | 128 ++++++++++++++++++++++------
 2 files changed, 110 insertions(+), 28 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 1d4bf624a3..fa3e8dc114 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6704,6 +6704,16 @@ static inline void set_fpr(int regno, TCGv_i64 src)
     tcg_gen_mov_i64(cpu_fpr[regno], src);
 }
 
+static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
+{
+    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+}
+
+static inline void set_avr64(int regno, TCGv_i64 src, bool high)
+{
+    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+}
+
 #include "translate/fp-impl.inc.c"
 
 #include "translate/vmx-impl.inc.c"
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 3cb6fc2926..30046c6e31 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -18,52 +18,66 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 avr;                                                             \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
+    avr = tcg_temp_new_i64();                                                 \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, false);                               \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, true);                                \
     } else {                                                                  \
-        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, true);                                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, false);                               \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(avr);                                                   \
 }
 
 #define GEN_VR_STX(name, opc2, opc3)                                          \
 static void gen_st##name(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 avr;                                                             \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
+    avr = tcg_temp_new_i64();                                                 \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), false);                               \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), true);                                \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
     } else {                                                                  \
-        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), true);                                \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), false);                               \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(avr);                                                   \
 }
 
 #define GEN_VR_LVE(name, opc2, opc3, size)                              \
@@ -159,15 +173,20 @@ static void gen_lvsr(DisasContext *ctx)
 static void gen_mfvscr(DisasContext *ctx)
 {
     TCGv_i32 t;
+    TCGv_i64 avr;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
         return;
     }
-    tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);
+    avr = tcg_temp_new_i64();
+    tcg_gen_movi_i64(avr, 0);
+    set_avr64(rD(ctx->opcode), avr, true);
     t = tcg_temp_new_i32();
     tcg_gen_ld_i32(t, cpu_env, offsetof(CPUPPCState, vscr));
-    tcg_gen_extu_i32_i64(cpu_avrl[rD(ctx->opcode)], t);
+    tcg_gen_extu_i32_i64(avr, t);
+    set_avr64(rD(ctx->opcode), avr, false);
     tcg_temp_free_i32(t);
+    tcg_temp_free_i64(avr);
 }
 
 static void gen_mtvscr(DisasContext *ctx)
@@ -188,6 +207,7 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     TCGv_i64 t0 = tcg_temp_new_i64();                                   \
     TCGv_i64 t1 = tcg_temp_new_i64();                                   \
     TCGv_i64 t2 = tcg_temp_new_i64();                                   \
+    TCGv_i64 avr = tcg_temp_new_i64();                                  \
     TCGv_i64 ten, z;                                                    \
                                                                         \
     if (unlikely(!ctx->altivec_enabled)) {                              \
@@ -199,26 +219,35 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     z = tcg_const_i64(0);                                               \
                                                                         \
     if (add_cin) {                                                      \
-        tcg_gen_mulu2_i64(t0, t1, cpu_avrl[rA(ctx->opcode)], ten);      \
-        tcg_gen_andi_i64(t2, cpu_avrl[rB(ctx->opcode)], 0xF);           \
-        tcg_gen_add2_i64(cpu_avrl[rD(ctx->opcode)], t2, t0, t1, t2, z); \
+        get_avr64(avr, rA(ctx->opcode), false);                         \
+        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
+        get_avr64(avr, rB(ctx->opcode), false);                         \
+        tcg_gen_andi_i64(t2, avr, 0xF);                                 \
+        tcg_gen_add2_i64(avr, t2, t0, t1, t2, z);                       \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
     } else {                                                            \
-        tcg_gen_mulu2_i64(cpu_avrl[rD(ctx->opcode)], t2,                \
-                          cpu_avrl[rA(ctx->opcode)], ten);              \
+        get_avr64(avr, rA(ctx->opcode), false);                         \
+        tcg_gen_mulu2_i64(avr, t2, avr, ten);                           \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
     }                                                                   \
                                                                         \
     if (ret_carry) {                                                    \
-        tcg_gen_mulu2_i64(t0, t1, cpu_avrh[rA(ctx->opcode)], ten);      \
-        tcg_gen_add2_i64(t0, cpu_avrl[rD(ctx->opcode)], t0, t1, t2, z); \
-        tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);                 \
+        get_avr64(avr, rA(ctx->opcode), true);                          \
+        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
+        tcg_gen_add2_i64(t0, avr, t0, t1, t2, z);                       \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
+        set_avr64(rD(ctx->opcode), z, true);                            \
     } else {                                                            \
-        tcg_gen_mul_i64(t0, cpu_avrh[rA(ctx->opcode)], ten);            \
-        tcg_gen_add_i64(cpu_avrh[rD(ctx->opcode)], t0, t2);             \
+        get_avr64(avr, rA(ctx->opcode), true);                          \
+        tcg_gen_mul_i64(t0, avr, ten);                                  \
+        tcg_gen_add_i64(avr, t0, t2);                                   \
+        set_avr64(rD(ctx->opcode), avr, true);                          \
     }                                                                   \
                                                                         \
     tcg_temp_free_i64(t0);                                              \
     tcg_temp_free_i64(t1);                                              \
     tcg_temp_free_i64(t2);                                              \
+    tcg_temp_free_i64(avr);                                             \
     tcg_temp_free_i64(ten);                                             \
     tcg_temp_free_i64(z);                                               \
 }                                                                       \
@@ -232,12 +261,27 @@ GEN_VX_VMUL10(vmul10ecuq, 1, 1);
 #define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
 {                                                                       \
+    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
+    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
+    TCGv_i64 avr = tcg_temp_new_i64();                                  \
+                                                                        \
     if (unlikely(!ctx->altivec_enabled)) {                              \
         gen_exception(ctx, POWERPC_EXCP_VPU);                           \
         return;                                                         \
     }                                                                   \
-    tcg_op(cpu_avrh[rD(ctx->opcode)], cpu_avrh[rA(ctx->opcode)], cpu_avrh[rB(ctx->opcode)]); \
-    tcg_op(cpu_avrl[rD(ctx->opcode)], cpu_avrl[rA(ctx->opcode)], cpu_avrl[rB(ctx->opcode)]); \
+    get_avr64(t0, rA(ctx->opcode), true);                               \
+    get_avr64(t1, rB(ctx->opcode), true);                               \
+    tcg_op(avr, t0, t1);                                                \
+    set_avr64(rD(ctx->opcode), avr, true);                              \
+                                                                        \
+    get_avr64(t0, rA(ctx->opcode), false);                              \
+    get_avr64(t1, rB(ctx->opcode), false);                              \
+    tcg_op(avr, t0, t1);                                                \
+    set_avr64(rD(ctx->opcode), avr, false);                             \
+                                                                        \
+    tcg_temp_free_i64(t0);                                              \
+    tcg_temp_free_i64(t1);                                              \
+    tcg_temp_free_i64(avr);                                             \
 }
 
 GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
@@ -406,6 +450,7 @@ GEN_VXFORM(vmrglw, 6, 6);
 static void gen_vmrgew(DisasContext *ctx)
 {
     TCGv_i64 tmp;
+    TCGv_i64 avr;
     int VT, VA, VB;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -415,15 +460,28 @@ static void gen_vmrgew(DisasContext *ctx)
     VA = rA(ctx->opcode);
     VB = rB(ctx->opcode);
     tmp = tcg_temp_new_i64();
-    tcg_gen_shri_i64(tmp, cpu_avrh[VB], 32);
-    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VA], tmp, 0, 32);
-    tcg_gen_shri_i64(tmp, cpu_avrl[VB], 32);
-    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VA], tmp, 0, 32);
+    avr = tcg_temp_new_i64();
+
+    get_avr64(avr, VB, true);
+    tcg_gen_shri_i64(tmp, avr, 32);
+    get_avr64(avr, VA, true);
+    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
+    set_avr64(VT, avr, true);
+
+    get_avr64(avr, VB, false);
+    tcg_gen_shri_i64(tmp, avr, 32);
+    get_avr64(avr, VA, false);
+    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
+    set_avr64(VT, avr, false);
+
     tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(avr);
 }
 
 static void gen_vmrgow(DisasContext *ctx)
 {
+    TCGv_i64 t0, t1;
+    TCGv_i64 avr;
     int VT, VA, VB;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -432,9 +490,23 @@ static void gen_vmrgow(DisasContext *ctx)
     VT = rD(ctx->opcode);
     VA = rA(ctx->opcode);
     VB = rB(ctx->opcode);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    avr = tcg_temp_new_i64();
 
-    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VB], cpu_avrh[VA], 32, 32);
-    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VB], cpu_avrl[VA], 32, 32);
+    get_avr64(t0, VB, true);
+    get_avr64(t1, VA, true);
+    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
+    set_avr64(VT, avr, true);
+
+    get_avr64(t0, VB, false);
+    get_avr64(t1, VA, false);
+    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
+    set_avr64(VT, avr, false);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(avr);
 }
 
 GEN_VXFORM(vmuloub, 4, 0);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:17   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Richard Henderson
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

These helpers allow us to move VSR register values to/from the specified TCGv_i64
argument.

To prevent VSX helpers accessing the cpu_vsr array directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20181217122405.18732-4-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate/vsx-impl.inc.c | 782 ++++++++++++++++++++--------
 1 file changed, 561 insertions(+), 221 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 85ed135d44..e9a05d66f7 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1,20 +1,48 @@
 /***                           VSX extension                               ***/
 
-static inline TCGv_i64 cpu_vsrh(int n)
+static inline void get_vsr(TCGv_i64 dst, int n)
+{
+    tcg_gen_mov_i64(dst, cpu_vsr[n]);
+}
+
+static inline void set_vsr(int n, TCGv_i64 src)
+{
+    tcg_gen_mov_i64(cpu_vsr[n], src);
+}
+
+static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
 {
     if (n < 32) {
-        return cpu_fpr[n];
+        get_fpr(dst, n);
     } else {
-        return cpu_avrh[n-32];
+        get_avr64(dst, n - 32, true);
     }
 }
 
-static inline TCGv_i64 cpu_vsrl(int n)
+static inline void get_cpu_vsrl(TCGv_i64 dst, int n)
 {
     if (n < 32) {
-        return cpu_vsr[n];
+        get_vsr(dst, n);
     } else {
-        return cpu_avrl[n-32];
+        get_avr64(dst, n - 32, false);
+    }
+}
+
+static inline void set_cpu_vsrh(int n, TCGv_i64 src)
+{
+    if (n < 32) {
+        set_fpr(n, src);
+    } else {
+        set_avr64(n - 32, src, true);
+    }
+}
+
+static inline void set_cpu_vsrl(int n, TCGv_i64 src)
+{
+    if (n < 32) {
+        set_vsr(n, src);
+    } else {
+        set_avr64(n - 32, src, false);
     }
 }
 
@@ -22,16 +50,20 @@ static inline TCGv_i64 cpu_vsrl(int n)
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
     TCGv EA;                                                  \
+    TCGv_i64 t0;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
+    t0 = tcg_temp_new_i64();                                  \
     gen_set_access_type(ctx, ACCESS_INT);                     \
     EA = tcg_temp_new();                                      \
     gen_addr_reg_index(ctx, EA);                              \
-    gen_qemu_##operation(ctx, cpu_vsrh(xT(ctx->opcode)), EA); \
+    gen_qemu_##operation(ctx, t0, EA);                        \
+    set_cpu_vsrh(xT(ctx->opcode), t0);                        \
     /* NOTE: cpu_vsrl is undefined */                         \
     tcg_temp_free(EA);                                        \
+    tcg_temp_free_i64(t0);                                    \
 }
 
 VSX_LOAD_SCALAR(lxsdx, ld64_i64)
@@ -44,39 +76,54 @@ VSX_LOAD_SCALAR(lxsspx, ld32fs)
 static void gen_lxvd2x(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_ld64_i64(ctx, cpu_vsrl(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrl(xT(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_lxvdsx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_gen_mov_i64(t1, t0);
+    set_cpu_vsrl(xT(ctx->opcode), t1);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_lxvw4x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrh(xtl, xT(ctx->opcode));
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -104,6 +151,8 @@ static void gen_lxvw4x(DisasContext *ctx)
         tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl,
@@ -151,8 +200,10 @@ static void gen_bswap32x4(TCGv_i64 outh, TCGv_i64 outl,
 static void gen_lxvh8x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrh(xtl, xT(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -169,13 +220,17 @@ static void gen_lxvh8x(DisasContext *ctx)
         gen_bswap16x8(xth, xtl, xth, xtl);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_lxvb16x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrh(xtl, xT(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -188,6 +243,8 @@ static void gen_lxvb16x(DisasContext *ctx)
     tcg_gen_addi_tl(EA, EA, 8);
     tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 #define VSX_VECTOR_LOAD_STORE(name, op, indexed)            \
@@ -195,15 +252,16 @@ static void gen_##name(DisasContext *ctx)                   \
 {                                                           \
     int xt;                                                 \
     TCGv EA;                                                \
-    TCGv_i64 xth, xtl;                                      \
+    TCGv_i64 xth = tcg_temp_new_i64();                      \
+    TCGv_i64 xtl = tcg_temp_new_i64();                      \
                                                             \
     if (indexed) {                                          \
         xt = xT(ctx->opcode);                               \
     } else {                                                \
         xt = DQxT(ctx->opcode);                             \
     }                                                       \
-    xth = cpu_vsrh(xt);                                     \
-    xtl = cpu_vsrl(xt);                                     \
+    get_cpu_vsrh(xth, xt);                                  \
+    get_cpu_vsrl(xtl, xt);                                  \
                                                             \
     if (xt < 32) {                                          \
         if (unlikely(!ctx->vsx_enabled)) {                  \
@@ -225,14 +283,20 @@ static void gen_##name(DisasContext *ctx)                   \
     }                                                       \
     if (ctx->le_mode) {                                     \
         tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_LEQ);   \
+        set_cpu_vsrl(xt, xtl);                              \
         tcg_gen_addi_tl(EA, EA, 8);                         \
         tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_LEQ);   \
+        set_cpu_vsrh(xt, xth);                              \
     } else {                                                \
         tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_BEQ);   \
+        set_cpu_vsrh(xt, xth);                              \
         tcg_gen_addi_tl(EA, EA, 8);                         \
         tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_BEQ);   \
+        set_cpu_vsrl(xt, xtl);                              \
     }                                                       \
     tcg_temp_free(EA);                                      \
+    tcg_temp_free_i64(xth);                                 \
+    tcg_temp_free_i64(xtl);                                 \
 }
 
 VSX_VECTOR_LOAD_STORE(lxv, ld_i64, 0)
@@ -276,7 +340,8 @@ VSX_VECTOR_LOAD_STORE_LENGTH(stxvll)
 static void gen_##name(DisasContext *ctx)                         \
 {                                                                 \
     TCGv EA;                                                      \
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
+    TCGv_i64 xth = tcg_temp_new_i64();                            \
+    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
                                                                   \
     if (unlikely(!ctx->altivec_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VPU);                     \
@@ -286,8 +351,10 @@ static void gen_##name(DisasContext *ctx)                         \
     EA = tcg_temp_new();                                          \
     gen_addr_imm_index(ctx, EA, 0x03);                            \
     gen_qemu_##operation(ctx, xth, EA);                           \
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);                      \
     /* NOTE: cpu_vsrl is undefined */                             \
     tcg_temp_free(EA);                                            \
+    tcg_temp_free_i64(xth);                                       \
 }
 
 VSX_LOAD_SCALAR_DS(lxsd, ld64_i64)
@@ -297,15 +364,19 @@ VSX_LOAD_SCALAR_DS(lxssp, ld32fs)
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
     TCGv EA;                                                  \
+    TCGv_i64 t0;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
+    t0 = tcg_temp_new_i64();                                  \
     gen_set_access_type(ctx, ACCESS_INT);                     \
     EA = tcg_temp_new();                                      \
     gen_addr_reg_index(ctx, EA);                              \
-    gen_qemu_##operation(ctx, cpu_vsrh(xS(ctx->opcode)), EA); \
+    gen_qemu_##operation(ctx, t0, EA);                        \
+    set_cpu_vsrh(xS(ctx->opcode), t0);                        \
     tcg_temp_free(EA);                                        \
+    tcg_temp_free_i64(t0);                                    \
 }
 
 VSX_STORE_SCALAR(stxsdx, st64_i64)
@@ -318,6 +389,7 @@ VSX_STORE_SCALAR(stxsspx, st32fs)
 static void gen_stxvd2x(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -325,17 +397,23 @@ static void gen_stxvd2x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_st64_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
+    get_cpu_vsrh(t0, xS(ctx->opcode));
+    gen_qemu_st64_i64(ctx, t0, EA);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_st64_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
+    get_cpu_vsrl(t0, xS(ctx->opcode));
+    gen_qemu_st64_i64(ctx, t0, EA);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_stxvw4x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    TCGv_i64 xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -362,13 +440,17 @@ static void gen_stxvw4x(DisasContext *ctx)
         tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 static void gen_stxvh8x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    TCGv_i64 xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -393,13 +475,17 @@ static void gen_stxvh8x(DisasContext *ctx)
         tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 static void gen_stxvb16x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    TCGv_i64 xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -412,13 +498,16 @@ static void gen_stxvb16x(DisasContext *ctx)
     tcg_gen_addi_tl(EA, EA, 8);
     tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 #define VSX_STORE_SCALAR_DS(name, operation)                      \
 static void gen_##name(DisasContext *ctx)                         \
 {                                                                 \
     TCGv EA;                                                      \
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
+    TCGv_i64 xth = tcg_temp_new_i64();                            \
+    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
                                                                   \
     if (unlikely(!ctx->altivec_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VPU);                     \
@@ -430,62 +519,119 @@ static void gen_##name(DisasContext *ctx)                         \
     gen_qemu_##operation(ctx, xth, EA);                           \
     /* NOTE: cpu_vsrl is undefined */                             \
     tcg_temp_free(EA);                                            \
+    tcg_temp_free_i64(xth);                                       \
 }
 
 VSX_LOAD_SCALAR_DS(stxsd, st64_i64)
 VSX_LOAD_SCALAR_DS(stxssp, st32fs)
 
-#define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
-static void gen_##name(DisasContext *ctx)                       \
-{                                                               \
-    if (xS(ctx->opcode) < 32) {                                 \
-        if (unlikely(!ctx->fpu_enabled)) {                      \
-            gen_exception(ctx, POWERPC_EXCP_FPU);               \
-            return;                                             \
-        }                                                       \
-    } else {                                                    \
-        if (unlikely(!ctx->altivec_enabled)) {                  \
-            gen_exception(ctx, POWERPC_EXCP_VPU);               \
-            return;                                             \
-        }                                                       \
-    }                                                           \
-    TCGv_i64 tmp = tcg_temp_new_i64();                          \
-    tcg_gen_##tcgop1(tmp, source);                              \
-    tcg_gen_##tcgop2(target, tmp);                              \
-    tcg_temp_free_i64(tmp);                                     \
+static void gen_mfvsrwz(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    tcg_gen_ext32u_i64(tmp, xsh);
+    tcg_gen_trunc_i64_tl(cpu_gpr[rA(ctx->opcode)], tmp);
+    tcg_temp_free_i64(tmp);
 }
 
+static void gen_mtvsrwa(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_ext32s_i64(xsh, tmp);
+    set_cpu_vsrh(xT(ctx->opcode), xsh);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
+}
 
-MV_VSRW(mfvsrwz, ext32u_i64, trunc_i64_tl, cpu_gpr[rA(ctx->opcode)], \
-        cpu_vsrh(xS(ctx->opcode)))
-MV_VSRW(mtvsrwa, extu_tl_i64, ext32s_i64, cpu_vsrh(xT(ctx->opcode)), \
-        cpu_gpr[rA(ctx->opcode)])
-MV_VSRW(mtvsrwz, extu_tl_i64, ext32u_i64, cpu_vsrh(xT(ctx->opcode)), \
-        cpu_gpr[rA(ctx->opcode)])
+static void gen_mtvsrwz(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_ext32u_i64(xsh, tmp);
+    set_cpu_vsrh(xT(ctx->opcode), xsh);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
+}
 
 #if defined(TARGET_PPC64)
-#define MV_VSRD(name, target, source)                           \
-static void gen_##name(DisasContext *ctx)                       \
-{                                                               \
-    if (xS(ctx->opcode) < 32) {                                 \
-        if (unlikely(!ctx->fpu_enabled)) {                      \
-            gen_exception(ctx, POWERPC_EXCP_FPU);               \
-            return;                                             \
-        }                                                       \
-    } else {                                                    \
-        if (unlikely(!ctx->altivec_enabled)) {                  \
-            gen_exception(ctx, POWERPC_EXCP_VPU);               \
-            return;                                             \
-        }                                                       \
-    }                                                           \
-    tcg_gen_mov_i64(target, source);                            \
+static void gen_mfvsrd(DisasContext *ctx)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    get_cpu_vsrh(t0, xS(ctx->opcode));
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
-MV_VSRD(mfvsrd, cpu_gpr[rA(ctx->opcode)], cpu_vsrh(xS(ctx->opcode)))
-MV_VSRD(mtvsrd, cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)])
+static void gen_mtvsrd(DisasContext *ctx)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
+}
 
 static void gen_mfvsrld(DisasContext *ctx)
 {
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (xS(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -497,12 +643,14 @@ static void gen_mfvsrld(DisasContext *ctx)
             return;
         }
     }
-
-    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_vsrl(xS(ctx->opcode)));
+    get_cpu_vsrl(t0, xS(ctx->opcode));
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_mtvsrdd(DisasContext *ctx)
 {
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (xT(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -516,16 +664,20 @@ static void gen_mtvsrdd(DisasContext *ctx)
     }
 
     if (!rA(ctx->opcode)) {
-        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);
+        tcg_gen_movi_i64(t0, 0);
     } else {
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)]);
+        tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
     }
+    set_cpu_vsrh(xT(ctx->opcode), t0);
 
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rB(ctx->opcode)]);
+    tcg_gen_mov_i64(t0, cpu_gpr[rB(ctx->opcode)]);
+    set_cpu_vsrl(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_mtvsrws(DisasContext *ctx)
 {
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (xT(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -538,55 +690,60 @@ static void gen_mtvsrws(DisasContext *ctx)
         }
     }
 
-    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)],
+    tcg_gen_deposit_i64(t0, cpu_gpr[rA(ctx->opcode)],
                         cpu_gpr[rA(ctx->opcode)], 32, 32);
-    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xT(ctx->opcode)));
+    set_cpu_vsrl(xT(ctx->opcode), t0);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
 }
 
 #endif
 
 static void gen_xxpermdi(DisasContext *ctx)
 {
+    TCGv_i64 xh, xl;
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
 
+    xh = tcg_temp_new_i64();
+    xl = tcg_temp_new_i64();
+
     if (unlikely((xT(ctx->opcode) == xA(ctx->opcode)) ||
                  (xT(ctx->opcode) == xB(ctx->opcode)))) {
-        TCGv_i64 xh, xl;
-
-        xh = tcg_temp_new_i64();
-        xl = tcg_temp_new_i64();
-
         if ((DM(ctx->opcode) & 2) == 0) {
-            tcg_gen_mov_i64(xh, cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xh, xA(ctx->opcode));
         } else {
-            tcg_gen_mov_i64(xh, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xh, xA(ctx->opcode));
         }
         if ((DM(ctx->opcode) & 1) == 0) {
-            tcg_gen_mov_i64(xl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xl, xB(ctx->opcode));
         } else {
-            tcg_gen_mov_i64(xl, cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(xl, xB(ctx->opcode));
         }
 
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xh);
-        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xl);
-
-        tcg_temp_free_i64(xh);
-        tcg_temp_free_i64(xl);
+        set_cpu_vsrh(xT(ctx->opcode), xh);
+        set_cpu_vsrl(xT(ctx->opcode), xl);
     } else {
         if ((DM(ctx->opcode) & 2) == 0) {
-            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xh, xA(ctx->opcode));
+            set_cpu_vsrh(xT(ctx->opcode), xh);
         } else {
-            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xh, xA(ctx->opcode));
+            set_cpu_vsrh(xT(ctx->opcode), xh);
         }
         if ((DM(ctx->opcode) & 1) == 0) {
-            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xl, xB(ctx->opcode));
+            set_cpu_vsrl(xT(ctx->opcode), xl);
         } else {
-            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(xl, xB(ctx->opcode));
+            set_cpu_vsrl(xT(ctx->opcode), xl);
         }
     }
+    tcg_temp_free_i64(xh);
+    tcg_temp_free_i64(xl);
 }
 
 #define OP_ABS 1
@@ -606,7 +763,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
         }                                                         \
         xb = tcg_temp_new_i64();                                  \
         sgm = tcg_temp_new_i64();                                 \
-        tcg_gen_mov_i64(xb, cpu_vsrh(xB(ctx->opcode)));           \
+        get_cpu_vsrh(xb, xB(ctx->opcode));                        \
         tcg_gen_movi_i64(sgm, sgn_mask);                          \
         switch (op) {                                             \
             case OP_ABS: {                                        \
@@ -623,7 +780,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
             }                                                     \
             case OP_CPSGN: {                                      \
                 TCGv_i64 xa = tcg_temp_new_i64();                 \
-                tcg_gen_mov_i64(xa, cpu_vsrh(xA(ctx->opcode)));   \
+                get_cpu_vsrh(xa, xA(ctx->opcode));                \
                 tcg_gen_and_i64(xa, xa, sgm);                     \
                 tcg_gen_andc_i64(xb, xb, sgm);                    \
                 tcg_gen_or_i64(xb, xb, xa);                       \
@@ -631,7 +788,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
                 break;                                            \
             }                                                     \
         }                                                         \
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xb);           \
+        set_cpu_vsrh(xT(ctx->opcode), xb);                        \
         tcg_temp_free_i64(xb);                                    \
         tcg_temp_free_i64(sgm);                                   \
     }
@@ -647,7 +804,7 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     int xa;                                                       \
     int xt = rD(ctx->opcode) + 32;                                \
     int xb = rB(ctx->opcode) + 32;                                \
-    TCGv_i64 xah, xbh, xbl, sgm;                                  \
+    TCGv_i64 xah, xbh, xbl, sgm, tmp;                             \
                                                                   \
     if (unlikely(!ctx->vsx_enabled)) {                            \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                    \
@@ -656,8 +813,9 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     xbh = tcg_temp_new_i64();                                     \
     xbl = tcg_temp_new_i64();                                     \
     sgm = tcg_temp_new_i64();                                     \
-    tcg_gen_mov_i64(xbh, cpu_vsrh(xb));                           \
-    tcg_gen_mov_i64(xbl, cpu_vsrl(xb));                           \
+    tmp = tcg_temp_new_i64();                                     \
+    get_cpu_vsrh(xbh, xb);                                        \
+    get_cpu_vsrl(xbl, xb);                                        \
     tcg_gen_movi_i64(sgm, sgn_mask);                              \
     switch (op) {                                                 \
     case OP_ABS:                                                  \
@@ -672,17 +830,19 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     case OP_CPSGN:                                                \
         xah = tcg_temp_new_i64();                                 \
         xa = rA(ctx->opcode) + 32;                                \
-        tcg_gen_and_i64(xah, cpu_vsrh(xa), sgm);                  \
+        get_cpu_vsrh(tmp, xa);                                    \
+        tcg_gen_and_i64(xah, tmp, sgm);                           \
         tcg_gen_andc_i64(xbh, xbh, sgm);                          \
         tcg_gen_or_i64(xbh, xbh, xah);                            \
         tcg_temp_free_i64(xah);                                   \
         break;                                                    \
     }                                                             \
-    tcg_gen_mov_i64(cpu_vsrh(xt), xbh);                           \
-    tcg_gen_mov_i64(cpu_vsrl(xt), xbl);                           \
+    set_cpu_vsrh(xt, xbh);                                        \
+    set_cpu_vsrl(xt, xbl);                                        \
     tcg_temp_free_i64(xbl);                                       \
     tcg_temp_free_i64(xbh);                                       \
     tcg_temp_free_i64(sgm);                                       \
+    tcg_temp_free_i64(tmp);                                       \
 }
 
 VSX_SCALAR_MOVE_QP(xsabsqp, OP_ABS, SGN_MASK_DP)
@@ -701,8 +861,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
         xbh = tcg_temp_new_i64();                                \
         xbl = tcg_temp_new_i64();                                \
         sgm = tcg_temp_new_i64();                                \
-        tcg_gen_mov_i64(xbh, cpu_vsrh(xB(ctx->opcode)));         \
-        tcg_gen_mov_i64(xbl, cpu_vsrl(xB(ctx->opcode)));         \
+        set_cpu_vsrh(xB(ctx->opcode), xbh);                      \
+        set_cpu_vsrl(xB(ctx->opcode), xbl);                      \
         tcg_gen_movi_i64(sgm, sgn_mask);                         \
         switch (op) {                                            \
             case OP_ABS: {                                       \
@@ -723,8 +883,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
             case OP_CPSGN: {                                     \
                 TCGv_i64 xah = tcg_temp_new_i64();               \
                 TCGv_i64 xal = tcg_temp_new_i64();               \
-                tcg_gen_mov_i64(xah, cpu_vsrh(xA(ctx->opcode))); \
-                tcg_gen_mov_i64(xal, cpu_vsrl(xA(ctx->opcode))); \
+                get_cpu_vsrh(xah, xA(ctx->opcode));              \
+                get_cpu_vsrl(xal, xA(ctx->opcode));              \
                 tcg_gen_and_i64(xah, xah, sgm);                  \
                 tcg_gen_and_i64(xal, xal, sgm);                  \
                 tcg_gen_andc_i64(xbh, xbh, sgm);                 \
@@ -736,8 +896,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
                 break;                                           \
             }                                                    \
         }                                                        \
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xbh);         \
-        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xbl);         \
+        set_cpu_vsrh(xT(ctx->opcode), xbh);                      \
+        set_cpu_vsrl(xT(ctx->opcode), xbl);                      \
         tcg_temp_free_i64(xbh);                                  \
         tcg_temp_free_i64(xbl);                                  \
         tcg_temp_free_i64(sgm);                                  \
@@ -768,12 +928,17 @@ static void gen_##name(DisasContext * ctx)                                    \
 #define GEN_VSX_HELPER_XT_XB_ENV(name, op1, op2, inval, type) \
 static void gen_##name(DisasContext * ctx)                    \
 {                                                             \
+    TCGv_i64 t0 = tcg_temp_new_i64();                         \
+    TCGv_i64 t1 = tcg_temp_new_i64();                         \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
-    gen_helper_##name(cpu_vsrh(xT(ctx->opcode)), cpu_env,     \
-                      cpu_vsrh(xB(ctx->opcode)));             \
+    get_cpu_vsrh(t0, xB(ctx->opcode));                        \
+    gen_helper_##name(t1, cpu_env, t0);                       \
+    set_cpu_vsrh(xT(ctx->opcode), t1);                        \
+    tcg_temp_free_i64(t0);                                    \
+    tcg_temp_free_i64(t1);                                    \
 }
 
 GEN_VSX_HELPER_2(xsadddp, 0x00, 0x04, 0, PPC2_VSX)
@@ -949,10 +1114,13 @@ GEN_VSX_HELPER_2(xxpermr, 0x08, 0x07, 0, PPC2_ISA300)
 
 static void gen_xxbrd(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -960,28 +1128,49 @@ static void gen_xxbrd(DisasContext *ctx)
     }
     tcg_gen_bswap64_i64(xth, xbh);
     tcg_gen_bswap64_i64(xtl, xbl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrh(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     gen_bswap16x8(xth, xtl, xbh, xbl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrq(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     TCGv_i64 t0 = tcg_temp_new_i64();
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -990,35 +1179,65 @@ static void gen_xxbrq(DisasContext *ctx)
     }
     tcg_gen_bswap64_i64(t0, xbl);
     tcg_gen_bswap64_i64(xtl, xbh);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
     tcg_gen_mov_i64(xth, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xth);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrw(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     gen_bswap32x4(xth, xtl, xbh, xbl);
+    set_cpu_vsrl(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 #define VSX_LOGICAL(name, tcg_op)                                    \
 static void glue(gen_, name)(DisasContext * ctx)                     \
     {                                                                \
+        TCGv_i64 t0;                                                 \
+        TCGv_i64 t1;                                                 \
+        TCGv_i64 t2;                                                 \
         if (unlikely(!ctx->vsx_enabled)) {                           \
             gen_exception(ctx, POWERPC_EXCP_VSXU);                   \
             return;                                                  \
         }                                                            \
-        tcg_op(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)), \
-            cpu_vsrh(xB(ctx->opcode)));                              \
-        tcg_op(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)), \
-            cpu_vsrl(xB(ctx->opcode)));                              \
+        t0 = tcg_temp_new_i64();                                     \
+        t1 = tcg_temp_new_i64();                                     \
+        t2 = tcg_temp_new_i64();                                     \
+        get_cpu_vsrh(t0, xA(ctx->opcode));                           \
+        get_cpu_vsrh(t1, xB(ctx->opcode));                           \
+        tcg_op(t2, t0, t1);                                          \
+        set_cpu_vsrh(xT(ctx->opcode), t2);                           \
+        get_cpu_vsrl(t0, xA(ctx->opcode));                           \
+        get_cpu_vsrl(t1, xB(ctx->opcode));                           \
+        tcg_op(t2, t0, t1);                                          \
+        set_cpu_vsrl(xT(ctx->opcode), t2);                           \
+        tcg_temp_free_i64(t0);                                       \
+        tcg_temp_free_i64(t1);                                       \
+        tcg_temp_free_i64(t2);                                       \
     }
 
 VSX_LOGICAL(xxland, tcg_gen_and_i64)
@@ -1033,7 +1252,7 @@ VSX_LOGICAL(xxlorc, tcg_gen_orc_i64)
 #define VSX_XXMRG(name, high)                               \
 static void glue(gen_, name)(DisasContext * ctx)            \
     {                                                       \
-        TCGv_i64 a0, a1, b0, b1;                            \
+        TCGv_i64 a0, a1, b0, b1, tmp;                       \
         if (unlikely(!ctx->vsx_enabled)) {                  \
             gen_exception(ctx, POWERPC_EXCP_VSXU);          \
             return;                                         \
@@ -1042,27 +1261,29 @@ static void glue(gen_, name)(DisasContext * ctx)            \
         a1 = tcg_temp_new_i64();                            \
         b0 = tcg_temp_new_i64();                            \
         b1 = tcg_temp_new_i64();                            \
+        tmp = tcg_temp_new_i64();                           \
         if (high) {                                         \
-            tcg_gen_mov_i64(a0, cpu_vsrh(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(a1, cpu_vsrh(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(b0, cpu_vsrh(xB(ctx->opcode))); \
-            tcg_gen_mov_i64(b1, cpu_vsrh(xB(ctx->opcode))); \
+            get_cpu_vsrh(a0, xA(ctx->opcode));              \
+            get_cpu_vsrh(a1, xA(ctx->opcode));              \
+            get_cpu_vsrh(b0, xB(ctx->opcode));              \
+            get_cpu_vsrh(b1, xB(ctx->opcode));              \
         } else {                                            \
-            tcg_gen_mov_i64(a0, cpu_vsrl(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(a1, cpu_vsrl(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(b0, cpu_vsrl(xB(ctx->opcode))); \
-            tcg_gen_mov_i64(b1, cpu_vsrl(xB(ctx->opcode))); \
+            get_cpu_vsrl(a0, xA(ctx->opcode));              \
+            get_cpu_vsrl(a1, xA(ctx->opcode));              \
+            get_cpu_vsrl(b0, xB(ctx->opcode));              \
+            get_cpu_vsrl(b1, xB(ctx->opcode));              \
         }                                                   \
         tcg_gen_shri_i64(a0, a0, 32);                       \
         tcg_gen_shri_i64(b0, b0, 32);                       \
-        tcg_gen_deposit_i64(cpu_vsrh(xT(ctx->opcode)),      \
-                            b0, a0, 32, 32);                \
-        tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)),      \
-                            b1, a1, 32, 32);                \
+        tcg_gen_deposit_i64(tmp, b0, a0, 32, 32);           \
+        set_cpu_vsrh(xT(ctx->opcode), tmp);                 \
+        tcg_gen_deposit_i64(tmp, b1, a1, 32, 32);           \
+        set_cpu_vsrl(xT(ctx->opcode), tmp);                 \
         tcg_temp_free_i64(a0);                              \
         tcg_temp_free_i64(a1);                              \
         tcg_temp_free_i64(b0);                              \
         tcg_temp_free_i64(b1);                              \
+        tcg_temp_free_i64(tmp);                             \
     }
 
 VSX_XXMRG(xxmrghw, 1)
@@ -1070,7 +1291,7 @@ VSX_XXMRG(xxmrglw, 0)
 
 static void gen_xxsel(DisasContext * ctx)
 {
-    TCGv_i64 a, b, c;
+    TCGv_i64 a, b, c, tmp;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -1078,34 +1299,43 @@ static void gen_xxsel(DisasContext * ctx)
     a = tcg_temp_new_i64();
     b = tcg_temp_new_i64();
     c = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
 
-    tcg_gen_mov_i64(a, cpu_vsrh(xA(ctx->opcode)));
-    tcg_gen_mov_i64(b, cpu_vsrh(xB(ctx->opcode)));
-    tcg_gen_mov_i64(c, cpu_vsrh(xC(ctx->opcode)));
+    get_cpu_vsrh(a, xA(ctx->opcode));
+    get_cpu_vsrh(b, xB(ctx->opcode));
+    get_cpu_vsrh(c, xC(ctx->opcode));
 
     tcg_gen_and_i64(b, b, c);
     tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), a, b);
+    tcg_gen_or_i64(tmp, a, b);
+    set_cpu_vsrh(xT(ctx->opcode), tmp);
 
-    tcg_gen_mov_i64(a, cpu_vsrl(xA(ctx->opcode)));
-    tcg_gen_mov_i64(b, cpu_vsrl(xB(ctx->opcode)));
-    tcg_gen_mov_i64(c, cpu_vsrl(xC(ctx->opcode)));
+    get_cpu_vsrl(a, xA(ctx->opcode));
+    get_cpu_vsrl(b, xB(ctx->opcode));
+    get_cpu_vsrl(c, xC(ctx->opcode));
 
     tcg_gen_and_i64(b, b, c);
     tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(cpu_vsrl(xT(ctx->opcode)), a, b);
+    tcg_gen_or_i64(tmp, a, b);
+    set_cpu_vsrl(xT(ctx->opcode), tmp);
 
     tcg_temp_free_i64(a);
     tcg_temp_free_i64(b);
     tcg_temp_free_i64(c);
+    tcg_temp_free_i64(tmp);
 }
 
 static void gen_xxspltw(DisasContext *ctx)
 {
     TCGv_i64 b, b2;
-    TCGv_i64 vsr = (UIM(ctx->opcode) & 2) ?
-                   cpu_vsrl(xB(ctx->opcode)) :
-                   cpu_vsrh(xB(ctx->opcode));
+    TCGv_i64 vsr;
+
+    vsr = tcg_temp_new_i64();
+    if (UIM(ctx->opcode) & 2) {
+        get_cpu_vsrl(vsr, xB(ctx->opcode));
+    } else {
+        get_cpu_vsrh(vsr, xB(ctx->opcode));
+    }
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1122,9 +1352,11 @@ static void gen_xxspltw(DisasContext *ctx)
     }
 
     tcg_gen_shli_i64(b2, b, 32);
-    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), b, b2);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
+    tcg_gen_or_i64(vsr, b, b2);
+    set_cpu_vsrh(xT(ctx->opcode), vsr);
+    set_cpu_vsrl(xT(ctx->opcode), vsr);
 
+    tcg_temp_free_i64(vsr);
     tcg_temp_free_i64(b);
     tcg_temp_free_i64(b2);
 }
@@ -1134,6 +1366,7 @@ static void gen_xxspltw(DisasContext *ctx)
 static void gen_xxspltib(DisasContext *ctx)
 {
     unsigned char uim8 = IMM8(ctx->opcode);
+    TCGv_i64 vsr = tcg_temp_new_i64();
     if (xS(ctx->opcode) < 32) {
         if (unlikely(!ctx->altivec_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -1145,8 +1378,10 @@ static void gen_xxspltib(DisasContext *ctx)
             return;
         }
     }
-    tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), pattern(uim8));
-    tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), pattern(uim8));
+    tcg_gen_movi_i64(vsr, pattern(uim8));
+    set_cpu_vsrh(xT(ctx->opcode), vsr);
+    set_cpu_vsrl(xT(ctx->opcode), vsr);
+    tcg_temp_free_i64(vsr);
 }
 
 static void gen_xxsldwi(DisasContext *ctx)
@@ -1161,40 +1396,40 @@ static void gen_xxsldwi(DisasContext *ctx)
 
     switch (SHW(ctx->opcode)) {
         case 0: {
-            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
-            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrh(xth, xA(ctx->opcode));
+            get_cpu_vsrl(xtl, xA(ctx->opcode));
             break;
         }
         case 1: {
             TCGv_i64 t0 = tcg_temp_new_i64();
-            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xth, xA(ctx->opcode));
             tcg_gen_shli_i64(xth, xth, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(t0, xA(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xth, xth, t0);
-            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xtl, xA(ctx->opcode));
             tcg_gen_shli_i64(xtl, xtl, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xtl, xtl, t0);
             tcg_temp_free_i64(t0);
             break;
         }
         case 2: {
-            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
-            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrl(xth, xA(ctx->opcode));
+            get_cpu_vsrh(xtl, xB(ctx->opcode));
             break;
         }
         case 3: {
             TCGv_i64 t0 = tcg_temp_new_i64();
-            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xth, xA(ctx->opcode));
             tcg_gen_shli_i64(xth, xth, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xth, xth, t0);
-            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xtl, xB(ctx->opcode));
             tcg_gen_shli_i64(xtl, xtl, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xtl, xtl, t0);
             tcg_temp_free_i64(t0);
@@ -1202,8 +1437,8 @@ static void gen_xxsldwi(DisasContext *ctx)
         }
     }
 
-    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xth);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xtl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
 
     tcg_temp_free_i64(xth);
     tcg_temp_free_i64(xtl);
@@ -1214,6 +1449,7 @@ static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
     TCGv xt, xb;                                                \
     TCGv_i32 t0 = tcg_temp_new_i32();                           \
+    TCGv_i64 t1 = tcg_temp_new_i64();                           \
     uint8_t uimm = UIMM4(ctx->opcode);                          \
                                                                 \
     if (unlikely(!ctx->vsx_enabled)) {                          \
@@ -1226,8 +1462,9 @@ static void gen_##name(DisasContext *ctx)                       \
      * uimm > 12 handle as per hardware in helper               \
      */                                                         \
     if (uimm > 15) {                                            \
-        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);         \
-        tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), 0);         \
+        tcg_gen_movi_i64(t1, 0);                                \
+        set_cpu_vsrh(xT(ctx->opcode), t1);                      \
+        set_cpu_vsrl(xT(ctx->opcode), t1);                      \
         return;                                                 \
     }                                                           \
     tcg_gen_movi_i32(t0, uimm);                                 \
@@ -1235,6 +1472,7 @@ static void gen_##name(DisasContext *ctx)                       \
     tcg_temp_free(xb);                                          \
     tcg_temp_free(xt);                                          \
     tcg_temp_free_i32(t0);                                      \
+    tcg_temp_free_i64(t1);                                      \
 }
 
 VSX_EXTRACT_INSERT(xxextractuw)
@@ -1244,30 +1482,41 @@ VSX_EXTRACT_INSERT(xxinsertw)
 static void gen_xsxexpdp(DisasContext *ctx)
 {
     TCGv rt = cpu_gpr[rD(ctx->opcode)];
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
-    tcg_gen_extract_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52, 11);
+    get_cpu_vsrh(t0, xB(ctx->opcode));
+    tcg_gen_extract_i64(rt, t0, 52, 11);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_xsxexpqp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
-    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     tcg_gen_extract_i64(xth, xbh, 48, 15);
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
     tcg_gen_movi_i64(xtl, 0);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
+
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_xsiexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xth;
     TCGv ra = cpu_gpr[rA(ctx->opcode)];
     TCGv rb = cpu_gpr[rB(ctx->opcode)];
     TCGv_i64 t0;
@@ -1277,21 +1526,30 @@ static void gen_xsiexpdp(DisasContext *ctx)
         return;
     }
     t0 = tcg_temp_new_i64();
+    xth = tcg_temp_new_i64();
     tcg_gen_andi_i64(xth, ra, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, rb, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     /* dword[1] is undefined */
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
 }
 
 static void gen_xsiexpqp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
-    TCGv_i64 xah = cpu_vsrh(rA(ctx->opcode) + 32);
-    TCGv_i64 xal = cpu_vsrl(rA(ctx->opcode) + 32);
-    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xah = tcg_temp_new_i64();
+    TCGv_i64 xal = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, rA(ctx->opcode) + 32);
+    get_cpu_vsrl(xal, rA(ctx->opcode) + 32);
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
+
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -1303,14 +1561,22 @@ static void gen_xsiexpqp(DisasContext *ctx)
     tcg_gen_andi_i64(t0, xbh, 0x7FFF);
     tcg_gen_shli_i64(t0, t0, 48);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
     tcg_gen_mov_i64(xtl, xal);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
 }
 
 static void gen_xsxsigdp(DisasContext *ctx)
 {
     TCGv rt = cpu_gpr[rD(ctx->opcode)];
-    TCGv_i64 t0, zr, nan, exp;
+    TCGv_i64 t0, t1, zr, nan, exp;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1318,17 +1584,21 @@ static void gen_xsxsigdp(DisasContext *ctx)
     }
     exp = tcg_temp_new_i64();
     t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(2047);
 
-    tcg_gen_extract_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52, 11);
+    get_cpu_vsrh(t1, xB(ctx->opcode));
+    tcg_gen_extract_i64(exp, t1, 52, 11);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-    tcg_gen_andi_i64(rt, cpu_vsrh(xB(ctx->opcode)), 0x000FFFFFFFFFFFFF);
+    get_cpu_vsrh(t1, xB(ctx->opcode));
+    tcg_gen_andi_i64(rt, t1, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(rt, rt, t0);
 
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
@@ -1337,8 +1607,13 @@ static void gen_xsxsigdp(DisasContext *ctx)
 static void gen_xsxsigqp(DisasContext *ctx)
 {
     TCGv_i64 t0, zr, nan, exp;
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
+    get_cpu_vsrl(xbl, rB(ctx->opcode) + 32);
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1349,29 +1624,41 @@ static void gen_xsxsigqp(DisasContext *ctx)
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(32767);
 
-    tcg_gen_extract_i64(exp, cpu_vsrh(rB(ctx->opcode) + 32), 48, 15);
+    tcg_gen_extract_i64(exp, xbh, 48, 15);
     tcg_gen_movi_i64(t0, 0x0001000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-    tcg_gen_andi_i64(xth, cpu_vsrh(rB(ctx->opcode) + 32), 0x0000FFFFFFFFFFFF);
+    tcg_gen_andi_i64(xth, xbh, 0x0000FFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
-    tcg_gen_mov_i64(xtl, cpu_vsrl(rB(ctx->opcode) + 32));
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
+    tcg_gen_mov_i64(xtl, xbl);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
 
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 #endif
 
 static void gen_xviexpsp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
-    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xah = tcg_temp_new_i64();
+    TCGv_i64 xal = tcg_temp_new_i64();
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, xA(ctx->opcode));
+    get_cpu_vsrl(xal, xA(ctx->opcode));
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -1383,21 +1670,36 @@ static void gen_xviexpsp(DisasContext *ctx)
     tcg_gen_andi_i64(t0, xbh, 0xFF000000FF);
     tcg_gen_shli_i64(t0, t0, 23);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_andi_i64(xtl, xal, 0x807FFFFF807FFFFF);
     tcg_gen_andi_i64(t0, xbl, 0xFF000000FF);
     tcg_gen_shli_i64(t0, t0, 23);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xviexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
-    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xah = tcg_temp_new_i64();
+    TCGv_i64 xal = tcg_temp_new_i64();
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, xA(ctx->opcode));
+    get_cpu_vsrl(xal, xA(ctx->opcode));
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -1409,19 +1711,31 @@ static void gen_xviexpdp(DisasContext *ctx)
     tcg_gen_andi_i64(t0, xbh, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_andi_i64(xtl, xal, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, xbl, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xvxexpsp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1429,33 +1743,53 @@ static void gen_xvxexpsp(DisasContext *ctx)
     }
     tcg_gen_shri_i64(xth, xbh, 23);
     tcg_gen_andi_i64(xth, xth, 0xFF000000FF);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_shri_i64(xtl, xbl, 23);
     tcg_gen_andi_i64(xtl, xtl, 0xFF000000FF);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xvxexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     tcg_gen_extract_i64(xth, xbh, 52, 11);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_extract_i64(xtl, xbl, 52, 11);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
 
 static void gen_xvxsigdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     TCGv_i64 t0, zr, nan, exp;
 
@@ -1474,6 +1808,7 @@ static void gen_xvxsigdp(DisasContext *ctx)
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xth, xbh, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
 
     tcg_gen_extract_i64(exp, xbl, 52, 11);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
@@ -1481,11 +1816,16 @@ static void gen_xvxsigdp(DisasContext *ctx)
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xtl, xbl, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
 
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 #undef GEN_XX2FORM
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:20   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Richard Henderson
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Instead of accessing the FPR, VMX and VSX registers through static arrays of
TCGv_i64 globals, remove them and change the helpers to load/store data directly
within cpu_env.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20181217122405.18732-6-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate.c              | 59 ++++++++---------------------
 target/ppc/translate/vsx-impl.inc.c |  4 +-
 2 files changed, 18 insertions(+), 45 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index fa3e8dc114..5923c688cd 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -55,15 +55,9 @@
 /* global register indexes */
 static char cpu_reg_names[10*3 + 22*4 /* GPR */
     + 10*4 + 22*5 /* SPE GPRh */
-    + 10*4 + 22*5 /* FPR */
-    + 2*(10*6 + 22*7) /* AVRh, AVRl */
-    + 10*5 + 22*6 /* VSR */
     + 8*5 /* CRF */];
 static TCGv cpu_gpr[32];
 static TCGv cpu_gprh[32];
-static TCGv_i64 cpu_fpr[32];
-static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
-static TCGv_i64 cpu_vsr[32];
 static TCGv_i32 cpu_crf[8];
 static TCGv cpu_nip;
 static TCGv cpu_msr;
@@ -108,39 +102,6 @@ void ppc_translate_init(void)
                                          offsetof(CPUPPCState, gprh[i]), p);
         p += (i < 10) ? 4 : 5;
         cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-        snprintf(p, cpu_reg_names_size, "fp%d", i);
-        cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env,
-                                            offsetof(CPUPPCState, fpr[i]), p);
-        p += (i < 10) ? 4 : 5;
-        cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-        snprintf(p, cpu_reg_names_size, "avr%dH", i);
-#ifdef HOST_WORDS_BIGENDIAN
-        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
-#else
-        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
-#endif
-        p += (i < 10) ? 6 : 7;
-        cpu_reg_names_size -= (i < 10) ? 6 : 7;
-
-        snprintf(p, cpu_reg_names_size, "avr%dL", i);
-#ifdef HOST_WORDS_BIGENDIAN
-        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
-#else
-        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
-#endif
-        p += (i < 10) ? 6 : 7;
-        cpu_reg_names_size -= (i < 10) ? 6 : 7;
-        snprintf(p, cpu_reg_names_size, "vsr%d", i);
-        cpu_vsr[i] = tcg_global_mem_new_i64(cpu_env,
-                                            offsetof(CPUPPCState, vsr[i]), p);
-        p += (i < 10) ? 5 : 6;
-        cpu_reg_names_size -= (i < 10) ? 5 : 6;
     }
 
     cpu_nip = tcg_global_mem_new(cpu_env,
@@ -6696,22 +6657,34 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-    tcg_gen_mov_i64(cpu_fpr[regno], src);
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
-    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+#ifdef HOST_WORDS_BIGENDIAN
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 0 : 1)]));
+#else
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
-    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+#ifdef HOST_WORDS_BIGENDIAN
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 0 : 1)]));
+#else
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 #include "translate/fp-impl.inc.c"
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index e9a05d66f7..20e1fd9324 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -2,12 +2,12 @@
 
 static inline void get_vsr(TCGv_i64 dst, int n)
 {
-    tcg_gen_mov_i64(dst, cpu_vsr[n]);
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
 }
 
 static inline void set_vsr(int n, TCGv_i64 src)
 {
-    tcg_gen_mov_i64(cpu_vsr[n], src);
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
 }
 
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:21   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 16/34] target/ppc: move FP and VMX registers into aligned vsr register array Richard Henderson
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Since the VSX registers are actually a superset of the VMX registers then they
can be represented by the same type. Merge ppc_avr_t into ppc_vsr_t and change
ppc_avr_t to be a simple typedef alias.

Note that due to a difference in the naming of the float32 member between
ppc_avr_t and ppc_vsr_t, references to the ppc_avr_t f member must be replaced
with f32 instead.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20181217122405.18732-7-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/cpu.h        | 17 +++++++------
 target/ppc/internal.h   | 11 --------
 target/ppc/int_helper.c | 56 +++++++++++++++++++++--------------------
 3 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index ab68abe8a2..5445d4c3c1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -230,7 +230,6 @@ typedef struct opc_handler_t opc_handler_t;
 /* Types used to describe some PowerPC registers etc. */
 typedef struct DisasContext DisasContext;
 typedef struct ppc_spr_t ppc_spr_t;
-typedef union ppc_avr_t ppc_avr_t;
 typedef union ppc_tlb_t ppc_tlb_t;
 typedef struct ppc_hash_pte64 ppc_hash_pte64_t;
 
@@ -254,22 +253,26 @@ struct ppc_spr_t {
 #endif
 };
 
-/* Altivec registers (128 bits) */
-union ppc_avr_t {
-    float32 f[4];
+/* VSX/Altivec registers (128 bits) */
+typedef union _ppc_vsr_t {
     uint8_t u8[16];
     uint16_t u16[8];
     uint32_t u32[4];
+    uint64_t u64[2];
     int8_t s8[16];
     int16_t s16[8];
     int32_t s32[4];
-    uint64_t u64[2];
     int64_t s64[2];
+    float32 f32[4];
+    float64 f64[2];
+    float128 f128;
 #ifdef CONFIG_INT128
     __uint128_t u128;
 #endif
-    Int128 s128;
-};
+    Int128  s128;
+} ppc_vsr_t;
+
+typedef ppc_vsr_t ppc_avr_t;
 
 #if !defined(CONFIG_USER_ONLY)
 /* Software TLB cache */
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index a9bcadff42..b4b1f7b3db 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -204,17 +204,6 @@ EXTRACT_HELPER(IMM8, 11, 8);
 EXTRACT_HELPER(DCMX, 16, 7);
 EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
-typedef union _ppc_vsr_t {
-    uint8_t u8[16];
-    uint16_t u16[8];
-    uint32_t u32[4];
-    uint64_t u64[2];
-    float32 f32[4];
-    float64 f64[2];
-    float128 f128;
-    Int128  s128;
-} ppc_vsr_t;
-
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VsrB(i) u8[i]
 #define VsrH(i) u16[i]
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index fcac90a4a9..9d715be25c 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -548,8 +548,8 @@ VARITH_DO(muluwm, *, u32)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            r->f[i] = func(a->f[i], b->f[i], &env->vec_status);         \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            r->f32[i] = func(a->f32[i], b->f32[i], &env->vec_status);   \
         }                                                               \
     }
 VARITHFP(addfp, float32_add)
@@ -563,9 +563,9 @@ VARITHFP(maxfp, float32_max)
                            ppc_avr_t *b, ppc_avr_t *c)                  \
     {                                                                   \
         int i;                                                          \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            r->f[i] = float32_muladd(a->f[i], c->f[i], b->f[i],         \
-                                     type, &env->vec_status);           \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            r->f32[i] = float32_muladd(a->f32[i], c->f32[i], b->f32[i], \
+                                       type, &env->vec_status);         \
         }                                                               \
     }
 VARITHFPFMA(maddfp, 0);
@@ -670,9 +670,9 @@ VABSDU(w, u32)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
             float32 t = cvt(b->element[i], &env->vec_status);           \
-            r->f[i] = float32_scalbn(t, -uim, &env->vec_status);        \
+            r->f32[i] = float32_scalbn(t, -uim, &env->vec_status);      \
         }                                                               \
     }
 VCF(ux, uint32_to_float32, u32)
@@ -782,9 +782,9 @@ VCMPNE(w, u32, uint32_t, 0)
         uint32_t none = 0;                                              \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
             uint32_t result;                                            \
-            int rel = float32_compare_quiet(a->f[i], b->f[i],           \
+            int rel = float32_compare_quiet(a->f32[i], b->f32[i],       \
                                             &env->vec_status);          \
             if (rel == float_relation_unordered) {                      \
                 result = 0;                                             \
@@ -816,14 +816,16 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
     int i;
     int all_in = 0;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        int le_rel = float32_compare_quiet(a->f[i], b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        int le_rel = float32_compare_quiet(a->f32[i], b->f32[i],
+                                           &env->vec_status);
         if (le_rel == float_relation_unordered) {
             r->u32[i] = 0xc0000000;
             all_in = 1;
         } else {
-            float32 bneg = float32_chs(b->f[i]);
-            int ge_rel = float32_compare_quiet(a->f[i], bneg, &env->vec_status);
+            float32 bneg = float32_chs(b->f32[i]);
+            int ge_rel = float32_compare_quiet(a->f32[i], bneg,
+                                               &env->vec_status);
             int le = le_rel != float_relation_greater;
             int ge = ge_rel != float_relation_less;
 
@@ -856,11 +858,11 @@ void helper_vcmpbfp_dot(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
         float_status s = env->vec_status;                               \
                                                                         \
         set_float_rounding_mode(float_round_to_zero, &s);               \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            if (float32_is_any_nan(b->f[i])) {                          \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            if (float32_is_any_nan(b->f32[i])) {                        \
                 r->element[i] = 0;                                      \
             } else {                                                    \
-                float64 t = float32_to_float64(b->f[i], &s);            \
+                float64 t = float32_to_float64(b->f32[i], &s);          \
                 int64_t j;                                              \
                                                                         \
                 t = float64_scalbn(t, uim, &s);                         \
@@ -1661,8 +1663,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_div(float32_one, b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_div(float32_one, b->f32[i], &env->vec_status);
     }
 }
 
@@ -1674,8 +1676,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
         float_status s = env->vec_status;                       \
                                                                 \
         set_float_rounding_mode(rounding, &s);                  \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                \
-            r->f[i] = float32_round_to_int (b->f[i], &s);       \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {              \
+            r->f32[i] = float32_round_to_int (b->f32[i], &s);   \
         }                                                       \
     }
 VRFI(n, float_round_nearest_even)
@@ -1705,10 +1707,10 @@ void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        float32 t = float32_sqrt(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        float32 t = float32_sqrt(b->f32[i], &env->vec_status);
 
-        r->f[i] = float32_div(float32_one, t, &env->vec_status);
+        r->f32[i] = float32_div(float32_one, t, &env->vec_status);
     }
 }
 
@@ -1751,8 +1753,8 @@ void helper_vexptefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_exp2(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_exp2(b->f32[i], &env->vec_status);
     }
 }
 
@@ -1760,8 +1762,8 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_log2(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_log2(b->f32[i], &env->vec_status);
     }
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 16/34] target/ppc: move FP and VMX registers into aligned vsr register array
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:27   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 17/34] target/ppc: convert VMX logical instructions to use vector operations Richard Henderson
                   ` (19 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

The VSX register array is a block of 64 128-bit registers where the first 32
registers consist of the existing 64-bit FP registers extended to 128-bit
using new VSR registers, and the last 32 registers are the VMX 128-bit
registers as show below:

            64-bit               64-bit
    +--------------------+--------------------+
    |        FP0         |                    |  VSR0
    +--------------------+--------------------+
    |        FP1         |                    |  VSR1
    +--------------------+--------------------+
    |        ...         |        ...         |  ...
    +--------------------+--------------------+
    |        FP30        |                    |  VSR30
    +--------------------+--------------------+
    |        FP31        |                    |  VSR31
    +--------------------+--------------------+
    |                  VMX0                   |  VSR32
    +-----------------------------------------+
    |                  VMX1                   |  VSR33
    +-----------------------------------------+
    |                  ...                    |  ...
    +-----------------------------------------+
    |                  VMX30                  |  VSR62
    +-----------------------------------------+
    |                  VMX31                  |  VSR63
    +-----------------------------------------+

In order to allow for future conversion of VSX instructions to use TCG vector
operations, recreate the same layout using an aligned version of the existing
vsr register array.

Since the old fpr and avr register arrays are removed, the existing callers
must also be updated to use the correct offset in the vsr register array. This
also includes switching the relevant VMState fields over to using subarrays
to make sure that migration is preserved.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20181217122405.18732-8-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/cpu.h                    |  9 ++--
 target/ppc/internal.h               | 18 ++------
 linux-user/ppc/signal.c             | 24 +++++-----
 target/ppc/arch_dump.c              | 12 ++---
 target/ppc/gdbstub.c                |  8 ++--
 target/ppc/machine.c                | 72 +++++++++++++++++++++++++++--
 target/ppc/monitor.c                |  4 +-
 target/ppc/translate.c              | 14 +++---
 target/ppc/translate/dfp-impl.inc.c |  2 +-
 target/ppc/translate/vmx-impl.inc.c |  7 ++-
 target/ppc/translate/vsx-impl.inc.c |  4 +-
 target/ppc/translate_init.inc.c     | 24 +++++-----
 12 files changed, 126 insertions(+), 72 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 5445d4c3c1..c8f449081d 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1016,8 +1016,6 @@ struct CPUPPCState {
 
     /* Floating point execution context */
     float_status fp_status;
-    /* floating point registers */
-    float64 fpr[32];
     /* floating point status and control register */
     target_ulong fpscr;
 
@@ -1067,11 +1065,10 @@ struct CPUPPCState {
     /* Special purpose registers */
     target_ulong spr[1024];
     ppc_spr_t spr_cb[1024];
-    /* Altivec registers */
-    ppc_avr_t avr[32];
+    /* Vector status and control register */
     uint32_t vscr;
-    /* VSX registers */
-    uint64_t vsr[32];
+    /* VSX registers (including FP and AVR) */
+    ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
     /* SPE registers */
     uint64_t spe_acc;
     uint32_t spe_fscr;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index b4b1f7b3db..b77d564a65 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -218,24 +218,14 @@ EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
 static inline void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
-    if (n < 32) {
-        vsr->VsrD(0) = env->fpr[n];
-        vsr->VsrD(1) = env->vsr[n];
-    } else {
-        vsr->u64[0] = env->avr[n - 32].u64[0];
-        vsr->u64[1] = env->avr[n - 32].u64[1];
-    }
+    vsr->VsrD(0) = env->vsr[n].u64[0];
+    vsr->VsrD(1) = env->vsr[n].u64[1];
 }
 
 static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
-    if (n < 32) {
-        env->fpr[n] = vsr->VsrD(0);
-        env->vsr[n] = vsr->VsrD(1);
-    } else {
-        env->avr[n - 32].u64[0] = vsr->u64[0];
-        env->avr[n - 32].u64[1] = vsr->u64[1];
-    }
+    env->vsr[n].u64[0] = vsr->VsrD(0);
+    env->vsr[n].u64[1] = vsr->VsrD(1);
 }
 
 void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index 2ae120a2bc..a053dd5b84 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -258,8 +258,8 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
     /* Save Altivec registers if necessary.  */
     if (env->insns_flags & PPC_ALTIVEC) {
         uint32_t *vrsave;
-        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
-            ppc_avr_t *avr = &env->avr[i];
+        for (i = 0; i < 32; i++) {
+            ppc_avr_t *avr = &env->vsr[32 + i];
             ppc_avr_t *vreg = (ppc_avr_t *)&frame->mc_vregs.altivec[i];
 
             __put_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
@@ -281,15 +281,15 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
     /* Save VSX second halves */
     if (env->insns_flags2 & PPC2_VSX) {
         uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
-        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
-            __put_user(env->vsr[i], &vsregs[i]);
+        for (i = 0; i < 32; i++) {
+            __put_user(env->vsr[i].u64[1], &vsregs[i]);
         }
     }
 
     /* Save floating point registers.  */
     if (env->insns_flags & PPC_FLOAT) {
-        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
-            __put_user(env->fpr[i], &frame->mc_fregs[i]);
+        for (i = 0; i < 32; i++) {
+            __put_user(env->vsr[i].u64[0], &frame->mc_fregs[i]);
         }
         __put_user((uint64_t) env->fpscr, &frame->mc_fregs[32]);
     }
@@ -373,8 +373,8 @@ static void restore_user_regs(CPUPPCState *env,
 #else
         v_regs = (ppc_avr_t *)frame->mc_vregs.altivec;
 #endif
-        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
-            ppc_avr_t *avr = &env->avr[i];
+        for (i = 0; i < 32; i++) {
+            ppc_avr_t *avr = &env->vsr[32 + i];
             ppc_avr_t *vreg = &v_regs[i];
 
             __get_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
@@ -393,16 +393,16 @@ static void restore_user_regs(CPUPPCState *env,
     /* Restore VSX second halves */
     if (env->insns_flags2 & PPC2_VSX) {
         uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
-        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
-            __get_user(env->vsr[i], &vsregs[i]);
+        for (i = 0; i < 32; i++) {
+            __get_user(env->vsr[i].u64[1], &vsregs[i]);
         }
     }
 
     /* Restore floating point registers.  */
     if (env->insns_flags & PPC_FLOAT) {
         uint64_t fpscr;
-        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
-            __get_user(env->fpr[i], &frame->mc_fregs[i]);
+        for (i = 0; i < 32; i++) {
+            __get_user(env->vsr[i].u64[0], &frame->mc_fregs[i]);
         }
         __get_user(fpscr, &frame->mc_fregs[32]);
         env->fpscr = (uint32_t) fpscr;
diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
index cc1460e4e3..c272d0d3d4 100644
--- a/target/ppc/arch_dump.c
+++ b/target/ppc/arch_dump.c
@@ -140,7 +140,7 @@ static void ppc_write_elf_fpregset(NoteFuncArg *arg, PowerPCCPU *cpu)
     memset(fpregset, 0, sizeof(*fpregset));
 
     for (i = 0; i < 32; i++) {
-        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.fpr[i]);
+        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.vsr[i].u64[0]);
     }
     fpregset->fpscr = cpu_to_dump_reg(s, cpu->env.fpscr);
 }
@@ -166,11 +166,11 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
 #endif
 
         if (needs_byteswap) {
-            vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
-            vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
+            vmxregset->avr[i].u64[0] = bswap64(cpu->env.vsr[32 + i].u64[1]);
+            vmxregset->avr[i].u64[1] = bswap64(cpu->env.vsr[32 + i].u64[0]);
         } else {
-            vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
-            vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
+            vmxregset->avr[i].u64[0] = cpu->env.vsr[32 + i].u64[0];
+            vmxregset->avr[i].u64[1] = cpu->env.vsr[32 + i].u64[1];
         }
     }
     vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
@@ -188,7 +188,7 @@ static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
     memset(vsxregset, 0, sizeof(*vsxregset));
 
     for (i = 0; i < 32; i++) {
-        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i]);
+        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i].u64[1]);
     }
 }
 
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index b6f6693583..8c9dc284c4 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -126,7 +126,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
         gdb_get_regl(mem_buf, env->gpr[n]);
     } else if (n < 64) {
         /* fprs */
-        stfq_p(mem_buf, env->fpr[n-32]);
+        stfq_p(mem_buf, env->vsr[n - 32].u64[0]);
     } else {
         switch (n) {
         case 64:
@@ -178,7 +178,7 @@ int ppc_cpu_gdb_read_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
         gdb_get_reg64(mem_buf, env->gpr[n]);
     } else if (n < 64) {
         /* fprs */
-        stfq_p(mem_buf, env->fpr[n-32]);
+        stfq_p(mem_buf, env->vsr[n - 32].u64[0]);
     } else if (n < 96) {
         /* Altivec */
         stq_p(mem_buf, n - 64);
@@ -234,7 +234,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldtul_p(mem_buf);
     } else if (n < 64) {
         /* fprs */
-        env->fpr[n-32] = ldfq_p(mem_buf);
+        env->vsr[n - 32].u64[0] = ldfq_p(mem_buf);
     } else {
         switch (n) {
         case 64:
@@ -284,7 +284,7 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldq_p(mem_buf);
     } else if (n < 64) {
         /* fprs */
-        env->fpr[n-32] = ldfq_p(mem_buf);
+        env->vsr[n - 32].u64[0] = ldfq_p(mem_buf);
     } else {
         switch (n) {
         case 64 + 32:
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index e7b3725273..451cf376b4 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -45,7 +45,7 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
             uint64_t l;
         } u;
         u.l = qemu_get_be64(f);
-        env->fpr[i] = u.d;
+        env->vsr[i].u64[0] = u.d;
     }
     qemu_get_be32s(f, &fpscr);
     env->fpscr = fpscr;
@@ -138,11 +138,73 @@ static const VMStateInfo vmstate_info_avr = {
 };
 
 #define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
-    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
+    VMSTATE_SUB_ARRAY(_f, _s, 32, _n, _v, vmstate_info_avr, ppc_avr_t)
 
 #define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
     VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
 
+static int get_fpr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field)
+{
+    ppc_vsr_t *v = pv;
+
+    v->u64[0] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static int put_fpr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field, QJSON *vmdesc)
+{
+    ppc_vsr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[0]);
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_fpr = {
+    .name = "fpr",
+    .get  = get_fpr,
+    .put  = put_fpr,
+};
+
+#define VMSTATE_FPR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_fpr, ppc_vsr_t)
+
+#define VMSTATE_FPR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_FPR_ARRAY_V(_f, _s, _n, 0)
+
+static int get_vsr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field)
+{
+    ppc_vsr_t *v = pv;
+
+    v->u64[1] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static int put_vsr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field, QJSON *vmdesc)
+{
+    ppc_vsr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[1]);
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_vsr = {
+    .name = "vsr",
+    .get  = get_vsr,
+    .put  = put_vsr,
+};
+
+#define VMSTATE_VSR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_vsr, ppc_vsr_t)
+
+#define VMSTATE_VSR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_VSR_ARRAY_V(_f, _s, _n, 0)
+
 static bool cpu_pre_2_8_migration(void *opaque, int version_id)
 {
     PowerPCCPU *cpu = opaque;
@@ -354,7 +416,7 @@ static const VMStateDescription vmstate_fpu = {
     .minimum_version_id = 1,
     .needed = fpu_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
+        VMSTATE_FPR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },
@@ -373,7 +435,7 @@ static const VMStateDescription vmstate_altivec = {
     .minimum_version_id = 1,
     .needed = altivec_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
+        VMSTATE_AVR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_UINT32(env.vscr, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },
@@ -392,7 +454,7 @@ static const VMStateDescription vmstate_vsx = {
     .minimum_version_id = 1,
     .needed = vsx_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
+        VMSTATE_VSR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/ppc/monitor.c b/target/ppc/monitor.c
index 14915119fc..1db9396b2e 100644
--- a/target/ppc/monitor.c
+++ b/target/ppc/monitor.c
@@ -123,8 +123,8 @@ int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval)
 
     /* Floating point registers */
     if ((qemu_tolower(name[0]) == 'f') &&
-        ppc_cpu_get_reg_num(name + 1, ARRAY_SIZE(env->fpr), &regnum)) {
-        *pval = env->fpr[regnum];
+        ppc_cpu_get_reg_num(name + 1, 32, &regnum)) {
+        *pval = env->vsr[regnum].u64[0];
         return 0;
     }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 5923c688cd..8e89aec14d 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6657,22 +6657,22 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
 #ifdef HOST_WORDS_BIGENDIAN
     tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 0 : 1)]));
+                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
 #else
     tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 1 : 0)]));
+                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
 #endif
 }
 
@@ -6680,10 +6680,10 @@ static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
 #ifdef HOST_WORDS_BIGENDIAN
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 0 : 1)]));
+                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
 #else
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 1 : 0)]));
+                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
 #endif
 }
 
@@ -7434,7 +7434,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
             if ((i & (RFPL - 1)) == 0) {
                 cpu_fprintf(f, "FPR%02d", i);
             }
-            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->fpr[i]));
+            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->vsr[i].u64[0]));
             if ((i & (RFPL - 1)) == (RFPL - 1)) {
                 cpu_fprintf(f, "\n");
             }
diff --git a/target/ppc/translate/dfp-impl.inc.c b/target/ppc/translate/dfp-impl.inc.c
index 634ef73b8a..6c556dc2e1 100644
--- a/target/ppc/translate/dfp-impl.inc.c
+++ b/target/ppc/translate/dfp-impl.inc.c
@@ -3,7 +3,7 @@
 static inline TCGv_ptr gen_fprp_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, fpr[reg]));
+    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[reg].u64[0]));
     return r;
 }
 
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 30046c6e31..75d2b2280f 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -10,10 +10,15 @@
 static inline TCGv_ptr gen_avr_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, avr[reg]));
+    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[32 + reg].u64[0]));
     return r;
 }
 
+static inline long avr64_offset(int reg, bool high)
+{
+    return offsetof(CPUPPCState, vsr[32 + reg].u64[(high ? 0 : 1)]);
+}
+
 #define GEN_VR_LDX(name, opc2, opc3)                                          \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 20e1fd9324..1608ad48b1 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -2,12 +2,12 @@
 
 static inline void get_vsr(TCGv_i64 dst, int n)
 {
-    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
 static inline void set_vsr(int n, TCGv_i64 src)
 {
-    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 168d0cec28..b83097141c 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -9486,7 +9486,7 @@ static bool avr_need_swap(CPUPPCState *env)
 static int gdb_get_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
-        stfq_p(mem_buf, env->fpr[n]);
+        stfq_p(mem_buf, env->vsr[n].u64[0]);
         ppc_maybe_bswap_register(env, mem_buf, 8);
         return 8;
     }
@@ -9502,7 +9502,7 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         ppc_maybe_bswap_register(env, mem_buf, 8);
-        env->fpr[n] = ldfq_p(mem_buf);
+        env->vsr[n].u64[0] = ldfq_p(mem_buf);
         return 8;
     }
     if (n == 32) {
@@ -9517,11 +9517,11 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         if (!avr_need_swap(env)) {
-            stq_p(mem_buf, env->avr[n].u64[0]);
-            stq_p(mem_buf+8, env->avr[n].u64[1]);
+            stq_p(mem_buf, env->vsr[32 + n].u64[0]);
+            stq_p(mem_buf + 8, env->vsr[32 + n].u64[1]);
         } else {
-            stq_p(mem_buf, env->avr[n].u64[1]);
-            stq_p(mem_buf+8, env->avr[n].u64[0]);
+            stq_p(mem_buf, env->vsr[32 + n].u64[1]);
+            stq_p(mem_buf + 8, env->vsr[32 + n].u64[0]);
         }
         ppc_maybe_bswap_register(env, mem_buf, 8);
         ppc_maybe_bswap_register(env, mem_buf + 8, 8);
@@ -9546,11 +9546,11 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
         ppc_maybe_bswap_register(env, mem_buf, 8);
         ppc_maybe_bswap_register(env, mem_buf + 8, 8);
         if (!avr_need_swap(env)) {
-            env->avr[n].u64[0] = ldq_p(mem_buf);
-            env->avr[n].u64[1] = ldq_p(mem_buf+8);
+            env->vsr[32 + n].u64[0] = ldq_p(mem_buf);
+            env->vsr[32 + n].u64[1] = ldq_p(mem_buf + 8);
         } else {
-            env->avr[n].u64[1] = ldq_p(mem_buf);
-            env->avr[n].u64[0] = ldq_p(mem_buf+8);
+            env->vsr[32 + n].u64[1] = ldq_p(mem_buf);
+            env->vsr[32 + n].u64[0] = ldq_p(mem_buf + 8);
         }
         return 16;
     }
@@ -9623,7 +9623,7 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 static int gdb_get_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
-        stq_p(mem_buf, env->vsr[n]);
+        stq_p(mem_buf, env->vsr[n].u64[1]);
         ppc_maybe_bswap_register(env, mem_buf, 8);
         return 8;
     }
@@ -9634,7 +9634,7 @@ static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         ppc_maybe_bswap_register(env, mem_buf, 8);
-        env->vsr[n] = ldq_p(mem_buf);
+        env->vsr[n].u64[1] = ldq_p(mem_buf);
         return 8;
     }
     return 0;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 17/34] target/ppc: convert VMX logical instructions to use vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 16/34] target/ppc: move FP and VMX registers into aligned vsr register array Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:29   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Richard Henderson
                   ` (18 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20181217122405.18732-9-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate.c              |  1 +
 target/ppc/translate/vmx-impl.inc.c | 63 ++++++++++++++++-------------
 2 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 8e89aec14d..1b61bfa093 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -24,6 +24,7 @@
 #include "disas/disas.h"
 #include "exec/exec-all.h"
 #include "tcg-op.h"
+#include "tcg-op-gvec.h"
 #include "qemu/host-utils.h"
 #include "exec/cpu_ldst.h"
 
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 75d2b2280f..c13828a09d 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -262,41 +262,50 @@ GEN_VX_VMUL10(vmul10euq, 1, 0);
 GEN_VX_VMUL10(vmul10cuq, 0, 1);
 GEN_VX_VMUL10(vmul10ecuq, 1, 1);
 
-/* Logical operations */
-#define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
-static void glue(gen_, name)(DisasContext *ctx)                                 \
+#define GEN_VXFORM_V(name, vece, tcg_op, opc2, opc3)                    \
+static void glue(gen_, name)(DisasContext *ctx)                         \
 {                                                                       \
-    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
-    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
-    TCGv_i64 avr = tcg_temp_new_i64();                                  \
-                                                                        \
     if (unlikely(!ctx->altivec_enabled)) {                              \
         gen_exception(ctx, POWERPC_EXCP_VPU);                           \
         return;                                                         \
     }                                                                   \
-    get_avr64(t0, rA(ctx->opcode), true);                               \
-    get_avr64(t1, rB(ctx->opcode), true);                               \
-    tcg_op(avr, t0, t1);                                                \
-    set_avr64(rD(ctx->opcode), avr, true);                              \
                                                                         \
-    get_avr64(t0, rA(ctx->opcode), false);                              \
-    get_avr64(t1, rB(ctx->opcode), false);                              \
-    tcg_op(avr, t0, t1);                                                \
-    set_avr64(rD(ctx->opcode), avr, false);                             \
-                                                                        \
-    tcg_temp_free_i64(t0);                                              \
-    tcg_temp_free_i64(t1);                                              \
-    tcg_temp_free_i64(avr);                                             \
+    tcg_op(vece,                                                        \
+           avr64_offset(rD(ctx->opcode), true),                         \
+           avr64_offset(rA(ctx->opcode), true),                         \
+           avr64_offset(rB(ctx->opcode), true),                         \
+           16, 16);                                                     \
 }
 
-GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
-GEN_VX_LOGICAL(vandc, tcg_gen_andc_i64, 2, 17);
-GEN_VX_LOGICAL(vor, tcg_gen_or_i64, 2, 18);
-GEN_VX_LOGICAL(vxor, tcg_gen_xor_i64, 2, 19);
-GEN_VX_LOGICAL(vnor, tcg_gen_nor_i64, 2, 20);
-GEN_VX_LOGICAL(veqv, tcg_gen_eqv_i64, 2, 26);
-GEN_VX_LOGICAL(vnand, tcg_gen_nand_i64, 2, 22);
-GEN_VX_LOGICAL(vorc, tcg_gen_orc_i64, 2, 21);
+#define GEN_VXFORM_VN(name, vece, tcg_op, opc2, opc3)                   \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+{                                                                       \
+    if (unlikely(!ctx->altivec_enabled)) {                              \
+        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
+        return;                                                         \
+    }                                                                   \
+                                                                        \
+    tcg_op(vece,                                                        \
+           avr64_offset(rD(ctx->opcode), true),                         \
+           avr64_offset(rA(ctx->opcode), true),                         \
+           avr64_offset(rB(ctx->opcode), true),                         \
+           16, 16);                                                     \
+                                                                        \
+    tcg_gen_gvec_not(vece,                                              \
+                     avr64_offset(rD(ctx->opcode), true),               \
+                     avr64_offset(rD(ctx->opcode), true),               \
+                     16, 16);                                           \
+}
+
+/* Logical operations */
+GEN_VXFORM_V(vand, MO_64, tcg_gen_gvec_and, 2, 16);
+GEN_VXFORM_V(vandc, MO_64, tcg_gen_gvec_andc, 2, 17);
+GEN_VXFORM_V(vor, MO_64, tcg_gen_gvec_or, 2, 18);
+GEN_VXFORM_V(vxor, MO_64, tcg_gen_gvec_xor, 2, 19);
+GEN_VXFORM_VN(vnor, MO_64, tcg_gen_gvec_or, 2, 20);
+GEN_VXFORM_VN(veqv, MO_64, tcg_gen_gvec_xor, 2, 26);
+GEN_VXFORM_VN(vnand, MO_64, tcg_gen_gvec_and, 2, 22);
+GEN_VXFORM_V(vorc, MO_64, tcg_gen_gvec_orc, 2, 21);
 
 #define GEN_VXFORM(name, opc2, opc3)                                    \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 17/34] target/ppc: convert VMX logical instructions to use vector operations Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:29   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 19/34] target/ppc: convert vspltis[bhw] " Richard Henderson
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20181217122405.18732-10-mark.cave-ayland@ilande.co.uk>
---
 target/ppc/helper.h                 |  8 --------
 target/ppc/int_helper.c             |  7 -------
 target/ppc/translate/vmx-impl.inc.c | 16 ++++++++--------
 3 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index c7de04e068..553ff500c8 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -108,14 +108,6 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
 #define dh_ctype_avr ppc_avr_t *
 #define dh_is_signed_avr dh_is_signed_ptr
 
-DEF_HELPER_3(vaddubm, void, avr, avr, avr)
-DEF_HELPER_3(vadduhm, void, avr, avr, avr)
-DEF_HELPER_3(vadduwm, void, avr, avr, avr)
-DEF_HELPER_3(vaddudm, void, avr, avr, avr)
-DEF_HELPER_3(vsububm, void, avr, avr, avr)
-DEF_HELPER_3(vsubuhm, void, avr, avr, avr)
-DEF_HELPER_3(vsubuwm, void, avr, avr, avr)
-DEF_HELPER_3(vsubudm, void, avr, avr, avr)
 DEF_HELPER_3(vavgub, void, avr, avr, avr)
 DEF_HELPER_3(vavguh, void, avr, avr, avr)
 DEF_HELPER_3(vavguw, void, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 9d715be25c..4547453ef1 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -531,13 +531,6 @@ void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b)
             r->element[i] = a->element[i] op b->element[i];             \
         }                                                               \
     }
-#define VARITH(suffix, element)                 \
-    VARITH_DO(add##suffix, +, element)          \
-    VARITH_DO(sub##suffix, -, element)
-VARITH(ubm, u8)
-VARITH(uhm, u16)
-VARITH(uwm, u32)
-VARITH(udm, u64)
 VARITH_DO(muluwm, *, u32)
 #undef VARITH_DO
 #undef VARITH
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index c13828a09d..e353d3f174 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -411,18 +411,18 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     tcg_temp_free_ptr(rb);                                              \
 }
 
-GEN_VXFORM(vaddubm, 0, 0);
+GEN_VXFORM_V(vaddubm, MO_8, tcg_gen_gvec_add, 0, 0);
 GEN_VXFORM_DUAL_EXT(vaddubm, PPC_ALTIVEC, PPC_NONE, 0,       \
                     vmul10cuq, PPC_NONE, PPC2_ISA300, 0x0000F800)
-GEN_VXFORM(vadduhm, 0, 1);
+GEN_VXFORM_V(vadduhm, MO_16, tcg_gen_gvec_add, 0, 1);
 GEN_VXFORM_DUAL(vadduhm, PPC_ALTIVEC, PPC_NONE,  \
                 vmul10ecuq, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM(vadduwm, 0, 2);
-GEN_VXFORM(vaddudm, 0, 3);
-GEN_VXFORM(vsububm, 0, 16);
-GEN_VXFORM(vsubuhm, 0, 17);
-GEN_VXFORM(vsubuwm, 0, 18);
-GEN_VXFORM(vsubudm, 0, 19);
+GEN_VXFORM_V(vadduwm, MO_32, tcg_gen_gvec_add, 0, 2);
+GEN_VXFORM_V(vaddudm, MO_64, tcg_gen_gvec_add, 0, 3);
+GEN_VXFORM_V(vsububm, MO_8, tcg_gen_gvec_sub, 0, 16);
+GEN_VXFORM_V(vsubuhm, MO_16, tcg_gen_gvec_sub, 0, 17);
+GEN_VXFORM_V(vsubuwm, MO_32, tcg_gen_gvec_sub, 0, 18);
+GEN_VXFORM_V(vsubudm, MO_64, tcg_gen_gvec_sub, 0, 19);
 GEN_VXFORM(vmaxub, 1, 0);
 GEN_VXFORM(vmaxuh, 1, 1);
 GEN_VXFORM(vmaxuw, 1, 2);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 19/34] target/ppc: convert vspltis[bhw] to use vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:31   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 20/34] target/ppc: convert vsplt[bhw] " Richard Henderson
                   ` (16 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 |  3 ---
 target/ppc/int_helper.c             | 15 ------------
 target/ppc/translate/vmx-impl.inc.c | 36 +++++++----------------------
 3 files changed, 8 insertions(+), 46 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 553ff500c8..2aa60e5d36 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -246,9 +246,6 @@ DEF_HELPER_3(vrld, void, avr, avr, avr)
 DEF_HELPER_3(vsl, void, avr, avr, avr)
 DEF_HELPER_3(vsr, void, avr, avr, avr)
 DEF_HELPER_4(vsldoi, void, avr, avr, avr, i32)
-DEF_HELPER_2(vspltisb, void, avr, i32)
-DEF_HELPER_2(vspltish, void, avr, i32)
-DEF_HELPER_2(vspltisw, void, avr, i32)
 DEF_HELPER_3(vspltb, void, avr, avr, i32)
 DEF_HELPER_3(vsplth, void, avr, avr, i32)
 DEF_HELPER_3(vspltw, void, avr, avr, i32)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 4547453ef1..e44c0d90ee 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2066,21 +2066,6 @@ VNEG(vnegw, s32)
 VNEG(vnegd, s64)
 #undef VNEG
 
-#define VSPLTI(suffix, element, splat_type)                     \
-    void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
-    {                                                           \
-        splat_type x = (int8_t)(splat << 3) >> 3;               \
-        int i;                                                  \
-                                                                \
-        for (i = 0; i < ARRAY_SIZE(r->element); i++) {          \
-            r->element[i] = x;                                  \
-        }                                                       \
-    }
-VSPLTI(b, s8, int8_t)
-VSPLTI(h, s16, int16_t)
-VSPLTI(w, s32, int32_t)
-#undef VSPLTI
-
 #define VSR(suffix, element, mask)                                      \
     void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
     {                                                                   \
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index e353d3f174..be638cdb1a 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -720,25 +720,21 @@ GEN_VXRFORM_DUAL(vcmpbfp, PPC_ALTIVEC, PPC_NONE, \
 GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
                  vcmpgtud, PPC_NONE, PPC2_ALTIVEC_207)
 
-#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
+#define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
 static void glue(gen_, name)(DisasContext *ctx)                         \
     {                                                                   \
-        TCGv_ptr rd;                                                    \
-        TCGv_i32 simm;                                                  \
+        int simm;                                                       \
         if (unlikely(!ctx->altivec_enabled)) {                          \
             gen_exception(ctx, POWERPC_EXCP_VPU);                       \
             return;                                                     \
         }                                                               \
-        simm = tcg_const_i32(SIMM5(ctx->opcode));                       \
-        rd = gen_avr_ptr(rD(ctx->opcode));                              \
-        gen_helper_##name (rd, simm);                                   \
-        tcg_temp_free_i32(simm);                                        \
-        tcg_temp_free_ptr(rd);                                          \
+        simm = SIMM5(ctx->opcode);                                      \
+        tcg_op(avr64_offset(rD(ctx->opcode), true), 16, 16, simm);      \
     }
 
-GEN_VXFORM_SIMM(vspltisb, 6, 12);
-GEN_VXFORM_SIMM(vspltish, 6, 13);
-GEN_VXFORM_SIMM(vspltisw, 6, 14);
+GEN_VXFORM_DUPI(vspltisb, tcg_gen_gvec_dup8i, 6, 12);
+GEN_VXFORM_DUPI(vspltish, tcg_gen_gvec_dup16i, 6, 13);
+GEN_VXFORM_DUPI(vspltisw, tcg_gen_gvec_dup32i, 6, 14);
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
@@ -818,22 +814,6 @@ GEN_VXFORM_NOA(vprtybw, 1, 24);
 GEN_VXFORM_NOA(vprtybd, 1, 24);
 GEN_VXFORM_NOA(vprtybq, 1, 24);
 
-#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
-static void glue(gen_, name)(DisasContext *ctx)                                 \
-    {                                                                   \
-        TCGv_ptr rd;                                                    \
-        TCGv_i32 simm;                                                  \
-        if (unlikely(!ctx->altivec_enabled)) {                          \
-            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
-            return;                                                     \
-        }                                                               \
-        simm = tcg_const_i32(SIMM5(ctx->opcode));                       \
-        rd = gen_avr_ptr(rD(ctx->opcode));                              \
-        gen_helper_##name (rd, simm);                                   \
-        tcg_temp_free_i32(simm);                                        \
-        tcg_temp_free_ptr(rd);                                          \
-    }
-
 #define GEN_VXFORM_UIMM(name, opc2, opc3)                               \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
     {                                                                   \
@@ -1255,7 +1235,7 @@ GEN_VXFORM_DUAL(vsldoi, PPC_ALTIVEC, PPC_NONE,
 #undef GEN_VXRFORM_DUAL
 #undef GEN_VXRFORM1
 #undef GEN_VXRFORM
-#undef GEN_VXFORM_SIMM
+#undef GEN_VXFORM_DUPI
 #undef GEN_VXFORM_NOA
 #undef GEN_VXFORM_UIMM
 #undef GEN_VAFORM_PAIRED
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 20/34] target/ppc: convert vsplt[bhw] to use vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 19/34] target/ppc: convert vspltis[bhw] " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:32   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 21/34] target/ppc: nand, nor, eqv are now generic " Richard Henderson
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 |  3 --
 target/ppc/int_helper.c             | 24 ---------------
 target/ppc/translate/vmx-impl.inc.c | 45 +++++++++++++++++------------
 3 files changed, 26 insertions(+), 46 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 2aa60e5d36..069daa9883 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -246,9 +246,6 @@ DEF_HELPER_3(vrld, void, avr, avr, avr)
 DEF_HELPER_3(vsl, void, avr, avr, avr)
 DEF_HELPER_3(vsr, void, avr, avr, avr)
 DEF_HELPER_4(vsldoi, void, avr, avr, avr, i32)
-DEF_HELPER_3(vspltb, void, avr, avr, i32)
-DEF_HELPER_3(vsplth, void, avr, avr, i32)
-DEF_HELPER_3(vspltw, void, avr, avr, i32)
 DEF_HELPER_3(vextractub, void, avr, avr, i32)
 DEF_HELPER_3(vextractuh, void, avr, avr, i32)
 DEF_HELPER_3(vextractuw, void, avr, avr, i32)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index e44c0d90ee..3bf0fdb6c5 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1918,30 +1918,6 @@ void helper_vslo(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 #endif
 }
 
-/* Experimental testing shows that hardware masks the immediate.  */
-#define _SPLAT_MASKED(element) (splat & (ARRAY_SIZE(r->element) - 1))
-#if defined(HOST_WORDS_BIGENDIAN)
-#define SPLAT_ELEMENT(element) _SPLAT_MASKED(element)
-#else
-#define SPLAT_ELEMENT(element)                                  \
-    (ARRAY_SIZE(r->element) - 1 - _SPLAT_MASKED(element))
-#endif
-#define VSPLT(suffix, element)                                          \
-    void helper_vsplt##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
-    {                                                                   \
-        uint32_t s = b->element[SPLAT_ELEMENT(element)];                \
-        int i;                                                          \
-                                                                        \
-        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            r->element[i] = s;                                          \
-        }                                                               \
-    }
-VSPLT(b, u8)
-VSPLT(h, u16)
-VSPLT(w, u32)
-#undef VSPLT
-#undef SPLAT_ELEMENT
-#undef _SPLAT_MASKED
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VINSERT(suffix, element)                                            \
     void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t index) \
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index be638cdb1a..529ae0e5f5 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -814,24 +814,31 @@ GEN_VXFORM_NOA(vprtybw, 1, 24);
 GEN_VXFORM_NOA(vprtybd, 1, 24);
 GEN_VXFORM_NOA(vprtybq, 1, 24);
 
-#define GEN_VXFORM_UIMM(name, opc2, opc3)                               \
-static void glue(gen_, name)(DisasContext *ctx)                                 \
-    {                                                                   \
-        TCGv_ptr rb, rd;                                                \
-        TCGv_i32 uimm;                                                  \
-        if (unlikely(!ctx->altivec_enabled)) {                          \
-            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
-            return;                                                     \
-        }                                                               \
-        uimm = tcg_const_i32(UIMM5(ctx->opcode));                       \
-        rb = gen_avr_ptr(rB(ctx->opcode));                              \
-        rd = gen_avr_ptr(rD(ctx->opcode));                              \
-        gen_helper_##name (rd, rb, uimm);                               \
-        tcg_temp_free_i32(uimm);                                        \
-        tcg_temp_free_ptr(rb);                                          \
-        tcg_temp_free_ptr(rd);                                          \
+static void gen_vsplt(DisasContext *ctx, int vece)
+{
+    int uimm, dofs, bofs;
+
+    if (unlikely(!ctx->altivec_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VPU);
+        return;
     }
 
+    uimm = UIMM5(ctx->opcode);
+    bofs = avr64_offset(rB(ctx->opcode), true);
+    dofs = avr64_offset(rD(ctx->opcode), true);
+
+    /* Experimental testing shows that hardware masks the immediate.  */
+    bofs += (uimm << vece) & 15;
+#ifndef HOST_WORDS_BIGENDIAN
+    bofs ^= 15;
+#endif
+
+    tcg_gen_gvec_dup_mem(vece, dofs, bofs, 16, 16);
+}
+
+#define GEN_VXFORM_VSPLT(name, vece, opc2, opc3) \
+static void glue(gen_, name)(DisasContext *ctx) { gen_vsplt(ctx, vece); }
+
 #define GEN_VXFORM_UIMM_ENV(name, opc2, opc3)                           \
 static void glue(gen_, name)(DisasContext *ctx)                         \
     {                                                                   \
@@ -873,9 +880,9 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
         tcg_temp_free_ptr(rd);                                          \
     }
 
-GEN_VXFORM_UIMM(vspltb, 6, 8);
-GEN_VXFORM_UIMM(vsplth, 6, 9);
-GEN_VXFORM_UIMM(vspltw, 6, 10);
+GEN_VXFORM_VSPLT(vspltb, MO_8, 6, 8);
+GEN_VXFORM_VSPLT(vsplth, MO_16, 6, 9);
+GEN_VXFORM_VSPLT(vspltw, MO_32, 6, 10);
 GEN_VXFORM_UIMM_SPLAT(vextractub, 6, 8, 15);
 GEN_VXFORM_UIMM_SPLAT(vextractuh, 6, 9, 14);
 GEN_VXFORM_UIMM_SPLAT(vextractuw, 6, 10, 12);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 21/34] target/ppc: nand, nor, eqv are now generic vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (19 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 20/34] target/ppc: convert vsplt[bhw] " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:32   ` David Gibson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 22/34] target/ppc: convert VSX logical operations to " Richard Henderson
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate/vmx-impl.inc.c | 26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 529ae0e5f5..329131d30b 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -277,34 +277,14 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
            16, 16);                                                     \
 }
 
-#define GEN_VXFORM_VN(name, vece, tcg_op, opc2, opc3)                   \
-static void glue(gen_, name)(DisasContext *ctx)                         \
-{                                                                       \
-    if (unlikely(!ctx->altivec_enabled)) {                              \
-        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
-        return;                                                         \
-    }                                                                   \
-                                                                        \
-    tcg_op(vece,                                                        \
-           avr64_offset(rD(ctx->opcode), true),                         \
-           avr64_offset(rA(ctx->opcode), true),                         \
-           avr64_offset(rB(ctx->opcode), true),                         \
-           16, 16);                                                     \
-                                                                        \
-    tcg_gen_gvec_not(vece,                                              \
-                     avr64_offset(rD(ctx->opcode), true),               \
-                     avr64_offset(rD(ctx->opcode), true),               \
-                     16, 16);                                           \
-}
-
 /* Logical operations */
 GEN_VXFORM_V(vand, MO_64, tcg_gen_gvec_and, 2, 16);
 GEN_VXFORM_V(vandc, MO_64, tcg_gen_gvec_andc, 2, 17);
 GEN_VXFORM_V(vor, MO_64, tcg_gen_gvec_or, 2, 18);
 GEN_VXFORM_V(vxor, MO_64, tcg_gen_gvec_xor, 2, 19);
-GEN_VXFORM_VN(vnor, MO_64, tcg_gen_gvec_or, 2, 20);
-GEN_VXFORM_VN(veqv, MO_64, tcg_gen_gvec_xor, 2, 26);
-GEN_VXFORM_VN(vnand, MO_64, tcg_gen_gvec_and, 2, 22);
+GEN_VXFORM_V(vnor, MO_64, tcg_gen_gvec_nor, 2, 20);
+GEN_VXFORM_V(veqv, MO_64, tcg_gen_gvec_eqv, 2, 26);
+GEN_VXFORM_V(vnand, MO_64, tcg_gen_gvec_nand, 2, 22);
 GEN_VXFORM_V(vorc, MO_64, tcg_gen_gvec_orc, 2, 21);
 
 #define GEN_VXFORM(name, opc2, opc3)                                    \
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 22/34] target/ppc: convert VSX logical operations to vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (20 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 21/34] target/ppc: nand, nor, eqv are now generic " Richard Henderson
@ 2018-12-18  6:38 ` Richard Henderson
  2018-12-19  6:33   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 23/34] target/ppc: convert xxspltib " Richard Henderson
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate/vsx-impl.inc.c | 43 ++++++++++++-----------------
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 1608ad48b1..8ab1290026 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -10,6 +10,11 @@ static inline void set_vsr(int n, TCGv_i64 src)
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
+static inline int vsr_full_offset(int n)
+{
+    return offsetof(CPUPPCState, vsr[n].u64[0]);
+}
+
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
 {
     if (n < 32) {
@@ -1214,40 +1219,26 @@ static void gen_xxbrw(DisasContext *ctx)
     tcg_temp_free_i64(xbl);
 }
 
-#define VSX_LOGICAL(name, tcg_op)                                    \
+#define VSX_LOGICAL(name, vece, tcg_op)                              \
 static void glue(gen_, name)(DisasContext * ctx)                     \
     {                                                                \
-        TCGv_i64 t0;                                                 \
-        TCGv_i64 t1;                                                 \
-        TCGv_i64 t2;                                                 \
         if (unlikely(!ctx->vsx_enabled)) {                           \
             gen_exception(ctx, POWERPC_EXCP_VSXU);                   \
             return;                                                  \
         }                                                            \
-        t0 = tcg_temp_new_i64();                                     \
-        t1 = tcg_temp_new_i64();                                     \
-        t2 = tcg_temp_new_i64();                                     \
-        get_cpu_vsrh(t0, xA(ctx->opcode));                           \
-        get_cpu_vsrh(t1, xB(ctx->opcode));                           \
-        tcg_op(t2, t0, t1);                                          \
-        set_cpu_vsrh(xT(ctx->opcode), t2);                           \
-        get_cpu_vsrl(t0, xA(ctx->opcode));                           \
-        get_cpu_vsrl(t1, xB(ctx->opcode));                           \
-        tcg_op(t2, t0, t1);                                          \
-        set_cpu_vsrl(xT(ctx->opcode), t2);                           \
-        tcg_temp_free_i64(t0);                                       \
-        tcg_temp_free_i64(t1);                                       \
-        tcg_temp_free_i64(t2);                                       \
+        tcg_op(vece, vsr_full_offset(xT(ctx->opcode)),               \
+               vsr_full_offset(xA(ctx->opcode)),                     \
+               vsr_full_offset(xB(ctx->opcode)), 16, 16);            \
     }
 
-VSX_LOGICAL(xxland, tcg_gen_and_i64)
-VSX_LOGICAL(xxlandc, tcg_gen_andc_i64)
-VSX_LOGICAL(xxlor, tcg_gen_or_i64)
-VSX_LOGICAL(xxlxor, tcg_gen_xor_i64)
-VSX_LOGICAL(xxlnor, tcg_gen_nor_i64)
-VSX_LOGICAL(xxleqv, tcg_gen_eqv_i64)
-VSX_LOGICAL(xxlnand, tcg_gen_nand_i64)
-VSX_LOGICAL(xxlorc, tcg_gen_orc_i64)
+VSX_LOGICAL(xxland, MO_64, tcg_gen_gvec_and)
+VSX_LOGICAL(xxlandc, MO_64, tcg_gen_gvec_andc)
+VSX_LOGICAL(xxlor, MO_64, tcg_gen_gvec_or)
+VSX_LOGICAL(xxlxor, MO_64, tcg_gen_gvec_xor)
+VSX_LOGICAL(xxlnor, MO_64, tcg_gen_gvec_nor)
+VSX_LOGICAL(xxleqv, MO_64, tcg_gen_gvec_eqv)
+VSX_LOGICAL(xxlnand, MO_64, tcg_gen_gvec_nand)
+VSX_LOGICAL(xxlorc, MO_64, tcg_gen_gvec_orc)
 
 #define VSX_XXMRG(name, high)                               \
 static void glue(gen_, name)(DisasContext * ctx)            \
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 23/34] target/ppc: convert xxspltib to vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (21 preceding siblings ...)
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 22/34] target/ppc: convert VSX logical operations to " Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:34   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 24/34] target/ppc: convert xxspltw " Richard Henderson
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate/vsx-impl.inc.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 8ab1290026..d88d6bbd74 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1356,9 +1356,10 @@ static void gen_xxspltw(DisasContext *ctx)
 
 static void gen_xxspltib(DisasContext *ctx)
 {
-    unsigned char uim8 = IMM8(ctx->opcode);
-    TCGv_i64 vsr = tcg_temp_new_i64();
-    if (xS(ctx->opcode) < 32) {
+    uint8_t uim8 = IMM8(ctx->opcode);
+    int rt = xT(ctx->opcode);
+
+    if (rt < 32) {
         if (unlikely(!ctx->altivec_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VPU);
             return;
@@ -1369,10 +1370,7 @@ static void gen_xxspltib(DisasContext *ctx)
             return;
         }
     }
-    tcg_gen_movi_i64(vsr, pattern(uim8));
-    set_cpu_vsrh(xT(ctx->opcode), vsr);
-    set_cpu_vsrl(xT(ctx->opcode), vsr);
-    tcg_temp_free_i64(vsr);
+    tcg_gen_gvec_dup8i(vsr_full_offset(rt), 16, 16, uim8);
 }
 
 static void gen_xxsldwi(DisasContext *ctx)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 24/34] target/ppc: convert xxspltw to vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (22 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 23/34] target/ppc: convert xxspltib " Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:35   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 25/34] target/ppc: convert xxsel " Richard Henderson
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate/vsx-impl.inc.c | 36 +++++++++--------------------
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index d88d6bbd74..a040038ed4 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1318,38 +1318,24 @@ static void gen_xxsel(DisasContext * ctx)
 
 static void gen_xxspltw(DisasContext *ctx)
 {
-    TCGv_i64 b, b2;
-    TCGv_i64 vsr;
-
-    vsr = tcg_temp_new_i64();
-    if (UIM(ctx->opcode) & 2) {
-        get_cpu_vsrl(vsr, xB(ctx->opcode));
-    } else {
-        get_cpu_vsrh(vsr, xB(ctx->opcode));
-    }
+    int rt = xT(ctx->opcode);
+    int rb = xB(ctx->opcode);
+    int uim = UIM(ctx->opcode);
+    int tofs, bofs;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
 
-    b = tcg_temp_new_i64();
-    b2 = tcg_temp_new_i64();
+    tofs = vsr_full_offset(rt);
+    bofs = vsr_full_offset(rb);
+    bofs += uim << MO_32;
+#ifndef HOST_WORDS_BIG_ENDIAN
+    bofs ^= 8 | 4;
+#endif
 
-    if (UIM(ctx->opcode) & 1) {
-        tcg_gen_ext32u_i64(b, vsr);
-    } else {
-        tcg_gen_shri_i64(b, vsr, 32);
-    }
-
-    tcg_gen_shli_i64(b2, b, 32);
-    tcg_gen_or_i64(vsr, b, b2);
-    set_cpu_vsrh(xT(ctx->opcode), vsr);
-    set_cpu_vsrl(xT(ctx->opcode), vsr);
-
-    tcg_temp_free_i64(vsr);
-    tcg_temp_free_i64(b);
-    tcg_temp_free_i64(b2);
+    tcg_gen_gvec_dup_mem(MO_32, tofs, bofs, 16, 16);
 }
 
 #define pattern(x) (((x) & 0xff) * (~(uint64_t)0 / 0xff))
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 25/34] target/ppc: convert xxsel to vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (23 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 24/34] target/ppc: convert xxspltw " Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:35   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 26/34] target/ppc: Pass integer to helper_mtvscr Richard Henderson
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate/vsx-impl.inc.c | 55 ++++++++++++++---------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index a040038ed4..dc32471cd7 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1280,40 +1280,39 @@ static void glue(gen_, name)(DisasContext * ctx)            \
 VSX_XXMRG(xxmrghw, 1)
 VSX_XXMRG(xxmrglw, 0)
 
+static void xxsel_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c)
+{
+    tcg_gen_and_i64(b, b, c);
+    tcg_gen_andc_i64(a, a, c);
+    tcg_gen_or_i64(t, a, b);
+}
+
+static void xxsel_vec(unsigned vece, TCGv_vec t, TCGv_vec a,
+                      TCGv_vec b, TCGv_vec c)
+{
+    tcg_gen_and_vec(vece, b, b, c);
+    tcg_gen_andc_vec(vece, a, a, c);
+    tcg_gen_or_vec(vece, t, a, b);
+}
+
 static void gen_xxsel(DisasContext * ctx)
 {
-    TCGv_i64 a, b, c, tmp;
+    static const GVecGen4 g = {
+        .fni8 = xxsel_i64,
+        .fniv = xxsel_vec,
+        .vece = MO_64,
+    };
+    int rt = xT(ctx->opcode);
+    int ra = xA(ctx->opcode);
+    int rb = xB(ctx->opcode);
+    int rc = xC(ctx->opcode);
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
-    a = tcg_temp_new_i64();
-    b = tcg_temp_new_i64();
-    c = tcg_temp_new_i64();
-    tmp = tcg_temp_new_i64();
-
-    get_cpu_vsrh(a, xA(ctx->opcode));
-    get_cpu_vsrh(b, xB(ctx->opcode));
-    get_cpu_vsrh(c, xC(ctx->opcode));
-
-    tcg_gen_and_i64(b, b, c);
-    tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(tmp, a, b);
-    set_cpu_vsrh(xT(ctx->opcode), tmp);
-
-    get_cpu_vsrl(a, xA(ctx->opcode));
-    get_cpu_vsrl(b, xB(ctx->opcode));
-    get_cpu_vsrl(c, xC(ctx->opcode));
-
-    tcg_gen_and_i64(b, b, c);
-    tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(tmp, a, b);
-    set_cpu_vsrl(xT(ctx->opcode), tmp);
-
-    tcg_temp_free_i64(a);
-    tcg_temp_free_i64(b);
-    tcg_temp_free_i64(c);
-    tcg_temp_free_i64(tmp);
+    tcg_gen_gvec_4(vsr_full_offset(rt), vsr_full_offset(ra),
+                   vsr_full_offset(rb), vsr_full_offset(rc), 16, 16, &g);
 }
 
 static void gen_xxspltw(DisasContext *ctx)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 26/34] target/ppc: Pass integer to helper_mtvscr
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (24 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 25/34] target/ppc: convert xxsel " Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:37   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 27/34] target/ppc: Use helper_mtvscr for reset and gdb Richard Henderson
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

We can re-use this helper elsewhere if we're not passing
in an entire vector register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 |  2 +-
 target/ppc/int_helper.c             | 10 +++-------
 target/ppc/translate/vmx-impl.inc.c | 17 +++++++++++++----
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 069daa9883..b3ffe28103 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -294,7 +294,7 @@ DEF_HELPER_5(vmsumuhs, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr)
 DEF_HELPER_4(vmladduhm, void, avr, avr, avr, avr)
-DEF_HELPER_2(mtvscr, void, env, avr)
+DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32)
 DEF_HELPER_3(lvebx, void, env, avr, tl)
 DEF_HELPER_3(lvehx, void, env, avr, tl)
 DEF_HELPER_3(lvewx, void, env, avr, tl)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 3bf0fdb6c5..0443f33cd2 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -469,14 +469,10 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh)
     }
 }
 
-void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)
+void helper_mtvscr(CPUPPCState *env, uint32_t vscr)
 {
-#if defined(HOST_WORDS_BIGENDIAN)
-    env->vscr = r->u32[3];
-#else
-    env->vscr = r->u32[0];
-#endif
-    set_flush_to_zero(vscr_nj, &env->vec_status);
+    env->vscr = vscr;
+    set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status);
 }
 
 void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 329131d30b..ab6da3aa55 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -196,14 +196,23 @@ static void gen_mfvscr(DisasContext *ctx)
 
 static void gen_mtvscr(DisasContext *ctx)
 {
-    TCGv_ptr p;
+    TCGv_i32 val;
+    int bofs;
+
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
         return;
     }
-    p = gen_avr_ptr(rB(ctx->opcode));
-    gen_helper_mtvscr(cpu_env, p);
-    tcg_temp_free_ptr(p);
+
+    val = tcg_temp_new_i32();
+    bofs = avr64_offset(rB(ctx->opcode), true);
+#ifdef HOST_WORDS_BIGENDIAN
+    bofs += 3 * 4;
+#endif
+
+    tcg_gen_ld_i32(val, cpu_env, bofs);
+    gen_helper_mtvscr(cpu_env, val);
+    tcg_temp_free_i32(val);
 }
 
 #define GEN_VX_VMUL10(name, add_cin, ret_carry)                         \
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 27/34] target/ppc: Use helper_mtvscr for reset and gdb
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (25 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 26/34] target/ppc: Pass integer to helper_mtvscr Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:38   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 28/34] target/ppc: Remove vscr_nj and vscr_sat Richard Henderson
                   ` (8 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Not setting flush_to_zero from gdb_set_avr_reg was a bug.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate_init.inc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index b83097141c..292b1df700 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -601,10 +601,9 @@ static void spr_write_excp_vector(DisasContext *ctx, int sprn, int gprn)
 
 static inline void vscr_init(CPUPPCState *env, uint32_t val)
 {
-    env->vscr = val;
     /* Altivec always uses round-to-nearest */
     set_float_rounding_mode(float_round_nearest_even, &env->vec_status);
-    set_flush_to_zero(vscr_nj, &env->vec_status);
+    helper_mtvscr(env, val);
 }
 
 #ifdef CONFIG_USER_ONLY
@@ -9556,7 +9555,7 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
     }
     if (n == 32) {
         ppc_maybe_bswap_register(env, mem_buf, 4);
-        env->vscr = ldl_p(mem_buf);
+        helper_mtvscr(env, ldl_p(mem_buf));
         return 4;
     }
     if (n == 33) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 28/34] target/ppc: Remove vscr_nj and vscr_sat
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (26 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 27/34] target/ppc: Use helper_mtvscr for reset and gdb Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:38   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 29/34] target/ppc: Add helper_mfvscr Richard Henderson
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

These macros are no longer used.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c8f449081d..a2fe6058b1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -700,8 +700,6 @@ enum {
 /* Vector status and control register */
 #define VSCR_NJ		16 /* Vector non-java */
 #define VSCR_SAT	0 /* Vector saturation */
-#define vscr_nj		(((env->vscr) >> VSCR_NJ)	& 0x1)
-#define vscr_sat	(((env->vscr) >> VSCR_SAT)	& 0x1)
 
 /*****************************************************************************/
 /* BookE e500 MMU registers */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 29/34] target/ppc: Add helper_mfvscr
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (27 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 28/34] target/ppc: Remove vscr_nj and vscr_sat Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:39   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 30/34] target/ppc: Use mtvscr/mfvscr for vmstate Richard Henderson
                   ` (6 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

This is required before changing the representation of the register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 | 1 +
 target/ppc/arch_dump.c              | 3 ++-
 target/ppc/int_helper.c             | 5 +++++
 target/ppc/translate/vmx-impl.inc.c | 2 +-
 target/ppc/translate_init.inc.c     | 2 +-
 5 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index b3ffe28103..7dbb08b9dd 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -295,6 +295,7 @@ DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr)
 DEF_HELPER_4(vmladduhm, void, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32)
+DEF_HELPER_FLAGS_1(mfvscr, TCG_CALL_NO_RWG, i32, env)
 DEF_HELPER_3(lvebx, void, env, avr, tl)
 DEF_HELPER_3(lvehx, void, env, avr, tl)
 DEF_HELPER_3(lvewx, void, env, avr, tl)
diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
index c272d0d3d4..f753798789 100644
--- a/target/ppc/arch_dump.c
+++ b/target/ppc/arch_dump.c
@@ -17,6 +17,7 @@
 #include "elf.h"
 #include "sysemu/dump.h"
 #include "sysemu/kvm.h"
+#include "exec/helper-proto.h"
 
 #ifdef TARGET_PPC64
 #define ELFCLASS ELFCLASS64
@@ -173,7 +174,7 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
             vmxregset->avr[i].u64[1] = cpu->env.vsr[32 + i].u64[1];
         }
     }
-    vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
+    vmxregset->vscr.u32[3] = cpu_to_dump32(s, helper_mfvscr(&cpu->env));
 }
 
 static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 0443f33cd2..75201bbba6 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -475,6 +475,11 @@ void helper_mtvscr(CPUPPCState *env, uint32_t vscr)
     set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status);
 }
 
+uint32_t helper_mfvscr(CPUPPCState *env)
+{
+    return env->vscr;
+}
+
 void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     int i;
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index ab6da3aa55..1c0c461241 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -187,7 +187,7 @@ static void gen_mfvscr(DisasContext *ctx)
     tcg_gen_movi_i64(avr, 0);
     set_avr64(rD(ctx->opcode), avr, true);
     t = tcg_temp_new_i32();
-    tcg_gen_ld_i32(t, cpu_env, offsetof(CPUPPCState, vscr));
+    gen_helper_mfvscr(t, cpu_env);
     tcg_gen_extu_i32_i64(avr, t);
     set_avr64(rD(ctx->opcode), avr, false);
     tcg_temp_free_i32(t);
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 292b1df700..353285c6bd 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -9527,7 +9527,7 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
         return 16;
     }
     if (n == 32) {
-        stl_p(mem_buf, env->vscr);
+        stl_p(mem_buf, helper_mfvscr(env));
         ppc_maybe_bswap_register(env, mem_buf, 4);
         return 4;
     }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 30/34] target/ppc: Use mtvscr/mfvscr for vmstate
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (28 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 29/34] target/ppc: Add helper_mfvscr Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:40   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 31/34] target/ppc: Add set_vscr_sat Richard Henderson
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

This is required before changing the representation of the register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/machine.c | 44 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 451cf376b4..3c27a89166 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -10,6 +10,7 @@
 #include "migration/cpu.h"
 #include "qapi/error.h"
 #include "kvm_ppc.h"
+#include "exec/helper-proto.h"
 
 static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
 {
@@ -17,7 +18,7 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
     CPUPPCState *env = &cpu->env;
     unsigned int i, j;
     target_ulong sdr1;
-    uint32_t fpscr;
+    uint32_t fpscr, vscr;
 #if defined(TARGET_PPC64)
     int32_t slb_nr;
 #endif
@@ -84,7 +85,8 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
     if (!cpu->vhyp) {
         ppc_store_sdr1(env, sdr1);
     }
-    qemu_get_be32s(f, &env->vscr);
+    qemu_get_be32s(f, &vscr);
+    helper_mtvscr(env, vscr);
     qemu_get_be64s(f, &env->spe_acc);
     qemu_get_be32s(f, &env->spe_fscr);
     qemu_get_betls(f, &env->msr_mask);
@@ -429,6 +431,28 @@ static bool altivec_needed(void *opaque)
     return (cpu->env.insns_flags & PPC_ALTIVEC);
 }
 
+static int get_vscr(QEMUFile *f, void *opaque, size_t size,
+                    const VMStateField *field)
+{
+    PowerPCCPU *cpu = opaque;
+    helper_mtvscr(&cpu->env, qemu_get_be32(f));
+    return 0;
+}
+
+static int put_vscr(QEMUFile *f, void *opaque, size_t size,
+                    const VMStateField *field, QJSON *vmdesc)
+{
+    PowerPCCPU *cpu = opaque;
+    qemu_put_be32(f, helper_mfvscr(&cpu->env));
+    return 0;
+}
+
+static const VMStateInfo vmstate_vscr = {
+    .name = "cpu/altivec/vscr",
+    .get = get_vscr,
+    .put = put_vscr,
+};
+
 static const VMStateDescription vmstate_altivec = {
     .name = "cpu/altivec",
     .version_id = 1,
@@ -436,7 +460,21 @@ static const VMStateDescription vmstate_altivec = {
     .needed = altivec_needed,
     .fields = (VMStateField[]) {
         VMSTATE_AVR_ARRAY(env.vsr, PowerPCCPU, 32),
-        VMSTATE_UINT32(env.vscr, PowerPCCPU),
+        /*
+         * Save the architecture value of the vscr, not the internally
+         * expanded version.  Since this architecture value does not
+         * exist in memory to be stored, this requires a but of hoop
+         * jumping.  We want OFFSET=0 so that we effectively pass CPU
+         * to the helper functions.
+         */
+        {
+            .name = "vscr",
+            .version_id = 0,
+            .size = sizeof(uint32_t),
+            .info = &vmstate_vscr,
+            .flags = VMS_SINGLE,
+            .offset = 0
+        },
         VMSTATE_END_OF_LIST()
     },
 };
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 31/34] target/ppc: Add set_vscr_sat
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (29 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 30/34] target/ppc: Use mtvscr/mfvscr for vmstate Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:40   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 32/34] target/ppc: Split out VSCR_SAT to a vector field Richard Henderson
                   ` (4 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

This is required before changing the representation of the register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 75201bbba6..38aa3e85a6 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -480,6 +480,11 @@ uint32_t helper_mfvscr(CPUPPCState *env)
     return env->vscr;
 }
 
+static inline void set_vscr_sat(CPUPPCState *env)
+{
+    env->vscr |= 1 << VSCR_SAT;
+}
+
 void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     int i;
@@ -593,7 +598,7 @@ VARITHFPFMA(nmsubfp, float_muladd_negate_result | float_muladd_negate_c);
             }                                                           \
         }                                                               \
         if (sat) {                                                      \
-            env->vscr |= (1 << VSCR_SAT);                               \
+            set_vscr_sat(env);                                          \
         }                                                               \
     }
 #define VARITHSAT_SIGNED(suffix, element, optype, cvt)          \
@@ -865,7 +870,7 @@ void helper_vcmpbfp_dot(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
             }                                                           \
         }                                                               \
         if (sat) {                                                      \
-            env->vscr |= (1 << VSCR_SAT);                               \
+            set_vscr_sat(env);                                          \
         }                                                               \
     }
 VCT(uxs, cvtsduw, u32)
@@ -916,7 +921,7 @@ void helper_vmhaddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -933,7 +938,7 @@ void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -1061,7 +1066,7 @@ void helper_vmsumshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -1114,7 +1119,7 @@ void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -1633,7 +1638,7 @@ void helper_vpkpx(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
         }                                                               \
         *r = result;                                                    \
         if (dosat && sat) {                                             \
-            env->vscr |= (1 << VSCR_SAT);                               \
+            set_vscr_sat(env);                                          \
         }                                                               \
     }
 #define I(x, y) (x)
@@ -2106,7 +2111,7 @@ void helper_vsumsws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     *r = result;
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -2133,7 +2138,7 @@ void helper_vsum2sws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
     *r = result;
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -2152,7 +2157,7 @@ void helper_vsum4sbs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -2169,7 +2174,7 @@ void helper_vsum4shs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
@@ -2188,7 +2193,7 @@ void helper_vsum4ubs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     }
 
     if (sat) {
-        env->vscr |= (1 << VSCR_SAT);
+        set_vscr_sat(env);
     }
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 32/34] target/ppc: Split out VSCR_SAT to a vector field
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (30 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 31/34] target/ppc: Add set_vscr_sat Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:41   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 33/34] target/ppc: convert vadd*s and vsub*s to vector operations Richard Henderson
                   ` (3 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Change the representation of VSCR_SAT such that it is easy
to set from vector code.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h        |  4 +++-
 target/ppc/int_helper.c | 11 ++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index a2fe6058b1..26d2e16720 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1063,10 +1063,12 @@ struct CPUPPCState {
     /* Special purpose registers */
     target_ulong spr[1024];
     ppc_spr_t spr_cb[1024];
-    /* Vector status and control register */
+    /* Vector status and control register, minus VSCR_SAT.  */
     uint32_t vscr;
     /* VSX registers (including FP and AVR) */
     ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
+    /* Non-zero if and only if VSCR_SAT should be set.  */
+    ppc_vsr_t vscr_sat;
     /* SPE registers */
     uint64_t spe_acc;
     uint32_t spe_fscr;
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 38aa3e85a6..9dbcbcd87a 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -471,18 +471,23 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh)
 
 void helper_mtvscr(CPUPPCState *env, uint32_t vscr)
 {
-    env->vscr = vscr;
+    env->vscr = vscr & ~(1u << VSCR_SAT);
+    /* Which bit we set is completely arbitrary, but clear the rest.  */
+    env->vscr_sat.u64[0] = vscr & (1u << VSCR_SAT);
+    env->vscr_sat.u64[1] = 0;
     set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status);
 }
 
 uint32_t helper_mfvscr(CPUPPCState *env)
 {
-    return env->vscr;
+    uint32_t sat = (env->vscr_sat.u64[0] | env->vscr_sat.u64[1]) != 0;
+    return env->vscr | (sat << VSCR_SAT);
 }
 
 static inline void set_vscr_sat(CPUPPCState *env)
 {
-    env->vscr |= 1 << VSCR_SAT;
+    /* The choice of non-zero value is arbitrary.  */
+    env->vscr_sat.u32[0] = 1;
 }
 
 void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 33/34] target/ppc: convert vadd*s and vsub*s to vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (31 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 32/34] target/ppc: Split out VSCR_SAT to a vector field Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:42   ` David Gibson
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 34/34] target/ppc: convert vmin* and vmax* " Richard Henderson
                   ` (2 subsequent siblings)
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 | 24 ++++++------
 target/ppc/int_helper.c             | 18 ++-------
 target/ppc/translate/vmx-impl.inc.c | 57 +++++++++++++++++++++++------
 3 files changed, 61 insertions(+), 38 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 7dbb08b9dd..3daf6bf863 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -219,18 +219,18 @@ DEF_HELPER_2(vprtybq, void, avr, avr)
 DEF_HELPER_3(vsubcuw, void, avr, avr, avr)
 DEF_HELPER_2(lvsl, void, avr, tl)
 DEF_HELPER_2(lvsr, void, avr, tl)
-DEF_HELPER_4(vaddsbs, void, env, avr, avr, avr)
-DEF_HELPER_4(vaddshs, void, env, avr, avr, avr)
-DEF_HELPER_4(vaddsws, void, env, avr, avr, avr)
-DEF_HELPER_4(vsubsbs, void, env, avr, avr, avr)
-DEF_HELPER_4(vsubshs, void, env, avr, avr, avr)
-DEF_HELPER_4(vsubsws, void, env, avr, avr, avr)
-DEF_HELPER_4(vaddubs, void, env, avr, avr, avr)
-DEF_HELPER_4(vadduhs, void, env, avr, avr, avr)
-DEF_HELPER_4(vadduws, void, env, avr, avr, avr)
-DEF_HELPER_4(vsububs, void, env, avr, avr, avr)
-DEF_HELPER_4(vsubuhs, void, env, avr, avr, avr)
-DEF_HELPER_4(vsubuws, void, env, avr, avr, avr)
+DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vsubsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vsubshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vsubsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vaddubs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vadduhs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vadduws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vsububs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vsubuhs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_5(vsubuws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
 DEF_HELPER_3(vadduqm, void, avr, avr, avr)
 DEF_HELPER_4(vaddecuq, void, avr, avr, avr, avr)
 DEF_HELPER_4(vaddeuqm, void, avr, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 9dbcbcd87a..22671c71e5 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -583,27 +583,17 @@ VARITHFPFMA(nmsubfp, float_muladd_negate_result | float_muladd_negate_c);
     }
 
 #define VARITHSAT_DO(name, op, optype, cvt, element)                    \
-    void helper_v##name(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,   \
-                        ppc_avr_t *b)                                   \
+    void helper_v##name(ppc_avr_t *r, ppc_avr_t *vscr_sat,              \
+                        ppc_avr_t *a, ppc_avr_t *b, uint32_t desc)      \
     {                                                                   \
         int sat = 0;                                                    \
         int i;                                                          \
                                                                         \
         for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            switch (sizeof(r->element[0])) {                            \
-            case 1:                                                     \
-                VARITHSAT_CASE(optype, op, cvt, element);               \
-                break;                                                  \
-            case 2:                                                     \
-                VARITHSAT_CASE(optype, op, cvt, element);               \
-                break;                                                  \
-            case 4:                                                     \
-                VARITHSAT_CASE(optype, op, cvt, element);               \
-                break;                                                  \
-            }                                                           \
+            VARITHSAT_CASE(optype, op, cvt, element);                   \
         }                                                               \
         if (sat) {                                                      \
-            set_vscr_sat(env);                                          \
+            vscr_sat->u32[0] = 1;                                       \
         }                                                               \
     }
 #define VARITHSAT_SIGNED(suffix, element, optype, cvt)          \
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 1c0c461241..c6a53a9f63 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -548,22 +548,55 @@ GEN_VXFORM(vslo, 6, 16);
 GEN_VXFORM(vsro, 6, 17);
 GEN_VXFORM(vaddcuw, 0, 6);
 GEN_VXFORM(vsubcuw, 0, 22);
-GEN_VXFORM_ENV(vaddubs, 0, 8);
+
+#define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
+static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
+                                         TCGv_vec sat, TCGv_vec a,      \
+                                         TCGv_vec b)                    \
+{                                                                       \
+    TCGv_vec x = tcg_temp_new_vec_matching(t);                          \
+    glue(glue(tcg_gen_, NORM), _vec)(VECE, x, a, b);                    \
+    glue(glue(tcg_gen_, SAT), _vec)(VECE, t, a, b);                     \
+    tcg_gen_cmp_vec(TCG_COND_NE, VECE, x, x, t);                        \
+    tcg_gen_or_vec(VECE, sat, sat, x);                                  \
+    tcg_temp_free_vec(x);                                               \
+}                                                                       \
+static void glue(gen_, NAME)(DisasContext *ctx)                         \
+{                                                                       \
+    static const GVecGen4 g = {                                         \
+        .fniv = glue(glue(gen_, NAME), _vec),                           \
+        .fno = glue(gen_helper_, NAME),                                 \
+        .opc = glue(glue(INDEX_op_, NORM), _vec),                       \
+        .write_aofs = true,                                             \
+        .vece = VECE,                                                   \
+    };                                                                  \
+    if (unlikely(!ctx->altivec_enabled)) {                              \
+        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
+        return;                                                         \
+    }                                                                   \
+    tcg_gen_gvec_4(avr64_offset(rD(ctx->opcode), true),                 \
+                   offsetof(CPUPPCState, vscr_sat),                     \
+                   avr64_offset(rA(ctx->opcode), true),                 \
+                   avr64_offset(rB(ctx->opcode), true),                 \
+                   16, 16, &g);                                         \
+}
+
+GEN_VXFORM_SAT(vaddubs, MO_8, add, usadd, 0, 8);
 GEN_VXFORM_DUAL_EXT(vaddubs, PPC_ALTIVEC, PPC_NONE, 0,       \
                     vmul10uq, PPC_NONE, PPC2_ISA300, 0x0000F800)
-GEN_VXFORM_ENV(vadduhs, 0, 9);
+GEN_VXFORM_SAT(vadduhs, MO_16, add, usadd, 0, 9);
 GEN_VXFORM_DUAL(vadduhs, PPC_ALTIVEC, PPC_NONE, \
                 vmul10euq, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM_ENV(vadduws, 0, 10);
-GEN_VXFORM_ENV(vaddsbs, 0, 12);
-GEN_VXFORM_ENV(vaddshs, 0, 13);
-GEN_VXFORM_ENV(vaddsws, 0, 14);
-GEN_VXFORM_ENV(vsububs, 0, 24);
-GEN_VXFORM_ENV(vsubuhs, 0, 25);
-GEN_VXFORM_ENV(vsubuws, 0, 26);
-GEN_VXFORM_ENV(vsubsbs, 0, 28);
-GEN_VXFORM_ENV(vsubshs, 0, 29);
-GEN_VXFORM_ENV(vsubsws, 0, 30);
+GEN_VXFORM_SAT(vadduws, MO_32, add, usadd, 0, 10);
+GEN_VXFORM_SAT(vaddsbs, MO_8, add, ssadd, 0, 12);
+GEN_VXFORM_SAT(vaddshs, MO_16, add, ssadd, 0, 13);
+GEN_VXFORM_SAT(vaddsws, MO_32, add, ssadd, 0, 14);
+GEN_VXFORM_SAT(vsububs, MO_8, sub, ussub, 0, 24);
+GEN_VXFORM_SAT(vsubuhs, MO_16, sub, ussub, 0, 25);
+GEN_VXFORM_SAT(vsubuws, MO_32, sub, ussub, 0, 26);
+GEN_VXFORM_SAT(vsubsbs, MO_8, sub, sssub, 0, 28);
+GEN_VXFORM_SAT(vsubshs, MO_16, sub, sssub, 0, 29);
+GEN_VXFORM_SAT(vsubsws, MO_32, sub, sssub, 0, 30);
 GEN_VXFORM(vadduqm, 0, 4);
 GEN_VXFORM(vaddcuq, 0, 5);
 GEN_VXFORM3(vaddeuqm, 30, 0);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 34/34] target/ppc: convert vmin* and vmax* to vector operations
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (32 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 33/34] target/ppc: convert vadd*s and vsub*s to vector operations Richard Henderson
@ 2018-12-18  6:39 ` Richard Henderson
  2018-12-19  6:42   ` David Gibson
  2018-12-18  9:49 ` [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Mark Cave-Ayland
  2019-01-03 18:31 ` Mark Cave-Ayland
  35 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 | 16 ---------------
 target/ppc/int_helper.c             | 27 ------------------------
 target/ppc/translate/vmx-impl.inc.c | 32 ++++++++++++++---------------
 3 files changed, 16 insertions(+), 59 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 3daf6bf863..18910d18a4 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -117,22 +117,6 @@ DEF_HELPER_3(vabsduw, void, avr, avr, avr)
 DEF_HELPER_3(vavgsb, void, avr, avr, avr)
 DEF_HELPER_3(vavgsh, void, avr, avr, avr)
 DEF_HELPER_3(vavgsw, void, avr, avr, avr)
-DEF_HELPER_3(vminsb, void, avr, avr, avr)
-DEF_HELPER_3(vminsh, void, avr, avr, avr)
-DEF_HELPER_3(vminsw, void, avr, avr, avr)
-DEF_HELPER_3(vminsd, void, avr, avr, avr)
-DEF_HELPER_3(vmaxsb, void, avr, avr, avr)
-DEF_HELPER_3(vmaxsh, void, avr, avr, avr)
-DEF_HELPER_3(vmaxsw, void, avr, avr, avr)
-DEF_HELPER_3(vmaxsd, void, avr, avr, avr)
-DEF_HELPER_3(vminub, void, avr, avr, avr)
-DEF_HELPER_3(vminuh, void, avr, avr, avr)
-DEF_HELPER_3(vminuw, void, avr, avr, avr)
-DEF_HELPER_3(vminud, void, avr, avr, avr)
-DEF_HELPER_3(vmaxub, void, avr, avr, avr)
-DEF_HELPER_3(vmaxuh, void, avr, avr, avr)
-DEF_HELPER_3(vmaxuw, void, avr, avr, avr)
-DEF_HELPER_3(vmaxud, void, avr, avr, avr)
 DEF_HELPER_4(vcmpequb, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpequh, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpequw, void, env, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 22671c71e5..b9793364fd 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -937,33 +937,6 @@ void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 }
 
-#define VMINMAX_DO(name, compare, element)                              \
-    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
-    {                                                                   \
-        int i;                                                          \
-                                                                        \
-        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            if (a->element[i] compare b->element[i]) {                  \
-                r->element[i] = b->element[i];                          \
-            } else {                                                    \
-                r->element[i] = a->element[i];                          \
-            }                                                           \
-        }                                                               \
-    }
-#define VMINMAX(suffix, element)                \
-    VMINMAX_DO(min##suffix, >, element)         \
-    VMINMAX_DO(max##suffix, <, element)
-VMINMAX(sb, s8)
-VMINMAX(sh, s16)
-VMINMAX(sw, s32)
-VMINMAX(sd, s64)
-VMINMAX(ub, u8)
-VMINMAX(uh, u16)
-VMINMAX(uw, u32)
-VMINMAX(ud, u64)
-#undef VMINMAX_DO
-#undef VMINMAX
-
 void helper_vmladduhm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
     int i;
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index c6a53a9f63..399d18707f 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -412,22 +412,22 @@ GEN_VXFORM_V(vsububm, MO_8, tcg_gen_gvec_sub, 0, 16);
 GEN_VXFORM_V(vsubuhm, MO_16, tcg_gen_gvec_sub, 0, 17);
 GEN_VXFORM_V(vsubuwm, MO_32, tcg_gen_gvec_sub, 0, 18);
 GEN_VXFORM_V(vsubudm, MO_64, tcg_gen_gvec_sub, 0, 19);
-GEN_VXFORM(vmaxub, 1, 0);
-GEN_VXFORM(vmaxuh, 1, 1);
-GEN_VXFORM(vmaxuw, 1, 2);
-GEN_VXFORM(vmaxud, 1, 3);
-GEN_VXFORM(vmaxsb, 1, 4);
-GEN_VXFORM(vmaxsh, 1, 5);
-GEN_VXFORM(vmaxsw, 1, 6);
-GEN_VXFORM(vmaxsd, 1, 7);
-GEN_VXFORM(vminub, 1, 8);
-GEN_VXFORM(vminuh, 1, 9);
-GEN_VXFORM(vminuw, 1, 10);
-GEN_VXFORM(vminud, 1, 11);
-GEN_VXFORM(vminsb, 1, 12);
-GEN_VXFORM(vminsh, 1, 13);
-GEN_VXFORM(vminsw, 1, 14);
-GEN_VXFORM(vminsd, 1, 15);
+GEN_VXFORM_V(vmaxub, MO_8, tcg_gen_gvec_umax, 1, 0);
+GEN_VXFORM_V(vmaxuh, MO_16, tcg_gen_gvec_umax, 1, 1);
+GEN_VXFORM_V(vmaxuw, MO_32, tcg_gen_gvec_umax, 1, 2);
+GEN_VXFORM_V(vmaxud, MO_64, tcg_gen_gvec_umax, 1, 3);
+GEN_VXFORM_V(vmaxsb, MO_8, tcg_gen_gvec_smax, 1, 4);
+GEN_VXFORM_V(vmaxsh, MO_16, tcg_gen_gvec_smax, 1, 5);
+GEN_VXFORM_V(vmaxsw, MO_32, tcg_gen_gvec_smax, 1, 6);
+GEN_VXFORM_V(vmaxsd, MO_64, tcg_gen_gvec_smax, 1, 7);
+GEN_VXFORM_V(vminub, MO_8, tcg_gen_gvec_umin, 1, 8);
+GEN_VXFORM_V(vminuh, MO_16, tcg_gen_gvec_umin, 1, 9);
+GEN_VXFORM_V(vminuw, MO_32, tcg_gen_gvec_umin, 1, 10);
+GEN_VXFORM_V(vminud, MO_64, tcg_gen_gvec_umin, 1, 11);
+GEN_VXFORM_V(vminsb, MO_8, tcg_gen_gvec_smin, 1, 12);
+GEN_VXFORM_V(vminsh, MO_16, tcg_gen_gvec_smin, 1, 13);
+GEN_VXFORM_V(vminsw, MO_32, tcg_gen_gvec_smin, 1, 14);
+GEN_VXFORM_V(vminsd, MO_64, tcg_gen_gvec_smin, 1, 15);
 GEN_VXFORM(vavgub, 1, 16);
 GEN_VXFORM(vabsdub, 1, 16);
 GEN_VXFORM_DUAL(vavgub, PPC_ALTIVEC, PPC_NONE, \
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (33 preceding siblings ...)
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 34/34] target/ppc: convert vmin* and vmax* " Richard Henderson
@ 2018-12-18  9:49 ` Mark Cave-Ayland
  2018-12-18 14:51   ` Mark Cave-Ayland
                     ` (2 more replies)
  2019-01-03 18:31 ` Mark Cave-Ayland
  35 siblings, 3 replies; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-18  9:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 06:38, Richard Henderson wrote:

> This implements some of the things that I talked about with Mark
> this morning / yesterday.  In particular:
> 
> (0) Implement expanders for nand, nor, eqv logical operations.
> 
> (1) Implement saturating arithmetic for the tcg backend.
> 
>     While I had expanders for these, they always went to helpers.
>     It's easy enough to expand byte and half-word operations for x86.
>     Beyond that, 32 and 64-bit operations can be expanded with integers.
> 
> (2) Implement minmax arithmetic for the tcg backend.
> 
>     While I had integral minmax operations, I had not yet added
>     any vector expanders for this.  (The integral stuff came in
>     for atomic minmax.)
> 
> (3) Trivial conversions to minmax for target/arm.
> 
> (4) Patches 11-18 are identical to Mark's.
> 
> (5) Patches 19-25 implement splat and logicals for VMX and VSX.
> 
>     VSX is no more difficult than VMX for these.  It does seem to be
>     just about everything that we can do for VSX at the momement.
> 
> (6) Patches 26-33 implement saturating arithmetic for VMX.
> 
> (7) Patch 34 implements minmax arithmetic for VMX.
> 
> I've tested the new operations via aarch64 guest, as that's the set
> of risu test cases I've got handy.  The rest is untested so far.

Thank you for working on this! I've just given this patchset a spin on my test images
and here's what I found:


- The version of my target/ppc patchset you've used is the one that I posted to the
mailing list which doesn't have the GEN_FLOAT macro fixes, removal of the uint64_t *
cast that you requested, and additional SoBs

I've taken this patchset, replaced my patches with the latest versions, and repushed
to github at https://github.com/mcayland/qemu/tree/ppc-altivec-rth.


- This patchset introduces visual artefacts on-screen for both OS X and OS 9

A quick bisection suggests that there could be 2 separate issues related to the
implementation of splat:

Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black
border to appear around the OS X splash screen
(https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an
overflow/alignment issue.

Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector
operations" causes corruption of the OS X splash screen
(https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be
an endian issue.


Having said that, the results look really promising, and I don't think it will take
too long to resolve any outstanding issues. I will be around on IRC later today if
that helps too.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18  9:49 ` [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Mark Cave-Ayland
@ 2018-12-18 14:51   ` Mark Cave-Ayland
  2018-12-18 15:07     ` Richard Henderson
  2018-12-18 15:05   ` Mark Cave-Ayland
  2019-01-03 14:58   ` Mark Cave-Ayland
  2 siblings, 1 reply; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-18 14:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> A quick bisection suggests that there could be 2 separate issues related to the
> implementation of splat:
> 
> Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black
> border to appear around the OS X splash screen
> (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an
> overflow/alignment issue.

This one appears to be a sign extension issue - if I make use of the same technique
used by the previous helper then this problem goes away. Below is my experimental
diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations":

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index be638cdb1a..6cd25c8dc6 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
 #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
 static void glue(gen_, name)(DisasContext *ctx)                         \
     {                                                                   \
-        int simm;                                                       \
+        int8_t simm;                                                    \
         if (unlikely(!ctx->altivec_enabled)) {                          \
             gen_exception(ctx, POWERPC_EXCP_VPU);                       \
             return;                                                     \
         }                                                               \
-        simm = SIMM5(ctx->opcode);                                      \
+        simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3;                  \
         tcg_op(avr64_offset(rD(ctx->opcode), true), 16, 16, simm);      \
     }


ATB,

Mark.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18  9:49 ` [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Mark Cave-Ayland
  2018-12-18 14:51   ` Mark Cave-Ayland
@ 2018-12-18 15:05   ` Mark Cave-Ayland
  2018-12-18 15:17     ` Richard Henderson
  2019-01-03 14:58   ` Mark Cave-Ayland
  2 siblings, 1 reply; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-18 15:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector
> operations" causes corruption of the OS X splash screen
> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be
> an endian issue.

Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this
patch helps a lot, but something still isn't quite right:
https://www.ilande.co.uk/tmp/qemu/badapple3.png.

Adding some more debugging seems to suggest that boffs is being handled correctly
based upon vece/uimm...

ins: vsplth  bofs before: 304e0
  bofs after: 304e0  uimm: 0  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e2  uimm: 1  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e4  uimm: 2  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e6  uimm: 3  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e0  uimm: 0  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e2  uimm: 1  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e4  uimm: 2  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e6  uimm: 3  vece: 1
ins: vsplth  bofs before: 30560
  bofs after: 3056e  uimm: 7  vece: 1
ins: vsplth  bofs before: 30540
  bofs after: 3054e  uimm: 7  vece: 1
ins: vsplth  bofs before: 30490
  bofs after: 30492  uimm: 1  vece: 1
ins: vspltw  bofs before: 30580
  bofs after: 3058c  uimm: 3  vece: 2
ins: vsplth  bofs before: 30580
  bofs after: 30586  uimm: 3  vece: 1
ins: vspltw  bofs before: 30580
  bofs after: 30580  uimm: 0  vece: 2
ins: vspltb  bofs before: 30560
  bofs after: 30560  uimm: 0  vece: 0
ins: vsplth  bofs before: 304d0
  bofs after: 304d0  uimm: 0  vece: 1
ins: vsplth  bofs before: 304d0
  bofs after: 304d2  uimm: 1  vece: 1
ins: vsplth  bofs before: 304d0
  bofs after: 304d4  uimm: 2  vece: 1
ins: vsplth  bofs before: 304d0
  bofs after: 304d4  uimm: 2  vece: 1


ATB,

Mark.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18 14:51   ` Mark Cave-Ayland
@ 2018-12-18 15:07     ` Richard Henderson
  2018-12-18 15:22       ` Mark Cave-Ayland
  0 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18 15:07 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: qemu-ppc, david

On 12/18/18 6:51 AM, Mark Cave-Ayland wrote:
> On 18/12/2018 09:49, Mark Cave-Ayland wrote:
> 
>> A quick bisection suggests that there could be 2 separate issues related to the
>> implementation of splat:
>>
>> Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black
>> border to appear around the OS X splash screen
>> (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an
>> overflow/alignment issue.
> 
> This one appears to be a sign extension issue - if I make use of the same technique
> used by the previous helper then this problem goes away. Below is my experimental
> diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations":
> 
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index be638cdb1a..6cd25c8dc6 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
>  #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
>  static void glue(gen_, name)(DisasContext *ctx)                         \
>      {                                                                   \
> -        int simm;                                                       \
> +        int8_t simm; 

This shouldn't matter.
                                                   \
>          if (unlikely(!ctx->altivec_enabled)) {                          \
>              gen_exception(ctx, POWERPC_EXCP_VPU);                       \
>              return;                                                     \
>          }                                                               \
> -        simm = SIMM5(ctx->opcode);                                      \
> +        simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3;                  \

This suggests that SIMM5 should be using sextract32.


r~

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18 15:05   ` Mark Cave-Ayland
@ 2018-12-18 15:17     ` Richard Henderson
  2018-12-18 15:26       ` Mark Cave-Ayland
  0 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2018-12-18 15:17 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: qemu-ppc, david

On 12/18/18 7:05 AM, Mark Cave-Ayland wrote:
> On 18/12/2018 09:49, Mark Cave-Ayland wrote:
> 
>> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector
>> operations" causes corruption of the OS X splash screen
>> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be
>> an endian issue.
> 
> Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this
> patch helps a lot, but something still isn't quite right:
> https://www.ilande.co.uk/tmp/qemu/badapple3.png.

I can't figure out what the host+guest endian rules for ppc_avr_t are at all.

Certainly there appear to be bugs wrt vscr and which end of the register we
pull the value.  On the tcg side we take host endianness into account, and on
the gdb side we always use u32[3].


r~

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18 15:07     ` Richard Henderson
@ 2018-12-18 15:22       ` Mark Cave-Ayland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-18 15:22 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 15:07, Richard Henderson wrote:

>> This one appears to be a sign extension issue - if I make use of the same technique
>> used by the previous helper then this problem goes away. Below is my experimental
>> diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations":
>>
>> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
>> index be638cdb1a..6cd25c8dc6 100644
>> --- a/target/ppc/translate/vmx-impl.inc.c
>> +++ b/target/ppc/translate/vmx-impl.inc.c
>> @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
>>  #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
>>  static void glue(gen_, name)(DisasContext *ctx)                         \
>>      {                                                                   \
>> -        int simm;                                                       \
>> +        int8_t simm; 
> 
> This shouldn't matter.
>                                                    \
>>          if (unlikely(!ctx->altivec_enabled)) {                          \
>>              gen_exception(ctx, POWERPC_EXCP_VPU);                       \
>>              return;                                                     \
>>          }                                                               \
>> -        simm = SIMM5(ctx->opcode);                                      \
>> +        simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3;                  \
> 
> This suggests that SIMM5 should be using sextract32.

There's certainly an obvious typo here, but on its own it doesn't fix the issue:

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index b77d564a65..08eee1cd84 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -124,7 +124,7 @@ EXTRACT_SHELPER(SIMM, 0, 16);
 /* 16 bits unsigned immediate value */
 EXTRACT_HELPER(UIMM, 0, 16);
 /* 5 bits signed immediate value */
-EXTRACT_HELPER(SIMM5, 16, 5);
+EXTRACT_SHELPER(SIMM5, 16, 5);
 /* 5 bits signed immediate value */
 EXTRACT_HELPER(UIMM5, 16, 5);
 /* 4 bits unsigned immediate value */


ATB,

Mark.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18 15:17     ` Richard Henderson
@ 2018-12-18 15:26       ` Mark Cave-Ayland
  2018-12-18 16:16         ` Richard Henderson
  0 siblings, 1 reply; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-18 15:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 15:17, Richard Henderson wrote:

> On 12/18/18 7:05 AM, Mark Cave-Ayland wrote:
>> On 18/12/2018 09:49, Mark Cave-Ayland wrote:
>>
>>> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector
>>> operations" causes corruption of the OS X splash screen
>>> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be
>>> an endian issue.
>>
>> Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this
>> patch helps a lot, but something still isn't quite right:
>> https://www.ilande.co.uk/tmp/qemu/badapple3.png.
> 
> I can't figure out what the host+guest endian rules for ppc_avr_t are at all.
> 
> Certainly there appear to be bugs wrt vscr and which end of the register we
> pull the value.  On the tcg side we take host endianness into account, and on
> the gdb side we always use u32[3].

That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect it to be
in host order? Certainly in the VMX helper macros I've looked at, the members are set
directly with no byte swapping.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18 15:26       ` Mark Cave-Ayland
@ 2018-12-18 16:16         ` Richard Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18 16:16 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: qemu-ppc, david

On 12/18/18 7:26 AM, Mark Cave-Ayland wrote:
> That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect it to be
> in host order? Certainly in the VMX helper macros I've looked at, the members are set
> directly with no byte swapping.

"Host order"?  For both words of the vector?

That's certainly going to cause problems wrt VSX and FPU registers.  We're
hard-coding that as fpu == vsx.u64[0] (both before and after your patch set).

For vscr, on master we have

void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)
{
#if defined(HOST_WORDS_BIGENDIAN)
    env->vscr = r->u32[3];
#else
    env->vscr = r->u32[0];
#endif

and

        if (needs_byteswap) {
            vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
            vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
        } else {
            vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
            vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
        }
    }
    vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);

For helper macros that apply the same operation to all lanes, it doesn't matter
which order in which the lanes are processed, so of course I would expect them
to be processed in host order.

It's cases that do not apply the same operation, such as merges, where the
problems would arise.

There are at least 3 schemes being employed to address this:

#if defined(HOST_WORDS_BIGENDIAN)
#define HI_IDX 0
#define LO_IDX 1
#define AVRB(i) u8[i]
#define AVRW(i) u32[i]
#else
#define HI_IDX 1
#define LO_IDX 0
#define AVRB(i) u8[15-(i)]
#define AVRW(i) u32[3-(i)]
#endif

...
#if defined(HOST_WORDS_BIGENDIAN)
#define EL_IDX(i) (i)
#else
#define EL_IDX(i) (3 - (i))
#endif

...
#define EL_IDX(i) (i)
#else
#define EL_IDX(i) (1 - (i))
#endif

...
#if defined(HOST_WORDS_BIGENDIAN)
        result.u8[i] = a->u8[indexA] ^ b->u8[indexB];
#else
        result.u8[i] = a->u8[15-indexA] ^ b->u8[15-indexB];
#endif


r~

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand Richard Henderson
@ 2018-12-19  5:36   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  5:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]

On Mon, Dec 17, 2018 at 10:38:38PM -0800, Richard Henderson wrote:
> We handle many of these during integer expansion, and the
> rest of them during integer optimization.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tcg/tcg-op-gvec.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 61c25f5784..ec231b78fb 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -1840,7 +1840,12 @@ void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs,
>          .opc = INDEX_op_and_vec,
>          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>      };
> -    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
>  }
>  
>  void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs,
> @@ -1853,7 +1858,12 @@ void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs,
>          .opc = INDEX_op_or_vec,
>          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>      };
> -    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
>  }
>  
>  void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs,
> @@ -1866,7 +1876,12 @@ void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs,
>          .opc = INDEX_op_xor_vec,
>          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>      };
> -    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
>  }
>  
>  void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
> @@ -1879,7 +1894,12 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
>          .opc = INDEX_op_andc_vec,
>          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>      };
> -    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
>  }
>  
>  void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
> @@ -1892,7 +1912,12 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
>          .opc = INDEX_op_orc_vec,
>          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
>      };
> -    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
>  }
>  
>  static const GVecGen2s gop_ands = {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 02/34] target/arm: Rely on optimization within tcg_gen_gvec_or
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 02/34] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
@ 2018-12-19  5:37   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  5:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3244 bytes --]

On Mon, Dec 17, 2018 at 10:38:39PM -0800, Richard Henderson wrote:
> Since we're now handling a == b generically, we no longer need
> to do it by hand within target/arm/.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/arm/translate-a64.c |  6 +-----
>  target/arm/translate-sve.c |  6 +-----
>  target/arm/translate.c     | 12 +++---------
>  3 files changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index e1da1e4d6f..2d6f8c1b4f 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -10152,11 +10152,7 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
>          gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_andc, 0);
>          return;
>      case 2: /* ORR */
> -        if (rn == rm) { /* MOV */
> -            gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_mov, 0);
> -        } else {
> -            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
> -        }
> +        gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
>          return;
>      case 3: /* ORN */
>          gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_orc, 0);
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index b15b615ceb..3a2eb51566 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -280,11 +280,7 @@ static bool trans_AND_zzz(DisasContext *s, arg_rrr_esz *a)
>  
>  static bool trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a)
>  {
> -    if (a->rn == a->rm) { /* MOV */
> -        return do_mov_z(s, a->rd, a->rn);
> -    } else {
> -        return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
> -    }
> +    return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
>  }
>  
>  static bool trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a)
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 7c4675ffd8..33b1860148 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -6294,15 +6294,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
>                  tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
>                                    vec_size, vec_size);
>                  break;
> -            case 2:
> -                if (rn == rm) {
> -                    /* VMOV */
> -                    tcg_gen_gvec_mov(0, rd_ofs, rn_ofs, vec_size, vec_size);
> -                } else {
> -                    /* VORR */
> -                    tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
> -                                    vec_size, vec_size);
> -                }
> +            case 2: /* VORR */
> +                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
> +                                vec_size, vec_size);
>                  break;
>              case 3: /* VORN */
>                  tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/34] tcg: Add gvec expanders for nand, nor, eqv
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 03/34] tcg: Add gvec expanders for nand, nor, eqv Richard Henderson
@ 2018-12-19  5:39   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  5:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8090 bytes --]

On Mon, Dec 17, 2018 at 10:38:40PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  accel/tcg/tcg-runtime.h      |  3 +++
>  tcg/tcg-op-gvec.h            |  6 +++++
>  tcg/tcg-op.h                 |  3 +++
>  accel/tcg/tcg-runtime-gvec.c | 33 +++++++++++++++++++++++
>  tcg/tcg-op-gvec.c            | 51 ++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op-vec.c             | 21 +++++++++++++++
>  6 files changed, 117 insertions(+)
> 
> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> index 1bd39d136d..835ddfebb2 100644
> --- a/accel/tcg/tcg-runtime.h
> +++ b/accel/tcg/tcg-runtime.h
> @@ -211,6 +211,9 @@ DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_nand, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_nor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_eqv, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  
>  DEF_HELPER_FLAGS_4(gvec_ands, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>  DEF_HELPER_FLAGS_4(gvec_xors, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
> diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
> index ff43a29a0b..d65b9d9d4c 100644
> --- a/tcg/tcg-op-gvec.h
> +++ b/tcg/tcg-op-gvec.h
> @@ -242,6 +242,12 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
>                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
>  void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
>                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_nand(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_nor(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_eqv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
>  
>  void tcg_gen_gvec_andi(unsigned vece, uint32_t dofs, uint32_t aofs,
>                         int64_t c, uint32_t oprsz, uint32_t maxsz);
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index db4e9188f4..1974bf1cae 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -961,6 +961,9 @@ void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
> +void tcg_gen_nand_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
> +void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
> +void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>  void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>  
> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> index 90340e56e0..d1802467d5 100644
> --- a/accel/tcg/tcg-runtime-gvec.c
> +++ b/accel/tcg/tcg-runtime-gvec.c
> @@ -512,6 +512,39 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc)
>      clear_high(d, oprsz, desc);
>  }
>  
> +void HELPER(gvec_nand)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {
> +        *(vec64 *)(d + i) = ~(*(vec64 *)(a + i) & *(vec64 *)(b + i));
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_nor)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {
> +        *(vec64 *)(d + i) = ~(*(vec64 *)(a + i) | *(vec64 *)(b + i));
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_eqv)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {
> +        *(vec64 *)(d + i) = ~(*(vec64 *)(a + i) ^ *(vec64 *)(b + i));
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
>  void HELPER(gvec_ands)(void *d, void *a, uint64_t b, uint32_t desc)
>  {
>      intptr_t oprsz = simd_oprsz(desc);
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index ec231b78fb..81689d02f7 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -1920,6 +1920,57 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
>      }
>  }
>  
> +void tcg_gen_gvec_nand(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 g = {
> +        .fni8 = tcg_gen_nand_i64,
> +        .fniv = tcg_gen_nand_vec,
> +        .fno = gen_helper_gvec_nand,
> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +    };
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_not(vece, dofs, aofs, oprsz, maxsz);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
> +}
> +
> +void tcg_gen_gvec_nor(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 g = {
> +        .fni8 = tcg_gen_nor_i64,
> +        .fniv = tcg_gen_nor_vec,
> +        .fno = gen_helper_gvec_nor,
> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +    };
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_not(vece, dofs, aofs, oprsz, maxsz);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
> +}
> +
> +void tcg_gen_gvec_eqv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 g = {
> +        .fni8 = tcg_gen_eqv_i64,
> +        .fniv = tcg_gen_eqv_vec,
> +        .fno = gen_helper_gvec_eqv,
> +        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +    };
> +
> +    if (aofs == bofs) {
> +        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
> +    } else {
> +        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
> +    }
> +}
> +
>  static const GVecGen2s gop_ands = {
>      .fni8 = tcg_gen_and_i64,
>      .fniv = tcg_gen_and_vec,
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index cefba3d185..d77fdf7c1d 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -275,6 +275,27 @@ void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
>      }
>  }
>  
> +void tcg_gen_nand_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> +{
> +    /* TODO: Add TCG_TARGET_HAS_nand_vec when adding a backend supports it. */
> +    tcg_gen_and_vec(0, r, a, b);
> +    tcg_gen_not_vec(0, r, r);
> +}
> +
> +void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> +{
> +    /* TODO: Add TCG_TARGET_HAS_nor_vec when adding a backend supports it. */
> +    tcg_gen_or_vec(0, r, a, b);
> +    tcg_gen_not_vec(0, r, r);
> +}
> +
> +void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> +{
> +    /* TODO: Add TCG_TARGET_HAS_eqv_vec when adding a backend supports it. */
> +    tcg_gen_xor_vec(0, r, a, b);
> +    tcg_gen_not_vec(0, r, r);
> +}
> +
>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
>  {
>      if (TCG_TARGET_HAS_not_vec) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Richard Henderson
@ 2018-12-19  6:15   ` David Gibson
  2018-12-19 12:29     ` Mark Cave-Ayland
  0 siblings, 1 reply; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 57721 bytes --]

On Mon, Dec 17, 2018 at 10:38:48PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> These helpers allow us to move FP register values to/from the specified TCGv_i64
> argument in the VSR helpers to be introduced shortly.
> 
> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
> temporaries as required.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Message-Id: <20181217122405.18732-2-mark.cave-ayland@ilande.co.uk>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

Do you want me to take these, or will you take them via your tree?

> ---
>  target/ppc/translate.c             |  10 +
>  target/ppc/translate/fp-impl.inc.c | 490 ++++++++++++++++++++++-------
>  2 files changed, 390 insertions(+), 110 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 2b37910248..1d4bf624a3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6694,6 +6694,16 @@ static inline void gen_##name(DisasContext *ctx)               \
>  GEN_TM_PRIV_NOOP(treclaim);
>  GEN_TM_PRIV_NOOP(trechkpt);
>  
> +static inline void get_fpr(TCGv_i64 dst, int regno)
> +{
> +    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
> +}
> +
> +static inline void set_fpr(int regno, TCGv_i64 src)
> +{
> +    tcg_gen_mov_i64(cpu_fpr[regno], src);
> +}
> +
>  #include "translate/fp-impl.inc.c"
>  
>  #include "translate/vmx-impl.inc.c"
> diff --git a/target/ppc/translate/fp-impl.inc.c b/target/ppc/translate/fp-impl.inc.c
> index 08770ba9f5..04b8733055 100644
> --- a/target/ppc/translate/fp-impl.inc.c
> +++ b/target/ppc/translate/fp-impl.inc.c
> @@ -34,24 +34,38 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>  #define _GEN_FLOAT_ACB(name, op, op1, op2, isfloat, set_fprf, type)           \
>  static void gen_f##name(DisasContext *ctx)                                    \
>  {                                                                             \
> +    TCGv_i64 t0;                                                              \
> +    TCGv_i64 t1;                                                              \
> +    TCGv_i64 t2;                                                              \
> +    TCGv_i64 t3;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
> +    t0 = tcg_temp_new_i64();                                                  \
> +    t1 = tcg_temp_new_i64();                                                  \
> +    t2 = tcg_temp_new_i64();                                                  \
> +    t3 = tcg_temp_new_i64();                                                  \
>      gen_reset_fpstatus();                                                     \
> -    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
> -                     cpu_fpr[rA(ctx->opcode)],                                \
> -                     cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);     \
> +    get_fpr(t0, rA(ctx->opcode));                                             \
> +    get_fpr(t1, rC(ctx->opcode));                                             \
> +    get_fpr(t2, rB(ctx->opcode));                                             \
> +    gen_helper_f##op(t3, cpu_env, t0, t1, t2);                                \
>      if (isfloat) {                                                            \
> -        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
> -                        cpu_fpr[rD(ctx->opcode)]);                            \
> +        get_fpr(t0, rD(ctx->opcode));                                         \
> +        gen_helper_frsp(t3, cpu_env, t0);                                     \
>      }                                                                         \
> +    set_fpr(rD(ctx->opcode), t3);                                             \
>      if (set_fprf) {                                                           \
> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
> +        gen_compute_fprf_float64(t3);                                         \
>      }                                                                         \
>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>          gen_set_cr1_from_fpscr(ctx);                                          \
>      }                                                                         \
> +    tcg_temp_free_i64(t0);                                                    \
> +    tcg_temp_free_i64(t1);                                                    \
> +    tcg_temp_free_i64(t2);                                                    \
> +    tcg_temp_free_i64(t3);                                                    \
>  }
>  
>  #define GEN_FLOAT_ACB(name, op2, set_fprf, type)                              \
> @@ -61,24 +75,34 @@ _GEN_FLOAT_ACB(name##s, name, 0x3B, op2, 1, set_fprf, type);
>  #define _GEN_FLOAT_AB(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
>  static void gen_f##name(DisasContext *ctx)                                    \
>  {                                                                             \
> +    TCGv_i64 t0;                                                              \
> +    TCGv_i64 t1;                                                              \
> +    TCGv_i64 t2;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
> +    t0 = tcg_temp_new_i64();                                                  \
> +    t1 = tcg_temp_new_i64();                                                  \
> +    t2 = tcg_temp_new_i64();                                                  \
>      gen_reset_fpstatus();                                                     \
> -    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
> -                     cpu_fpr[rA(ctx->opcode)],                                \
> -                     cpu_fpr[rB(ctx->opcode)]);                               \
> +    get_fpr(t0, rA(ctx->opcode));                                             \
> +    get_fpr(t1, rB(ctx->opcode));                                             \
> +    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
>      if (isfloat) {                                                            \
> -        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
> -                        cpu_fpr[rD(ctx->opcode)]);                            \
> +        get_fpr(t0, rD(ctx->opcode));                                         \
> +        gen_helper_frsp(t2, cpu_env, t0);                                     \
>      }                                                                         \
> +    set_fpr(rD(ctx->opcode), t2);                                             \
>      if (set_fprf) {                                                           \
> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
> +        gen_compute_fprf_float64(t2);                                         \
>      }                                                                         \
>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>          gen_set_cr1_from_fpscr(ctx);                                          \
>      }                                                                         \
> +    tcg_temp_free_i64(t0);                                                    \
> +    tcg_temp_free_i64(t1);                                                    \
> +    tcg_temp_free_i64(t2);                                                    \
>  }
>  #define GEN_FLOAT_AB(name, op2, inval, set_fprf, type)                        \
>  _GEN_FLOAT_AB(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
> @@ -87,24 +111,35 @@ _GEN_FLOAT_AB(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
>  #define _GEN_FLOAT_AC(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
>  static void gen_f##name(DisasContext *ctx)                                    \
>  {                                                                             \
> +    TCGv_i64 t0;                                                              \
> +    TCGv_i64 t1;                                                              \
> +    TCGv_i64 t2;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
> +    t0 = tcg_temp_new_i64();                                                  \
> +    t1 = tcg_temp_new_i64();                                                  \
> +    t2 = tcg_temp_new_i64();                                                  \
>      gen_reset_fpstatus();                                                     \
> -    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
> -                     cpu_fpr[rA(ctx->opcode)],                                \
> -                     cpu_fpr[rC(ctx->opcode)]);                               \
> +    get_fpr(t0, rA(ctx->opcode));                                             \
> +    get_fpr(t1, rC(ctx->opcode));                                             \
> +    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
> +    set_fpr(rD(ctx->opcode), t2);                                             \
>      if (isfloat) {                                                            \
> -        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
> -                        cpu_fpr[rD(ctx->opcode)]);                            \
> +        get_fpr(t0, rD(ctx->opcode));                                         \
> +        gen_helper_frsp(t2, cpu_env, t0);                                     \
> +        set_fpr(rD(ctx->opcode), t2);                                         \
>      }                                                                         \
>      if (set_fprf) {                                                           \
> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
> +        gen_compute_fprf_float64(t2);                                         \
>      }                                                                         \
>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>          gen_set_cr1_from_fpscr(ctx);                                          \
>      }                                                                         \
> +    tcg_temp_free_i64(t0);                                                    \
> +    tcg_temp_free_i64(t1);                                                    \
> +    tcg_temp_free_i64(t2);                                                    \
>  }
>  #define GEN_FLOAT_AC(name, op2, inval, set_fprf, type)                        \
>  _GEN_FLOAT_AC(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
> @@ -113,37 +148,51 @@ _GEN_FLOAT_AC(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
>  #define GEN_FLOAT_B(name, op2, op3, set_fprf, type)                           \
>  static void gen_f##name(DisasContext *ctx)                                    \
>  {                                                                             \
> +    TCGv_i64 t0;                                                              \
> +    TCGv_i64 t1;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
> +    t0 = tcg_temp_new_i64();                                                  \
> +    t1 = tcg_temp_new_i64();                                                  \
>      gen_reset_fpstatus();                                                     \
> -    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
> -                       cpu_fpr[rB(ctx->opcode)]);                             \
> +    get_fpr(t0, rB(ctx->opcode));                                             \
> +    gen_helper_f##name(t1, cpu_env, t0);                                      \
> +    set_fpr(rD(ctx->opcode), t1);                                             \
>      if (set_fprf) {                                                           \
> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
> +        gen_compute_fprf_float64(t1);                                         \
>      }                                                                         \
>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>          gen_set_cr1_from_fpscr(ctx);                                          \
>      }                                                                         \
> +    tcg_temp_free_i64(t0);                                                    \
> +    tcg_temp_free_i64(t1);                                                    \
>  }
>  
>  #define GEN_FLOAT_BS(name, op1, op2, set_fprf, type)                          \
>  static void gen_f##name(DisasContext *ctx)                                    \
>  {                                                                             \
> +    TCGv_i64 t0;                                                              \
> +    TCGv_i64 t1;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
> +    t0 = tcg_temp_new_i64();                                                  \
> +    t1 = tcg_temp_new_i64();                                                  \
>      gen_reset_fpstatus();                                                     \
> -    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
> -                       cpu_fpr[rB(ctx->opcode)]);                             \
> +    get_fpr(t0, rB(ctx->opcode));                                             \
> +    gen_helper_f##name(t1, cpu_env, t0);                                      \
> +    set_fpr(rD(ctx->opcode), t1);                                             \
>      if (set_fprf) {                                                           \
> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
> +        gen_compute_fprf_float64(t1);                                         \
>      }                                                                         \
>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>          gen_set_cr1_from_fpscr(ctx);                                          \
>      }                                                                         \
> +    tcg_temp_free_i64(t0);                                                    \
> +    tcg_temp_free_i64(t1);                                                    \
>  }
>  
>  /* fadd - fadds */
> @@ -165,19 +214,25 @@ GEN_FLOAT_BS(rsqrte, 0x3F, 0x1A, 1, PPC_FLOAT_FRSQRTE);
>  /* frsqrtes */
>  static void gen_frsqrtes(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      gen_reset_fpstatus();
> -    gen_helper_frsqrte(cpu_fpr[rD(ctx->opcode)], cpu_env,
> -                       cpu_fpr[rB(ctx->opcode)]);
> -    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
> -                    cpu_fpr[rD(ctx->opcode)]);
> -    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
> +    get_fpr(t0, rB(ctx->opcode));
> +    gen_helper_frsqrte(t1, cpu_env, t0);
> +    gen_helper_frsp(t1, cpu_env, t1);
> +    set_fpr(rD(ctx->opcode), t1);
> +    gen_compute_fprf_float64(t1);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* fsel */
> @@ -189,34 +244,47 @@ GEN_FLOAT_AB(sub, 0x14, 0x000007C0, 1, PPC_FLOAT);
>  /* fsqrt */
>  static void gen_fsqrt(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      gen_reset_fpstatus();
> -    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
> -                     cpu_fpr[rB(ctx->opcode)]);
> -    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
> +    get_fpr(t0, rB(ctx->opcode));
> +    gen_helper_fsqrt(t1, cpu_env, t0);
> +    set_fpr(rD(ctx->opcode), t1);
> +    gen_compute_fprf_float64(t1);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  static void gen_fsqrts(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      gen_reset_fpstatus();
> -    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
> -                     cpu_fpr[rB(ctx->opcode)]);
> -    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
> -                    cpu_fpr[rD(ctx->opcode)]);
> -    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
> +    get_fpr(t0, rB(ctx->opcode));
> +    gen_helper_fsqrt(t1, cpu_env, t0);
> +    gen_helper_frsp(t1, cpu_env, t1);
> +    set_fpr(rD(ctx->opcode), t1);
> +    gen_compute_fprf_float64(t1);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /***                     Floating-Point multiply-and-add                   ***/
> @@ -268,21 +336,32 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
>  
>  static void gen_ftdiv(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
> -                     cpu_fpr[rB(ctx->opcode)]);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    get_fpr(t0, rA(ctx->opcode));
> +    get_fpr(t1, rB(ctx->opcode));
> +    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], t0, t1);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  static void gen_ftsqrt(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
> +    t0 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], t0);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  
> @@ -293,32 +372,46 @@ static void gen_ftsqrt(DisasContext *ctx)
>  static void gen_fcmpo(DisasContext *ctx)
>  {
>      TCGv_i32 crf;
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      gen_reset_fpstatus();
>      crf = tcg_const_i32(crfD(ctx->opcode));
> -    gen_helper_fcmpo(cpu_env, cpu_fpr[rA(ctx->opcode)],
> -                     cpu_fpr[rB(ctx->opcode)], crf);
> +    get_fpr(t0, rA(ctx->opcode));
> +    get_fpr(t1, rB(ctx->opcode));
> +    gen_helper_fcmpo(cpu_env, t0, t1, crf);
>      tcg_temp_free_i32(crf);
>      gen_helper_float_check_status(cpu_env);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* fcmpu */
>  static void gen_fcmpu(DisasContext *ctx)
>  {
>      TCGv_i32 crf;
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      gen_reset_fpstatus();
>      crf = tcg_const_i32(crfD(ctx->opcode));
> -    gen_helper_fcmpu(cpu_env, cpu_fpr[rA(ctx->opcode)],
> -                     cpu_fpr[rB(ctx->opcode)], crf);
> +    get_fpr(t0, rA(ctx->opcode));
> +    get_fpr(t1, rB(ctx->opcode));
> +    gen_helper_fcmpu(cpu_env, t0, t1, crf);
>      tcg_temp_free_i32(crf);
>      gen_helper_float_check_status(cpu_env);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /***                         Floating-point move                           ***/
> @@ -326,100 +419,153 @@ static void gen_fcmpu(DisasContext *ctx)
>  /* XXX: beware that fabs never checks for NaNs nor update FPSCR */
>  static void gen_fabs(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    tcg_gen_andi_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
> -                     ~(1ULL << 63));
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    tcg_gen_andi_i64(t1, t0, ~(1ULL << 63));
> +    set_fpr(rD(ctx->opcode), t1);
>      if (unlikely(Rc(ctx->opcode))) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* fmr  - fmr. */
>  /* XXX: beware that fmr never checks for NaNs nor update FPSCR */
>  static void gen_fmr(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    tcg_gen_mov_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
> +    t0 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    set_fpr(rD(ctx->opcode), t0);
>      if (unlikely(Rc(ctx->opcode))) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* fnabs */
>  /* XXX: beware that fnabs never checks for NaNs nor update FPSCR */
>  static void gen_fnabs(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    tcg_gen_ori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
> -                    1ULL << 63);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    tcg_gen_ori_i64(t1, t0, 1ULL << 63);
> +    set_fpr(rD(ctx->opcode), t1);
>      if (unlikely(Rc(ctx->opcode))) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* fneg */
>  /* XXX: beware that fneg never checks for NaNs nor update FPSCR */
>  static void gen_fneg(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    tcg_gen_xori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
> -                     1ULL << 63);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    tcg_gen_xori_i64(t1, t0, 1ULL << 63);
> +    set_fpr(rD(ctx->opcode), t1);
>      if (unlikely(Rc(ctx->opcode))) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* fcpsgn: PowerPC 2.05 specification */
>  /* XXX: beware that fcpsgn never checks for NaNs nor update FPSCR */
>  static void gen_fcpsgn(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
> +    TCGv_i64 t2;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
> -                        cpu_fpr[rB(ctx->opcode)], 0, 63);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    t2 = tcg_temp_new_i64();
> +    get_fpr(t0, rA(ctx->opcode));
> +    get_fpr(t1, rB(ctx->opcode));
> +    tcg_gen_deposit_i64(t2, t0, t1, 0, 63);
> +    set_fpr(rD(ctx->opcode), t2);
>      if (unlikely(Rc(ctx->opcode))) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>  }
>  
>  static void gen_fmrgew(DisasContext *ctx)
>  {
>      TCGv_i64 b0;
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
>      b0 = tcg_temp_new_i64();
> -    tcg_gen_shri_i64(b0, cpu_fpr[rB(ctx->opcode)], 32);
> -    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
> -                        b0, 0, 32);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    tcg_gen_shri_i64(b0, t0, 32);
> +    get_fpr(t0, rA(ctx->opcode));
> +    tcg_gen_deposit_i64(t1, t0, b0, 0, 32);
> +    set_fpr(rD(ctx->opcode), t1);
>      tcg_temp_free_i64(b0);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  static void gen_fmrgow(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
> +    TCGv_i64 t2;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> -    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)],
> -                        cpu_fpr[rB(ctx->opcode)],
> -                        cpu_fpr[rA(ctx->opcode)],
> -                        32, 32);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    t2 = tcg_temp_new_i64();
> +    get_fpr(t0, rB(ctx->opcode));
> +    get_fpr(t1, rA(ctx->opcode));
> +    tcg_gen_deposit_i64(t2, t0, t1, 32, 32);
> +    set_fpr(rD(ctx->opcode), t2);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
>  }
>  
>  /***                  Floating-Point status & ctrl register                ***/
> @@ -458,15 +604,19 @@ static void gen_mcrfs(DisasContext *ctx)
>  /* mffs */
>  static void gen_mffs(DisasContext *ctx)
>  {
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
>      gen_reset_fpstatus();
> -    tcg_gen_extu_tl_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpscr);
> +    tcg_gen_extu_tl_i64(t0, cpu_fpscr);
> +    set_fpr(rD(ctx->opcode), t0);
>      if (unlikely(Rc(ctx->opcode))) {
>          gen_set_cr1_from_fpscr(ctx);
>      }
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* mtfsb0 */
> @@ -522,6 +672,7 @@ static void gen_mtfsb1(DisasContext *ctx)
>  static void gen_mtfsf(DisasContext *ctx)
>  {
>      TCGv_i32 t0;
> +    TCGv_i64 t1;
>      int flm, l, w;
>  
>      if (unlikely(!ctx->fpu_enabled)) {
> @@ -541,7 +692,9 @@ static void gen_mtfsf(DisasContext *ctx)
>      } else {
>          t0 = tcg_const_i32(flm << (w * 8));
>      }
> -    gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
> +    t1 = tcg_temp_new_i64();
> +    get_fpr(t1, rB(ctx->opcode));
> +    gen_helper_store_fpscr(cpu_env, t1, t0);
>      tcg_temp_free_i32(t0);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
>          tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
> @@ -549,6 +702,7 @@ static void gen_mtfsf(DisasContext *ctx)
>      }
>      /* We can raise a differed exception */
>      gen_helper_float_check_status(cpu_env);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* mtfsfi */
> @@ -588,21 +742,26 @@ static void gen_mtfsfi(DisasContext *ctx)
>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_imm_index(ctx, EA, 0);                                           \
> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
> +    set_fpr(rD(ctx->opcode), t0);                                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_LDUF(name, ldop, opc, type)                                       \
>  static void glue(gen_, name##u)(DisasContext *ctx)                                    \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
> @@ -613,20 +772,25 @@ static void glue(gen_, name##u)(DisasContext *ctx)
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_imm_index(ctx, EA, 0);                                           \
> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
> +    set_fpr(rD(ctx->opcode), t0);                                             \
>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_LDUXF(name, ldop, opc, type)                                      \
>  static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
> +    t0 = tcg_temp_new_i64();                                                  \
>      if (unlikely(rA(ctx->opcode) == 0)) {                                     \
>          gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);                   \
>          return;                                                               \
> @@ -634,24 +798,30 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
>      gen_addr_reg_index(ctx, EA);                                              \
> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
> +    set_fpr(rD(ctx->opcode), t0);                                             \
>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_LDXF(name, ldop, opc2, opc3, type)                                \
>  static void glue(gen_, name##x)(DisasContext *ctx)                                    \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_reg_index(ctx, EA);                                              \
> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
> +    set_fpr(rD(ctx->opcode), t0);                                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_LDFS(name, ldop, op, type)                                        \
> @@ -677,6 +847,7 @@ GEN_LDFS(lfs, ld32fs, 0x10, PPC_FLOAT);
>  static void gen_lfdepx(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      CHK_SV;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
> @@ -684,16 +855,19 @@ static void gen_lfdepx(DisasContext *ctx)
>      }
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
> +    t0 = tcg_temp_new_i64();
>      gen_addr_reg_index(ctx, EA);
> -    tcg_gen_qemu_ld_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_LOAD,
> -        DEF_MEMOP(MO_Q));
> +    tcg_gen_qemu_ld_i64(t0, EA, PPC_TLB_EPID_LOAD, DEF_MEMOP(MO_Q));
> +    set_fpr(rD(ctx->opcode), t0);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* lfdp */
>  static void gen_lfdp(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
> @@ -701,24 +875,31 @@ static void gen_lfdp(DisasContext *ctx)
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
>      gen_addr_imm_index(ctx, EA, 0);
> +    t0 = tcg_temp_new_i64();
>      /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
>         necessary 64-bit byteswap already. */
>      if (unlikely(ctx->le_mode)) {
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode) + 1, t0);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode), t0);
>      } else {
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode), t0);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode) + 1, t0);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* lfdpx */
>  static void gen_lfdpx(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
> @@ -726,18 +907,24 @@ static void gen_lfdpx(DisasContext *ctx)
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> +    t0 = tcg_temp_new_i64();
>      /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
>         necessary 64-bit byteswap already. */
>      if (unlikely(ctx->le_mode)) {
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode) + 1, t0);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode), t0);
>      } else {
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode), t0);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        gen_qemu_ld64_i64(ctx, t0, EA);
> +        set_fpr(rD(ctx->opcode) + 1, t0);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* lfiwax */
> @@ -745,6 +932,7 @@ static void gen_lfiwax(DisasContext *ctx)
>  {
>      TCGv EA;
>      TCGv t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
> @@ -752,47 +940,59 @@ static void gen_lfiwax(DisasContext *ctx)
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
>      t0 = tcg_temp_new();
> +    t1 = tcg_temp_new_i64();
>      gen_addr_reg_index(ctx, EA);
>      gen_qemu_ld32s(ctx, t0, EA);
> -    tcg_gen_ext_tl_i64(cpu_fpr[rD(ctx->opcode)], t0);
> +    tcg_gen_ext_tl_i64(t1, t0);
> +    set_fpr(rD(ctx->opcode), t1);
>      tcg_temp_free(EA);
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* lfiwzx */
>  static void gen_lfiwzx(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
> +    t0 = tcg_temp_new_i64();
>      gen_addr_reg_index(ctx, EA);
> -    gen_qemu_ld32u_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +    gen_qemu_ld32u_i64(ctx, t0, EA);
> +    set_fpr(rD(ctx->opcode), t0);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  /***                         Floating-point store                          ***/
>  #define GEN_STF(name, stop, opc, type)                                        \
>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_imm_index(ctx, EA, 0);                                           \
> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
> +    get_fpr(t0, rS(ctx->opcode));                                             \
> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_STUF(name, stop, opc, type)                                       \
>  static void glue(gen_, name##u)(DisasContext *ctx)                                    \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
> @@ -803,16 +1003,20 @@ static void glue(gen_, name##u)(DisasContext *ctx)
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_imm_index(ctx, EA, 0);                                           \
> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
> +    get_fpr(t0, rS(ctx->opcode));                                             \
> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_STUXF(name, stop, opc, type)                                      \
>  static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
> @@ -823,25 +1027,32 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_reg_index(ctx, EA);                                              \
> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
> +    get_fpr(t0, rS(ctx->opcode));                                             \
> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_STXF(name, stop, opc2, opc3, type)                                \
>  static void glue(gen_, name##x)(DisasContext *ctx)                                    \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 t0;                                                              \
>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>          return;                                                               \
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>      EA = tcg_temp_new();                                                      \
> +    t0 = tcg_temp_new_i64();                                                  \
>      gen_addr_reg_index(ctx, EA);                                              \
> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
> +    get_fpr(t0, rS(ctx->opcode));                                             \
> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(t0);                                                    \
>  }
>  
>  #define GEN_STFS(name, stop, op, type)                                        \
> @@ -867,6 +1078,7 @@ GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT);
>  static void gen_stfdepx(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      CHK_SV;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
> @@ -874,60 +1086,76 @@ static void gen_stfdepx(DisasContext *ctx)
>      }
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
> +    t0 = tcg_temp_new_i64();
>      gen_addr_reg_index(ctx, EA);
> -    tcg_gen_qemu_st_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_STORE,
> -                       DEF_MEMOP(MO_Q));
> +    get_fpr(t0, rD(ctx->opcode));
> +    tcg_gen_qemu_st_i64(t0, EA, PPC_TLB_EPID_STORE, DEF_MEMOP(MO_Q));
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* stfdp */
>  static void gen_stfdp(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
> +    t0 = tcg_temp_new_i64();
>      gen_addr_imm_index(ctx, EA, 0);
>      /* We only need to swap high and low halves. gen_qemu_st64_i64 does
>         necessary 64-bit byteswap already. */
>      if (unlikely(ctx->le_mode)) {
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        get_fpr(t0, rD(ctx->opcode) + 1);
> +        gen_qemu_st64_i64(ctx, t0, EA);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        get_fpr(t0, rD(ctx->opcode));
> +        gen_qemu_st64_i64(ctx, t0, EA);
>      } else {
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        get_fpr(t0, rD(ctx->opcode));
> +        gen_qemu_st64_i64(ctx, t0, EA);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        get_fpr(t0, rD(ctx->opcode) + 1);
> +        gen_qemu_st64_i64(ctx, t0, EA);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* stfdpx */
>  static void gen_stfdpx(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->fpu_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_FPU);
>          return;
>      }
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      EA = tcg_temp_new();
> +    t0 = tcg_temp_new_i64();
>      gen_addr_reg_index(ctx, EA);
>      /* We only need to swap high and low halves. gen_qemu_st64_i64 does
>         necessary 64-bit byteswap already. */
>      if (unlikely(ctx->le_mode)) {
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        get_fpr(t0, rD(ctx->opcode) + 1);
> +        gen_qemu_st64_i64(ctx, t0, EA);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        get_fpr(t0, rD(ctx->opcode));
> +        gen_qemu_st64_i64(ctx, t0, EA);
>      } else {
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
> +        get_fpr(t0, rD(ctx->opcode));
> +        gen_qemu_st64_i64(ctx, t0, EA);
>          tcg_gen_addi_tl(EA, EA, 8);
> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
> +        get_fpr(t0, rD(ctx->opcode) + 1);
> +        gen_qemu_st64_i64(ctx, t0, EA);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  /* Optional: */
> @@ -949,13 +1177,18 @@ static void gen_lfq(DisasContext *ctx)
>  {
>      int rd = rD(ctx->opcode);
>      TCGv t0;
> +    TCGv_i64 t1;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      t0 = tcg_temp_new();
> +    t1 = tcg_temp_new_i64();
>      gen_addr_imm_index(ctx, t0, 0);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
> +    gen_qemu_ld64_i64(ctx, t1, t0);
> +    set_fpr(rd, t1);
>      gen_addr_add(ctx, t0, t0, 8);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
> +    gen_qemu_ld64_i64(ctx, t1, t0);
> +    set_fpr((rd + 1) % 32, t1);
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* lfqu */
> @@ -964,17 +1197,22 @@ static void gen_lfqu(DisasContext *ctx)
>      int ra = rA(ctx->opcode);
>      int rd = rD(ctx->opcode);
>      TCGv t0, t1;
> +    TCGv_i64 t2;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      t0 = tcg_temp_new();
>      t1 = tcg_temp_new();
> +    t2 = tcg_temp_new_i64();
>      gen_addr_imm_index(ctx, t0, 0);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
> +    gen_qemu_ld64_i64(ctx, t2, t0);
> +    set_fpr(rd, t2);
>      gen_addr_add(ctx, t1, t0, 8);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
> +    gen_qemu_ld64_i64(ctx, t2, t1);
> +    set_fpr((rd + 1) % 32, t2);
>      if (ra != 0)
>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> +    tcg_temp_free_i64(t2);
>  }
>  
>  /* lfqux */
> @@ -984,16 +1222,21 @@ static void gen_lfqux(DisasContext *ctx)
>      int rd = rD(ctx->opcode);
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      TCGv t0, t1;
> +    TCGv_i64 t2;
> +    t2 = tcg_temp_new_i64();
>      t0 = tcg_temp_new();
>      gen_addr_reg_index(ctx, t0);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
> +    gen_qemu_ld64_i64(ctx, t2, t0);
> +    set_fpr(rd, t2);
>      t1 = tcg_temp_new();
>      gen_addr_add(ctx, t1, t0, 8);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
> +    gen_qemu_ld64_i64(ctx, t2, t1);
> +    set_fpr((rd + 1) % 32, t2);
>      tcg_temp_free(t1);
>      if (ra != 0)
>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t2);
>  }
>  
>  /* lfqx */
> @@ -1001,13 +1244,18 @@ static void gen_lfqx(DisasContext *ctx)
>  {
>      int rd = rD(ctx->opcode);
>      TCGv t0;
> +    TCGv_i64 t1;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      t0 = tcg_temp_new();
> +    t1 = tcg_temp_new_i64();
>      gen_addr_reg_index(ctx, t0);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
> +    gen_qemu_ld64_i64(ctx, t1, t0);
> +    set_fpr(rd, t1);
>      gen_addr_add(ctx, t0, t0, 8);
> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
> +    gen_qemu_ld64_i64(ctx, t1, t0);
> +    set_fpr((rd + 1) % 32, t1);
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* stfq */
> @@ -1015,13 +1263,18 @@ static void gen_stfq(DisasContext *ctx)
>  {
>      int rd = rD(ctx->opcode);
>      TCGv t0;
> +    TCGv_i64 t1;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
>      t0 = tcg_temp_new();
> +    t1 = tcg_temp_new_i64();
>      gen_addr_imm_index(ctx, t0, 0);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
> +    get_fpr(t1, rd);
> +    gen_qemu_st64_i64(ctx, t1, t0);
>      gen_addr_add(ctx, t0, t0, 8);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
> +    get_fpr(t1, (rd + 1) % 32);
> +    gen_qemu_st64_i64(ctx, t1, t0);
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  /* stfqu */
> @@ -1030,17 +1283,23 @@ static void gen_stfqu(DisasContext *ctx)
>      int ra = rA(ctx->opcode);
>      int rd = rD(ctx->opcode);
>      TCGv t0, t1;
> +    TCGv_i64 t2;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
> +    t2 = tcg_temp_new_i64();
>      t0 = tcg_temp_new();
>      gen_addr_imm_index(ctx, t0, 0);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
> +    get_fpr(t2, rd);
> +    gen_qemu_st64_i64(ctx, t2, t0);
>      t1 = tcg_temp_new();
>      gen_addr_add(ctx, t1, t0, 8);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
> +    get_fpr(t2, (rd + 1) % 32);
> +    gen_qemu_st64_i64(ctx, t2, t1);
>      tcg_temp_free(t1);
> -    if (ra != 0)
> +    if (ra != 0) {
>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
> +    }
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t2);
>  }
>  
>  /* stfqux */
> @@ -1049,17 +1308,23 @@ static void gen_stfqux(DisasContext *ctx)
>      int ra = rA(ctx->opcode);
>      int rd = rD(ctx->opcode);
>      TCGv t0, t1;
> +    TCGv_i64 t2;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
> +    t2 = tcg_temp_new_i64();
>      t0 = tcg_temp_new();
>      gen_addr_reg_index(ctx, t0);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
> +    get_fpr(t2, rd);
> +    gen_qemu_st64_i64(ctx, t2, t0);
>      t1 = tcg_temp_new();
>      gen_addr_add(ctx, t1, t0, 8);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
> +    get_fpr(t2, (rd + 1) % 32);
> +    gen_qemu_st64_i64(ctx, t2, t1);
>      tcg_temp_free(t1);
> -    if (ra != 0)
> +    if (ra != 0) {
>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
> +    }
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t2);
>  }
>  
>  /* stfqx */
> @@ -1067,13 +1332,18 @@ static void gen_stfqx(DisasContext *ctx)
>  {
>      int rd = rD(ctx->opcode);
>      TCGv t0;
> +    TCGv_i64 t1;
>      gen_set_access_type(ctx, ACCESS_FLOAT);
> +    t1 = tcg_temp_new_i64();
>      t0 = tcg_temp_new();
>      gen_addr_reg_index(ctx, t0);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
> +    get_fpr(t1, rd);
> +    gen_qemu_st64_i64(ctx, t1, t0);
>      gen_addr_add(ctx, t0, t0, 8);
> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
> +    get_fpr(t1, (rd + 1) % 32);
> +    gen_qemu_st64_i64(ctx, t1, t0);
>      tcg_temp_free(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  #undef _GEN_FLOAT_ACB

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX register access
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Richard Henderson
@ 2018-12-19  6:15   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 17410 bytes --]

On Mon, Dec 17, 2018 at 10:38:49PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> These helpers allow us to move AVR register values to/from the specified TCGv_i64
> argument.
> 
> To prevent VMX helpers accessing the cpu_avr{l,h} arrays directly, add extra TCG
> temporaries as required.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-3-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate.c              |  10 +++
>  target/ppc/translate/vmx-impl.inc.c | 128 ++++++++++++++++++++++------
>  2 files changed, 110 insertions(+), 28 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 1d4bf624a3..fa3e8dc114 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6704,6 +6704,16 @@ static inline void set_fpr(int regno, TCGv_i64 src)
>      tcg_gen_mov_i64(cpu_fpr[regno], src);
>  }
>  
> +static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
> +{
> +    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
> +}
> +
> +static inline void set_avr64(int regno, TCGv_i64 src, bool high)
> +{
> +    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
> +}
> +
>  #include "translate/fp-impl.inc.c"
>  
>  #include "translate/vmx-impl.inc.c"
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index 3cb6fc2926..30046c6e31 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -18,52 +18,66 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 avr;                                                             \
>      if (unlikely(!ctx->altivec_enabled)) {                                    \
>          gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
>          return;                                                               \
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_INT);                                     \
> +    avr = tcg_temp_new_i64();                                                 \
>      EA = tcg_temp_new();                                                      \
>      gen_addr_reg_index(ctx, EA);                                              \
>      tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
>      /* We only need to swap high and low halves. gen_qemu_ld64_i64 does       \
>         necessary 64-bit byteswap already. */                                  \
>      if (ctx->le_mode) {                                                       \
> -        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
> +        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
> +        set_avr64(rD(ctx->opcode), avr, false);                               \
>          tcg_gen_addi_tl(EA, EA, 8);                                           \
> -        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
> +        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
> +        set_avr64(rD(ctx->opcode), avr, true);                                \
>      } else {                                                                  \
> -        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
> +        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
> +        set_avr64(rD(ctx->opcode), avr, true);                                \
>          tcg_gen_addi_tl(EA, EA, 8);                                           \
> -        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
> +        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
> +        set_avr64(rD(ctx->opcode), avr, false);                               \
>      }                                                                         \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(avr);                                                   \
>  }
>  
>  #define GEN_VR_STX(name, opc2, opc3)                                          \
>  static void gen_st##name(DisasContext *ctx)                                   \
>  {                                                                             \
>      TCGv EA;                                                                  \
> +    TCGv_i64 avr;                                                             \
>      if (unlikely(!ctx->altivec_enabled)) {                                    \
>          gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
>          return;                                                               \
>      }                                                                         \
>      gen_set_access_type(ctx, ACCESS_INT);                                     \
> +    avr = tcg_temp_new_i64();                                                 \
>      EA = tcg_temp_new();                                                      \
>      gen_addr_reg_index(ctx, EA);                                              \
>      tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
>      /* We only need to swap high and low halves. gen_qemu_st64_i64 does       \
>         necessary 64-bit byteswap already. */                                  \
>      if (ctx->le_mode) {                                                       \
> -        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
> +        get_avr64(avr, rD(ctx->opcode), false);                               \
> +        gen_qemu_st64_i64(ctx, avr, EA);                                      \
>          tcg_gen_addi_tl(EA, EA, 8);                                           \
> -        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
> +        get_avr64(avr, rD(ctx->opcode), true);                                \
> +        gen_qemu_st64_i64(ctx, avr, EA);                                      \
>      } else {                                                                  \
> -        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
> +        get_avr64(avr, rD(ctx->opcode), true);                                \
> +        gen_qemu_st64_i64(ctx, avr, EA);                                      \
>          tcg_gen_addi_tl(EA, EA, 8);                                           \
> -        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
> +        get_avr64(avr, rD(ctx->opcode), false);                               \
> +        gen_qemu_st64_i64(ctx, avr, EA);                                      \
>      }                                                                         \
>      tcg_temp_free(EA);                                                        \
> +    tcg_temp_free_i64(avr);                                                   \
>  }
>  
>  #define GEN_VR_LVE(name, opc2, opc3, size)                              \
> @@ -159,15 +173,20 @@ static void gen_lvsr(DisasContext *ctx)
>  static void gen_mfvscr(DisasContext *ctx)
>  {
>      TCGv_i32 t;
> +    TCGv_i64 avr;
>      if (unlikely(!ctx->altivec_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VPU);
>          return;
>      }
> -    tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);
> +    avr = tcg_temp_new_i64();
> +    tcg_gen_movi_i64(avr, 0);
> +    set_avr64(rD(ctx->opcode), avr, true);
>      t = tcg_temp_new_i32();
>      tcg_gen_ld_i32(t, cpu_env, offsetof(CPUPPCState, vscr));
> -    tcg_gen_extu_i32_i64(cpu_avrl[rD(ctx->opcode)], t);
> +    tcg_gen_extu_i32_i64(avr, t);
> +    set_avr64(rD(ctx->opcode), avr, false);
>      tcg_temp_free_i32(t);
> +    tcg_temp_free_i64(avr);
>  }
>  
>  static void gen_mtvscr(DisasContext *ctx)
> @@ -188,6 +207,7 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>      TCGv_i64 t0 = tcg_temp_new_i64();                                   \
>      TCGv_i64 t1 = tcg_temp_new_i64();                                   \
>      TCGv_i64 t2 = tcg_temp_new_i64();                                   \
> +    TCGv_i64 avr = tcg_temp_new_i64();                                  \
>      TCGv_i64 ten, z;                                                    \
>                                                                          \
>      if (unlikely(!ctx->altivec_enabled)) {                              \
> @@ -199,26 +219,35 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>      z = tcg_const_i64(0);                                               \
>                                                                          \
>      if (add_cin) {                                                      \
> -        tcg_gen_mulu2_i64(t0, t1, cpu_avrl[rA(ctx->opcode)], ten);      \
> -        tcg_gen_andi_i64(t2, cpu_avrl[rB(ctx->opcode)], 0xF);           \
> -        tcg_gen_add2_i64(cpu_avrl[rD(ctx->opcode)], t2, t0, t1, t2, z); \
> +        get_avr64(avr, rA(ctx->opcode), false);                         \
> +        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
> +        get_avr64(avr, rB(ctx->opcode), false);                         \
> +        tcg_gen_andi_i64(t2, avr, 0xF);                                 \
> +        tcg_gen_add2_i64(avr, t2, t0, t1, t2, z);                       \
> +        set_avr64(rD(ctx->opcode), avr, false);                         \
>      } else {                                                            \
> -        tcg_gen_mulu2_i64(cpu_avrl[rD(ctx->opcode)], t2,                \
> -                          cpu_avrl[rA(ctx->opcode)], ten);              \
> +        get_avr64(avr, rA(ctx->opcode), false);                         \
> +        tcg_gen_mulu2_i64(avr, t2, avr, ten);                           \
> +        set_avr64(rD(ctx->opcode), avr, false);                         \
>      }                                                                   \
>                                                                          \
>      if (ret_carry) {                                                    \
> -        tcg_gen_mulu2_i64(t0, t1, cpu_avrh[rA(ctx->opcode)], ten);      \
> -        tcg_gen_add2_i64(t0, cpu_avrl[rD(ctx->opcode)], t0, t1, t2, z); \
> -        tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);                 \
> +        get_avr64(avr, rA(ctx->opcode), true);                          \
> +        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
> +        tcg_gen_add2_i64(t0, avr, t0, t1, t2, z);                       \
> +        set_avr64(rD(ctx->opcode), avr, false);                         \
> +        set_avr64(rD(ctx->opcode), z, true);                            \
>      } else {                                                            \
> -        tcg_gen_mul_i64(t0, cpu_avrh[rA(ctx->opcode)], ten);            \
> -        tcg_gen_add_i64(cpu_avrh[rD(ctx->opcode)], t0, t2);             \
> +        get_avr64(avr, rA(ctx->opcode), true);                          \
> +        tcg_gen_mul_i64(t0, avr, ten);                                  \
> +        tcg_gen_add_i64(avr, t0, t2);                                   \
> +        set_avr64(rD(ctx->opcode), avr, true);                          \
>      }                                                                   \
>                                                                          \
>      tcg_temp_free_i64(t0);                                              \
>      tcg_temp_free_i64(t1);                                              \
>      tcg_temp_free_i64(t2);                                              \
> +    tcg_temp_free_i64(avr);                                             \
>      tcg_temp_free_i64(ten);                                             \
>      tcg_temp_free_i64(z);                                               \
>  }                                                                       \
> @@ -232,12 +261,27 @@ GEN_VX_VMUL10(vmul10ecuq, 1, 1);
>  #define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
>  static void glue(gen_, name)(DisasContext *ctx)                                 \
>  {                                                                       \
> +    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
> +    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
> +    TCGv_i64 avr = tcg_temp_new_i64();                                  \
> +                                                                        \
>      if (unlikely(!ctx->altivec_enabled)) {                              \
>          gen_exception(ctx, POWERPC_EXCP_VPU);                           \
>          return;                                                         \
>      }                                                                   \
> -    tcg_op(cpu_avrh[rD(ctx->opcode)], cpu_avrh[rA(ctx->opcode)], cpu_avrh[rB(ctx->opcode)]); \
> -    tcg_op(cpu_avrl[rD(ctx->opcode)], cpu_avrl[rA(ctx->opcode)], cpu_avrl[rB(ctx->opcode)]); \
> +    get_avr64(t0, rA(ctx->opcode), true);                               \
> +    get_avr64(t1, rB(ctx->opcode), true);                               \
> +    tcg_op(avr, t0, t1);                                                \
> +    set_avr64(rD(ctx->opcode), avr, true);                              \
> +                                                                        \
> +    get_avr64(t0, rA(ctx->opcode), false);                              \
> +    get_avr64(t1, rB(ctx->opcode), false);                              \
> +    tcg_op(avr, t0, t1);                                                \
> +    set_avr64(rD(ctx->opcode), avr, false);                             \
> +                                                                        \
> +    tcg_temp_free_i64(t0);                                              \
> +    tcg_temp_free_i64(t1);                                              \
> +    tcg_temp_free_i64(avr);                                             \
>  }
>  
>  GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
> @@ -406,6 +450,7 @@ GEN_VXFORM(vmrglw, 6, 6);
>  static void gen_vmrgew(DisasContext *ctx)
>  {
>      TCGv_i64 tmp;
> +    TCGv_i64 avr;
>      int VT, VA, VB;
>      if (unlikely(!ctx->altivec_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VPU);
> @@ -415,15 +460,28 @@ static void gen_vmrgew(DisasContext *ctx)
>      VA = rA(ctx->opcode);
>      VB = rB(ctx->opcode);
>      tmp = tcg_temp_new_i64();
> -    tcg_gen_shri_i64(tmp, cpu_avrh[VB], 32);
> -    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VA], tmp, 0, 32);
> -    tcg_gen_shri_i64(tmp, cpu_avrl[VB], 32);
> -    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VA], tmp, 0, 32);
> +    avr = tcg_temp_new_i64();
> +
> +    get_avr64(avr, VB, true);
> +    tcg_gen_shri_i64(tmp, avr, 32);
> +    get_avr64(avr, VA, true);
> +    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
> +    set_avr64(VT, avr, true);
> +
> +    get_avr64(avr, VB, false);
> +    tcg_gen_shri_i64(tmp, avr, 32);
> +    get_avr64(avr, VA, false);
> +    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
> +    set_avr64(VT, avr, false);
> +
>      tcg_temp_free_i64(tmp);
> +    tcg_temp_free_i64(avr);
>  }
>  
>  static void gen_vmrgow(DisasContext *ctx)
>  {
> +    TCGv_i64 t0, t1;
> +    TCGv_i64 avr;
>      int VT, VA, VB;
>      if (unlikely(!ctx->altivec_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VPU);
> @@ -432,9 +490,23 @@ static void gen_vmrgow(DisasContext *ctx)
>      VT = rD(ctx->opcode);
>      VA = rA(ctx->opcode);
>      VB = rB(ctx->opcode);
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
> +    avr = tcg_temp_new_i64();
>  
> -    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VB], cpu_avrh[VA], 32, 32);
> -    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VB], cpu_avrl[VA], 32, 32);
> +    get_avr64(t0, VB, true);
> +    get_avr64(t1, VA, true);
> +    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
> +    set_avr64(VT, avr, true);
> +
> +    get_avr64(t0, VB, false);
> +    get_avr64(t1, VA, false);
> +    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
> +    set_avr64(VT, avr, false);
> +
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(avr);
>  }
>  
>  GEN_VXFORM(vmuloub, 4, 0);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Richard Henderson
@ 2018-12-19  6:17   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 67108 bytes --]

On Mon, Dec 17, 2018 at 10:38:50PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> These helpers allow us to move VSR register values to/from the specified TCGv_i64
> argument.
> 
> To prevent VSX helpers accessing the cpu_vsr array directly, add extra TCG
> temporaries as required.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-4-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate/vsx-impl.inc.c | 782 ++++++++++++++++++++--------
>  1 file changed, 561 insertions(+), 221 deletions(-)
> 
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index 85ed135d44..e9a05d66f7 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -1,20 +1,48 @@
>  /***                           VSX extension                               ***/
>  
> -static inline TCGv_i64 cpu_vsrh(int n)
> +static inline void get_vsr(TCGv_i64 dst, int n)
> +{
> +    tcg_gen_mov_i64(dst, cpu_vsr[n]);
> +}
> +
> +static inline void set_vsr(int n, TCGv_i64 src)
> +{
> +    tcg_gen_mov_i64(cpu_vsr[n], src);
> +}
> +
> +static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
>  {
>      if (n < 32) {
> -        return cpu_fpr[n];
> +        get_fpr(dst, n);
>      } else {
> -        return cpu_avrh[n-32];
> +        get_avr64(dst, n - 32, true);
>      }
>  }
>  
> -static inline TCGv_i64 cpu_vsrl(int n)
> +static inline void get_cpu_vsrl(TCGv_i64 dst, int n)
>  {
>      if (n < 32) {
> -        return cpu_vsr[n];
> +        get_vsr(dst, n);
>      } else {
> -        return cpu_avrl[n-32];
> +        get_avr64(dst, n - 32, false);
> +    }
> +}
> +
> +static inline void set_cpu_vsrh(int n, TCGv_i64 src)
> +{
> +    if (n < 32) {
> +        set_fpr(n, src);
> +    } else {
> +        set_avr64(n - 32, src, true);
> +    }
> +}
> +
> +static inline void set_cpu_vsrl(int n, TCGv_i64 src)
> +{
> +    if (n < 32) {
> +        set_vsr(n, src);
> +    } else {
> +        set_avr64(n - 32, src, false);
>      }
>  }
>  
> @@ -22,16 +50,20 @@ static inline TCGv_i64 cpu_vsrl(int n)
>  static void gen_##name(DisasContext *ctx)                     \
>  {                                                             \
>      TCGv EA;                                                  \
> +    TCGv_i64 t0;                                              \
>      if (unlikely(!ctx->vsx_enabled)) {                        \
>          gen_exception(ctx, POWERPC_EXCP_VSXU);                \
>          return;                                               \
>      }                                                         \
> +    t0 = tcg_temp_new_i64();                                  \
>      gen_set_access_type(ctx, ACCESS_INT);                     \
>      EA = tcg_temp_new();                                      \
>      gen_addr_reg_index(ctx, EA);                              \
> -    gen_qemu_##operation(ctx, cpu_vsrh(xT(ctx->opcode)), EA); \
> +    gen_qemu_##operation(ctx, t0, EA);                        \
> +    set_cpu_vsrh(xT(ctx->opcode), t0);                        \
>      /* NOTE: cpu_vsrl is undefined */                         \
>      tcg_temp_free(EA);                                        \
> +    tcg_temp_free_i64(t0);                                    \
>  }
>  
>  VSX_LOAD_SCALAR(lxsdx, ld64_i64)
> @@ -44,39 +76,54 @@ VSX_LOAD_SCALAR(lxsspx, ld32fs)
>  static void gen_lxvd2x(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
>      gen_set_access_type(ctx, ACCESS_INT);
>      EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> -    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
> +    gen_qemu_ld64_i64(ctx, t0, EA);
> +    set_cpu_vsrh(xT(ctx->opcode), t0);
>      tcg_gen_addi_tl(EA, EA, 8);
> -    gen_qemu_ld64_i64(ctx, cpu_vsrl(xT(ctx->opcode)), EA);
> +    gen_qemu_ld64_i64(ctx, t0, EA);
> +    set_cpu_vsrl(xT(ctx->opcode), t0);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  static void gen_lxvdsx(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0;
> +    TCGv_i64 t1;
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
> +    t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      gen_set_access_type(ctx, ACCESS_INT);
>      EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> -    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
> -    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
> +    gen_qemu_ld64_i64(ctx, t0, EA);
> +    set_cpu_vsrh(xT(ctx->opcode), t0);
> +    tcg_gen_mov_i64(t1, t0);
> +    set_cpu_vsrl(xT(ctx->opcode), t1);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  static void gen_lxvw4x(DisasContext *ctx)
>  {
>      TCGv EA;
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xth, xT(ctx->opcode));
> +    get_cpu_vsrh(xtl, xT(ctx->opcode));
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
> @@ -104,6 +151,8 @@ static void gen_lxvw4x(DisasContext *ctx)
>          tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
>  }
>  
>  static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl,
> @@ -151,8 +200,10 @@ static void gen_bswap32x4(TCGv_i64 outh, TCGv_i64 outl,
>  static void gen_lxvh8x(DisasContext *ctx)
>  {
>      TCGv EA;
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xth, xT(ctx->opcode));
> +    get_cpu_vsrh(xtl, xT(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -169,13 +220,17 @@ static void gen_lxvh8x(DisasContext *ctx)
>          gen_bswap16x8(xth, xtl, xth, xtl);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
>  }
>  
>  static void gen_lxvb16x(DisasContext *ctx)
>  {
>      TCGv EA;
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xth, xT(ctx->opcode));
> +    get_cpu_vsrh(xtl, xT(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -188,6 +243,8 @@ static void gen_lxvb16x(DisasContext *ctx)
>      tcg_gen_addi_tl(EA, EA, 8);
>      tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
>  }
>  
>  #define VSX_VECTOR_LOAD_STORE(name, op, indexed)            \
> @@ -195,15 +252,16 @@ static void gen_##name(DisasContext *ctx)                   \
>  {                                                           \
>      int xt;                                                 \
>      TCGv EA;                                                \
> -    TCGv_i64 xth, xtl;                                      \
> +    TCGv_i64 xth = tcg_temp_new_i64();                      \
> +    TCGv_i64 xtl = tcg_temp_new_i64();                      \
>                                                              \
>      if (indexed) {                                          \
>          xt = xT(ctx->opcode);                               \
>      } else {                                                \
>          xt = DQxT(ctx->opcode);                             \
>      }                                                       \
> -    xth = cpu_vsrh(xt);                                     \
> -    xtl = cpu_vsrl(xt);                                     \
> +    get_cpu_vsrh(xth, xt);                                  \
> +    get_cpu_vsrl(xtl, xt);                                  \
>                                                              \
>      if (xt < 32) {                                          \
>          if (unlikely(!ctx->vsx_enabled)) {                  \
> @@ -225,14 +283,20 @@ static void gen_##name(DisasContext *ctx)                   \
>      }                                                       \
>      if (ctx->le_mode) {                                     \
>          tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_LEQ);   \
> +        set_cpu_vsrl(xt, xtl);                              \
>          tcg_gen_addi_tl(EA, EA, 8);                         \
>          tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_LEQ);   \
> +        set_cpu_vsrh(xt, xth);                              \
>      } else {                                                \
>          tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_BEQ);   \
> +        set_cpu_vsrh(xt, xth);                              \
>          tcg_gen_addi_tl(EA, EA, 8);                         \
>          tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_BEQ);   \
> +        set_cpu_vsrl(xt, xtl);                              \
>      }                                                       \
>      tcg_temp_free(EA);                                      \
> +    tcg_temp_free_i64(xth);                                 \
> +    tcg_temp_free_i64(xtl);                                 \
>  }
>  
>  VSX_VECTOR_LOAD_STORE(lxv, ld_i64, 0)
> @@ -276,7 +340,8 @@ VSX_VECTOR_LOAD_STORE_LENGTH(stxvll)
>  static void gen_##name(DisasContext *ctx)                         \
>  {                                                                 \
>      TCGv EA;                                                      \
> -    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
> +    TCGv_i64 xth = tcg_temp_new_i64();                            \
> +    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
>                                                                    \
>      if (unlikely(!ctx->altivec_enabled)) {                        \
>          gen_exception(ctx, POWERPC_EXCP_VPU);                     \
> @@ -286,8 +351,10 @@ static void gen_##name(DisasContext *ctx)                         \
>      EA = tcg_temp_new();                                          \
>      gen_addr_imm_index(ctx, EA, 0x03);                            \
>      gen_qemu_##operation(ctx, xth, EA);                           \
> +    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);                      \
>      /* NOTE: cpu_vsrl is undefined */                             \
>      tcg_temp_free(EA);                                            \
> +    tcg_temp_free_i64(xth);                                       \
>  }
>  
>  VSX_LOAD_SCALAR_DS(lxsd, ld64_i64)
> @@ -297,15 +364,19 @@ VSX_LOAD_SCALAR_DS(lxssp, ld32fs)
>  static void gen_##name(DisasContext *ctx)                     \
>  {                                                             \
>      TCGv EA;                                                  \
> +    TCGv_i64 t0;                                              \
>      if (unlikely(!ctx->vsx_enabled)) {                        \
>          gen_exception(ctx, POWERPC_EXCP_VSXU);                \
>          return;                                               \
>      }                                                         \
> +    t0 = tcg_temp_new_i64();                                  \
>      gen_set_access_type(ctx, ACCESS_INT);                     \
>      EA = tcg_temp_new();                                      \
>      gen_addr_reg_index(ctx, EA);                              \
> -    gen_qemu_##operation(ctx, cpu_vsrh(xS(ctx->opcode)), EA); \
> +    gen_qemu_##operation(ctx, t0, EA);                        \
> +    set_cpu_vsrh(xS(ctx->opcode), t0);                        \
>      tcg_temp_free(EA);                                        \
> +    tcg_temp_free_i64(t0);                                    \
>  }
>  
>  VSX_STORE_SCALAR(stxsdx, st64_i64)
> @@ -318,6 +389,7 @@ VSX_STORE_SCALAR(stxsspx, st32fs)
>  static void gen_stxvd2x(DisasContext *ctx)
>  {
>      TCGv EA;
> +    TCGv_i64 t0 = tcg_temp_new_i64();
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
> @@ -325,17 +397,23 @@ static void gen_stxvd2x(DisasContext *ctx)
>      gen_set_access_type(ctx, ACCESS_INT);
>      EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> -    gen_qemu_st64_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
> +    get_cpu_vsrh(t0, xS(ctx->opcode));
> +    gen_qemu_st64_i64(ctx, t0, EA);
>      tcg_gen_addi_tl(EA, EA, 8);
> -    gen_qemu_st64_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
> +    get_cpu_vsrl(t0, xS(ctx->opcode));
> +    gen_qemu_st64_i64(ctx, t0, EA);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  static void gen_stxvw4x(DisasContext *ctx)
>  {
> -    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> -    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
>      TCGv EA;
> +    TCGv_i64 xsh = tcg_temp_new_i64();
> +    TCGv_i64 xsl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xsh, xS(ctx->opcode));
> +    get_cpu_vsrl(xsl, xS(ctx->opcode));
> +
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
> @@ -362,13 +440,17 @@ static void gen_stxvw4x(DisasContext *ctx)
>          tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(xsh);
> +    tcg_temp_free_i64(xsl);
>  }
>  
>  static void gen_stxvh8x(DisasContext *ctx)
>  {
> -    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> -    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
>      TCGv EA;
> +    TCGv_i64 xsh = tcg_temp_new_i64();
> +    TCGv_i64 xsl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xsh, xS(ctx->opcode));
> +    get_cpu_vsrl(xsl, xS(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -393,13 +475,17 @@ static void gen_stxvh8x(DisasContext *ctx)
>          tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
>      }
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(xsh);
> +    tcg_temp_free_i64(xsl);
>  }
>  
>  static void gen_stxvb16x(DisasContext *ctx)
>  {
> -    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> -    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
>      TCGv EA;
> +    TCGv_i64 xsh = tcg_temp_new_i64();
> +    TCGv_i64 xsl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xsh, xS(ctx->opcode));
> +    get_cpu_vsrl(xsl, xS(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -412,13 +498,16 @@ static void gen_stxvb16x(DisasContext *ctx)
>      tcg_gen_addi_tl(EA, EA, 8);
>      tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
>      tcg_temp_free(EA);
> +    tcg_temp_free_i64(xsh);
> +    tcg_temp_free_i64(xsl);
>  }
>  
>  #define VSX_STORE_SCALAR_DS(name, operation)                      \
>  static void gen_##name(DisasContext *ctx)                         \
>  {                                                                 \
>      TCGv EA;                                                      \
> -    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
> +    TCGv_i64 xth = tcg_temp_new_i64();                            \
> +    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
>                                                                    \
>      if (unlikely(!ctx->altivec_enabled)) {                        \
>          gen_exception(ctx, POWERPC_EXCP_VPU);                     \
> @@ -430,62 +519,119 @@ static void gen_##name(DisasContext *ctx)                         \
>      gen_qemu_##operation(ctx, xth, EA);                           \
>      /* NOTE: cpu_vsrl is undefined */                             \
>      tcg_temp_free(EA);                                            \
> +    tcg_temp_free_i64(xth);                                       \
>  }
>  
>  VSX_LOAD_SCALAR_DS(stxsd, st64_i64)
>  VSX_LOAD_SCALAR_DS(stxssp, st32fs)
>  
> -#define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
> -static void gen_##name(DisasContext *ctx)                       \
> -{                                                               \
> -    if (xS(ctx->opcode) < 32) {                                 \
> -        if (unlikely(!ctx->fpu_enabled)) {                      \
> -            gen_exception(ctx, POWERPC_EXCP_FPU);               \
> -            return;                                             \
> -        }                                                       \
> -    } else {                                                    \
> -        if (unlikely(!ctx->altivec_enabled)) {                  \
> -            gen_exception(ctx, POWERPC_EXCP_VPU);               \
> -            return;                                             \
> -        }                                                       \
> -    }                                                           \
> -    TCGv_i64 tmp = tcg_temp_new_i64();                          \
> -    tcg_gen_##tcgop1(tmp, source);                              \
> -    tcg_gen_##tcgop2(target, tmp);                              \
> -    tcg_temp_free_i64(tmp);                                     \
> +static void gen_mfvsrwz(DisasContext *ctx)
> +{
> +    if (xS(ctx->opcode) < 32) {
> +        if (unlikely(!ctx->fpu_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_FPU);
> +            return;
> +        }
> +    } else {
> +        if (unlikely(!ctx->altivec_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_VPU);
> +            return;
> +        }
> +    }
> +    TCGv_i64 tmp = tcg_temp_new_i64();
> +    TCGv_i64 xsh = tcg_temp_new_i64();
> +    get_cpu_vsrh(xsh, xS(ctx->opcode));
> +    tcg_gen_ext32u_i64(tmp, xsh);
> +    tcg_gen_trunc_i64_tl(cpu_gpr[rA(ctx->opcode)], tmp);
> +    tcg_temp_free_i64(tmp);
>  }
>  
> +static void gen_mtvsrwa(DisasContext *ctx)
> +{
> +    if (xS(ctx->opcode) < 32) {
> +        if (unlikely(!ctx->fpu_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_FPU);
> +            return;
> +        }
> +    } else {
> +        if (unlikely(!ctx->altivec_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_VPU);
> +            return;
> +        }
> +    }
> +    TCGv_i64 tmp = tcg_temp_new_i64();
> +    TCGv_i64 xsh = tcg_temp_new_i64();
> +    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
> +    tcg_gen_ext32s_i64(xsh, tmp);
> +    set_cpu_vsrh(xT(ctx->opcode), xsh);
> +    tcg_temp_free_i64(tmp);
> +    tcg_temp_free_i64(xsh);
> +}
>  
> -MV_VSRW(mfvsrwz, ext32u_i64, trunc_i64_tl, cpu_gpr[rA(ctx->opcode)], \
> -        cpu_vsrh(xS(ctx->opcode)))
> -MV_VSRW(mtvsrwa, extu_tl_i64, ext32s_i64, cpu_vsrh(xT(ctx->opcode)), \
> -        cpu_gpr[rA(ctx->opcode)])
> -MV_VSRW(mtvsrwz, extu_tl_i64, ext32u_i64, cpu_vsrh(xT(ctx->opcode)), \
> -        cpu_gpr[rA(ctx->opcode)])
> +static void gen_mtvsrwz(DisasContext *ctx)
> +{
> +    if (xS(ctx->opcode) < 32) {
> +        if (unlikely(!ctx->fpu_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_FPU);
> +            return;
> +        }
> +    } else {
> +        if (unlikely(!ctx->altivec_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_VPU);
> +            return;
> +        }
> +    }
> +    TCGv_i64 tmp = tcg_temp_new_i64();
> +    TCGv_i64 xsh = tcg_temp_new_i64();
> +    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
> +    tcg_gen_ext32u_i64(xsh, tmp);
> +    set_cpu_vsrh(xT(ctx->opcode), xsh);
> +    tcg_temp_free_i64(tmp);
> +    tcg_temp_free_i64(xsh);
> +}
>  
>  #if defined(TARGET_PPC64)
> -#define MV_VSRD(name, target, source)                           \
> -static void gen_##name(DisasContext *ctx)                       \
> -{                                                               \
> -    if (xS(ctx->opcode) < 32) {                                 \
> -        if (unlikely(!ctx->fpu_enabled)) {                      \
> -            gen_exception(ctx, POWERPC_EXCP_FPU);               \
> -            return;                                             \
> -        }                                                       \
> -    } else {                                                    \
> -        if (unlikely(!ctx->altivec_enabled)) {                  \
> -            gen_exception(ctx, POWERPC_EXCP_VPU);               \
> -            return;                                             \
> -        }                                                       \
> -    }                                                           \
> -    tcg_gen_mov_i64(target, source);                            \
> +static void gen_mfvsrd(DisasContext *ctx)
> +{
> +    TCGv_i64 t0 = tcg_temp_new_i64();
> +    if (xS(ctx->opcode) < 32) {
> +        if (unlikely(!ctx->fpu_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_FPU);
> +            return;
> +        }
> +    } else {
> +        if (unlikely(!ctx->altivec_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_VPU);
> +            return;
> +        }
> +    }
> +    get_cpu_vsrh(t0, xS(ctx->opcode));
> +    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
> +    tcg_temp_free_i64(t0);
>  }
>  
> -MV_VSRD(mfvsrd, cpu_gpr[rA(ctx->opcode)], cpu_vsrh(xS(ctx->opcode)))
> -MV_VSRD(mtvsrd, cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)])
> +static void gen_mtvsrd(DisasContext *ctx)
> +{
> +    TCGv_i64 t0 = tcg_temp_new_i64();
> +    if (xS(ctx->opcode) < 32) {
> +        if (unlikely(!ctx->fpu_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_FPU);
> +            return;
> +        }
> +    } else {
> +        if (unlikely(!ctx->altivec_enabled)) {
> +            gen_exception(ctx, POWERPC_EXCP_VPU);
> +            return;
> +        }
> +    }
> +    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
> +    set_cpu_vsrh(xT(ctx->opcode), t0);
> +    tcg_temp_free_i64(t0);
> +}
>  
>  static void gen_mfvsrld(DisasContext *ctx)
>  {
> +    TCGv_i64 t0 = tcg_temp_new_i64();
>      if (xS(ctx->opcode) < 32) {
>          if (unlikely(!ctx->vsx_enabled)) {
>              gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -497,12 +643,14 @@ static void gen_mfvsrld(DisasContext *ctx)
>              return;
>          }
>      }
> -
> -    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_vsrl(xS(ctx->opcode)));
> +    get_cpu_vsrl(t0, xS(ctx->opcode));
> +    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  static void gen_mtvsrdd(DisasContext *ctx)
>  {
> +    TCGv_i64 t0 = tcg_temp_new_i64();
>      if (xT(ctx->opcode) < 32) {
>          if (unlikely(!ctx->vsx_enabled)) {
>              gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -516,16 +664,20 @@ static void gen_mtvsrdd(DisasContext *ctx)
>      }
>  
>      if (!rA(ctx->opcode)) {
> -        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);
> +        tcg_gen_movi_i64(t0, 0);
>      } else {
> -        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)]);
> +        tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
>      }
> +    set_cpu_vsrh(xT(ctx->opcode), t0);
>  
> -    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rB(ctx->opcode)]);
> +    tcg_gen_mov_i64(t0, cpu_gpr[rB(ctx->opcode)]);
> +    set_cpu_vsrl(xT(ctx->opcode), t0);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  static void gen_mtvsrws(DisasContext *ctx)
>  {
> +    TCGv_i64 t0 = tcg_temp_new_i64();
>      if (xT(ctx->opcode) < 32) {
>          if (unlikely(!ctx->vsx_enabled)) {
>              gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -538,55 +690,60 @@ static void gen_mtvsrws(DisasContext *ctx)
>          }
>      }
>  
> -    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)],
> +    tcg_gen_deposit_i64(t0, cpu_gpr[rA(ctx->opcode)],
>                          cpu_gpr[rA(ctx->opcode)], 32, 32);
> -    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xT(ctx->opcode)));
> +    set_cpu_vsrl(xT(ctx->opcode), t0);
> +    set_cpu_vsrh(xT(ctx->opcode), t0);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  #endif
>  
>  static void gen_xxpermdi(DisasContext *ctx)
>  {
> +    TCGv_i64 xh, xl;
> +
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
>  
> +    xh = tcg_temp_new_i64();
> +    xl = tcg_temp_new_i64();
> +
>      if (unlikely((xT(ctx->opcode) == xA(ctx->opcode)) ||
>                   (xT(ctx->opcode) == xB(ctx->opcode)))) {
> -        TCGv_i64 xh, xl;
> -
> -        xh = tcg_temp_new_i64();
> -        xl = tcg_temp_new_i64();
> -
>          if ((DM(ctx->opcode) & 2) == 0) {
> -            tcg_gen_mov_i64(xh, cpu_vsrh(xA(ctx->opcode)));
> +            get_cpu_vsrh(xh, xA(ctx->opcode));
>          } else {
> -            tcg_gen_mov_i64(xh, cpu_vsrl(xA(ctx->opcode)));
> +            get_cpu_vsrl(xh, xA(ctx->opcode));
>          }
>          if ((DM(ctx->opcode) & 1) == 0) {
> -            tcg_gen_mov_i64(xl, cpu_vsrh(xB(ctx->opcode)));
> +            get_cpu_vsrh(xl, xB(ctx->opcode));
>          } else {
> -            tcg_gen_mov_i64(xl, cpu_vsrl(xB(ctx->opcode)));
> +            get_cpu_vsrl(xl, xB(ctx->opcode));
>          }
>  
> -        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xh);
> -        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xl);
> -
> -        tcg_temp_free_i64(xh);
> -        tcg_temp_free_i64(xl);
> +        set_cpu_vsrh(xT(ctx->opcode), xh);
> +        set_cpu_vsrl(xT(ctx->opcode), xl);
>      } else {
>          if ((DM(ctx->opcode) & 2) == 0) {
> -            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)));
> +            get_cpu_vsrh(xh, xA(ctx->opcode));
> +            set_cpu_vsrh(xT(ctx->opcode), xh);
>          } else {
> -            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)));
> +            get_cpu_vsrl(xh, xA(ctx->opcode));
> +            set_cpu_vsrh(xT(ctx->opcode), xh);
>          }
>          if ((DM(ctx->opcode) & 1) == 0) {
> -            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xB(ctx->opcode)));
> +            get_cpu_vsrh(xl, xB(ctx->opcode));
> +            set_cpu_vsrl(xT(ctx->opcode), xl);
>          } else {
> -            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xB(ctx->opcode)));
> +            get_cpu_vsrl(xl, xB(ctx->opcode));
> +            set_cpu_vsrl(xT(ctx->opcode), xl);
>          }
>      }
> +    tcg_temp_free_i64(xh);
> +    tcg_temp_free_i64(xl);
>  }
>  
>  #define OP_ABS 1
> @@ -606,7 +763,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
>          }                                                         \
>          xb = tcg_temp_new_i64();                                  \
>          sgm = tcg_temp_new_i64();                                 \
> -        tcg_gen_mov_i64(xb, cpu_vsrh(xB(ctx->opcode)));           \
> +        get_cpu_vsrh(xb, xB(ctx->opcode));                        \
>          tcg_gen_movi_i64(sgm, sgn_mask);                          \
>          switch (op) {                                             \
>              case OP_ABS: {                                        \
> @@ -623,7 +780,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
>              }                                                     \
>              case OP_CPSGN: {                                      \
>                  TCGv_i64 xa = tcg_temp_new_i64();                 \
> -                tcg_gen_mov_i64(xa, cpu_vsrh(xA(ctx->opcode)));   \
> +                get_cpu_vsrh(xa, xA(ctx->opcode));                \
>                  tcg_gen_and_i64(xa, xa, sgm);                     \
>                  tcg_gen_andc_i64(xb, xb, sgm);                    \
>                  tcg_gen_or_i64(xb, xb, xa);                       \
> @@ -631,7 +788,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
>                  break;                                            \
>              }                                                     \
>          }                                                         \
> -        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xb);           \
> +        set_cpu_vsrh(xT(ctx->opcode), xb);                        \
>          tcg_temp_free_i64(xb);                                    \
>          tcg_temp_free_i64(sgm);                                   \
>      }
> @@ -647,7 +804,7 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
>      int xa;                                                       \
>      int xt = rD(ctx->opcode) + 32;                                \
>      int xb = rB(ctx->opcode) + 32;                                \
> -    TCGv_i64 xah, xbh, xbl, sgm;                                  \
> +    TCGv_i64 xah, xbh, xbl, sgm, tmp;                             \
>                                                                    \
>      if (unlikely(!ctx->vsx_enabled)) {                            \
>          gen_exception(ctx, POWERPC_EXCP_VSXU);                    \
> @@ -656,8 +813,9 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
>      xbh = tcg_temp_new_i64();                                     \
>      xbl = tcg_temp_new_i64();                                     \
>      sgm = tcg_temp_new_i64();                                     \
> -    tcg_gen_mov_i64(xbh, cpu_vsrh(xb));                           \
> -    tcg_gen_mov_i64(xbl, cpu_vsrl(xb));                           \
> +    tmp = tcg_temp_new_i64();                                     \
> +    get_cpu_vsrh(xbh, xb);                                        \
> +    get_cpu_vsrl(xbl, xb);                                        \
>      tcg_gen_movi_i64(sgm, sgn_mask);                              \
>      switch (op) {                                                 \
>      case OP_ABS:                                                  \
> @@ -672,17 +830,19 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
>      case OP_CPSGN:                                                \
>          xah = tcg_temp_new_i64();                                 \
>          xa = rA(ctx->opcode) + 32;                                \
> -        tcg_gen_and_i64(xah, cpu_vsrh(xa), sgm);                  \
> +        get_cpu_vsrh(tmp, xa);                                    \
> +        tcg_gen_and_i64(xah, tmp, sgm);                           \
>          tcg_gen_andc_i64(xbh, xbh, sgm);                          \
>          tcg_gen_or_i64(xbh, xbh, xah);                            \
>          tcg_temp_free_i64(xah);                                   \
>          break;                                                    \
>      }                                                             \
> -    tcg_gen_mov_i64(cpu_vsrh(xt), xbh);                           \
> -    tcg_gen_mov_i64(cpu_vsrl(xt), xbl);                           \
> +    set_cpu_vsrh(xt, xbh);                                        \
> +    set_cpu_vsrl(xt, xbl);                                        \
>      tcg_temp_free_i64(xbl);                                       \
>      tcg_temp_free_i64(xbh);                                       \
>      tcg_temp_free_i64(sgm);                                       \
> +    tcg_temp_free_i64(tmp);                                       \
>  }
>  
>  VSX_SCALAR_MOVE_QP(xsabsqp, OP_ABS, SGN_MASK_DP)
> @@ -701,8 +861,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
>          xbh = tcg_temp_new_i64();                                \
>          xbl = tcg_temp_new_i64();                                \
>          sgm = tcg_temp_new_i64();                                \
> -        tcg_gen_mov_i64(xbh, cpu_vsrh(xB(ctx->opcode)));         \
> -        tcg_gen_mov_i64(xbl, cpu_vsrl(xB(ctx->opcode)));         \
> +        set_cpu_vsrh(xB(ctx->opcode), xbh);                      \
> +        set_cpu_vsrl(xB(ctx->opcode), xbl);                      \
>          tcg_gen_movi_i64(sgm, sgn_mask);                         \
>          switch (op) {                                            \
>              case OP_ABS: {                                       \
> @@ -723,8 +883,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
>              case OP_CPSGN: {                                     \
>                  TCGv_i64 xah = tcg_temp_new_i64();               \
>                  TCGv_i64 xal = tcg_temp_new_i64();               \
> -                tcg_gen_mov_i64(xah, cpu_vsrh(xA(ctx->opcode))); \
> -                tcg_gen_mov_i64(xal, cpu_vsrl(xA(ctx->opcode))); \
> +                get_cpu_vsrh(xah, xA(ctx->opcode));              \
> +                get_cpu_vsrl(xal, xA(ctx->opcode));              \
>                  tcg_gen_and_i64(xah, xah, sgm);                  \
>                  tcg_gen_and_i64(xal, xal, sgm);                  \
>                  tcg_gen_andc_i64(xbh, xbh, sgm);                 \
> @@ -736,8 +896,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
>                  break;                                           \
>              }                                                    \
>          }                                                        \
> -        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xbh);         \
> -        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xbl);         \
> +        set_cpu_vsrh(xT(ctx->opcode), xbh);                      \
> +        set_cpu_vsrl(xT(ctx->opcode), xbl);                      \
>          tcg_temp_free_i64(xbh);                                  \
>          tcg_temp_free_i64(xbl);                                  \
>          tcg_temp_free_i64(sgm);                                  \
> @@ -768,12 +928,17 @@ static void gen_##name(DisasContext * ctx)                                    \
>  #define GEN_VSX_HELPER_XT_XB_ENV(name, op1, op2, inval, type) \
>  static void gen_##name(DisasContext * ctx)                    \
>  {                                                             \
> +    TCGv_i64 t0 = tcg_temp_new_i64();                         \
> +    TCGv_i64 t1 = tcg_temp_new_i64();                         \
>      if (unlikely(!ctx->vsx_enabled)) {                        \
>          gen_exception(ctx, POWERPC_EXCP_VSXU);                \
>          return;                                               \
>      }                                                         \
> -    gen_helper_##name(cpu_vsrh(xT(ctx->opcode)), cpu_env,     \
> -                      cpu_vsrh(xB(ctx->opcode)));             \
> +    get_cpu_vsrh(t0, xB(ctx->opcode));                        \
> +    gen_helper_##name(t1, cpu_env, t0);                       \
> +    set_cpu_vsrh(xT(ctx->opcode), t1);                        \
> +    tcg_temp_free_i64(t0);                                    \
> +    tcg_temp_free_i64(t1);                                    \
>  }
>  
>  GEN_VSX_HELPER_2(xsadddp, 0x00, 0x04, 0, PPC2_VSX)
> @@ -949,10 +1114,13 @@ GEN_VSX_HELPER_2(xxpermr, 0x08, 0x07, 0, PPC2_ISA300)
>  
>  static void gen_xxbrd(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -960,28 +1128,49 @@ static void gen_xxbrd(DisasContext *ctx)
>      }
>      tcg_gen_bswap64_i64(xth, xbh);
>      tcg_gen_bswap64_i64(xtl, xbl);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  static void gen_xxbrh(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
>      gen_bswap16x8(xth, xtl, xbh, xbl);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  static void gen_xxbrq(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
> +
>      TCGv_i64 t0 = tcg_temp_new_i64();
>  
>      if (unlikely(!ctx->vsx_enabled)) {
> @@ -990,35 +1179,65 @@ static void gen_xxbrq(DisasContext *ctx)
>      }
>      tcg_gen_bswap64_i64(t0, xbl);
>      tcg_gen_bswap64_i64(xtl, xbh);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
>      tcg_gen_mov_i64(xth, t0);
> +    set_cpu_vsrl(xT(ctx->opcode), xth);
> +
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  static void gen_xxbrw(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
>      gen_bswap32x4(xth, xtl, xbh, xbl);
> +    set_cpu_vsrl(xT(ctx->opcode), xth);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  #define VSX_LOGICAL(name, tcg_op)                                    \
>  static void glue(gen_, name)(DisasContext * ctx)                     \
>      {                                                                \
> +        TCGv_i64 t0;                                                 \
> +        TCGv_i64 t1;                                                 \
> +        TCGv_i64 t2;                                                 \
>          if (unlikely(!ctx->vsx_enabled)) {                           \
>              gen_exception(ctx, POWERPC_EXCP_VSXU);                   \
>              return;                                                  \
>          }                                                            \
> -        tcg_op(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)), \
> -            cpu_vsrh(xB(ctx->opcode)));                              \
> -        tcg_op(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)), \
> -            cpu_vsrl(xB(ctx->opcode)));                              \
> +        t0 = tcg_temp_new_i64();                                     \
> +        t1 = tcg_temp_new_i64();                                     \
> +        t2 = tcg_temp_new_i64();                                     \
> +        get_cpu_vsrh(t0, xA(ctx->opcode));                           \
> +        get_cpu_vsrh(t1, xB(ctx->opcode));                           \
> +        tcg_op(t2, t0, t1);                                          \
> +        set_cpu_vsrh(xT(ctx->opcode), t2);                           \
> +        get_cpu_vsrl(t0, xA(ctx->opcode));                           \
> +        get_cpu_vsrl(t1, xB(ctx->opcode));                           \
> +        tcg_op(t2, t0, t1);                                          \
> +        set_cpu_vsrl(xT(ctx->opcode), t2);                           \
> +        tcg_temp_free_i64(t0);                                       \
> +        tcg_temp_free_i64(t1);                                       \
> +        tcg_temp_free_i64(t2);                                       \
>      }
>  
>  VSX_LOGICAL(xxland, tcg_gen_and_i64)
> @@ -1033,7 +1252,7 @@ VSX_LOGICAL(xxlorc, tcg_gen_orc_i64)
>  #define VSX_XXMRG(name, high)                               \
>  static void glue(gen_, name)(DisasContext * ctx)            \
>      {                                                       \
> -        TCGv_i64 a0, a1, b0, b1;                            \
> +        TCGv_i64 a0, a1, b0, b1, tmp;                       \
>          if (unlikely(!ctx->vsx_enabled)) {                  \
>              gen_exception(ctx, POWERPC_EXCP_VSXU);          \
>              return;                                         \
> @@ -1042,27 +1261,29 @@ static void glue(gen_, name)(DisasContext * ctx)            \
>          a1 = tcg_temp_new_i64();                            \
>          b0 = tcg_temp_new_i64();                            \
>          b1 = tcg_temp_new_i64();                            \
> +        tmp = tcg_temp_new_i64();                           \
>          if (high) {                                         \
> -            tcg_gen_mov_i64(a0, cpu_vsrh(xA(ctx->opcode))); \
> -            tcg_gen_mov_i64(a1, cpu_vsrh(xA(ctx->opcode))); \
> -            tcg_gen_mov_i64(b0, cpu_vsrh(xB(ctx->opcode))); \
> -            tcg_gen_mov_i64(b1, cpu_vsrh(xB(ctx->opcode))); \
> +            get_cpu_vsrh(a0, xA(ctx->opcode));              \
> +            get_cpu_vsrh(a1, xA(ctx->opcode));              \
> +            get_cpu_vsrh(b0, xB(ctx->opcode));              \
> +            get_cpu_vsrh(b1, xB(ctx->opcode));              \
>          } else {                                            \
> -            tcg_gen_mov_i64(a0, cpu_vsrl(xA(ctx->opcode))); \
> -            tcg_gen_mov_i64(a1, cpu_vsrl(xA(ctx->opcode))); \
> -            tcg_gen_mov_i64(b0, cpu_vsrl(xB(ctx->opcode))); \
> -            tcg_gen_mov_i64(b1, cpu_vsrl(xB(ctx->opcode))); \
> +            get_cpu_vsrl(a0, xA(ctx->opcode));              \
> +            get_cpu_vsrl(a1, xA(ctx->opcode));              \
> +            get_cpu_vsrl(b0, xB(ctx->opcode));              \
> +            get_cpu_vsrl(b1, xB(ctx->opcode));              \
>          }                                                   \
>          tcg_gen_shri_i64(a0, a0, 32);                       \
>          tcg_gen_shri_i64(b0, b0, 32);                       \
> -        tcg_gen_deposit_i64(cpu_vsrh(xT(ctx->opcode)),      \
> -                            b0, a0, 32, 32);                \
> -        tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)),      \
> -                            b1, a1, 32, 32);                \
> +        tcg_gen_deposit_i64(tmp, b0, a0, 32, 32);           \
> +        set_cpu_vsrh(xT(ctx->opcode), tmp);                 \
> +        tcg_gen_deposit_i64(tmp, b1, a1, 32, 32);           \
> +        set_cpu_vsrl(xT(ctx->opcode), tmp);                 \
>          tcg_temp_free_i64(a0);                              \
>          tcg_temp_free_i64(a1);                              \
>          tcg_temp_free_i64(b0);                              \
>          tcg_temp_free_i64(b1);                              \
> +        tcg_temp_free_i64(tmp);                             \
>      }
>  
>  VSX_XXMRG(xxmrghw, 1)
> @@ -1070,7 +1291,7 @@ VSX_XXMRG(xxmrglw, 0)
>  
>  static void gen_xxsel(DisasContext * ctx)
>  {
> -    TCGv_i64 a, b, c;
> +    TCGv_i64 a, b, c, tmp;
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
> @@ -1078,34 +1299,43 @@ static void gen_xxsel(DisasContext * ctx)
>      a = tcg_temp_new_i64();
>      b = tcg_temp_new_i64();
>      c = tcg_temp_new_i64();
> +    tmp = tcg_temp_new_i64();
>  
> -    tcg_gen_mov_i64(a, cpu_vsrh(xA(ctx->opcode)));
> -    tcg_gen_mov_i64(b, cpu_vsrh(xB(ctx->opcode)));
> -    tcg_gen_mov_i64(c, cpu_vsrh(xC(ctx->opcode)));
> +    get_cpu_vsrh(a, xA(ctx->opcode));
> +    get_cpu_vsrh(b, xB(ctx->opcode));
> +    get_cpu_vsrh(c, xC(ctx->opcode));
>  
>      tcg_gen_and_i64(b, b, c);
>      tcg_gen_andc_i64(a, a, c);
> -    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), a, b);
> +    tcg_gen_or_i64(tmp, a, b);
> +    set_cpu_vsrh(xT(ctx->opcode), tmp);
>  
> -    tcg_gen_mov_i64(a, cpu_vsrl(xA(ctx->opcode)));
> -    tcg_gen_mov_i64(b, cpu_vsrl(xB(ctx->opcode)));
> -    tcg_gen_mov_i64(c, cpu_vsrl(xC(ctx->opcode)));
> +    get_cpu_vsrl(a, xA(ctx->opcode));
> +    get_cpu_vsrl(b, xB(ctx->opcode));
> +    get_cpu_vsrl(c, xC(ctx->opcode));
>  
>      tcg_gen_and_i64(b, b, c);
>      tcg_gen_andc_i64(a, a, c);
> -    tcg_gen_or_i64(cpu_vsrl(xT(ctx->opcode)), a, b);
> +    tcg_gen_or_i64(tmp, a, b);
> +    set_cpu_vsrl(xT(ctx->opcode), tmp);
>  
>      tcg_temp_free_i64(a);
>      tcg_temp_free_i64(b);
>      tcg_temp_free_i64(c);
> +    tcg_temp_free_i64(tmp);
>  }
>  
>  static void gen_xxspltw(DisasContext *ctx)
>  {
>      TCGv_i64 b, b2;
> -    TCGv_i64 vsr = (UIM(ctx->opcode) & 2) ?
> -                   cpu_vsrl(xB(ctx->opcode)) :
> -                   cpu_vsrh(xB(ctx->opcode));
> +    TCGv_i64 vsr;
> +
> +    vsr = tcg_temp_new_i64();
> +    if (UIM(ctx->opcode) & 2) {
> +        get_cpu_vsrl(vsr, xB(ctx->opcode));
> +    } else {
> +        get_cpu_vsrh(vsr, xB(ctx->opcode));
> +    }
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -1122,9 +1352,11 @@ static void gen_xxspltw(DisasContext *ctx)
>      }
>  
>      tcg_gen_shli_i64(b2, b, 32);
> -    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), b, b2);
> -    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
> +    tcg_gen_or_i64(vsr, b, b2);
> +    set_cpu_vsrh(xT(ctx->opcode), vsr);
> +    set_cpu_vsrl(xT(ctx->opcode), vsr);
>  
> +    tcg_temp_free_i64(vsr);
>      tcg_temp_free_i64(b);
>      tcg_temp_free_i64(b2);
>  }
> @@ -1134,6 +1366,7 @@ static void gen_xxspltw(DisasContext *ctx)
>  static void gen_xxspltib(DisasContext *ctx)
>  {
>      unsigned char uim8 = IMM8(ctx->opcode);
> +    TCGv_i64 vsr = tcg_temp_new_i64();
>      if (xS(ctx->opcode) < 32) {
>          if (unlikely(!ctx->altivec_enabled)) {
>              gen_exception(ctx, POWERPC_EXCP_VPU);
> @@ -1145,8 +1378,10 @@ static void gen_xxspltib(DisasContext *ctx)
>              return;
>          }
>      }
> -    tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), pattern(uim8));
> -    tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), pattern(uim8));
> +    tcg_gen_movi_i64(vsr, pattern(uim8));
> +    set_cpu_vsrh(xT(ctx->opcode), vsr);
> +    set_cpu_vsrl(xT(ctx->opcode), vsr);
> +    tcg_temp_free_i64(vsr);
>  }
>  
>  static void gen_xxsldwi(DisasContext *ctx)
> @@ -1161,40 +1396,40 @@ static void gen_xxsldwi(DisasContext *ctx)
>  
>      switch (SHW(ctx->opcode)) {
>          case 0: {
> -            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
> -            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
> +            get_cpu_vsrh(xth, xA(ctx->opcode));
> +            get_cpu_vsrl(xtl, xA(ctx->opcode));
>              break;
>          }
>          case 1: {
>              TCGv_i64 t0 = tcg_temp_new_i64();
> -            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
> +            get_cpu_vsrh(xth, xA(ctx->opcode));
>              tcg_gen_shli_i64(xth, xth, 32);
> -            tcg_gen_mov_i64(t0, cpu_vsrl(xA(ctx->opcode)));
> +            get_cpu_vsrl(t0, xA(ctx->opcode));
>              tcg_gen_shri_i64(t0, t0, 32);
>              tcg_gen_or_i64(xth, xth, t0);
> -            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
> +            get_cpu_vsrl(xtl, xA(ctx->opcode));
>              tcg_gen_shli_i64(xtl, xtl, 32);
> -            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
> +            get_cpu_vsrh(t0, xB(ctx->opcode));
>              tcg_gen_shri_i64(t0, t0, 32);
>              tcg_gen_or_i64(xtl, xtl, t0);
>              tcg_temp_free_i64(t0);
>              break;
>          }
>          case 2: {
> -            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
> -            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
> +            get_cpu_vsrl(xth, xA(ctx->opcode));
> +            get_cpu_vsrh(xtl, xB(ctx->opcode));
>              break;
>          }
>          case 3: {
>              TCGv_i64 t0 = tcg_temp_new_i64();
> -            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
> +            get_cpu_vsrl(xth, xA(ctx->opcode));
>              tcg_gen_shli_i64(xth, xth, 32);
> -            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
> +            get_cpu_vsrh(t0, xB(ctx->opcode));
>              tcg_gen_shri_i64(t0, t0, 32);
>              tcg_gen_or_i64(xth, xth, t0);
> -            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
> +            get_cpu_vsrh(xtl, xB(ctx->opcode));
>              tcg_gen_shli_i64(xtl, xtl, 32);
> -            tcg_gen_mov_i64(t0, cpu_vsrl(xB(ctx->opcode)));
> +            get_cpu_vsrl(t0, xB(ctx->opcode));
>              tcg_gen_shri_i64(t0, t0, 32);
>              tcg_gen_or_i64(xtl, xtl, t0);
>              tcg_temp_free_i64(t0);
> @@ -1202,8 +1437,8 @@ static void gen_xxsldwi(DisasContext *ctx)
>          }
>      }
>  
> -    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xth);
> -    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xtl);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
>  
>      tcg_temp_free_i64(xth);
>      tcg_temp_free_i64(xtl);
> @@ -1214,6 +1449,7 @@ static void gen_##name(DisasContext *ctx)                       \
>  {                                                               \
>      TCGv xt, xb;                                                \
>      TCGv_i32 t0 = tcg_temp_new_i32();                           \
> +    TCGv_i64 t1 = tcg_temp_new_i64();                           \
>      uint8_t uimm = UIMM4(ctx->opcode);                          \
>                                                                  \
>      if (unlikely(!ctx->vsx_enabled)) {                          \
> @@ -1226,8 +1462,9 @@ static void gen_##name(DisasContext *ctx)                       \
>       * uimm > 12 handle as per hardware in helper               \
>       */                                                         \
>      if (uimm > 15) {                                            \
> -        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);         \
> -        tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), 0);         \
> +        tcg_gen_movi_i64(t1, 0);                                \
> +        set_cpu_vsrh(xT(ctx->opcode), t1);                      \
> +        set_cpu_vsrl(xT(ctx->opcode), t1);                      \
>          return;                                                 \
>      }                                                           \
>      tcg_gen_movi_i32(t0, uimm);                                 \
> @@ -1235,6 +1472,7 @@ static void gen_##name(DisasContext *ctx)                       \
>      tcg_temp_free(xb);                                          \
>      tcg_temp_free(xt);                                          \
>      tcg_temp_free_i32(t0);                                      \
> +    tcg_temp_free_i64(t1);                                      \
>  }
>  
>  VSX_EXTRACT_INSERT(xxextractuw)
> @@ -1244,30 +1482,41 @@ VSX_EXTRACT_INSERT(xxinsertw)
>  static void gen_xsxexpdp(DisasContext *ctx)
>  {
>      TCGv rt = cpu_gpr[rD(ctx->opcode)];
> +    TCGv_i64 t0 = tcg_temp_new_i64();
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
> -    tcg_gen_extract_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52, 11);
> +    get_cpu_vsrh(t0, xB(ctx->opcode));
> +    tcg_gen_extract_i64(rt, t0, 52, 11);
> +    tcg_temp_free_i64(t0);
>  }
>  
>  static void gen_xsxexpqp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
> -    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
> -    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
>      tcg_gen_extract_i64(xth, xbh, 48, 15);
> +    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
>      tcg_gen_movi_i64(xtl, 0);
> +    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
> +
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
>  }
>  
>  static void gen_xsiexpdp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> +    TCGv_i64 xth;
>      TCGv ra = cpu_gpr[rA(ctx->opcode)];
>      TCGv rb = cpu_gpr[rB(ctx->opcode)];
>      TCGv_i64 t0;
> @@ -1277,21 +1526,30 @@ static void gen_xsiexpdp(DisasContext *ctx)
>          return;
>      }
>      t0 = tcg_temp_new_i64();
> +    xth = tcg_temp_new_i64();
>      tcg_gen_andi_i64(xth, ra, 0x800FFFFFFFFFFFFF);
>      tcg_gen_andi_i64(t0, rb, 0x7FF);
>      tcg_gen_shli_i64(t0, t0, 52);
>      tcg_gen_or_i64(xth, xth, t0);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
>      /* dword[1] is undefined */
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(xth);
>  }
>  
>  static void gen_xsiexpqp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
> -    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
> -    TCGv_i64 xah = cpu_vsrh(rA(ctx->opcode) + 32);
> -    TCGv_i64 xal = cpu_vsrl(rA(ctx->opcode) + 32);
> -    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xah = tcg_temp_new_i64();
> +    TCGv_i64 xal = tcg_temp_new_i64();
> +    get_cpu_vsrh(xah, rA(ctx->opcode) + 32);
> +    get_cpu_vsrl(xal, rA(ctx->opcode) + 32);
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
> +
>      TCGv_i64 t0;
>  
>      if (unlikely(!ctx->vsx_enabled)) {
> @@ -1303,14 +1561,22 @@ static void gen_xsiexpqp(DisasContext *ctx)
>      tcg_gen_andi_i64(t0, xbh, 0x7FFF);
>      tcg_gen_shli_i64(t0, t0, 48);
>      tcg_gen_or_i64(xth, xth, t0);
> +    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
>      tcg_gen_mov_i64(xtl, xal);
> +    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
> +
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xah);
> +    tcg_temp_free_i64(xal);
> +    tcg_temp_free_i64(xbh);
>  }
>  
>  static void gen_xsxsigdp(DisasContext *ctx)
>  {
>      TCGv rt = cpu_gpr[rD(ctx->opcode)];
> -    TCGv_i64 t0, zr, nan, exp;
> +    TCGv_i64 t0, t1, zr, nan, exp;
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -1318,17 +1584,21 @@ static void gen_xsxsigdp(DisasContext *ctx)
>      }
>      exp = tcg_temp_new_i64();
>      t0 = tcg_temp_new_i64();
> +    t1 = tcg_temp_new_i64();
>      zr = tcg_const_i64(0);
>      nan = tcg_const_i64(2047);
>  
> -    tcg_gen_extract_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52, 11);
> +    get_cpu_vsrh(t1, xB(ctx->opcode));
> +    tcg_gen_extract_i64(exp, t1, 52, 11);
>      tcg_gen_movi_i64(t0, 0x0010000000000000);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
> -    tcg_gen_andi_i64(rt, cpu_vsrh(xB(ctx->opcode)), 0x000FFFFFFFFFFFFF);
> +    get_cpu_vsrh(t1, xB(ctx->opcode));
> +    tcg_gen_andi_i64(rt, t1, 0x000FFFFFFFFFFFFF);
>      tcg_gen_or_i64(rt, rt, t0);
>  
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>      tcg_temp_free_i64(exp);
>      tcg_temp_free_i64(zr);
>      tcg_temp_free_i64(nan);
> @@ -1337,8 +1607,13 @@ static void gen_xsxsigdp(DisasContext *ctx)
>  static void gen_xsxsigqp(DisasContext *ctx)
>  {
>      TCGv_i64 t0, zr, nan, exp;
> -    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
> -    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
> +    get_cpu_vsrl(xbl, rB(ctx->opcode) + 32);
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -1349,29 +1624,41 @@ static void gen_xsxsigqp(DisasContext *ctx)
>      zr = tcg_const_i64(0);
>      nan = tcg_const_i64(32767);
>  
> -    tcg_gen_extract_i64(exp, cpu_vsrh(rB(ctx->opcode) + 32), 48, 15);
> +    tcg_gen_extract_i64(exp, xbh, 48, 15);
>      tcg_gen_movi_i64(t0, 0x0001000000000000);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
> -    tcg_gen_andi_i64(xth, cpu_vsrh(rB(ctx->opcode) + 32), 0x0000FFFFFFFFFFFF);
> +    tcg_gen_andi_i64(xth, xbh, 0x0000FFFFFFFFFFFF);
>      tcg_gen_or_i64(xth, xth, t0);
> -    tcg_gen_mov_i64(xtl, cpu_vsrl(rB(ctx->opcode) + 32));
> +    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
> +    tcg_gen_mov_i64(xtl, xbl);
> +    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
>  
>      tcg_temp_free_i64(t0);
>      tcg_temp_free_i64(exp);
>      tcg_temp_free_i64(zr);
>      tcg_temp_free_i64(nan);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  #endif
>  
>  static void gen_xviexpsp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
> -    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xah = tcg_temp_new_i64();
> +    TCGv_i64 xal = tcg_temp_new_i64();
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xah, xA(ctx->opcode));
> +    get_cpu_vsrl(xal, xA(ctx->opcode));
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
> +
>      TCGv_i64 t0;
>  
>      if (unlikely(!ctx->vsx_enabled)) {
> @@ -1383,21 +1670,36 @@ static void gen_xviexpsp(DisasContext *ctx)
>      tcg_gen_andi_i64(t0, xbh, 0xFF000000FF);
>      tcg_gen_shli_i64(t0, t0, 23);
>      tcg_gen_or_i64(xth, xth, t0);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
>      tcg_gen_andi_i64(xtl, xal, 0x807FFFFF807FFFFF);
>      tcg_gen_andi_i64(t0, xbl, 0xFF000000FF);
>      tcg_gen_shli_i64(t0, t0, 23);
>      tcg_gen_or_i64(xtl, xtl, t0);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xah);
> +    tcg_temp_free_i64(xal);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  static void gen_xviexpdp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
> -    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xah = tcg_temp_new_i64();
> +    TCGv_i64 xal = tcg_temp_new_i64();
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xah, xA(ctx->opcode));
> +    get_cpu_vsrl(xal, xA(ctx->opcode));
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
> +
>      TCGv_i64 t0;
>  
>      if (unlikely(!ctx->vsx_enabled)) {
> @@ -1409,19 +1711,31 @@ static void gen_xviexpdp(DisasContext *ctx)
>      tcg_gen_andi_i64(t0, xbh, 0x7FF);
>      tcg_gen_shli_i64(t0, t0, 52);
>      tcg_gen_or_i64(xth, xth, t0);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
>      tcg_gen_andi_i64(xtl, xal, 0x800FFFFFFFFFFFFF);
>      tcg_gen_andi_i64(t0, xbl, 0x7FF);
>      tcg_gen_shli_i64(t0, t0, 52);
>      tcg_gen_or_i64(xtl, xtl, t0);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xah);
> +    tcg_temp_free_i64(xal);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  static void gen_xvxexpsp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -1429,33 +1743,53 @@ static void gen_xvxexpsp(DisasContext *ctx)
>      }
>      tcg_gen_shri_i64(xth, xbh, 23);
>      tcg_gen_andi_i64(xth, xth, 0xFF000000FF);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
>      tcg_gen_shri_i64(xtl, xbl, 23);
>      tcg_gen_andi_i64(xtl, xtl, 0xFF000000FF);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  static void gen_xvxexpdp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
>      tcg_gen_extract_i64(xth, xbh, 52, 11);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
>      tcg_gen_extract_i64(xtl, xbl, 52, 11);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
> +
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
>  
>  static void gen_xvxsigdp(DisasContext *ctx)
>  {
> -    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> -    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> -    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
> -    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
> +    TCGv_i64 xth = tcg_temp_new_i64();
> +    TCGv_i64 xtl = tcg_temp_new_i64();
> +
> +    TCGv_i64 xbh = tcg_temp_new_i64();
> +    TCGv_i64 xbl = tcg_temp_new_i64();
> +    get_cpu_vsrh(xbh, xB(ctx->opcode));
> +    get_cpu_vsrl(xbl, xB(ctx->opcode));
>  
>      TCGv_i64 t0, zr, nan, exp;
>  
> @@ -1474,6 +1808,7 @@ static void gen_xvxsigdp(DisasContext *ctx)
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
>      tcg_gen_andi_i64(xth, xbh, 0x000FFFFFFFFFFFFF);
>      tcg_gen_or_i64(xth, xth, t0);
> +    set_cpu_vsrh(xT(ctx->opcode), xth);
>  
>      tcg_gen_extract_i64(exp, xbl, 52, 11);
>      tcg_gen_movi_i64(t0, 0x0010000000000000);
> @@ -1481,11 +1816,16 @@ static void gen_xvxsigdp(DisasContext *ctx)
>      tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
>      tcg_gen_andi_i64(xtl, xbl, 0x000FFFFFFFFFFFFF);
>      tcg_gen_or_i64(xtl, xtl, t0);
> +    set_cpu_vsrl(xT(ctx->opcode), xtl);
>  
>      tcg_temp_free_i64(t0);
>      tcg_temp_free_i64(exp);
>      tcg_temp_free_i64(zr);
>      tcg_temp_free_i64(nan);
> +    tcg_temp_free_i64(xth);
> +    tcg_temp_free_i64(xtl);
> +    tcg_temp_free_i64(xbh);
> +    tcg_temp_free_i64(xbl);
>  }
>  
>  #undef GEN_XX2FORM

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Richard Henderson
@ 2018-12-19  6:20   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5632 bytes --]

On Mon, Dec 17, 2018 at 10:38:51PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> Instead of accessing the FPR, VMX and VSX registers through static arrays of
> TCGv_i64 globals, remove them and change the helpers to load/store data directly
> within cpu_env.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-6-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate.c              | 59 ++++++++---------------------
>  target/ppc/translate/vsx-impl.inc.c |  4 +-
>  2 files changed, 18 insertions(+), 45 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index fa3e8dc114..5923c688cd 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -55,15 +55,9 @@
>  /* global register indexes */
>  static char cpu_reg_names[10*3 + 22*4 /* GPR */
>      + 10*4 + 22*5 /* SPE GPRh */
> -    + 10*4 + 22*5 /* FPR */
> -    + 2*(10*6 + 22*7) /* AVRh, AVRl */
> -    + 10*5 + 22*6 /* VSR */
>      + 8*5 /* CRF */];
>  static TCGv cpu_gpr[32];
>  static TCGv cpu_gprh[32];
> -static TCGv_i64 cpu_fpr[32];
> -static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
> -static TCGv_i64 cpu_vsr[32];
>  static TCGv_i32 cpu_crf[8];
>  static TCGv cpu_nip;
>  static TCGv cpu_msr;
> @@ -108,39 +102,6 @@ void ppc_translate_init(void)
>                                           offsetof(CPUPPCState, gprh[i]), p);
>          p += (i < 10) ? 4 : 5;
>          cpu_reg_names_size -= (i < 10) ? 4 : 5;
> -
> -        snprintf(p, cpu_reg_names_size, "fp%d", i);
> -        cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env,
> -                                            offsetof(CPUPPCState, fpr[i]), p);
> -        p += (i < 10) ? 4 : 5;
> -        cpu_reg_names_size -= (i < 10) ? 4 : 5;
> -
> -        snprintf(p, cpu_reg_names_size, "avr%dH", i);
> -#ifdef HOST_WORDS_BIGENDIAN
> -        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
> -                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
> -#else
> -        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
> -                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
> -#endif
> -        p += (i < 10) ? 6 : 7;
> -        cpu_reg_names_size -= (i < 10) ? 6 : 7;
> -
> -        snprintf(p, cpu_reg_names_size, "avr%dL", i);
> -#ifdef HOST_WORDS_BIGENDIAN
> -        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
> -                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
> -#else
> -        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
> -                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
> -#endif
> -        p += (i < 10) ? 6 : 7;
> -        cpu_reg_names_size -= (i < 10) ? 6 : 7;
> -        snprintf(p, cpu_reg_names_size, "vsr%d", i);
> -        cpu_vsr[i] = tcg_global_mem_new_i64(cpu_env,
> -                                            offsetof(CPUPPCState, vsr[i]), p);
> -        p += (i < 10) ? 5 : 6;
> -        cpu_reg_names_size -= (i < 10) ? 5 : 6;
>      }
>  
>      cpu_nip = tcg_global_mem_new(cpu_env,
> @@ -6696,22 +6657,34 @@ GEN_TM_PRIV_NOOP(trechkpt);
>  
>  static inline void get_fpr(TCGv_i64 dst, int regno)
>  {
> -    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
> +    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
>  }
>  
>  static inline void set_fpr(int regno, TCGv_i64 src)
>  {
> -    tcg_gen_mov_i64(cpu_fpr[regno], src);
> +    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
>  }
>  
>  static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
>  {
> -    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
> +#ifdef HOST_WORDS_BIGENDIAN
> +    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
> +                                          avr[regno].u64[(high ? 0 : 1)]));
> +#else
> +    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
> +                                          avr[regno].u64[(high ? 1 : 0)]));
> +#endif
>  }
>  
>  static inline void set_avr64(int regno, TCGv_i64 src, bool high)
>  {
> -    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
> +#ifdef HOST_WORDS_BIGENDIAN
> +    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
> +                                          avr[regno].u64[(high ? 0 : 1)]));
> +#else
> +    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
> +                                          avr[regno].u64[(high ? 1 : 0)]));
> +#endif
>  }
>  
>  #include "translate/fp-impl.inc.c"
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index e9a05d66f7..20e1fd9324 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -2,12 +2,12 @@
>  
>  static inline void get_vsr(TCGv_i64 dst, int n)
>  {
> -    tcg_gen_mov_i64(dst, cpu_vsr[n]);
> +    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
>  }
>  
>  static inline void set_vsr(int n, TCGv_i64 src)
>  {
> -    tcg_gen_mov_i64(cpu_vsr[n], src);
> +    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
>  }
>  
>  static inline void get_cpu_vsrh(TCGv_i64 dst, int n)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Richard Henderson
@ 2018-12-19  6:21   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 11075 bytes --]

On Mon, Dec 17, 2018 at 10:38:52PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> Since the VSX registers are actually a superset of the VMX registers then they
> can be represented by the same type. Merge ppc_avr_t into ppc_vsr_t and change
> ppc_avr_t to be a simple typedef alias.
> 
> Note that due to a difference in the naming of the float32 member between
> ppc_avr_t and ppc_vsr_t, references to the ppc_avr_t f member must be replaced
> with f32 instead.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-7-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/cpu.h        | 17 +++++++------
>  target/ppc/internal.h   | 11 --------
>  target/ppc/int_helper.c | 56 +++++++++++++++++++++--------------------
>  3 files changed, 39 insertions(+), 45 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index ab68abe8a2..5445d4c3c1 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -230,7 +230,6 @@ typedef struct opc_handler_t opc_handler_t;
>  /* Types used to describe some PowerPC registers etc. */
>  typedef struct DisasContext DisasContext;
>  typedef struct ppc_spr_t ppc_spr_t;
> -typedef union ppc_avr_t ppc_avr_t;
>  typedef union ppc_tlb_t ppc_tlb_t;
>  typedef struct ppc_hash_pte64 ppc_hash_pte64_t;
>  
> @@ -254,22 +253,26 @@ struct ppc_spr_t {
>  #endif
>  };
>  
> -/* Altivec registers (128 bits) */
> -union ppc_avr_t {
> -    float32 f[4];
> +/* VSX/Altivec registers (128 bits) */
> +typedef union _ppc_vsr_t {
>      uint8_t u8[16];
>      uint16_t u16[8];
>      uint32_t u32[4];
> +    uint64_t u64[2];
>      int8_t s8[16];
>      int16_t s16[8];
>      int32_t s32[4];
> -    uint64_t u64[2];
>      int64_t s64[2];
> +    float32 f32[4];
> +    float64 f64[2];
> +    float128 f128;
>  #ifdef CONFIG_INT128
>      __uint128_t u128;
>  #endif
> -    Int128 s128;
> -};
> +    Int128  s128;
> +} ppc_vsr_t;
> +
> +typedef ppc_vsr_t ppc_avr_t;
>  
>  #if !defined(CONFIG_USER_ONLY)
>  /* Software TLB cache */
> diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> index a9bcadff42..b4b1f7b3db 100644
> --- a/target/ppc/internal.h
> +++ b/target/ppc/internal.h
> @@ -204,17 +204,6 @@ EXTRACT_HELPER(IMM8, 11, 8);
>  EXTRACT_HELPER(DCMX, 16, 7);
>  EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
>  
> -typedef union _ppc_vsr_t {
> -    uint8_t u8[16];
> -    uint16_t u16[8];
> -    uint32_t u32[4];
> -    uint64_t u64[2];
> -    float32 f32[4];
> -    float64 f64[2];
> -    float128 f128;
> -    Int128  s128;
> -} ppc_vsr_t;
> -
>  #if defined(HOST_WORDS_BIGENDIAN)
>  #define VsrB(i) u8[i]
>  #define VsrH(i) u16[i]
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index fcac90a4a9..9d715be25c 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -548,8 +548,8 @@ VARITH_DO(muluwm, *, u32)
>      {                                                                   \
>          int i;                                                          \
>                                                                          \
> -        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
> -            r->f[i] = func(a->f[i], b->f[i], &env->vec_status);         \
> +        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
> +            r->f32[i] = func(a->f32[i], b->f32[i], &env->vec_status);   \
>          }                                                               \
>      }
>  VARITHFP(addfp, float32_add)
> @@ -563,9 +563,9 @@ VARITHFP(maxfp, float32_max)
>                             ppc_avr_t *b, ppc_avr_t *c)                  \
>      {                                                                   \
>          int i;                                                          \
> -        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
> -            r->f[i] = float32_muladd(a->f[i], c->f[i], b->f[i],         \
> -                                     type, &env->vec_status);           \
> +        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
> +            r->f32[i] = float32_muladd(a->f32[i], c->f32[i], b->f32[i], \
> +                                       type, &env->vec_status);         \
>          }                                                               \
>      }
>  VARITHFPFMA(maddfp, 0);
> @@ -670,9 +670,9 @@ VABSDU(w, u32)
>      {                                                                   \
>          int i;                                                          \
>                                                                          \
> -        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
> +        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
>              float32 t = cvt(b->element[i], &env->vec_status);           \
> -            r->f[i] = float32_scalbn(t, -uim, &env->vec_status);        \
> +            r->f32[i] = float32_scalbn(t, -uim, &env->vec_status);      \
>          }                                                               \
>      }
>  VCF(ux, uint32_to_float32, u32)
> @@ -782,9 +782,9 @@ VCMPNE(w, u32, uint32_t, 0)
>          uint32_t none = 0;                                              \
>          int i;                                                          \
>                                                                          \
> -        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
> +        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
>              uint32_t result;                                            \
> -            int rel = float32_compare_quiet(a->f[i], b->f[i],           \
> +            int rel = float32_compare_quiet(a->f32[i], b->f32[i],       \
>                                              &env->vec_status);          \
>              if (rel == float_relation_unordered) {                      \
>                  result = 0;                                             \
> @@ -816,14 +816,16 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
>      int i;
>      int all_in = 0;
>  
> -    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
> -        int le_rel = float32_compare_quiet(a->f[i], b->f[i], &env->vec_status);
> +    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
> +        int le_rel = float32_compare_quiet(a->f32[i], b->f32[i],
> +                                           &env->vec_status);
>          if (le_rel == float_relation_unordered) {
>              r->u32[i] = 0xc0000000;
>              all_in = 1;
>          } else {
> -            float32 bneg = float32_chs(b->f[i]);
> -            int ge_rel = float32_compare_quiet(a->f[i], bneg, &env->vec_status);
> +            float32 bneg = float32_chs(b->f32[i]);
> +            int ge_rel = float32_compare_quiet(a->f32[i], bneg,
> +                                               &env->vec_status);
>              int le = le_rel != float_relation_greater;
>              int ge = ge_rel != float_relation_less;
>  
> @@ -856,11 +858,11 @@ void helper_vcmpbfp_dot(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>          float_status s = env->vec_status;                               \
>                                                                          \
>          set_float_rounding_mode(float_round_to_zero, &s);               \
> -        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
> -            if (float32_is_any_nan(b->f[i])) {                          \
> +        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
> +            if (float32_is_any_nan(b->f32[i])) {                        \
>                  r->element[i] = 0;                                      \
>              } else {                                                    \
> -                float64 t = float32_to_float64(b->f[i], &s);            \
> +                float64 t = float32_to_float64(b->f32[i], &s);          \
>                  int64_t j;                                              \
>                                                                          \
>                  t = float64_scalbn(t, uim, &s);                         \
> @@ -1661,8 +1663,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
>  {
>      int i;
>  
> -    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
> -        r->f[i] = float32_div(float32_one, b->f[i], &env->vec_status);
> +    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
> +        r->f32[i] = float32_div(float32_one, b->f32[i], &env->vec_status);
>      }
>  }
>  
> @@ -1674,8 +1676,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
>          float_status s = env->vec_status;                       \
>                                                                  \
>          set_float_rounding_mode(rounding, &s);                  \
> -        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                \
> -            r->f[i] = float32_round_to_int (b->f[i], &s);       \
> +        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {              \
> +            r->f32[i] = float32_round_to_int (b->f32[i], &s);   \
>          }                                                       \
>      }
>  VRFI(n, float_round_nearest_even)
> @@ -1705,10 +1707,10 @@ void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
>  {
>      int i;
>  
> -    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
> -        float32 t = float32_sqrt(b->f[i], &env->vec_status);
> +    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
> +        float32 t = float32_sqrt(b->f32[i], &env->vec_status);
>  
> -        r->f[i] = float32_div(float32_one, t, &env->vec_status);
> +        r->f32[i] = float32_div(float32_one, t, &env->vec_status);
>      }
>  }
>  
> @@ -1751,8 +1753,8 @@ void helper_vexptefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
>  {
>      int i;
>  
> -    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
> -        r->f[i] = float32_exp2(b->f[i], &env->vec_status);
> +    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
> +        r->f32[i] = float32_exp2(b->f32[i], &env->vec_status);
>      }
>  }
>  
> @@ -1760,8 +1762,8 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
>  {
>      int i;
>  
> -    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
> -        r->f[i] = float32_log2(b->f[i], &env->vec_status);
> +    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
> +        r->f32[i] = float32_log2(b->f32[i], &env->vec_status);
>      }
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 16/34] target/ppc: move FP and VMX registers into aligned vsr register array
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 16/34] target/ppc: move FP and VMX registers into aligned vsr register array Richard Henderson
@ 2018-12-19  6:27   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 23686 bytes --]

On Mon, Dec 17, 2018 at 10:38:53PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> The VSX register array is a block of 64 128-bit registers where the first 32
> registers consist of the existing 64-bit FP registers extended to 128-bit
> using new VSR registers, and the last 32 registers are the VMX 128-bit
> registers as show below:
> 
>             64-bit               64-bit
>     +--------------------+--------------------+
>     |        FP0         |                    |  VSR0
>     +--------------------+--------------------+
>     |        FP1         |                    |  VSR1
>     +--------------------+--------------------+
>     |        ...         |        ...         |  ...
>     +--------------------+--------------------+
>     |        FP30        |                    |  VSR30
>     +--------------------+--------------------+
>     |        FP31        |                    |  VSR31
>     +--------------------+--------------------+
>     |                  VMX0                   |  VSR32
>     +-----------------------------------------+
>     |                  VMX1                   |  VSR33
>     +-----------------------------------------+
>     |                  ...                    |  ...
>     +-----------------------------------------+
>     |                  VMX30                  |  VSR62
>     +-----------------------------------------+
>     |                  VMX31                  |  VSR63
>     +-----------------------------------------+
> 
> In order to allow for future conversion of VSX instructions to use TCG vector
> operations, recreate the same layout using an aligned version of the existing
> vsr register array.
> 
> Since the old fpr and avr register arrays are removed, the existing callers
> must also be updated to use the correct offset in the vsr register array. This
> also includes switching the relevant VMState fields over to using subarrays
> to make sure that migration is preserved.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-8-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/cpu.h                    |  9 ++--
>  target/ppc/internal.h               | 18 ++------
>  linux-user/ppc/signal.c             | 24 +++++-----
>  target/ppc/arch_dump.c              | 12 ++---
>  target/ppc/gdbstub.c                |  8 ++--
>  target/ppc/machine.c                | 72 +++++++++++++++++++++++++++--
>  target/ppc/monitor.c                |  4 +-
>  target/ppc/translate.c              | 14 +++---
>  target/ppc/translate/dfp-impl.inc.c |  2 +-
>  target/ppc/translate/vmx-impl.inc.c |  7 ++-
>  target/ppc/translate/vsx-impl.inc.c |  4 +-
>  target/ppc/translate_init.inc.c     | 24 +++++-----
>  12 files changed, 126 insertions(+), 72 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 5445d4c3c1..c8f449081d 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1016,8 +1016,6 @@ struct CPUPPCState {
>  
>      /* Floating point execution context */
>      float_status fp_status;
> -    /* floating point registers */
> -    float64 fpr[32];
>      /* floating point status and control register */
>      target_ulong fpscr;
>  
> @@ -1067,11 +1065,10 @@ struct CPUPPCState {
>      /* Special purpose registers */
>      target_ulong spr[1024];
>      ppc_spr_t spr_cb[1024];
> -    /* Altivec registers */
> -    ppc_avr_t avr[32];
> +    /* Vector status and control register */
>      uint32_t vscr;
> -    /* VSX registers */
> -    uint64_t vsr[32];
> +    /* VSX registers (including FP and AVR) */
> +    ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
>      /* SPE registers */
>      uint64_t spe_acc;
>      uint32_t spe_fscr;
> diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> index b4b1f7b3db..b77d564a65 100644
> --- a/target/ppc/internal.h
> +++ b/target/ppc/internal.h
> @@ -218,24 +218,14 @@ EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
>  
>  static inline void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
>  {
> -    if (n < 32) {
> -        vsr->VsrD(0) = env->fpr[n];
> -        vsr->VsrD(1) = env->vsr[n];
> -    } else {
> -        vsr->u64[0] = env->avr[n - 32].u64[0];
> -        vsr->u64[1] = env->avr[n - 32].u64[1];
> -    }
> +    vsr->VsrD(0) = env->vsr[n].u64[0];
> +    vsr->VsrD(1) = env->vsr[n].u64[1];
>  }
>  
>  static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
>  {
> -    if (n < 32) {
> -        env->fpr[n] = vsr->VsrD(0);
> -        env->vsr[n] = vsr->VsrD(1);
> -    } else {
> -        env->avr[n - 32].u64[0] = vsr->u64[0];
> -        env->avr[n - 32].u64[1] = vsr->u64[1];
> -    }
> +    env->vsr[n].u64[0] = vsr->VsrD(0);
> +    env->vsr[n].u64[1] = vsr->VsrD(1);
>  }
>  
>  void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
> diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
> index 2ae120a2bc..a053dd5b84 100644
> --- a/linux-user/ppc/signal.c
> +++ b/linux-user/ppc/signal.c
> @@ -258,8 +258,8 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
>      /* Save Altivec registers if necessary.  */
>      if (env->insns_flags & PPC_ALTIVEC) {
>          uint32_t *vrsave;
> -        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
> -            ppc_avr_t *avr = &env->avr[i];
> +        for (i = 0; i < 32; i++) {
> +            ppc_avr_t *avr = &env->vsr[32 + i];
>              ppc_avr_t *vreg = (ppc_avr_t *)&frame->mc_vregs.altivec[i];
>  
>              __put_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
> @@ -281,15 +281,15 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
>      /* Save VSX second halves */
>      if (env->insns_flags2 & PPC2_VSX) {
>          uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
> -        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
> -            __put_user(env->vsr[i], &vsregs[i]);
> +        for (i = 0; i < 32; i++) {
> +            __put_user(env->vsr[i].u64[1], &vsregs[i]);
>          }
>      }
>  
>      /* Save floating point registers.  */
>      if (env->insns_flags & PPC_FLOAT) {
> -        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
> -            __put_user(env->fpr[i], &frame->mc_fregs[i]);
> +        for (i = 0; i < 32; i++) {
> +            __put_user(env->vsr[i].u64[0], &frame->mc_fregs[i]);
>          }
>          __put_user((uint64_t) env->fpscr, &frame->mc_fregs[32]);
>      }
> @@ -373,8 +373,8 @@ static void restore_user_regs(CPUPPCState *env,
>  #else
>          v_regs = (ppc_avr_t *)frame->mc_vregs.altivec;
>  #endif
> -        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
> -            ppc_avr_t *avr = &env->avr[i];
> +        for (i = 0; i < 32; i++) {
> +            ppc_avr_t *avr = &env->vsr[32 + i];
>              ppc_avr_t *vreg = &v_regs[i];
>  
>              __get_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
> @@ -393,16 +393,16 @@ static void restore_user_regs(CPUPPCState *env,
>      /* Restore VSX second halves */
>      if (env->insns_flags2 & PPC2_VSX) {
>          uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
> -        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
> -            __get_user(env->vsr[i], &vsregs[i]);
> +        for (i = 0; i < 32; i++) {
> +            __get_user(env->vsr[i].u64[1], &vsregs[i]);
>          }
>      }
>  
>      /* Restore floating point registers.  */
>      if (env->insns_flags & PPC_FLOAT) {
>          uint64_t fpscr;
> -        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
> -            __get_user(env->fpr[i], &frame->mc_fregs[i]);
> +        for (i = 0; i < 32; i++) {
> +            __get_user(env->vsr[i].u64[0], &frame->mc_fregs[i]);
>          }
>          __get_user(fpscr, &frame->mc_fregs[32]);
>          env->fpscr = (uint32_t) fpscr;
> diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
> index cc1460e4e3..c272d0d3d4 100644
> --- a/target/ppc/arch_dump.c
> +++ b/target/ppc/arch_dump.c
> @@ -140,7 +140,7 @@ static void ppc_write_elf_fpregset(NoteFuncArg *arg, PowerPCCPU *cpu)
>      memset(fpregset, 0, sizeof(*fpregset));
>  
>      for (i = 0; i < 32; i++) {
> -        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.fpr[i]);
> +        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.vsr[i].u64[0]);
>      }
>      fpregset->fpscr = cpu_to_dump_reg(s, cpu->env.fpscr);
>  }
> @@ -166,11 +166,11 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
>  #endif
>  
>          if (needs_byteswap) {
> -            vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
> -            vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
> +            vmxregset->avr[i].u64[0] = bswap64(cpu->env.vsr[32 + i].u64[1]);
> +            vmxregset->avr[i].u64[1] = bswap64(cpu->env.vsr[32 + i].u64[0]);
>          } else {
> -            vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
> -            vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
> +            vmxregset->avr[i].u64[0] = cpu->env.vsr[32 + i].u64[0];
> +            vmxregset->avr[i].u64[1] = cpu->env.vsr[32 + i].u64[1];
>          }
>      }
>      vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
> @@ -188,7 +188,7 @@ static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
>      memset(vsxregset, 0, sizeof(*vsxregset));
>  
>      for (i = 0; i < 32; i++) {
> -        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i]);
> +        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i].u64[1]);
>      }
>  }
>  
> diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
> index b6f6693583..8c9dc284c4 100644
> --- a/target/ppc/gdbstub.c
> +++ b/target/ppc/gdbstub.c
> @@ -126,7 +126,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
>          gdb_get_regl(mem_buf, env->gpr[n]);
>      } else if (n < 64) {
>          /* fprs */
> -        stfq_p(mem_buf, env->fpr[n-32]);
> +        stfq_p(mem_buf, env->vsr[n - 32].u64[0]);
>      } else {
>          switch (n) {
>          case 64:
> @@ -178,7 +178,7 @@ int ppc_cpu_gdb_read_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
>          gdb_get_reg64(mem_buf, env->gpr[n]);
>      } else if (n < 64) {
>          /* fprs */
> -        stfq_p(mem_buf, env->fpr[n-32]);
> +        stfq_p(mem_buf, env->vsr[n - 32].u64[0]);
>      } else if (n < 96) {
>          /* Altivec */
>          stq_p(mem_buf, n - 64);
> @@ -234,7 +234,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
>          env->gpr[n] = ldtul_p(mem_buf);
>      } else if (n < 64) {
>          /* fprs */
> -        env->fpr[n-32] = ldfq_p(mem_buf);
> +        env->vsr[n - 32].u64[0] = ldfq_p(mem_buf);
>      } else {
>          switch (n) {
>          case 64:
> @@ -284,7 +284,7 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
>          env->gpr[n] = ldq_p(mem_buf);
>      } else if (n < 64) {
>          /* fprs */
> -        env->fpr[n-32] = ldfq_p(mem_buf);
> +        env->vsr[n - 32].u64[0] = ldfq_p(mem_buf);
>      } else {
>          switch (n) {
>          case 64 + 32:
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index e7b3725273..451cf376b4 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -45,7 +45,7 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>              uint64_t l;
>          } u;
>          u.l = qemu_get_be64(f);
> -        env->fpr[i] = u.d;
> +        env->vsr[i].u64[0] = u.d;
>      }
>      qemu_get_be32s(f, &fpscr);
>      env->fpscr = fpscr;
> @@ -138,11 +138,73 @@ static const VMStateInfo vmstate_info_avr = {
>  };
>  
>  #define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
> -    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
> +    VMSTATE_SUB_ARRAY(_f, _s, 32, _n, _v, vmstate_info_avr, ppc_avr_t)
>  
>  #define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
>      VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
>  
> +static int get_fpr(QEMUFile *f, void *pv, size_t size,
> +                   const VMStateField *field)
> +{
> +    ppc_vsr_t *v = pv;
> +
> +    v->u64[0] = qemu_get_be64(f);
> +
> +    return 0;
> +}
> +
> +static int put_fpr(QEMUFile *f, void *pv, size_t size,
> +                   const VMStateField *field, QJSON *vmdesc)
> +{
> +    ppc_vsr_t *v = pv;
> +
> +    qemu_put_be64(f, v->u64[0]);
> +    return 0;
> +}
> +
> +static const VMStateInfo vmstate_info_fpr = {
> +    .name = "fpr",
> +    .get  = get_fpr,
> +    .put  = put_fpr,
> +};
> +
> +#define VMSTATE_FPR_ARRAY_V(_f, _s, _n, _v)                       \
> +    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_fpr, ppc_vsr_t)
> +
> +#define VMSTATE_FPR_ARRAY(_f, _s, _n)                             \
> +    VMSTATE_FPR_ARRAY_V(_f, _s, _n, 0)
> +
> +static int get_vsr(QEMUFile *f, void *pv, size_t size,
> +                   const VMStateField *field)
> +{
> +    ppc_vsr_t *v = pv;
> +
> +    v->u64[1] = qemu_get_be64(f);
> +
> +    return 0;
> +}
> +
> +static int put_vsr(QEMUFile *f, void *pv, size_t size,
> +                   const VMStateField *field, QJSON *vmdesc)
> +{
> +    ppc_vsr_t *v = pv;
> +
> +    qemu_put_be64(f, v->u64[1]);
> +    return 0;
> +}
> +
> +static const VMStateInfo vmstate_info_vsr = {
> +    .name = "vsr",
> +    .get  = get_vsr,
> +    .put  = put_vsr,
> +};
> +
> +#define VMSTATE_VSR_ARRAY_V(_f, _s, _n, _v)                       \
> +    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_vsr, ppc_vsr_t)
> +
> +#define VMSTATE_VSR_ARRAY(_f, _s, _n)                             \
> +    VMSTATE_VSR_ARRAY_V(_f, _s, _n, 0)
> +
>  static bool cpu_pre_2_8_migration(void *opaque, int version_id)
>  {
>      PowerPCCPU *cpu = opaque;
> @@ -354,7 +416,7 @@ static const VMStateDescription vmstate_fpu = {
>      .minimum_version_id = 1,
>      .needed = fpu_needed,
>      .fields = (VMStateField[]) {
> -        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
> +        VMSTATE_FPR_ARRAY(env.vsr, PowerPCCPU, 32),
>          VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
>          VMSTATE_END_OF_LIST()
>      },
> @@ -373,7 +435,7 @@ static const VMStateDescription vmstate_altivec = {
>      .minimum_version_id = 1,
>      .needed = altivec_needed,
>      .fields = (VMStateField[]) {
> -        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
> +        VMSTATE_AVR_ARRAY(env.vsr, PowerPCCPU, 32),
>          VMSTATE_UINT32(env.vscr, PowerPCCPU),
>          VMSTATE_END_OF_LIST()
>      },
> @@ -392,7 +454,7 @@ static const VMStateDescription vmstate_vsx = {
>      .minimum_version_id = 1,
>      .needed = vsx_needed,
>      .fields = (VMStateField[]) {
> -        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
> +        VMSTATE_VSR_ARRAY(env.vsr, PowerPCCPU, 32),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> diff --git a/target/ppc/monitor.c b/target/ppc/monitor.c
> index 14915119fc..1db9396b2e 100644
> --- a/target/ppc/monitor.c
> +++ b/target/ppc/monitor.c
> @@ -123,8 +123,8 @@ int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval)
>  
>      /* Floating point registers */
>      if ((qemu_tolower(name[0]) == 'f') &&
> -        ppc_cpu_get_reg_num(name + 1, ARRAY_SIZE(env->fpr), &regnum)) {
> -        *pval = env->fpr[regnum];
> +        ppc_cpu_get_reg_num(name + 1, 32, &regnum)) {
> +        *pval = env->vsr[regnum].u64[0];
>          return 0;
>      }
>  
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 5923c688cd..8e89aec14d 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6657,22 +6657,22 @@ GEN_TM_PRIV_NOOP(trechkpt);
>  
>  static inline void get_fpr(TCGv_i64 dst, int regno)
>  {
> -    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
> +    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
>  }
>  
>  static inline void set_fpr(int regno, TCGv_i64 src)
>  {
> -    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
> +    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
>  }
>  
>  static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
>  {
>  #ifdef HOST_WORDS_BIGENDIAN
>      tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
> -                                          avr[regno].u64[(high ? 0 : 1)]));
> +                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
>  #else
>      tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
> -                                          avr[regno].u64[(high ? 1 : 0)]));
> +                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
>  #endif
>  }
>  
> @@ -6680,10 +6680,10 @@ static inline void set_avr64(int regno, TCGv_i64 src, bool high)
>  {
>  #ifdef HOST_WORDS_BIGENDIAN
>      tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
> -                                          avr[regno].u64[(high ? 0 : 1)]));
> +                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
>  #else
>      tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
> -                                          avr[regno].u64[(high ? 1 : 0)]));
> +                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
>  #endif
>  }
>  
> @@ -7434,7 +7434,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>              if ((i & (RFPL - 1)) == 0) {
>                  cpu_fprintf(f, "FPR%02d", i);
>              }
> -            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->fpr[i]));
> +            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->vsr[i].u64[0]));
>              if ((i & (RFPL - 1)) == (RFPL - 1)) {
>                  cpu_fprintf(f, "\n");
>              }
> diff --git a/target/ppc/translate/dfp-impl.inc.c b/target/ppc/translate/dfp-impl.inc.c
> index 634ef73b8a..6c556dc2e1 100644
> --- a/target/ppc/translate/dfp-impl.inc.c
> +++ b/target/ppc/translate/dfp-impl.inc.c
> @@ -3,7 +3,7 @@
>  static inline TCGv_ptr gen_fprp_ptr(int reg)
>  {
>      TCGv_ptr r = tcg_temp_new_ptr();
> -    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, fpr[reg]));
> +    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[reg].u64[0]));
>      return r;
>  }
>  
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index 30046c6e31..75d2b2280f 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -10,10 +10,15 @@
>  static inline TCGv_ptr gen_avr_ptr(int reg)
>  {
>      TCGv_ptr r = tcg_temp_new_ptr();
> -    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, avr[reg]));
> +    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[32 + reg].u64[0]));
>      return r;
>  }
>  
> +static inline long avr64_offset(int reg, bool high)
> +{
> +    return offsetof(CPUPPCState, vsr[32 + reg].u64[(high ? 0 : 1)]);
> +}
> +
>  #define GEN_VR_LDX(name, opc2, opc3)                                          \
>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>  {                                                                             \
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index 20e1fd9324..1608ad48b1 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -2,12 +2,12 @@
>  
>  static inline void get_vsr(TCGv_i64 dst, int n)
>  {
> -    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
> +    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
>  }
>  
>  static inline void set_vsr(int n, TCGv_i64 src)
>  {
> -    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
> +    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
>  }
>  
>  static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 168d0cec28..b83097141c 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -9486,7 +9486,7 @@ static bool avr_need_swap(CPUPPCState *env)
>  static int gdb_get_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>  {
>      if (n < 32) {
> -        stfq_p(mem_buf, env->fpr[n]);
> +        stfq_p(mem_buf, env->vsr[n].u64[0]);
>          ppc_maybe_bswap_register(env, mem_buf, 8);
>          return 8;
>      }
> @@ -9502,7 +9502,7 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>  {
>      if (n < 32) {
>          ppc_maybe_bswap_register(env, mem_buf, 8);
> -        env->fpr[n] = ldfq_p(mem_buf);
> +        env->vsr[n].u64[0] = ldfq_p(mem_buf);
>          return 8;
>      }
>      if (n == 32) {
> @@ -9517,11 +9517,11 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>  {
>      if (n < 32) {
>          if (!avr_need_swap(env)) {
> -            stq_p(mem_buf, env->avr[n].u64[0]);
> -            stq_p(mem_buf+8, env->avr[n].u64[1]);
> +            stq_p(mem_buf, env->vsr[32 + n].u64[0]);
> +            stq_p(mem_buf + 8, env->vsr[32 + n].u64[1]);
>          } else {
> -            stq_p(mem_buf, env->avr[n].u64[1]);
> -            stq_p(mem_buf+8, env->avr[n].u64[0]);
> +            stq_p(mem_buf, env->vsr[32 + n].u64[1]);
> +            stq_p(mem_buf + 8, env->vsr[32 + n].u64[0]);
>          }
>          ppc_maybe_bswap_register(env, mem_buf, 8);
>          ppc_maybe_bswap_register(env, mem_buf + 8, 8);
> @@ -9546,11 +9546,11 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>          ppc_maybe_bswap_register(env, mem_buf, 8);
>          ppc_maybe_bswap_register(env, mem_buf + 8, 8);
>          if (!avr_need_swap(env)) {
> -            env->avr[n].u64[0] = ldq_p(mem_buf);
> -            env->avr[n].u64[1] = ldq_p(mem_buf+8);
> +            env->vsr[32 + n].u64[0] = ldq_p(mem_buf);
> +            env->vsr[32 + n].u64[1] = ldq_p(mem_buf + 8);
>          } else {
> -            env->avr[n].u64[1] = ldq_p(mem_buf);
> -            env->avr[n].u64[0] = ldq_p(mem_buf+8);
> +            env->vsr[32 + n].u64[1] = ldq_p(mem_buf);
> +            env->vsr[32 + n].u64[0] = ldq_p(mem_buf + 8);
>          }
>          return 16;
>      }
> @@ -9623,7 +9623,7 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>  static int gdb_get_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>  {
>      if (n < 32) {
> -        stq_p(mem_buf, env->vsr[n]);
> +        stq_p(mem_buf, env->vsr[n].u64[1]);
>          ppc_maybe_bswap_register(env, mem_buf, 8);
>          return 8;
>      }
> @@ -9634,7 +9634,7 @@ static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>  {
>      if (n < 32) {
>          ppc_maybe_bswap_register(env, mem_buf, 8);
> -        env->vsr[n] = ldq_p(mem_buf);
> +        env->vsr[n].u64[1] = ldq_p(mem_buf);
>          return 8;
>      }
>      return 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 17/34] target/ppc: convert VMX logical instructions to use vector operations
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 17/34] target/ppc: convert VMX logical instructions to use vector operations Richard Henderson
@ 2018-12-19  6:29   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6430 bytes --]

On Mon, Dec 17, 2018 at 10:38:54PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-9-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate.c              |  1 +
>  target/ppc/translate/vmx-impl.inc.c | 63 ++++++++++++++++-------------
>  2 files changed, 37 insertions(+), 27 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 8e89aec14d..1b61bfa093 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -24,6 +24,7 @@
>  #include "disas/disas.h"
>  #include "exec/exec-all.h"
>  #include "tcg-op.h"
> +#include "tcg-op-gvec.h"
>  #include "qemu/host-utils.h"
>  #include "exec/cpu_ldst.h"
>  
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index 75d2b2280f..c13828a09d 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -262,41 +262,50 @@ GEN_VX_VMUL10(vmul10euq, 1, 0);
>  GEN_VX_VMUL10(vmul10cuq, 0, 1);
>  GEN_VX_VMUL10(vmul10ecuq, 1, 1);
>  
> -/* Logical operations */
> -#define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
> -static void glue(gen_, name)(DisasContext *ctx)                                 \
> +#define GEN_VXFORM_V(name, vece, tcg_op, opc2, opc3)                    \
> +static void glue(gen_, name)(DisasContext *ctx)                         \
>  {                                                                       \
> -    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
> -    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
> -    TCGv_i64 avr = tcg_temp_new_i64();                                  \
> -                                                                        \
>      if (unlikely(!ctx->altivec_enabled)) {                              \
>          gen_exception(ctx, POWERPC_EXCP_VPU);                           \
>          return;                                                         \
>      }                                                                   \
> -    get_avr64(t0, rA(ctx->opcode), true);                               \
> -    get_avr64(t1, rB(ctx->opcode), true);                               \
> -    tcg_op(avr, t0, t1);                                                \
> -    set_avr64(rD(ctx->opcode), avr, true);                              \
>                                                                          \
> -    get_avr64(t0, rA(ctx->opcode), false);                              \
> -    get_avr64(t1, rB(ctx->opcode), false);                              \
> -    tcg_op(avr, t0, t1);                                                \
> -    set_avr64(rD(ctx->opcode), avr, false);                             \
> -                                                                        \
> -    tcg_temp_free_i64(t0);                                              \
> -    tcg_temp_free_i64(t1);                                              \
> -    tcg_temp_free_i64(avr);                                             \
> +    tcg_op(vece,                                                        \
> +           avr64_offset(rD(ctx->opcode), true),                         \
> +           avr64_offset(rA(ctx->opcode), true),                         \
> +           avr64_offset(rB(ctx->opcode), true),                         \
> +           16, 16);                                                     \
>  }
>  
> -GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
> -GEN_VX_LOGICAL(vandc, tcg_gen_andc_i64, 2, 17);
> -GEN_VX_LOGICAL(vor, tcg_gen_or_i64, 2, 18);
> -GEN_VX_LOGICAL(vxor, tcg_gen_xor_i64, 2, 19);
> -GEN_VX_LOGICAL(vnor, tcg_gen_nor_i64, 2, 20);
> -GEN_VX_LOGICAL(veqv, tcg_gen_eqv_i64, 2, 26);
> -GEN_VX_LOGICAL(vnand, tcg_gen_nand_i64, 2, 22);
> -GEN_VX_LOGICAL(vorc, tcg_gen_orc_i64, 2, 21);
> +#define GEN_VXFORM_VN(name, vece, tcg_op, opc2, opc3)                   \
> +static void glue(gen_, name)(DisasContext *ctx)                         \
> +{                                                                       \
> +    if (unlikely(!ctx->altivec_enabled)) {                              \
> +        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
> +        return;                                                         \
> +    }                                                                   \
> +                                                                        \
> +    tcg_op(vece,                                                        \
> +           avr64_offset(rD(ctx->opcode), true),                         \
> +           avr64_offset(rA(ctx->opcode), true),                         \
> +           avr64_offset(rB(ctx->opcode), true),                         \
> +           16, 16);                                                     \
> +                                                                        \
> +    tcg_gen_gvec_not(vece,                                              \
> +                     avr64_offset(rD(ctx->opcode), true),               \
> +                     avr64_offset(rD(ctx->opcode), true),               \
> +                     16, 16);                                           \
> +}
> +
> +/* Logical operations */
> +GEN_VXFORM_V(vand, MO_64, tcg_gen_gvec_and, 2, 16);
> +GEN_VXFORM_V(vandc, MO_64, tcg_gen_gvec_andc, 2, 17);
> +GEN_VXFORM_V(vor, MO_64, tcg_gen_gvec_or, 2, 18);
> +GEN_VXFORM_V(vxor, MO_64, tcg_gen_gvec_xor, 2, 19);
> +GEN_VXFORM_VN(vnor, MO_64, tcg_gen_gvec_or, 2, 20);
> +GEN_VXFORM_VN(veqv, MO_64, tcg_gen_gvec_xor, 2, 26);
> +GEN_VXFORM_VN(vnand, MO_64, tcg_gen_gvec_and, 2, 22);
> +GEN_VXFORM_V(vorc, MO_64, tcg_gen_gvec_orc, 2, 21);
>  
>  #define GEN_VXFORM(name, opc2, opc3)                                    \
>  static void glue(gen_, name)(DisasContext *ctx)                                 \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Richard Henderson
@ 2018-12-19  6:29   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3833 bytes --]

On Mon, Dec 17, 2018 at 10:38:55PM -0800, Richard Henderson wrote:
> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> Message-Id: <20181217122405.18732-10-mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/helper.h                 |  8 --------
>  target/ppc/int_helper.c             |  7 -------
>  target/ppc/translate/vmx-impl.inc.c | 16 ++++++++--------
>  3 files changed, 8 insertions(+), 23 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index c7de04e068..553ff500c8 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -108,14 +108,6 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
>  #define dh_ctype_avr ppc_avr_t *
>  #define dh_is_signed_avr dh_is_signed_ptr
>  
> -DEF_HELPER_3(vaddubm, void, avr, avr, avr)
> -DEF_HELPER_3(vadduhm, void, avr, avr, avr)
> -DEF_HELPER_3(vadduwm, void, avr, avr, avr)
> -DEF_HELPER_3(vaddudm, void, avr, avr, avr)
> -DEF_HELPER_3(vsububm, void, avr, avr, avr)
> -DEF_HELPER_3(vsubuhm, void, avr, avr, avr)
> -DEF_HELPER_3(vsubuwm, void, avr, avr, avr)
> -DEF_HELPER_3(vsubudm, void, avr, avr, avr)
>  DEF_HELPER_3(vavgub, void, avr, avr, avr)
>  DEF_HELPER_3(vavguh, void, avr, avr, avr)
>  DEF_HELPER_3(vavguw, void, avr, avr, avr)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 9d715be25c..4547453ef1 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -531,13 +531,6 @@ void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b)
>              r->element[i] = a->element[i] op b->element[i];             \
>          }                                                               \
>      }
> -#define VARITH(suffix, element)                 \
> -    VARITH_DO(add##suffix, +, element)          \
> -    VARITH_DO(sub##suffix, -, element)
> -VARITH(ubm, u8)
> -VARITH(uhm, u16)
> -VARITH(uwm, u32)
> -VARITH(udm, u64)
>  VARITH_DO(muluwm, *, u32)
>  #undef VARITH_DO
>  #undef VARITH
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index c13828a09d..e353d3f174 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -411,18 +411,18 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>      tcg_temp_free_ptr(rb);                                              \
>  }
>  
> -GEN_VXFORM(vaddubm, 0, 0);
> +GEN_VXFORM_V(vaddubm, MO_8, tcg_gen_gvec_add, 0, 0);
>  GEN_VXFORM_DUAL_EXT(vaddubm, PPC_ALTIVEC, PPC_NONE, 0,       \
>                      vmul10cuq, PPC_NONE, PPC2_ISA300, 0x0000F800)
> -GEN_VXFORM(vadduhm, 0, 1);
> +GEN_VXFORM_V(vadduhm, MO_16, tcg_gen_gvec_add, 0, 1);
>  GEN_VXFORM_DUAL(vadduhm, PPC_ALTIVEC, PPC_NONE,  \
>                  vmul10ecuq, PPC_NONE, PPC2_ISA300)
> -GEN_VXFORM(vadduwm, 0, 2);
> -GEN_VXFORM(vaddudm, 0, 3);
> -GEN_VXFORM(vsububm, 0, 16);
> -GEN_VXFORM(vsubuhm, 0, 17);
> -GEN_VXFORM(vsubuwm, 0, 18);
> -GEN_VXFORM(vsubudm, 0, 19);
> +GEN_VXFORM_V(vadduwm, MO_32, tcg_gen_gvec_add, 0, 2);
> +GEN_VXFORM_V(vaddudm, MO_64, tcg_gen_gvec_add, 0, 3);
> +GEN_VXFORM_V(vsububm, MO_8, tcg_gen_gvec_sub, 0, 16);
> +GEN_VXFORM_V(vsubuhm, MO_16, tcg_gen_gvec_sub, 0, 17);
> +GEN_VXFORM_V(vsubuwm, MO_32, tcg_gen_gvec_sub, 0, 18);
> +GEN_VXFORM_V(vsubudm, MO_64, tcg_gen_gvec_sub, 0, 19);
>  GEN_VXFORM(vmaxub, 1, 0);
>  GEN_VXFORM(vmaxuh, 1, 1);
>  GEN_VXFORM(vmaxuw, 1, 2);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 19/34] target/ppc: convert vspltis[bhw] to use vector operations
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 19/34] target/ppc: convert vspltis[bhw] " Richard Henderson
@ 2018-12-19  6:31   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6559 bytes --]

On Mon, Dec 17, 2018 at 10:38:56PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/helper.h                 |  3 ---
>  target/ppc/int_helper.c             | 15 ------------
>  target/ppc/translate/vmx-impl.inc.c | 36 +++++++----------------------
>  3 files changed, 8 insertions(+), 46 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 553ff500c8..2aa60e5d36 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -246,9 +246,6 @@ DEF_HELPER_3(vrld, void, avr, avr, avr)
>  DEF_HELPER_3(vsl, void, avr, avr, avr)
>  DEF_HELPER_3(vsr, void, avr, avr, avr)
>  DEF_HELPER_4(vsldoi, void, avr, avr, avr, i32)
> -DEF_HELPER_2(vspltisb, void, avr, i32)
> -DEF_HELPER_2(vspltish, void, avr, i32)
> -DEF_HELPER_2(vspltisw, void, avr, i32)
>  DEF_HELPER_3(vspltb, void, avr, avr, i32)
>  DEF_HELPER_3(vsplth, void, avr, avr, i32)
>  DEF_HELPER_3(vspltw, void, avr, avr, i32)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 4547453ef1..e44c0d90ee 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -2066,21 +2066,6 @@ VNEG(vnegw, s32)
>  VNEG(vnegd, s64)
>  #undef VNEG
>  
> -#define VSPLTI(suffix, element, splat_type)                     \
> -    void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
> -    {                                                           \
> -        splat_type x = (int8_t)(splat << 3) >> 3;               \
> -        int i;                                                  \
> -                                                                \
> -        for (i = 0; i < ARRAY_SIZE(r->element); i++) {          \
> -            r->element[i] = x;                                  \
> -        }                                                       \
> -    }
> -VSPLTI(b, s8, int8_t)
> -VSPLTI(h, s16, int16_t)
> -VSPLTI(w, s32, int32_t)
> -#undef VSPLTI
> -
>  #define VSR(suffix, element, mask)                                      \
>      void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
>      {                                                                   \
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index e353d3f174..be638cdb1a 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -720,25 +720,21 @@ GEN_VXRFORM_DUAL(vcmpbfp, PPC_ALTIVEC, PPC_NONE, \
>  GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
>                   vcmpgtud, PPC_NONE, PPC2_ALTIVEC_207)
>  
> -#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
> +#define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
>  static void glue(gen_, name)(DisasContext *ctx)                         \
>      {                                                                   \
> -        TCGv_ptr rd;                                                    \
> -        TCGv_i32 simm;                                                  \
> +        int simm;                                                       \
>          if (unlikely(!ctx->altivec_enabled)) {                          \
>              gen_exception(ctx, POWERPC_EXCP_VPU);                       \
>              return;                                                     \
>          }                                                               \
> -        simm = tcg_const_i32(SIMM5(ctx->opcode));                       \
> -        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> -        gen_helper_##name (rd, simm);                                   \
> -        tcg_temp_free_i32(simm);                                        \
> -        tcg_temp_free_ptr(rd);                                          \
> +        simm = SIMM5(ctx->opcode);                                      \
> +        tcg_op(avr64_offset(rD(ctx->opcode), true), 16, 16, simm);      \
>      }
>  
> -GEN_VXFORM_SIMM(vspltisb, 6, 12);
> -GEN_VXFORM_SIMM(vspltish, 6, 13);
> -GEN_VXFORM_SIMM(vspltisw, 6, 14);
> +GEN_VXFORM_DUPI(vspltisb, tcg_gen_gvec_dup8i, 6, 12);
> +GEN_VXFORM_DUPI(vspltish, tcg_gen_gvec_dup16i, 6, 13);
> +GEN_VXFORM_DUPI(vspltisw, tcg_gen_gvec_dup32i, 6, 14);
>  
>  #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
>  static void glue(gen_, name)(DisasContext *ctx)                                 \
> @@ -818,22 +814,6 @@ GEN_VXFORM_NOA(vprtybw, 1, 24);
>  GEN_VXFORM_NOA(vprtybd, 1, 24);
>  GEN_VXFORM_NOA(vprtybq, 1, 24);
>  
> -#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
> -static void glue(gen_, name)(DisasContext *ctx)                                 \
> -    {                                                                   \
> -        TCGv_ptr rd;                                                    \
> -        TCGv_i32 simm;                                                  \
> -        if (unlikely(!ctx->altivec_enabled)) {                          \
> -            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
> -            return;                                                     \
> -        }                                                               \
> -        simm = tcg_const_i32(SIMM5(ctx->opcode));                       \
> -        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> -        gen_helper_##name (rd, simm);                                   \
> -        tcg_temp_free_i32(simm);                                        \
> -        tcg_temp_free_ptr(rd);                                          \
> -    }
> -
>  #define GEN_VXFORM_UIMM(name, opc2, opc3)                               \
>  static void glue(gen_, name)(DisasContext *ctx)                                 \
>      {                                                                   \
> @@ -1255,7 +1235,7 @@ GEN_VXFORM_DUAL(vsldoi, PPC_ALTIVEC, PPC_NONE,
>  #undef GEN_VXRFORM_DUAL
>  #undef GEN_VXRFORM1
>  #undef GEN_VXRFORM
> -#undef GEN_VXFORM_SIMM
> +#undef GEN_VXFORM_DUPI
>  #undef GEN_VXFORM_NOA
>  #undef GEN_VXFORM_UIMM
>  #undef GEN_VAFORM_PAIRED

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 20/34] target/ppc: convert vsplt[bhw] to use vector operations
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 20/34] target/ppc: convert vsplt[bhw] " Richard Henderson
@ 2018-12-19  6:32   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6168 bytes --]

On Mon, Dec 17, 2018 at 10:38:57PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/helper.h                 |  3 --
>  target/ppc/int_helper.c             | 24 ---------------
>  target/ppc/translate/vmx-impl.inc.c | 45 +++++++++++++++++------------
>  3 files changed, 26 insertions(+), 46 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 2aa60e5d36..069daa9883 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -246,9 +246,6 @@ DEF_HELPER_3(vrld, void, avr, avr, avr)
>  DEF_HELPER_3(vsl, void, avr, avr, avr)
>  DEF_HELPER_3(vsr, void, avr, avr, avr)
>  DEF_HELPER_4(vsldoi, void, avr, avr, avr, i32)
> -DEF_HELPER_3(vspltb, void, avr, avr, i32)
> -DEF_HELPER_3(vsplth, void, avr, avr, i32)
> -DEF_HELPER_3(vspltw, void, avr, avr, i32)
>  DEF_HELPER_3(vextractub, void, avr, avr, i32)
>  DEF_HELPER_3(vextractuh, void, avr, avr, i32)
>  DEF_HELPER_3(vextractuw, void, avr, avr, i32)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index e44c0d90ee..3bf0fdb6c5 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -1918,30 +1918,6 @@ void helper_vslo(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>  #endif
>  }
>  
> -/* Experimental testing shows that hardware masks the immediate.  */
> -#define _SPLAT_MASKED(element) (splat & (ARRAY_SIZE(r->element) - 1))
> -#if defined(HOST_WORDS_BIGENDIAN)
> -#define SPLAT_ELEMENT(element) _SPLAT_MASKED(element)
> -#else
> -#define SPLAT_ELEMENT(element)                                  \
> -    (ARRAY_SIZE(r->element) - 1 - _SPLAT_MASKED(element))
> -#endif
> -#define VSPLT(suffix, element)                                          \
> -    void helper_vsplt##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> -    {                                                                   \
> -        uint32_t s = b->element[SPLAT_ELEMENT(element)];                \
> -        int i;                                                          \
> -                                                                        \
> -        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
> -            r->element[i] = s;                                          \
> -        }                                                               \
> -    }
> -VSPLT(b, u8)
> -VSPLT(h, u16)
> -VSPLT(w, u32)
> -#undef VSPLT
> -#undef SPLAT_ELEMENT
> -#undef _SPLAT_MASKED
>  #if defined(HOST_WORDS_BIGENDIAN)
>  #define VINSERT(suffix, element)                                            \
>      void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t index) \
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index be638cdb1a..529ae0e5f5 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -814,24 +814,31 @@ GEN_VXFORM_NOA(vprtybw, 1, 24);
>  GEN_VXFORM_NOA(vprtybd, 1, 24);
>  GEN_VXFORM_NOA(vprtybq, 1, 24);
>  
> -#define GEN_VXFORM_UIMM(name, opc2, opc3)                               \
> -static void glue(gen_, name)(DisasContext *ctx)                                 \
> -    {                                                                   \
> -        TCGv_ptr rb, rd;                                                \
> -        TCGv_i32 uimm;                                                  \
> -        if (unlikely(!ctx->altivec_enabled)) {                          \
> -            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
> -            return;                                                     \
> -        }                                                               \
> -        uimm = tcg_const_i32(UIMM5(ctx->opcode));                       \
> -        rb = gen_avr_ptr(rB(ctx->opcode));                              \
> -        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> -        gen_helper_##name (rd, rb, uimm);                               \
> -        tcg_temp_free_i32(uimm);                                        \
> -        tcg_temp_free_ptr(rb);                                          \
> -        tcg_temp_free_ptr(rd);                                          \
> +static void gen_vsplt(DisasContext *ctx, int vece)
> +{
> +    int uimm, dofs, bofs;
> +
> +    if (unlikely(!ctx->altivec_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VPU);
> +        return;
>      }
>  
> +    uimm = UIMM5(ctx->opcode);
> +    bofs = avr64_offset(rB(ctx->opcode), true);
> +    dofs = avr64_offset(rD(ctx->opcode), true);
> +
> +    /* Experimental testing shows that hardware masks the immediate.  */
> +    bofs += (uimm << vece) & 15;
> +#ifndef HOST_WORDS_BIGENDIAN
> +    bofs ^= 15;
> +#endif
> +
> +    tcg_gen_gvec_dup_mem(vece, dofs, bofs, 16, 16);
> +}
> +
> +#define GEN_VXFORM_VSPLT(name, vece, opc2, opc3) \
> +static void glue(gen_, name)(DisasContext *ctx) { gen_vsplt(ctx, vece); }
> +
>  #define GEN_VXFORM_UIMM_ENV(name, opc2, opc3)                           \
>  static void glue(gen_, name)(DisasContext *ctx)                         \
>      {                                                                   \
> @@ -873,9 +880,9 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>          tcg_temp_free_ptr(rd);                                          \
>      }
>  
> -GEN_VXFORM_UIMM(vspltb, 6, 8);
> -GEN_VXFORM_UIMM(vsplth, 6, 9);
> -GEN_VXFORM_UIMM(vspltw, 6, 10);
> +GEN_VXFORM_VSPLT(vspltb, MO_8, 6, 8);
> +GEN_VXFORM_VSPLT(vsplth, MO_16, 6, 9);
> +GEN_VXFORM_VSPLT(vspltw, MO_32, 6, 10);
>  GEN_VXFORM_UIMM_SPLAT(vextractub, 6, 8, 15);
>  GEN_VXFORM_UIMM_SPLAT(vextractuh, 6, 9, 14);
>  GEN_VXFORM_UIMM_SPLAT(vextractuw, 6, 10, 12);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 21/34] target/ppc: nand, nor, eqv are now generic vector operations
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 21/34] target/ppc: nand, nor, eqv are now generic " Richard Henderson
@ 2018-12-19  6:32   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3080 bytes --]

On Mon, Dec 17, 2018 at 10:38:58PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/translate/vmx-impl.inc.c | 26 +++-----------------------
>  1 file changed, 3 insertions(+), 23 deletions(-)
> 
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index 529ae0e5f5..329131d30b 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -277,34 +277,14 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>             16, 16);                                                     \
>  }
>  
> -#define GEN_VXFORM_VN(name, vece, tcg_op, opc2, opc3)                   \
> -static void glue(gen_, name)(DisasContext *ctx)                         \
> -{                                                                       \
> -    if (unlikely(!ctx->altivec_enabled)) {                              \
> -        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
> -        return;                                                         \
> -    }                                                                   \
> -                                                                        \
> -    tcg_op(vece,                                                        \
> -           avr64_offset(rD(ctx->opcode), true),                         \
> -           avr64_offset(rA(ctx->opcode), true),                         \
> -           avr64_offset(rB(ctx->opcode), true),                         \
> -           16, 16);                                                     \
> -                                                                        \
> -    tcg_gen_gvec_not(vece,                                              \
> -                     avr64_offset(rD(ctx->opcode), true),               \
> -                     avr64_offset(rD(ctx->opcode), true),               \
> -                     16, 16);                                           \
> -}
> -
>  /* Logical operations */
>  GEN_VXFORM_V(vand, MO_64, tcg_gen_gvec_and, 2, 16);
>  GEN_VXFORM_V(vandc, MO_64, tcg_gen_gvec_andc, 2, 17);
>  GEN_VXFORM_V(vor, MO_64, tcg_gen_gvec_or, 2, 18);
>  GEN_VXFORM_V(vxor, MO_64, tcg_gen_gvec_xor, 2, 19);
> -GEN_VXFORM_VN(vnor, MO_64, tcg_gen_gvec_or, 2, 20);
> -GEN_VXFORM_VN(veqv, MO_64, tcg_gen_gvec_xor, 2, 26);
> -GEN_VXFORM_VN(vnand, MO_64, tcg_gen_gvec_and, 2, 22);
> +GEN_VXFORM_V(vnor, MO_64, tcg_gen_gvec_nor, 2, 20);
> +GEN_VXFORM_V(veqv, MO_64, tcg_gen_gvec_eqv, 2, 26);
> +GEN_VXFORM_V(vnand, MO_64, tcg_gen_gvec_nand, 2, 22);
>  GEN_VXFORM_V(vorc, MO_64, tcg_gen_gvec_orc, 2, 21);
>  
>  #define GEN_VXFORM(name, opc2, opc3)                                    \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 22/34] target/ppc: convert VSX logical operations to vector operations
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 22/34] target/ppc: convert VSX logical operations to " Richard Henderson
@ 2018-12-19  6:33   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4192 bytes --]

On Mon, Dec 17, 2018 at 10:38:59PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/translate/vsx-impl.inc.c | 43 ++++++++++++-----------------
>  1 file changed, 17 insertions(+), 26 deletions(-)
> 
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index 1608ad48b1..8ab1290026 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -10,6 +10,11 @@ static inline void set_vsr(int n, TCGv_i64 src)
>      tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
>  }
>  
> +static inline int vsr_full_offset(int n)
> +{
> +    return offsetof(CPUPPCState, vsr[n].u64[0]);
> +}
> +
>  static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
>  {
>      if (n < 32) {
> @@ -1214,40 +1219,26 @@ static void gen_xxbrw(DisasContext *ctx)
>      tcg_temp_free_i64(xbl);
>  }
>  
> -#define VSX_LOGICAL(name, tcg_op)                                    \
> +#define VSX_LOGICAL(name, vece, tcg_op)                              \
>  static void glue(gen_, name)(DisasContext * ctx)                     \
>      {                                                                \
> -        TCGv_i64 t0;                                                 \
> -        TCGv_i64 t1;                                                 \
> -        TCGv_i64 t2;                                                 \
>          if (unlikely(!ctx->vsx_enabled)) {                           \
>              gen_exception(ctx, POWERPC_EXCP_VSXU);                   \
>              return;                                                  \
>          }                                                            \
> -        t0 = tcg_temp_new_i64();                                     \
> -        t1 = tcg_temp_new_i64();                                     \
> -        t2 = tcg_temp_new_i64();                                     \
> -        get_cpu_vsrh(t0, xA(ctx->opcode));                           \
> -        get_cpu_vsrh(t1, xB(ctx->opcode));                           \
> -        tcg_op(t2, t0, t1);                                          \
> -        set_cpu_vsrh(xT(ctx->opcode), t2);                           \
> -        get_cpu_vsrl(t0, xA(ctx->opcode));                           \
> -        get_cpu_vsrl(t1, xB(ctx->opcode));                           \
> -        tcg_op(t2, t0, t1);                                          \
> -        set_cpu_vsrl(xT(ctx->opcode), t2);                           \
> -        tcg_temp_free_i64(t0);                                       \
> -        tcg_temp_free_i64(t1);                                       \
> -        tcg_temp_free_i64(t2);                                       \
> +        tcg_op(vece, vsr_full_offset(xT(ctx->opcode)),               \
> +               vsr_full_offset(xA(ctx->opcode)),                     \
> +               vsr_full_offset(xB(ctx->opcode)), 16, 16);            \
>      }
>  
> -VSX_LOGICAL(xxland, tcg_gen_and_i64)
> -VSX_LOGICAL(xxlandc, tcg_gen_andc_i64)
> -VSX_LOGICAL(xxlor, tcg_gen_or_i64)
> -VSX_LOGICAL(xxlxor, tcg_gen_xor_i64)
> -VSX_LOGICAL(xxlnor, tcg_gen_nor_i64)
> -VSX_LOGICAL(xxleqv, tcg_gen_eqv_i64)
> -VSX_LOGICAL(xxlnand, tcg_gen_nand_i64)
> -VSX_LOGICAL(xxlorc, tcg_gen_orc_i64)
> +VSX_LOGICAL(xxland, MO_64, tcg_gen_gvec_and)
> +VSX_LOGICAL(xxlandc, MO_64, tcg_gen_gvec_andc)
> +VSX_LOGICAL(xxlor, MO_64, tcg_gen_gvec_or)
> +VSX_LOGICAL(xxlxor, MO_64, tcg_gen_gvec_xor)
> +VSX_LOGICAL(xxlnor, MO_64, tcg_gen_gvec_nor)
> +VSX_LOGICAL(xxleqv, MO_64, tcg_gen_gvec_eqv)
> +VSX_LOGICAL(xxlnand, MO_64, tcg_gen_gvec_nand)
> +VSX_LOGICAL(xxlorc, MO_64, tcg_gen_gvec_orc)
>  
>  #define VSX_XXMRG(name, high)                               \
>  static void glue(gen_, name)(DisasContext * ctx)            \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 23/34] target/ppc: convert xxspltib to vector operations
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 23/34] target/ppc: convert xxspltib " Richard Henderson
@ 2018-12-19  6:34   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Mon, Dec 17, 2018 at 10:39:00PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/translate/vsx-impl.inc.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index 8ab1290026..d88d6bbd74 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -1356,9 +1356,10 @@ static void gen_xxspltw(DisasContext *ctx)
>  
>  static void gen_xxspltib(DisasContext *ctx)
>  {
> -    unsigned char uim8 = IMM8(ctx->opcode);
> -    TCGv_i64 vsr = tcg_temp_new_i64();
> -    if (xS(ctx->opcode) < 32) {
> +    uint8_t uim8 = IMM8(ctx->opcode);
> +    int rt = xT(ctx->opcode);
> +
> +    if (rt < 32) {
>          if (unlikely(!ctx->altivec_enabled)) {
>              gen_exception(ctx, POWERPC_EXCP_VPU);
>              return;
> @@ -1369,10 +1370,7 @@ static void gen_xxspltib(DisasContext *ctx)
>              return;
>          }
>      }
> -    tcg_gen_movi_i64(vsr, pattern(uim8));
> -    set_cpu_vsrh(xT(ctx->opcode), vsr);
> -    set_cpu_vsrl(xT(ctx->opcode), vsr);
> -    tcg_temp_free_i64(vsr);
> +    tcg_gen_gvec_dup8i(vsr_full_offset(rt), 16, 16, uim8);
>  }
>  
>  static void gen_xxsldwi(DisasContext *ctx)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 24/34] target/ppc: convert xxspltw to vector operations
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 24/34] target/ppc: convert xxspltw " Richard Henderson
@ 2018-12-19  6:35   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]

On Mon, Dec 17, 2018 at 10:39:01PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/translate/vsx-impl.inc.c | 36 +++++++++--------------------
>  1 file changed, 11 insertions(+), 25 deletions(-)
> 
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index d88d6bbd74..a040038ed4 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -1318,38 +1318,24 @@ static void gen_xxsel(DisasContext * ctx)
>  
>  static void gen_xxspltw(DisasContext *ctx)
>  {
> -    TCGv_i64 b, b2;
> -    TCGv_i64 vsr;
> -
> -    vsr = tcg_temp_new_i64();
> -    if (UIM(ctx->opcode) & 2) {
> -        get_cpu_vsrl(vsr, xB(ctx->opcode));
> -    } else {
> -        get_cpu_vsrh(vsr, xB(ctx->opcode));
> -    }
> +    int rt = xT(ctx->opcode);
> +    int rb = xB(ctx->opcode);
> +    int uim = UIM(ctx->opcode);
> +    int tofs, bofs;
>  
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
>  
> -    b = tcg_temp_new_i64();
> -    b2 = tcg_temp_new_i64();
> +    tofs = vsr_full_offset(rt);
> +    bofs = vsr_full_offset(rb);
> +    bofs += uim << MO_32;
> +#ifndef HOST_WORDS_BIG_ENDIAN
> +    bofs ^= 8 | 4;
> +#endif
>  
> -    if (UIM(ctx->opcode) & 1) {
> -        tcg_gen_ext32u_i64(b, vsr);
> -    } else {
> -        tcg_gen_shri_i64(b, vsr, 32);
> -    }
> -
> -    tcg_gen_shli_i64(b2, b, 32);
> -    tcg_gen_or_i64(vsr, b, b2);
> -    set_cpu_vsrh(xT(ctx->opcode), vsr);
> -    set_cpu_vsrl(xT(ctx->opcode), vsr);
> -
> -    tcg_temp_free_i64(vsr);
> -    tcg_temp_free_i64(b);
> -    tcg_temp_free_i64(b2);
> +    tcg_gen_gvec_dup_mem(MO_32, tofs, bofs, 16, 16);
>  }
>  
>  #define pattern(x) (((x) & 0xff) * (~(uint64_t)0 / 0xff))

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 25/34] target/ppc: convert xxsel to vector operations
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 25/34] target/ppc: convert xxsel " Richard Henderson
@ 2018-12-19  6:35   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]

On Mon, Dec 17, 2018 at 10:39:02PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/translate/vsx-impl.inc.c | 55 ++++++++++++++---------------
>  1 file changed, 27 insertions(+), 28 deletions(-)
> 
> diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
> index a040038ed4..dc32471cd7 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -1280,40 +1280,39 @@ static void glue(gen_, name)(DisasContext * ctx)            \
>  VSX_XXMRG(xxmrghw, 1)
>  VSX_XXMRG(xxmrglw, 0)
>  
> +static void xxsel_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c)
> +{
> +    tcg_gen_and_i64(b, b, c);
> +    tcg_gen_andc_i64(a, a, c);
> +    tcg_gen_or_i64(t, a, b);
> +}
> +
> +static void xxsel_vec(unsigned vece, TCGv_vec t, TCGv_vec a,
> +                      TCGv_vec b, TCGv_vec c)
> +{
> +    tcg_gen_and_vec(vece, b, b, c);
> +    tcg_gen_andc_vec(vece, a, a, c);
> +    tcg_gen_or_vec(vece, t, a, b);
> +}
> +
>  static void gen_xxsel(DisasContext * ctx)
>  {
> -    TCGv_i64 a, b, c, tmp;
> +    static const GVecGen4 g = {
> +        .fni8 = xxsel_i64,
> +        .fniv = xxsel_vec,
> +        .vece = MO_64,
> +    };
> +    int rt = xT(ctx->opcode);
> +    int ra = xA(ctx->opcode);
> +    int rb = xB(ctx->opcode);
> +    int rc = xC(ctx->opcode);
> +
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
>          return;
>      }
> -    a = tcg_temp_new_i64();
> -    b = tcg_temp_new_i64();
> -    c = tcg_temp_new_i64();
> -    tmp = tcg_temp_new_i64();
> -
> -    get_cpu_vsrh(a, xA(ctx->opcode));
> -    get_cpu_vsrh(b, xB(ctx->opcode));
> -    get_cpu_vsrh(c, xC(ctx->opcode));
> -
> -    tcg_gen_and_i64(b, b, c);
> -    tcg_gen_andc_i64(a, a, c);
> -    tcg_gen_or_i64(tmp, a, b);
> -    set_cpu_vsrh(xT(ctx->opcode), tmp);
> -
> -    get_cpu_vsrl(a, xA(ctx->opcode));
> -    get_cpu_vsrl(b, xB(ctx->opcode));
> -    get_cpu_vsrl(c, xC(ctx->opcode));
> -
> -    tcg_gen_and_i64(b, b, c);
> -    tcg_gen_andc_i64(a, a, c);
> -    tcg_gen_or_i64(tmp, a, b);
> -    set_cpu_vsrl(xT(ctx->opcode), tmp);
> -
> -    tcg_temp_free_i64(a);
> -    tcg_temp_free_i64(b);
> -    tcg_temp_free_i64(c);
> -    tcg_temp_free_i64(tmp);
> +    tcg_gen_gvec_4(vsr_full_offset(rt), vsr_full_offset(ra),
> +                   vsr_full_offset(rb), vsr_full_offset(rc), 16, 16, &g);
>  }
>  
>  static void gen_xxspltw(DisasContext *ctx)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 26/34] target/ppc: Pass integer to helper_mtvscr
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 26/34] target/ppc: Pass integer to helper_mtvscr Richard Henderson
@ 2018-12-19  6:37   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3100 bytes --]

On Mon, Dec 17, 2018 at 10:39:03PM -0800, Richard Henderson wrote:
> We can re-use this helper elsewhere if we're not passing
> in an entire vector register.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/helper.h                 |  2 +-
>  target/ppc/int_helper.c             | 10 +++-------
>  target/ppc/translate/vmx-impl.inc.c | 17 +++++++++++++----
>  3 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 069daa9883..b3ffe28103 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -294,7 +294,7 @@ DEF_HELPER_5(vmsumuhs, void, env, avr, avr, avr, avr)
>  DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr)
>  DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr)
>  DEF_HELPER_4(vmladduhm, void, avr, avr, avr, avr)
> -DEF_HELPER_2(mtvscr, void, env, avr)
> +DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32)
>  DEF_HELPER_3(lvebx, void, env, avr, tl)
>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>  DEF_HELPER_3(lvewx, void, env, avr, tl)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 3bf0fdb6c5..0443f33cd2 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -469,14 +469,10 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh)
>      }
>  }
>  
> -void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)
> +void helper_mtvscr(CPUPPCState *env, uint32_t vscr)
>  {
> -#if defined(HOST_WORDS_BIGENDIAN)
> -    env->vscr = r->u32[3];
> -#else
> -    env->vscr = r->u32[0];
> -#endif
> -    set_flush_to_zero(vscr_nj, &env->vec_status);
> +    env->vscr = vscr;
> +    set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status);
>  }
>  
>  void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index 329131d30b..ab6da3aa55 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -196,14 +196,23 @@ static void gen_mfvscr(DisasContext *ctx)
>  
>  static void gen_mtvscr(DisasContext *ctx)
>  {
> -    TCGv_ptr p;
> +    TCGv_i32 val;
> +    int bofs;
> +
>      if (unlikely(!ctx->altivec_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VPU);
>          return;
>      }
> -    p = gen_avr_ptr(rB(ctx->opcode));
> -    gen_helper_mtvscr(cpu_env, p);
> -    tcg_temp_free_ptr(p);
> +
> +    val = tcg_temp_new_i32();
> +    bofs = avr64_offset(rB(ctx->opcode), true);
> +#ifdef HOST_WORDS_BIGENDIAN
> +    bofs += 3 * 4;
> +#endif
> +
> +    tcg_gen_ld_i32(val, cpu_env, bofs);
> +    gen_helper_mtvscr(cpu_env, val);
> +    tcg_temp_free_i32(val);
>  }
>  
>  #define GEN_VX_VMUL10(name, add_cin, ret_carry)                         \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 27/34] target/ppc: Use helper_mtvscr for reset and gdb
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 27/34] target/ppc: Use helper_mtvscr for reset and gdb Richard Henderson
@ 2018-12-19  6:38   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1544 bytes --]

On Mon, Dec 17, 2018 at 10:39:04PM -0800, Richard Henderson wrote:
> Not setting flush_to_zero from gdb_set_avr_reg was a bug.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/translate_init.inc.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index b83097141c..292b1df700 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -601,10 +601,9 @@ static void spr_write_excp_vector(DisasContext *ctx, int sprn, int gprn)
>  
>  static inline void vscr_init(CPUPPCState *env, uint32_t val)
>  {
> -    env->vscr = val;
>      /* Altivec always uses round-to-nearest */
>      set_float_rounding_mode(float_round_nearest_even, &env->vec_status);
> -    set_flush_to_zero(vscr_nj, &env->vec_status);
> +    helper_mtvscr(env, val);
>  }
>  
>  #ifdef CONFIG_USER_ONLY
> @@ -9556,7 +9555,7 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>      }
>      if (n == 32) {
>          ppc_maybe_bswap_register(env, mem_buf, 4);
> -        env->vscr = ldl_p(mem_buf);
> +        helper_mtvscr(env, ldl_p(mem_buf));
>          return 4;
>      }
>      if (n == 33) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 28/34] target/ppc: Remove vscr_nj and vscr_sat
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 28/34] target/ppc: Remove vscr_nj and vscr_sat Richard Henderson
@ 2018-12-19  6:38   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On Mon, Dec 17, 2018 at 10:39:05PM -0800, Richard Henderson wrote:
> These macros are no longer used.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/cpu.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c8f449081d..a2fe6058b1 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -700,8 +700,6 @@ enum {
>  /* Vector status and control register */
>  #define VSCR_NJ		16 /* Vector non-java */
>  #define VSCR_SAT	0 /* Vector saturation */
> -#define vscr_nj		(((env->vscr) >> VSCR_NJ)	& 0x1)
> -#define vscr_sat	(((env->vscr) >> VSCR_SAT)	& 0x1)
>  
>  /*****************************************************************************/
>  /* BookE e500 MMU registers */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 29/34] target/ppc: Add helper_mfvscr
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 29/34] target/ppc: Add helper_mfvscr Richard Henderson
@ 2018-12-19  6:39   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3793 bytes --]

On Mon, Dec 17, 2018 at 10:39:06PM -0800, Richard Henderson wrote:
> This is required before changing the representation of the register.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/helper.h                 | 1 +
>  target/ppc/arch_dump.c              | 3 ++-
>  target/ppc/int_helper.c             | 5 +++++
>  target/ppc/translate/vmx-impl.inc.c | 2 +-
>  target/ppc/translate_init.inc.c     | 2 +-
>  5 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index b3ffe28103..7dbb08b9dd 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -295,6 +295,7 @@ DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr)
>  DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr)
>  DEF_HELPER_4(vmladduhm, void, avr, avr, avr, avr)
>  DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32)
> +DEF_HELPER_FLAGS_1(mfvscr, TCG_CALL_NO_RWG, i32, env)
>  DEF_HELPER_3(lvebx, void, env, avr, tl)
>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>  DEF_HELPER_3(lvewx, void, env, avr, tl)
> diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
> index c272d0d3d4..f753798789 100644
> --- a/target/ppc/arch_dump.c
> +++ b/target/ppc/arch_dump.c
> @@ -17,6 +17,7 @@
>  #include "elf.h"
>  #include "sysemu/dump.h"
>  #include "sysemu/kvm.h"
> +#include "exec/helper-proto.h"
>  
>  #ifdef TARGET_PPC64
>  #define ELFCLASS ELFCLASS64
> @@ -173,7 +174,7 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
>              vmxregset->avr[i].u64[1] = cpu->env.vsr[32 + i].u64[1];
>          }
>      }
> -    vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
> +    vmxregset->vscr.u32[3] = cpu_to_dump32(s, helper_mfvscr(&cpu->env));
>  }
>  
>  static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 0443f33cd2..75201bbba6 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -475,6 +475,11 @@ void helper_mtvscr(CPUPPCState *env, uint32_t vscr)
>      set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status);
>  }
>  
> +uint32_t helper_mfvscr(CPUPPCState *env)
> +{
> +    return env->vscr;
> +}
> +
>  void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>  {
>      int i;
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index ab6da3aa55..1c0c461241 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -187,7 +187,7 @@ static void gen_mfvscr(DisasContext *ctx)
>      tcg_gen_movi_i64(avr, 0);
>      set_avr64(rD(ctx->opcode), avr, true);
>      t = tcg_temp_new_i32();
> -    tcg_gen_ld_i32(t, cpu_env, offsetof(CPUPPCState, vscr));
> +    gen_helper_mfvscr(t, cpu_env);
>      tcg_gen_extu_i32_i64(avr, t);
>      set_avr64(rD(ctx->opcode), avr, false);
>      tcg_temp_free_i32(t);
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 292b1df700..353285c6bd 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -9527,7 +9527,7 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
>          return 16;
>      }
>      if (n == 32) {
> -        stl_p(mem_buf, env->vscr);
> +        stl_p(mem_buf, helper_mfvscr(env));
>          ppc_maybe_bswap_register(env, mem_buf, 4);
>          return 4;
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 30/34] target/ppc: Use mtvscr/mfvscr for vmstate
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 30/34] target/ppc: Use mtvscr/mfvscr for vmstate Richard Henderson
@ 2018-12-19  6:40   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]

On Mon, Dec 17, 2018 at 10:39:07PM -0800, Richard Henderson wrote:
> This is required before changing the representation of the register.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/machine.c | 44 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/target/ppc/machine.c b/target/ppc/machine.c
> index 451cf376b4..3c27a89166 100644
> --- a/target/ppc/machine.c
> +++ b/target/ppc/machine.c
> @@ -10,6 +10,7 @@
>  #include "migration/cpu.h"
>  #include "qapi/error.h"
>  #include "kvm_ppc.h"
> +#include "exec/helper-proto.h"
>  
>  static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>  {
> @@ -17,7 +18,7 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>      CPUPPCState *env = &cpu->env;
>      unsigned int i, j;
>      target_ulong sdr1;
> -    uint32_t fpscr;
> +    uint32_t fpscr, vscr;
>  #if defined(TARGET_PPC64)
>      int32_t slb_nr;
>  #endif
> @@ -84,7 +85,8 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>      if (!cpu->vhyp) {
>          ppc_store_sdr1(env, sdr1);
>      }
> -    qemu_get_be32s(f, &env->vscr);
> +    qemu_get_be32s(f, &vscr);
> +    helper_mtvscr(env, vscr);
>      qemu_get_be64s(f, &env->spe_acc);
>      qemu_get_be32s(f, &env->spe_fscr);
>      qemu_get_betls(f, &env->msr_mask);
> @@ -429,6 +431,28 @@ static bool altivec_needed(void *opaque)
>      return (cpu->env.insns_flags & PPC_ALTIVEC);
>  }
>  
> +static int get_vscr(QEMUFile *f, void *opaque, size_t size,
> +                    const VMStateField *field)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    helper_mtvscr(&cpu->env, qemu_get_be32(f));
> +    return 0;
> +}
> +
> +static int put_vscr(QEMUFile *f, void *opaque, size_t size,
> +                    const VMStateField *field, QJSON *vmdesc)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    qemu_put_be32(f, helper_mfvscr(&cpu->env));
> +    return 0;
> +}
> +
> +static const VMStateInfo vmstate_vscr = {
> +    .name = "cpu/altivec/vscr",
> +    .get = get_vscr,
> +    .put = put_vscr,
> +};
> +
>  static const VMStateDescription vmstate_altivec = {
>      .name = "cpu/altivec",
>      .version_id = 1,
> @@ -436,7 +460,21 @@ static const VMStateDescription vmstate_altivec = {
>      .needed = altivec_needed,
>      .fields = (VMStateField[]) {
>          VMSTATE_AVR_ARRAY(env.vsr, PowerPCCPU, 32),
> -        VMSTATE_UINT32(env.vscr, PowerPCCPU),
> +        /*
> +         * Save the architecture value of the vscr, not the internally
> +         * expanded version.  Since this architecture value does not
> +         * exist in memory to be stored, this requires a but of hoop
> +         * jumping.  We want OFFSET=0 so that we effectively pass CPU
> +         * to the helper functions.
> +         */
> +        {
> +            .name = "vscr",
> +            .version_id = 0,
> +            .size = sizeof(uint32_t),
> +            .info = &vmstate_vscr,
> +            .flags = VMS_SINGLE,
> +            .offset = 0
> +        },
>          VMSTATE_END_OF_LIST()
>      },
>  };

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 31/34] target/ppc: Add set_vscr_sat
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 31/34] target/ppc: Add set_vscr_sat Richard Henderson
@ 2018-12-19  6:40   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4960 bytes --]

On Mon, Dec 17, 2018 at 10:39:08PM -0800, Richard Henderson wrote:
> This is required before changing the representation of the register.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/int_helper.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 75201bbba6..38aa3e85a6 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -480,6 +480,11 @@ uint32_t helper_mfvscr(CPUPPCState *env)
>      return env->vscr;
>  }
>  
> +static inline void set_vscr_sat(CPUPPCState *env)
> +{
> +    env->vscr |= 1 << VSCR_SAT;
> +}
> +
>  void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>  {
>      int i;
> @@ -593,7 +598,7 @@ VARITHFPFMA(nmsubfp, float_muladd_negate_result | float_muladd_negate_c);
>              }                                                           \
>          }                                                               \
>          if (sat) {                                                      \
> -            env->vscr |= (1 << VSCR_SAT);                               \
> +            set_vscr_sat(env);                                          \
>          }                                                               \
>      }
>  #define VARITHSAT_SIGNED(suffix, element, optype, cvt)          \
> @@ -865,7 +870,7 @@ void helper_vcmpbfp_dot(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>              }                                                           \
>          }                                                               \
>          if (sat) {                                                      \
> -            env->vscr |= (1 << VSCR_SAT);                               \
> +            set_vscr_sat(env);                                          \
>          }                                                               \
>      }
>  VCT(uxs, cvtsduw, u32)
> @@ -916,7 +921,7 @@ void helper_vmhaddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -933,7 +938,7 @@ void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -1061,7 +1066,7 @@ void helper_vmsumshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -1114,7 +1119,7 @@ void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -1633,7 +1638,7 @@ void helper_vpkpx(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>          }                                                               \
>          *r = result;                                                    \
>          if (dosat && sat) {                                             \
> -            env->vscr |= (1 << VSCR_SAT);                               \
> +            set_vscr_sat(env);                                          \
>          }                                                               \
>      }
>  #define I(x, y) (x)
> @@ -2106,7 +2111,7 @@ void helper_vsumsws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>      *r = result;
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -2133,7 +2138,7 @@ void helper_vsum2sws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>  
>      *r = result;
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -2152,7 +2157,7 @@ void helper_vsum4sbs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -2169,7 +2174,7 @@ void helper_vsum4shs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  
> @@ -2188,7 +2193,7 @@ void helper_vsum4ubs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>      }
>  
>      if (sat) {
> -        env->vscr |= (1 << VSCR_SAT);
> +        set_vscr_sat(env);
>      }
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 32/34] target/ppc: Split out VSCR_SAT to a vector field
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 32/34] target/ppc: Split out VSCR_SAT to a vector field Richard Henderson
@ 2018-12-19  6:41   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2435 bytes --]

On Mon, Dec 17, 2018 at 10:39:09PM -0800, Richard Henderson wrote:
> Change the representation of VSCR_SAT such that it is easy
> to set from vector code.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/cpu.h        |  4 +++-
>  target/ppc/int_helper.c | 11 ++++++++---
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index a2fe6058b1..26d2e16720 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1063,10 +1063,12 @@ struct CPUPPCState {
>      /* Special purpose registers */
>      target_ulong spr[1024];
>      ppc_spr_t spr_cb[1024];
> -    /* Vector status and control register */
> +    /* Vector status and control register, minus VSCR_SAT.  */
>      uint32_t vscr;
>      /* VSX registers (including FP and AVR) */
>      ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
> +    /* Non-zero if and only if VSCR_SAT should be set.  */
> +    ppc_vsr_t vscr_sat;
>      /* SPE registers */
>      uint64_t spe_acc;
>      uint32_t spe_fscr;
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 38aa3e85a6..9dbcbcd87a 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -471,18 +471,23 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh)
>  
>  void helper_mtvscr(CPUPPCState *env, uint32_t vscr)
>  {
> -    env->vscr = vscr;
> +    env->vscr = vscr & ~(1u << VSCR_SAT);
> +    /* Which bit we set is completely arbitrary, but clear the rest.  */
> +    env->vscr_sat.u64[0] = vscr & (1u << VSCR_SAT);
> +    env->vscr_sat.u64[1] = 0;
>      set_flush_to_zero((vscr >> VSCR_NJ) & 1, &env->vec_status);
>  }
>  
>  uint32_t helper_mfvscr(CPUPPCState *env)
>  {
> -    return env->vscr;
> +    uint32_t sat = (env->vscr_sat.u64[0] | env->vscr_sat.u64[1]) != 0;
> +    return env->vscr | (sat << VSCR_SAT);
>  }
>  
>  static inline void set_vscr_sat(CPUPPCState *env)
>  {
> -    env->vscr |= 1 << VSCR_SAT;
> +    /* The choice of non-zero value is arbitrary.  */
> +    env->vscr_sat.u32[0] = 1;
>  }
>  
>  void helper_vaddcuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 33/34] target/ppc: convert vadd*s and vsub*s to vector operations
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 33/34] target/ppc: convert vadd*s and vsub*s to vector operations Richard Henderson
@ 2018-12-19  6:42   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 9333 bytes --]

On Mon, Dec 17, 2018 at 10:39:10PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/helper.h                 | 24 ++++++------
>  target/ppc/int_helper.c             | 18 ++-------
>  target/ppc/translate/vmx-impl.inc.c | 57 +++++++++++++++++++++++------
>  3 files changed, 61 insertions(+), 38 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 7dbb08b9dd..3daf6bf863 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -219,18 +219,18 @@ DEF_HELPER_2(vprtybq, void, avr, avr)
>  DEF_HELPER_3(vsubcuw, void, avr, avr, avr)
>  DEF_HELPER_2(lvsl, void, avr, tl)
>  DEF_HELPER_2(lvsr, void, avr, tl)
> -DEF_HELPER_4(vaddsbs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vaddshs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vaddsws, void, env, avr, avr, avr)
> -DEF_HELPER_4(vsubsbs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vsubshs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vsubsws, void, env, avr, avr, avr)
> -DEF_HELPER_4(vaddubs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vadduhs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vadduws, void, env, avr, avr, avr)
> -DEF_HELPER_4(vsububs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vsubuhs, void, env, avr, avr, avr)
> -DEF_HELPER_4(vsubuws, void, env, avr, avr, avr)
> +DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vsubsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vsubshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vsubsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vaddubs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vadduhs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vadduws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vsububs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vsubuhs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> +DEF_HELPER_FLAGS_5(vsubuws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
>  DEF_HELPER_3(vadduqm, void, avr, avr, avr)
>  DEF_HELPER_4(vaddecuq, void, avr, avr, avr, avr)
>  DEF_HELPER_4(vaddeuqm, void, avr, avr, avr, avr)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 9dbcbcd87a..22671c71e5 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -583,27 +583,17 @@ VARITHFPFMA(nmsubfp, float_muladd_negate_result | float_muladd_negate_c);
>      }
>  
>  #define VARITHSAT_DO(name, op, optype, cvt, element)                    \
> -    void helper_v##name(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,   \
> -                        ppc_avr_t *b)                                   \
> +    void helper_v##name(ppc_avr_t *r, ppc_avr_t *vscr_sat,              \
> +                        ppc_avr_t *a, ppc_avr_t *b, uint32_t desc)      \
>      {                                                                   \
>          int sat = 0;                                                    \
>          int i;                                                          \
>                                                                          \
>          for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
> -            switch (sizeof(r->element[0])) {                            \
> -            case 1:                                                     \
> -                VARITHSAT_CASE(optype, op, cvt, element);               \
> -                break;                                                  \
> -            case 2:                                                     \
> -                VARITHSAT_CASE(optype, op, cvt, element);               \
> -                break;                                                  \
> -            case 4:                                                     \
> -                VARITHSAT_CASE(optype, op, cvt, element);               \
> -                break;                                                  \
> -            }                                                           \
> +            VARITHSAT_CASE(optype, op, cvt, element);                   \
>          }                                                               \
>          if (sat) {                                                      \
> -            set_vscr_sat(env);                                          \
> +            vscr_sat->u32[0] = 1;                                       \
>          }                                                               \
>      }
>  #define VARITHSAT_SIGNED(suffix, element, optype, cvt)          \
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index 1c0c461241..c6a53a9f63 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -548,22 +548,55 @@ GEN_VXFORM(vslo, 6, 16);
>  GEN_VXFORM(vsro, 6, 17);
>  GEN_VXFORM(vaddcuw, 0, 6);
>  GEN_VXFORM(vsubcuw, 0, 22);
> -GEN_VXFORM_ENV(vaddubs, 0, 8);
> +
> +#define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
> +static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
> +                                         TCGv_vec sat, TCGv_vec a,      \
> +                                         TCGv_vec b)                    \
> +{                                                                       \
> +    TCGv_vec x = tcg_temp_new_vec_matching(t);                          \
> +    glue(glue(tcg_gen_, NORM), _vec)(VECE, x, a, b);                    \
> +    glue(glue(tcg_gen_, SAT), _vec)(VECE, t, a, b);                     \
> +    tcg_gen_cmp_vec(TCG_COND_NE, VECE, x, x, t);                        \
> +    tcg_gen_or_vec(VECE, sat, sat, x);                                  \
> +    tcg_temp_free_vec(x);                                               \
> +}                                                                       \
> +static void glue(gen_, NAME)(DisasContext *ctx)                         \
> +{                                                                       \
> +    static const GVecGen4 g = {                                         \
> +        .fniv = glue(glue(gen_, NAME), _vec),                           \
> +        .fno = glue(gen_helper_, NAME),                                 \
> +        .opc = glue(glue(INDEX_op_, NORM), _vec),                       \
> +        .write_aofs = true,                                             \
> +        .vece = VECE,                                                   \
> +    };                                                                  \
> +    if (unlikely(!ctx->altivec_enabled)) {                              \
> +        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
> +        return;                                                         \
> +    }                                                                   \
> +    tcg_gen_gvec_4(avr64_offset(rD(ctx->opcode), true),                 \
> +                   offsetof(CPUPPCState, vscr_sat),                     \
> +                   avr64_offset(rA(ctx->opcode), true),                 \
> +                   avr64_offset(rB(ctx->opcode), true),                 \
> +                   16, 16, &g);                                         \
> +}
> +
> +GEN_VXFORM_SAT(vaddubs, MO_8, add, usadd, 0, 8);
>  GEN_VXFORM_DUAL_EXT(vaddubs, PPC_ALTIVEC, PPC_NONE, 0,       \
>                      vmul10uq, PPC_NONE, PPC2_ISA300, 0x0000F800)
> -GEN_VXFORM_ENV(vadduhs, 0, 9);
> +GEN_VXFORM_SAT(vadduhs, MO_16, add, usadd, 0, 9);
>  GEN_VXFORM_DUAL(vadduhs, PPC_ALTIVEC, PPC_NONE, \
>                  vmul10euq, PPC_NONE, PPC2_ISA300)
> -GEN_VXFORM_ENV(vadduws, 0, 10);
> -GEN_VXFORM_ENV(vaddsbs, 0, 12);
> -GEN_VXFORM_ENV(vaddshs, 0, 13);
> -GEN_VXFORM_ENV(vaddsws, 0, 14);
> -GEN_VXFORM_ENV(vsububs, 0, 24);
> -GEN_VXFORM_ENV(vsubuhs, 0, 25);
> -GEN_VXFORM_ENV(vsubuws, 0, 26);
> -GEN_VXFORM_ENV(vsubsbs, 0, 28);
> -GEN_VXFORM_ENV(vsubshs, 0, 29);
> -GEN_VXFORM_ENV(vsubsws, 0, 30);
> +GEN_VXFORM_SAT(vadduws, MO_32, add, usadd, 0, 10);
> +GEN_VXFORM_SAT(vaddsbs, MO_8, add, ssadd, 0, 12);
> +GEN_VXFORM_SAT(vaddshs, MO_16, add, ssadd, 0, 13);
> +GEN_VXFORM_SAT(vaddsws, MO_32, add, ssadd, 0, 14);
> +GEN_VXFORM_SAT(vsububs, MO_8, sub, ussub, 0, 24);
> +GEN_VXFORM_SAT(vsubuhs, MO_16, sub, ussub, 0, 25);
> +GEN_VXFORM_SAT(vsubuws, MO_32, sub, ussub, 0, 26);
> +GEN_VXFORM_SAT(vsubsbs, MO_8, sub, sssub, 0, 28);
> +GEN_VXFORM_SAT(vsubshs, MO_16, sub, sssub, 0, 29);
> +GEN_VXFORM_SAT(vsubsws, MO_32, sub, sssub, 0, 30);
>  GEN_VXFORM(vadduqm, 0, 4);
>  GEN_VXFORM(vaddcuq, 0, 5);
>  GEN_VXFORM3(vaddeuqm, 30, 0);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 34/34] target/ppc: convert vmin* and vmax* to vector operations
  2018-12-18  6:39 ` [Qemu-devel] [PATCH 34/34] target/ppc: convert vmin* and vmax* " Richard Henderson
@ 2018-12-19  6:42   ` David Gibson
  0 siblings, 0 replies; 75+ messages in thread
From: David Gibson @ 2018-12-19  6:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5621 bytes --]

On Mon, Dec 17, 2018 at 10:39:11PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target/ppc/helper.h                 | 16 ---------------
>  target/ppc/int_helper.c             | 27 ------------------------
>  target/ppc/translate/vmx-impl.inc.c | 32 ++++++++++++++---------------
>  3 files changed, 16 insertions(+), 59 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3daf6bf863..18910d18a4 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -117,22 +117,6 @@ DEF_HELPER_3(vabsduw, void, avr, avr, avr)
>  DEF_HELPER_3(vavgsb, void, avr, avr, avr)
>  DEF_HELPER_3(vavgsh, void, avr, avr, avr)
>  DEF_HELPER_3(vavgsw, void, avr, avr, avr)
> -DEF_HELPER_3(vminsb, void, avr, avr, avr)
> -DEF_HELPER_3(vminsh, void, avr, avr, avr)
> -DEF_HELPER_3(vminsw, void, avr, avr, avr)
> -DEF_HELPER_3(vminsd, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxsb, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxsh, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxsw, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxsd, void, avr, avr, avr)
> -DEF_HELPER_3(vminub, void, avr, avr, avr)
> -DEF_HELPER_3(vminuh, void, avr, avr, avr)
> -DEF_HELPER_3(vminuw, void, avr, avr, avr)
> -DEF_HELPER_3(vminud, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxub, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxuh, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxuw, void, avr, avr, avr)
> -DEF_HELPER_3(vmaxud, void, avr, avr, avr)
>  DEF_HELPER_4(vcmpequb, void, env, avr, avr, avr)
>  DEF_HELPER_4(vcmpequh, void, env, avr, avr, avr)
>  DEF_HELPER_4(vcmpequw, void, env, avr, avr, avr)
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 22671c71e5..b9793364fd 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -937,33 +937,6 @@ void helper_vmhraddshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
>      }
>  }
>  
> -#define VMINMAX_DO(name, compare, element)                              \
> -    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
> -    {                                                                   \
> -        int i;                                                          \
> -                                                                        \
> -        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
> -            if (a->element[i] compare b->element[i]) {                  \
> -                r->element[i] = b->element[i];                          \
> -            } else {                                                    \
> -                r->element[i] = a->element[i];                          \
> -            }                                                           \
> -        }                                                               \
> -    }
> -#define VMINMAX(suffix, element)                \
> -    VMINMAX_DO(min##suffix, >, element)         \
> -    VMINMAX_DO(max##suffix, <, element)
> -VMINMAX(sb, s8)
> -VMINMAX(sh, s16)
> -VMINMAX(sw, s32)
> -VMINMAX(sd, s64)
> -VMINMAX(ub, u8)
> -VMINMAX(uh, u16)
> -VMINMAX(uw, u32)
> -VMINMAX(ud, u64)
> -#undef VMINMAX_DO
> -#undef VMINMAX
> -
>  void helper_vmladduhm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
>  {
>      int i;
> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
> index c6a53a9f63..399d18707f 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -412,22 +412,22 @@ GEN_VXFORM_V(vsububm, MO_8, tcg_gen_gvec_sub, 0, 16);
>  GEN_VXFORM_V(vsubuhm, MO_16, tcg_gen_gvec_sub, 0, 17);
>  GEN_VXFORM_V(vsubuwm, MO_32, tcg_gen_gvec_sub, 0, 18);
>  GEN_VXFORM_V(vsubudm, MO_64, tcg_gen_gvec_sub, 0, 19);
> -GEN_VXFORM(vmaxub, 1, 0);
> -GEN_VXFORM(vmaxuh, 1, 1);
> -GEN_VXFORM(vmaxuw, 1, 2);
> -GEN_VXFORM(vmaxud, 1, 3);
> -GEN_VXFORM(vmaxsb, 1, 4);
> -GEN_VXFORM(vmaxsh, 1, 5);
> -GEN_VXFORM(vmaxsw, 1, 6);
> -GEN_VXFORM(vmaxsd, 1, 7);
> -GEN_VXFORM(vminub, 1, 8);
> -GEN_VXFORM(vminuh, 1, 9);
> -GEN_VXFORM(vminuw, 1, 10);
> -GEN_VXFORM(vminud, 1, 11);
> -GEN_VXFORM(vminsb, 1, 12);
> -GEN_VXFORM(vminsh, 1, 13);
> -GEN_VXFORM(vminsw, 1, 14);
> -GEN_VXFORM(vminsd, 1, 15);
> +GEN_VXFORM_V(vmaxub, MO_8, tcg_gen_gvec_umax, 1, 0);
> +GEN_VXFORM_V(vmaxuh, MO_16, tcg_gen_gvec_umax, 1, 1);
> +GEN_VXFORM_V(vmaxuw, MO_32, tcg_gen_gvec_umax, 1, 2);
> +GEN_VXFORM_V(vmaxud, MO_64, tcg_gen_gvec_umax, 1, 3);
> +GEN_VXFORM_V(vmaxsb, MO_8, tcg_gen_gvec_smax, 1, 4);
> +GEN_VXFORM_V(vmaxsh, MO_16, tcg_gen_gvec_smax, 1, 5);
> +GEN_VXFORM_V(vmaxsw, MO_32, tcg_gen_gvec_smax, 1, 6);
> +GEN_VXFORM_V(vmaxsd, MO_64, tcg_gen_gvec_smax, 1, 7);
> +GEN_VXFORM_V(vminub, MO_8, tcg_gen_gvec_umin, 1, 8);
> +GEN_VXFORM_V(vminuh, MO_16, tcg_gen_gvec_umin, 1, 9);
> +GEN_VXFORM_V(vminuw, MO_32, tcg_gen_gvec_umin, 1, 10);
> +GEN_VXFORM_V(vminud, MO_64, tcg_gen_gvec_umin, 1, 11);
> +GEN_VXFORM_V(vminsb, MO_8, tcg_gen_gvec_smin, 1, 12);
> +GEN_VXFORM_V(vminsh, MO_16, tcg_gen_gvec_smin, 1, 13);
> +GEN_VXFORM_V(vminsw, MO_32, tcg_gen_gvec_smin, 1, 14);
> +GEN_VXFORM_V(vminsd, MO_64, tcg_gen_gvec_smin, 1, 15);
>  GEN_VXFORM(vavgub, 1, 16);
>  GEN_VXFORM(vabsdub, 1, 16);
>  GEN_VXFORM_DUAL(vavgub, PPC_ALTIVEC, PPC_NONE, \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-19  6:15   ` David Gibson
@ 2018-12-19 12:29     ` Mark Cave-Ayland
  2018-12-20 16:52       ` Mark Cave-Ayland
  0 siblings, 1 reply; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-19 12:29 UTC (permalink / raw)
  To: David Gibson, Richard Henderson; +Cc: qemu-ppc, qemu-devel

On 19/12/2018 06:15, David Gibson wrote:

> On Mon, Dec 17, 2018 at 10:38:48PM -0800, Richard Henderson wrote:
>> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>
>> These helpers allow us to move FP register values to/from the specified TCGv_i64
>> argument in the VSR helpers to be introduced shortly.
>>
>> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
>> temporaries as required.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> Message-Id: <20181217122405.18732-2-mark.cave-ayland@ilande.co.uk>
> 
> Acked-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Do you want me to take these, or will you take them via your tree?

Well as discussed yesterday with Richard, I've already found another couple of bugs
in this version: a sign-extension bug, plus some leaking temporaries so there will at
least need to be a v3 of my patches.

I'm wondering if it makes sense for me to pass the 2 vector operation conversion
patches over to Richard, and for you to take my v3 patchset that does all the
groundwork separately first?


ATB,

Mark.


>> ---
>>  target/ppc/translate.c             |  10 +
>>  target/ppc/translate/fp-impl.inc.c | 490 ++++++++++++++++++++++-------
>>  2 files changed, 390 insertions(+), 110 deletions(-)
>>
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index 2b37910248..1d4bf624a3 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -6694,6 +6694,16 @@ static inline void gen_##name(DisasContext *ctx)               \
>>  GEN_TM_PRIV_NOOP(treclaim);
>>  GEN_TM_PRIV_NOOP(trechkpt);
>>  
>> +static inline void get_fpr(TCGv_i64 dst, int regno)
>> +{
>> +    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
>> +}
>> +
>> +static inline void set_fpr(int regno, TCGv_i64 src)
>> +{
>> +    tcg_gen_mov_i64(cpu_fpr[regno], src);
>> +}
>> +
>>  #include "translate/fp-impl.inc.c"
>>  
>>  #include "translate/vmx-impl.inc.c"
>> diff --git a/target/ppc/translate/fp-impl.inc.c b/target/ppc/translate/fp-impl.inc.c
>> index 08770ba9f5..04b8733055 100644
>> --- a/target/ppc/translate/fp-impl.inc.c
>> +++ b/target/ppc/translate/fp-impl.inc.c
>> @@ -34,24 +34,38 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>>  #define _GEN_FLOAT_ACB(name, op, op1, op2, isfloat, set_fprf, type)           \
>>  static void gen_f##name(DisasContext *ctx)                                    \
>>  {                                                                             \
>> +    TCGv_i64 t0;                                                              \
>> +    TCGv_i64 t1;                                                              \
>> +    TCGv_i64 t2;                                                              \
>> +    TCGv_i64 t3;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>> +    t0 = tcg_temp_new_i64();                                                  \
>> +    t1 = tcg_temp_new_i64();                                                  \
>> +    t2 = tcg_temp_new_i64();                                                  \
>> +    t3 = tcg_temp_new_i64();                                                  \
>>      gen_reset_fpstatus();                                                     \
>> -    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
>> -                     cpu_fpr[rA(ctx->opcode)],                                \
>> -                     cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);     \
>> +    get_fpr(t0, rA(ctx->opcode));                                             \
>> +    get_fpr(t1, rC(ctx->opcode));                                             \
>> +    get_fpr(t2, rB(ctx->opcode));                                             \
>> +    gen_helper_f##op(t3, cpu_env, t0, t1, t2);                                \
>>      if (isfloat) {                                                            \
>> -        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
>> -                        cpu_fpr[rD(ctx->opcode)]);                            \
>> +        get_fpr(t0, rD(ctx->opcode));                                         \
>> +        gen_helper_frsp(t3, cpu_env, t0);                                     \
>>      }                                                                         \
>> +    set_fpr(rD(ctx->opcode), t3);                                             \
>>      if (set_fprf) {                                                           \
>> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
>> +        gen_compute_fprf_float64(t3);                                         \
>>      }                                                                         \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>>          gen_set_cr1_from_fpscr(ctx);                                          \
>>      }                                                                         \
>> +    tcg_temp_free_i64(t0);                                                    \
>> +    tcg_temp_free_i64(t1);                                                    \
>> +    tcg_temp_free_i64(t2);                                                    \
>> +    tcg_temp_free_i64(t3);                                                    \
>>  }
>>  
>>  #define GEN_FLOAT_ACB(name, op2, set_fprf, type)                              \
>> @@ -61,24 +75,34 @@ _GEN_FLOAT_ACB(name##s, name, 0x3B, op2, 1, set_fprf, type);
>>  #define _GEN_FLOAT_AB(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
>>  static void gen_f##name(DisasContext *ctx)                                    \
>>  {                                                                             \
>> +    TCGv_i64 t0;                                                              \
>> +    TCGv_i64 t1;                                                              \
>> +    TCGv_i64 t2;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>> +    t0 = tcg_temp_new_i64();                                                  \
>> +    t1 = tcg_temp_new_i64();                                                  \
>> +    t2 = tcg_temp_new_i64();                                                  \
>>      gen_reset_fpstatus();                                                     \
>> -    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
>> -                     cpu_fpr[rA(ctx->opcode)],                                \
>> -                     cpu_fpr[rB(ctx->opcode)]);                               \
>> +    get_fpr(t0, rA(ctx->opcode));                                             \
>> +    get_fpr(t1, rB(ctx->opcode));                                             \
>> +    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
>>      if (isfloat) {                                                            \
>> -        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
>> -                        cpu_fpr[rD(ctx->opcode)]);                            \
>> +        get_fpr(t0, rD(ctx->opcode));                                         \
>> +        gen_helper_frsp(t2, cpu_env, t0);                                     \
>>      }                                                                         \
>> +    set_fpr(rD(ctx->opcode), t2);                                             \
>>      if (set_fprf) {                                                           \
>> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
>> +        gen_compute_fprf_float64(t2);                                         \
>>      }                                                                         \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>>          gen_set_cr1_from_fpscr(ctx);                                          \
>>      }                                                                         \
>> +    tcg_temp_free_i64(t0);                                                    \
>> +    tcg_temp_free_i64(t1);                                                    \
>> +    tcg_temp_free_i64(t2);                                                    \
>>  }
>>  #define GEN_FLOAT_AB(name, op2, inval, set_fprf, type)                        \
>>  _GEN_FLOAT_AB(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
>> @@ -87,24 +111,35 @@ _GEN_FLOAT_AB(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
>>  #define _GEN_FLOAT_AC(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
>>  static void gen_f##name(DisasContext *ctx)                                    \
>>  {                                                                             \
>> +    TCGv_i64 t0;                                                              \
>> +    TCGv_i64 t1;                                                              \
>> +    TCGv_i64 t2;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>> +    t0 = tcg_temp_new_i64();                                                  \
>> +    t1 = tcg_temp_new_i64();                                                  \
>> +    t2 = tcg_temp_new_i64();                                                  \
>>      gen_reset_fpstatus();                                                     \
>> -    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
>> -                     cpu_fpr[rA(ctx->opcode)],                                \
>> -                     cpu_fpr[rC(ctx->opcode)]);                               \
>> +    get_fpr(t0, rA(ctx->opcode));                                             \
>> +    get_fpr(t1, rC(ctx->opcode));                                             \
>> +    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
>> +    set_fpr(rD(ctx->opcode), t2);                                             \
>>      if (isfloat) {                                                            \
>> -        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
>> -                        cpu_fpr[rD(ctx->opcode)]);                            \
>> +        get_fpr(t0, rD(ctx->opcode));                                         \
>> +        gen_helper_frsp(t2, cpu_env, t0);                                     \
>> +        set_fpr(rD(ctx->opcode), t2);                                         \
>>      }                                                                         \
>>      if (set_fprf) {                                                           \
>> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
>> +        gen_compute_fprf_float64(t2);                                         \
>>      }                                                                         \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>>          gen_set_cr1_from_fpscr(ctx);                                          \
>>      }                                                                         \
>> +    tcg_temp_free_i64(t0);                                                    \
>> +    tcg_temp_free_i64(t1);                                                    \
>> +    tcg_temp_free_i64(t2);                                                    \
>>  }
>>  #define GEN_FLOAT_AC(name, op2, inval, set_fprf, type)                        \
>>  _GEN_FLOAT_AC(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
>> @@ -113,37 +148,51 @@ _GEN_FLOAT_AC(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
>>  #define GEN_FLOAT_B(name, op2, op3, set_fprf, type)                           \
>>  static void gen_f##name(DisasContext *ctx)                                    \
>>  {                                                                             \
>> +    TCGv_i64 t0;                                                              \
>> +    TCGv_i64 t1;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>> +    t0 = tcg_temp_new_i64();                                                  \
>> +    t1 = tcg_temp_new_i64();                                                  \
>>      gen_reset_fpstatus();                                                     \
>> -    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
>> -                       cpu_fpr[rB(ctx->opcode)]);                             \
>> +    get_fpr(t0, rB(ctx->opcode));                                             \
>> +    gen_helper_f##name(t1, cpu_env, t0);                                      \
>> +    set_fpr(rD(ctx->opcode), t1);                                             \
>>      if (set_fprf) {                                                           \
>> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
>> +        gen_compute_fprf_float64(t1);                                         \
>>      }                                                                         \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>>          gen_set_cr1_from_fpscr(ctx);                                          \
>>      }                                                                         \
>> +    tcg_temp_free_i64(t0);                                                    \
>> +    tcg_temp_free_i64(t1);                                                    \
>>  }
>>  
>>  #define GEN_FLOAT_BS(name, op1, op2, set_fprf, type)                          \
>>  static void gen_f##name(DisasContext *ctx)                                    \
>>  {                                                                             \
>> +    TCGv_i64 t0;                                                              \
>> +    TCGv_i64 t1;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>> +    t0 = tcg_temp_new_i64();                                                  \
>> +    t1 = tcg_temp_new_i64();                                                  \
>>      gen_reset_fpstatus();                                                     \
>> -    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
>> -                       cpu_fpr[rB(ctx->opcode)]);                             \
>> +    get_fpr(t0, rB(ctx->opcode));                                             \
>> +    gen_helper_f##name(t1, cpu_env, t0);                                      \
>> +    set_fpr(rD(ctx->opcode), t1);                                             \
>>      if (set_fprf) {                                                           \
>> -        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
>> +        gen_compute_fprf_float64(t1);                                         \
>>      }                                                                         \
>>      if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
>>          gen_set_cr1_from_fpscr(ctx);                                          \
>>      }                                                                         \
>> +    tcg_temp_free_i64(t0);                                                    \
>> +    tcg_temp_free_i64(t1);                                                    \
>>  }
>>  
>>  /* fadd - fadds */
>> @@ -165,19 +214,25 @@ GEN_FLOAT_BS(rsqrte, 0x3F, 0x1A, 1, PPC_FLOAT_FRSQRTE);
>>  /* frsqrtes */
>>  static void gen_frsqrtes(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>>      gen_reset_fpstatus();
>> -    gen_helper_frsqrte(cpu_fpr[rD(ctx->opcode)], cpu_env,
>> -                       cpu_fpr[rB(ctx->opcode)]);
>> -    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
>> -                    cpu_fpr[rD(ctx->opcode)]);
>> -    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    gen_helper_frsqrte(t1, cpu_env, t0);
>> +    gen_helper_frsp(t1, cpu_env, t1);
>> +    set_fpr(rD(ctx->opcode), t1);
>> +    gen_compute_fprf_float64(t1);
>>      if (unlikely(Rc(ctx->opcode) != 0)) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* fsel */
>> @@ -189,34 +244,47 @@ GEN_FLOAT_AB(sub, 0x14, 0x000007C0, 1, PPC_FLOAT);
>>  /* fsqrt */
>>  static void gen_fsqrt(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>>      gen_reset_fpstatus();
>> -    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
>> -                     cpu_fpr[rB(ctx->opcode)]);
>> -    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    gen_helper_fsqrt(t1, cpu_env, t0);
>> +    set_fpr(rD(ctx->opcode), t1);
>> +    gen_compute_fprf_float64(t1);
>>      if (unlikely(Rc(ctx->opcode) != 0)) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  static void gen_fsqrts(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>>      gen_reset_fpstatus();
>> -    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
>> -                     cpu_fpr[rB(ctx->opcode)]);
>> -    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
>> -                    cpu_fpr[rD(ctx->opcode)]);
>> -    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    gen_helper_fsqrt(t1, cpu_env, t0);
>> +    gen_helper_frsp(t1, cpu_env, t1);
>> +    set_fpr(rD(ctx->opcode), t1);
>> +    gen_compute_fprf_float64(t1);
>>      if (unlikely(Rc(ctx->opcode) != 0)) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /***                     Floating-Point multiply-and-add                   ***/
>> @@ -268,21 +336,32 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
>>  
>>  static void gen_ftdiv(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
>> -                     cpu_fpr[rB(ctx->opcode)]);
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    get_fpr(t0, rA(ctx->opcode));
>> +    get_fpr(t1, rB(ctx->opcode));
>> +    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], t0, t1);
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  static void gen_ftsqrt(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
>> +    t0 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], t0);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  
>> @@ -293,32 +372,46 @@ static void gen_ftsqrt(DisasContext *ctx)
>>  static void gen_fcmpo(DisasContext *ctx)
>>  {
>>      TCGv_i32 crf;
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>>      gen_reset_fpstatus();
>>      crf = tcg_const_i32(crfD(ctx->opcode));
>> -    gen_helper_fcmpo(cpu_env, cpu_fpr[rA(ctx->opcode)],
>> -                     cpu_fpr[rB(ctx->opcode)], crf);
>> +    get_fpr(t0, rA(ctx->opcode));
>> +    get_fpr(t1, rB(ctx->opcode));
>> +    gen_helper_fcmpo(cpu_env, t0, t1, crf);
>>      tcg_temp_free_i32(crf);
>>      gen_helper_float_check_status(cpu_env);
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* fcmpu */
>>  static void gen_fcmpu(DisasContext *ctx)
>>  {
>>      TCGv_i32 crf;
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>>      gen_reset_fpstatus();
>>      crf = tcg_const_i32(crfD(ctx->opcode));
>> -    gen_helper_fcmpu(cpu_env, cpu_fpr[rA(ctx->opcode)],
>> -                     cpu_fpr[rB(ctx->opcode)], crf);
>> +    get_fpr(t0, rA(ctx->opcode));
>> +    get_fpr(t1, rB(ctx->opcode));
>> +    gen_helper_fcmpu(cpu_env, t0, t1, crf);
>>      tcg_temp_free_i32(crf);
>>      gen_helper_float_check_status(cpu_env);
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /***                         Floating-point move                           ***/
>> @@ -326,100 +419,153 @@ static void gen_fcmpu(DisasContext *ctx)
>>  /* XXX: beware that fabs never checks for NaNs nor update FPSCR */
>>  static void gen_fabs(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    tcg_gen_andi_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
>> -                     ~(1ULL << 63));
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    tcg_gen_andi_i64(t1, t0, ~(1ULL << 63));
>> +    set_fpr(rD(ctx->opcode), t1);
>>      if (unlikely(Rc(ctx->opcode))) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* fmr  - fmr. */
>>  /* XXX: beware that fmr never checks for NaNs nor update FPSCR */
>>  static void gen_fmr(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    tcg_gen_mov_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
>> +    t0 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    set_fpr(rD(ctx->opcode), t0);
>>      if (unlikely(Rc(ctx->opcode))) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* fnabs */
>>  /* XXX: beware that fnabs never checks for NaNs nor update FPSCR */
>>  static void gen_fnabs(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    tcg_gen_ori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
>> -                    1ULL << 63);
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    tcg_gen_ori_i64(t1, t0, 1ULL << 63);
>> +    set_fpr(rD(ctx->opcode), t1);
>>      if (unlikely(Rc(ctx->opcode))) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* fneg */
>>  /* XXX: beware that fneg never checks for NaNs nor update FPSCR */
>>  static void gen_fneg(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    tcg_gen_xori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
>> -                     1ULL << 63);
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    tcg_gen_xori_i64(t1, t0, 1ULL << 63);
>> +    set_fpr(rD(ctx->opcode), t1);
>>      if (unlikely(Rc(ctx->opcode))) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* fcpsgn: PowerPC 2.05 specification */
>>  /* XXX: beware that fcpsgn never checks for NaNs nor update FPSCR */
>>  static void gen_fcpsgn(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>> +    TCGv_i64 t2;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
>> -                        cpu_fpr[rB(ctx->opcode)], 0, 63);
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    t2 = tcg_temp_new_i64();
>> +    get_fpr(t0, rA(ctx->opcode));
>> +    get_fpr(t1, rB(ctx->opcode));
>> +    tcg_gen_deposit_i64(t2, t0, t1, 0, 63);
>> +    set_fpr(rD(ctx->opcode), t2);
>>      if (unlikely(Rc(ctx->opcode))) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>  }
>>  
>>  static void gen_fmrgew(DisasContext *ctx)
>>  {
>>      TCGv_i64 b0;
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>>      b0 = tcg_temp_new_i64();
>> -    tcg_gen_shri_i64(b0, cpu_fpr[rB(ctx->opcode)], 32);
>> -    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
>> -                        b0, 0, 32);
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    tcg_gen_shri_i64(b0, t0, 32);
>> +    get_fpr(t0, rA(ctx->opcode));
>> +    tcg_gen_deposit_i64(t1, t0, b0, 0, 32);
>> +    set_fpr(rD(ctx->opcode), t1);
>>      tcg_temp_free_i64(b0);
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  static void gen_fmrgow(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>> +    TCGv_i64 t1;
>> +    TCGv_i64 t2;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> -    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)],
>> -                        cpu_fpr[rB(ctx->opcode)],
>> -                        cpu_fpr[rA(ctx->opcode)],
>> -                        32, 32);
>> +    t0 = tcg_temp_new_i64();
>> +    t1 = tcg_temp_new_i64();
>> +    t2 = tcg_temp_new_i64();
>> +    get_fpr(t0, rB(ctx->opcode));
>> +    get_fpr(t1, rA(ctx->opcode));
>> +    tcg_gen_deposit_i64(t2, t0, t1, 32, 32);
>> +    set_fpr(rD(ctx->opcode), t2);
>> +    tcg_temp_free_i64(t0);
>> +    tcg_temp_free_i64(t1);
>> +    tcg_temp_free_i64(t2);
>>  }
>>  
>>  /***                  Floating-Point status & ctrl register                ***/
>> @@ -458,15 +604,19 @@ static void gen_mcrfs(DisasContext *ctx)
>>  /* mffs */
>>  static void gen_mffs(DisasContext *ctx)
>>  {
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>> +    t0 = tcg_temp_new_i64();
>>      gen_reset_fpstatus();
>> -    tcg_gen_extu_tl_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpscr);
>> +    tcg_gen_extu_tl_i64(t0, cpu_fpscr);
>> +    set_fpr(rD(ctx->opcode), t0);
>>      if (unlikely(Rc(ctx->opcode))) {
>>          gen_set_cr1_from_fpscr(ctx);
>>      }
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* mtfsb0 */
>> @@ -522,6 +672,7 @@ static void gen_mtfsb1(DisasContext *ctx)
>>  static void gen_mtfsf(DisasContext *ctx)
>>  {
>>      TCGv_i32 t0;
>> +    TCGv_i64 t1;
>>      int flm, l, w;
>>  
>>      if (unlikely(!ctx->fpu_enabled)) {
>> @@ -541,7 +692,9 @@ static void gen_mtfsf(DisasContext *ctx)
>>      } else {
>>          t0 = tcg_const_i32(flm << (w * 8));
>>      }
>> -    gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
>> +    t1 = tcg_temp_new_i64();
>> +    get_fpr(t1, rB(ctx->opcode));
>> +    gen_helper_store_fpscr(cpu_env, t1, t0);
>>      tcg_temp_free_i32(t0);
>>      if (unlikely(Rc(ctx->opcode) != 0)) {
>>          tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
>> @@ -549,6 +702,7 @@ static void gen_mtfsf(DisasContext *ctx)
>>      }
>>      /* We can raise a differed exception */
>>      gen_helper_float_check_status(cpu_env);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* mtfsfi */
>> @@ -588,21 +742,26 @@ static void gen_mtfsfi(DisasContext *ctx)
>>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_imm_index(ctx, EA, 0);                                           \
>> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
>> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
>> +    set_fpr(rD(ctx->opcode), t0);                                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_LDUF(name, ldop, opc, type)                                       \
>>  static void glue(gen_, name##u)(DisasContext *ctx)                                    \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>> @@ -613,20 +772,25 @@ static void glue(gen_, name##u)(DisasContext *ctx)
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_imm_index(ctx, EA, 0);                                           \
>> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
>> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
>> +    set_fpr(rD(ctx->opcode), t0);                                             \
>>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_LDUXF(name, ldop, opc, type)                                      \
>>  static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      if (unlikely(rA(ctx->opcode) == 0)) {                                     \
>>          gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);                   \
>>          return;                                                               \
>> @@ -634,24 +798,30 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>>      gen_addr_reg_index(ctx, EA);                                              \
>> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
>> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
>> +    set_fpr(rD(ctx->opcode), t0);                                             \
>>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_LDXF(name, ldop, opc2, opc3, type)                                \
>>  static void glue(gen_, name##x)(DisasContext *ctx)                                    \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_reg_index(ctx, EA);                                              \
>> -    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
>> +    gen_qemu_##ldop(ctx, t0, EA);                                             \
>> +    set_fpr(rD(ctx->opcode), t0);                                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_LDFS(name, ldop, op, type)                                        \
>> @@ -677,6 +847,7 @@ GEN_LDFS(lfs, ld32fs, 0x10, PPC_FLOAT);
>>  static void gen_lfdepx(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      CHK_SV;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>> @@ -684,16 +855,19 @@ static void gen_lfdepx(DisasContext *ctx)
>>      }
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>> +    t0 = tcg_temp_new_i64();
>>      gen_addr_reg_index(ctx, EA);
>> -    tcg_gen_qemu_ld_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_LOAD,
>> -        DEF_MEMOP(MO_Q));
>> +    tcg_gen_qemu_ld_i64(t0, EA, PPC_TLB_EPID_LOAD, DEF_MEMOP(MO_Q));
>> +    set_fpr(rD(ctx->opcode), t0);
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* lfdp */
>>  static void gen_lfdp(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>> @@ -701,24 +875,31 @@ static void gen_lfdp(DisasContext *ctx)
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>>      gen_addr_imm_index(ctx, EA, 0);
>> +    t0 = tcg_temp_new_i64();
>>      /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
>>         necessary 64-bit byteswap already. */
>>      if (unlikely(ctx->le_mode)) {
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode) + 1, t0);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode), t0);
>>      } else {
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode), t0);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode) + 1, t0);
>>      }
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* lfdpx */
>>  static void gen_lfdpx(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>> @@ -726,18 +907,24 @@ static void gen_lfdpx(DisasContext *ctx)
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>>      gen_addr_reg_index(ctx, EA);
>> +    t0 = tcg_temp_new_i64();
>>      /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
>>         necessary 64-bit byteswap already. */
>>      if (unlikely(ctx->le_mode)) {
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode) + 1, t0);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode), t0);
>>      } else {
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode), t0);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        gen_qemu_ld64_i64(ctx, t0, EA);
>> +        set_fpr(rD(ctx->opcode) + 1, t0);
>>      }
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* lfiwax */
>> @@ -745,6 +932,7 @@ static void gen_lfiwax(DisasContext *ctx)
>>  {
>>      TCGv EA;
>>      TCGv t0;
>> +    TCGv_i64 t1;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>> @@ -752,47 +940,59 @@ static void gen_lfiwax(DisasContext *ctx)
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>>      t0 = tcg_temp_new();
>> +    t1 = tcg_temp_new_i64();
>>      gen_addr_reg_index(ctx, EA);
>>      gen_qemu_ld32s(ctx, t0, EA);
>> -    tcg_gen_ext_tl_i64(cpu_fpr[rD(ctx->opcode)], t0);
>> +    tcg_gen_ext_tl_i64(t1, t0);
>> +    set_fpr(rD(ctx->opcode), t1);
>>      tcg_temp_free(EA);
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* lfiwzx */
>>  static void gen_lfiwzx(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>> +    t0 = tcg_temp_new_i64();
>>      gen_addr_reg_index(ctx, EA);
>> -    gen_qemu_ld32u_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +    gen_qemu_ld32u_i64(ctx, t0, EA);
>> +    set_fpr(rD(ctx->opcode), t0);
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  /***                         Floating-point store                          ***/
>>  #define GEN_STF(name, stop, opc, type)                                        \
>>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_imm_index(ctx, EA, 0);                                           \
>> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
>> +    get_fpr(t0, rS(ctx->opcode));                                             \
>> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_STUF(name, stop, opc, type)                                       \
>>  static void glue(gen_, name##u)(DisasContext *ctx)                                    \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>> @@ -803,16 +1003,20 @@ static void glue(gen_, name##u)(DisasContext *ctx)
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_imm_index(ctx, EA, 0);                                           \
>> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
>> +    get_fpr(t0, rS(ctx->opcode));                                             \
>> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_STUXF(name, stop, opc, type)                                      \
>>  static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>> @@ -823,25 +1027,32 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_reg_index(ctx, EA);                                              \
>> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
>> +    get_fpr(t0, rS(ctx->opcode));                                             \
>> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>>      tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_STXF(name, stop, opc2, opc3, type)                                \
>>  static void glue(gen_, name##x)(DisasContext *ctx)                                    \
>>  {                                                                             \
>>      TCGv EA;                                                                  \
>> +    TCGv_i64 t0;                                                              \
>>      if (unlikely(!ctx->fpu_enabled)) {                                        \
>>          gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
>>          return;                                                               \
>>      }                                                                         \
>>      gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
>>      EA = tcg_temp_new();                                                      \
>> +    t0 = tcg_temp_new_i64();                                                  \
>>      gen_addr_reg_index(ctx, EA);                                              \
>> -    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
>> +    get_fpr(t0, rS(ctx->opcode));                                             \
>> +    gen_qemu_##stop(ctx, t0, EA);                                             \
>>      tcg_temp_free(EA);                                                        \
>> +    tcg_temp_free_i64(t0);                                                    \
>>  }
>>  
>>  #define GEN_STFS(name, stop, op, type)                                        \
>> @@ -867,6 +1078,7 @@ GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT);
>>  static void gen_stfdepx(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      CHK_SV;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>> @@ -874,60 +1086,76 @@ static void gen_stfdepx(DisasContext *ctx)
>>      }
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>> +    t0 = tcg_temp_new_i64();
>>      gen_addr_reg_index(ctx, EA);
>> -    tcg_gen_qemu_st_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_STORE,
>> -                       DEF_MEMOP(MO_Q));
>> +    get_fpr(t0, rD(ctx->opcode));
>> +    tcg_gen_qemu_st_i64(t0, EA, PPC_TLB_EPID_STORE, DEF_MEMOP(MO_Q));
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* stfdp */
>>  static void gen_stfdp(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>> +    t0 = tcg_temp_new_i64();
>>      gen_addr_imm_index(ctx, EA, 0);
>>      /* We only need to swap high and low halves. gen_qemu_st64_i64 does
>>         necessary 64-bit byteswap already. */
>>      if (unlikely(ctx->le_mode)) {
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        get_fpr(t0, rD(ctx->opcode) + 1);
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        get_fpr(t0, rD(ctx->opcode));
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>      } else {
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        get_fpr(t0, rD(ctx->opcode));
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        get_fpr(t0, rD(ctx->opcode) + 1);
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>      }
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* stfdpx */
>>  static void gen_stfdpx(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> +    TCGv_i64 t0;
>>      if (unlikely(!ctx->fpu_enabled)) {
>>          gen_exception(ctx, POWERPC_EXCP_FPU);
>>          return;
>>      }
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      EA = tcg_temp_new();
>> +    t0 = tcg_temp_new_i64();
>>      gen_addr_reg_index(ctx, EA);
>>      /* We only need to swap high and low halves. gen_qemu_st64_i64 does
>>         necessary 64-bit byteswap already. */
>>      if (unlikely(ctx->le_mode)) {
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        get_fpr(t0, rD(ctx->opcode) + 1);
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        get_fpr(t0, rD(ctx->opcode));
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>      } else {
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
>> +        get_fpr(t0, rD(ctx->opcode));
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>          tcg_gen_addi_tl(EA, EA, 8);
>> -        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
>> +        get_fpr(t0, rD(ctx->opcode) + 1);
>> +        gen_qemu_st64_i64(ctx, t0, EA);
>>      }
>>      tcg_temp_free(EA);
>> +    tcg_temp_free_i64(t0);
>>  }
>>  
>>  /* Optional: */
>> @@ -949,13 +1177,18 @@ static void gen_lfq(DisasContext *ctx)
>>  {
>>      int rd = rD(ctx->opcode);
>>      TCGv t0;
>> +    TCGv_i64 t1;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      t0 = tcg_temp_new();
>> +    t1 = tcg_temp_new_i64();
>>      gen_addr_imm_index(ctx, t0, 0);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
>> +    gen_qemu_ld64_i64(ctx, t1, t0);
>> +    set_fpr(rd, t1);
>>      gen_addr_add(ctx, t0, t0, 8);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
>> +    gen_qemu_ld64_i64(ctx, t1, t0);
>> +    set_fpr((rd + 1) % 32, t1);
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* lfqu */
>> @@ -964,17 +1197,22 @@ static void gen_lfqu(DisasContext *ctx)
>>      int ra = rA(ctx->opcode);
>>      int rd = rD(ctx->opcode);
>>      TCGv t0, t1;
>> +    TCGv_i64 t2;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      t0 = tcg_temp_new();
>>      t1 = tcg_temp_new();
>> +    t2 = tcg_temp_new_i64();
>>      gen_addr_imm_index(ctx, t0, 0);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
>> +    gen_qemu_ld64_i64(ctx, t2, t0);
>> +    set_fpr(rd, t2);
>>      gen_addr_add(ctx, t1, t0, 8);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
>> +    gen_qemu_ld64_i64(ctx, t2, t1);
>> +    set_fpr((rd + 1) % 32, t2);
>>      if (ra != 0)
>>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
>>      tcg_temp_free(t0);
>>      tcg_temp_free(t1);
>> +    tcg_temp_free_i64(t2);
>>  }
>>  
>>  /* lfqux */
>> @@ -984,16 +1222,21 @@ static void gen_lfqux(DisasContext *ctx)
>>      int rd = rD(ctx->opcode);
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      TCGv t0, t1;
>> +    TCGv_i64 t2;
>> +    t2 = tcg_temp_new_i64();
>>      t0 = tcg_temp_new();
>>      gen_addr_reg_index(ctx, t0);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
>> +    gen_qemu_ld64_i64(ctx, t2, t0);
>> +    set_fpr(rd, t2);
>>      t1 = tcg_temp_new();
>>      gen_addr_add(ctx, t1, t0, 8);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
>> +    gen_qemu_ld64_i64(ctx, t2, t1);
>> +    set_fpr((rd + 1) % 32, t2);
>>      tcg_temp_free(t1);
>>      if (ra != 0)
>>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t2);
>>  }
>>  
>>  /* lfqx */
>> @@ -1001,13 +1244,18 @@ static void gen_lfqx(DisasContext *ctx)
>>  {
>>      int rd = rD(ctx->opcode);
>>      TCGv t0;
>> +    TCGv_i64 t1;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      t0 = tcg_temp_new();
>> +    t1 = tcg_temp_new_i64();
>>      gen_addr_reg_index(ctx, t0);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
>> +    gen_qemu_ld64_i64(ctx, t1, t0);
>> +    set_fpr(rd, t1);
>>      gen_addr_add(ctx, t0, t0, 8);
>> -    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
>> +    gen_qemu_ld64_i64(ctx, t1, t0);
>> +    set_fpr((rd + 1) % 32, t1);
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* stfq */
>> @@ -1015,13 +1263,18 @@ static void gen_stfq(DisasContext *ctx)
>>  {
>>      int rd = rD(ctx->opcode);
>>      TCGv t0;
>> +    TCGv_i64 t1;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>>      t0 = tcg_temp_new();
>> +    t1 = tcg_temp_new_i64();
>>      gen_addr_imm_index(ctx, t0, 0);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
>> +    get_fpr(t1, rd);
>> +    gen_qemu_st64_i64(ctx, t1, t0);
>>      gen_addr_add(ctx, t0, t0, 8);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
>> +    get_fpr(t1, (rd + 1) % 32);
>> +    gen_qemu_st64_i64(ctx, t1, t0);
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  /* stfqu */
>> @@ -1030,17 +1283,23 @@ static void gen_stfqu(DisasContext *ctx)
>>      int ra = rA(ctx->opcode);
>>      int rd = rD(ctx->opcode);
>>      TCGv t0, t1;
>> +    TCGv_i64 t2;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>> +    t2 = tcg_temp_new_i64();
>>      t0 = tcg_temp_new();
>>      gen_addr_imm_index(ctx, t0, 0);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
>> +    get_fpr(t2, rd);
>> +    gen_qemu_st64_i64(ctx, t2, t0);
>>      t1 = tcg_temp_new();
>>      gen_addr_add(ctx, t1, t0, 8);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
>> +    get_fpr(t2, (rd + 1) % 32);
>> +    gen_qemu_st64_i64(ctx, t2, t1);
>>      tcg_temp_free(t1);
>> -    if (ra != 0)
>> +    if (ra != 0) {
>>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
>> +    }
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t2);
>>  }
>>  
>>  /* stfqux */
>> @@ -1049,17 +1308,23 @@ static void gen_stfqux(DisasContext *ctx)
>>      int ra = rA(ctx->opcode);
>>      int rd = rD(ctx->opcode);
>>      TCGv t0, t1;
>> +    TCGv_i64 t2;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>> +    t2 = tcg_temp_new_i64();
>>      t0 = tcg_temp_new();
>>      gen_addr_reg_index(ctx, t0);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
>> +    get_fpr(t2, rd);
>> +    gen_qemu_st64_i64(ctx, t2, t0);
>>      t1 = tcg_temp_new();
>>      gen_addr_add(ctx, t1, t0, 8);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
>> +    get_fpr(t2, (rd + 1) % 32);
>> +    gen_qemu_st64_i64(ctx, t2, t1);
>>      tcg_temp_free(t1);
>> -    if (ra != 0)
>> +    if (ra != 0) {
>>          tcg_gen_mov_tl(cpu_gpr[ra], t0);
>> +    }
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t2);
>>  }
>>  
>>  /* stfqx */
>> @@ -1067,13 +1332,18 @@ static void gen_stfqx(DisasContext *ctx)
>>  {
>>      int rd = rD(ctx->opcode);
>>      TCGv t0;
>> +    TCGv_i64 t1;
>>      gen_set_access_type(ctx, ACCESS_FLOAT);
>> +    t1 = tcg_temp_new_i64();
>>      t0 = tcg_temp_new();
>>      gen_addr_reg_index(ctx, t0);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
>> +    get_fpr(t1, rd);
>> +    gen_qemu_st64_i64(ctx, t1, t0);
>>      gen_addr_add(ctx, t0, t0, 8);
>> -    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
>> +    get_fpr(t1, (rd + 1) % 32);
>> +    gen_qemu_st64_i64(ctx, t1, t0);
>>      tcg_temp_free(t0);
>> +    tcg_temp_free_i64(t1);
>>  }
>>  
>>  #undef _GEN_FLOAT_ACB
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-19 12:29     ` Mark Cave-Ayland
@ 2018-12-20 16:52       ` Mark Cave-Ayland
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Cave-Ayland @ 2018-12-20 16:52 UTC (permalink / raw)
  To: David Gibson, Richard Henderson; +Cc: qemu-ppc, qemu-devel

On 19/12/2018 12:29, Mark Cave-Ayland wrote:

> On 19/12/2018 06:15, David Gibson wrote:
> 
>> On Mon, Dec 17, 2018 at 10:38:48PM -0800, Richard Henderson wrote:
>>> From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>
>>> These helpers allow us to move FP register values to/from the specified TCGv_i64
>>> argument in the VSR helpers to be introduced shortly.
>>>
>>> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
>>> temporaries as required.
>>>
>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>> Message-Id: <20181217122405.18732-2-mark.cave-ayland@ilande.co.uk>
>>
>> Acked-by: David Gibson <david@gibson.dropbear.id.au>
>>
>> Do you want me to take these, or will you take them via your tree?
> 
> Well as discussed yesterday with Richard, I've already found another couple of bugs
> in this version: a sign-extension bug, plus some leaking temporaries so there will at
> least need to be a v3 of my patches.
> 
> I'm wondering if it makes sense for me to pass the 2 vector operation conversion
> patches over to Richard, and for you to take my v3 patchset that does all the
> groundwork separately first?

So this is the approach I've gone for - I've dropped my TCG vector conversion patches
from the previous iteration, and posted v3 with all my latest fixes as a separate
"prepare for conversion to TCG vector operations" patchset.

Richard - I've rebased your "tcg, target/ppc vector improvements" patchset on top of
my v3 patchset and pushed to https://github.com/mcayland/qemu/commits/ppc-altivec-rth
to make it easier for us both to test.

Note that the 2 TCG vector conversion patches I originally created for v2 are now
included as part of your patchset instead (including a squash of your "target/ppc:
nand, nor, eqv are now generic vector operations" patch).


ATB,

Mark.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18  9:49 ` [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Mark Cave-Ayland
  2018-12-18 14:51   ` Mark Cave-Ayland
  2018-12-18 15:05   ` Mark Cave-Ayland
@ 2019-01-03 14:58   ` Mark Cave-Ayland
  2 siblings, 0 replies; 75+ messages in thread
From: Mark Cave-Ayland @ 2019-01-03 14:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector
> operations" causes corruption of the OS X splash screen
> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be
> an endian issue.

After some more digging I've found out what's going on here by dumping out the AVR
registers before and after:

Before the patch:

BEFORE:
uimm: 0  size: 2
sreg: 99 @ 0x7f54fd7157a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 99 @ 0x7f54fd715870 - 7f ff de ad 7f ff de ad 7f ff de ad 7f ff de ad
AFTER:
dreg: 99 @ 0x7f54fd715870 - 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a

BEFORE:
uimm: 1  size: 2
sreg: 99 @ 0x7f54fd7157a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 99 @ 0x7f54fd715870 - 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a
AFTER:
dreg: 99 @ 0x7f54fd715870 - 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9


After the patch:

BEFORE:
uimm: 0  size: 2
sreg: 5 @ 0x7fe5a0c4a7a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 18 @ 0x7fe5a0c4a870 - 7f ff de ad 7f ff de ad 7f ff de ad 7f ff de ad
AFTER:
dreg: 18 @ 0x7fe5a0c4a870 - 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1

BEFORE:
uimm: 1  size: 2
sreg: 5 @ 0x7fe5a0c4a7a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 18 @ 0x7fe5a0c4a870 - 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1
AFTER:
dreg: 18 @ 0x7fe5a0c4a870 - 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1


As you can see vsplth splat is one byte off with this patch applied and the cause is
the xor in the #ifndef HOST_WORDS_BIGENDIAN block: before the xor is applied, bofs is
aligned to 2 bytes and with bofs ^ 15 the LSB is set to 1 again, introducing the 1
byte error.

Applying the following patch to mask bofs based upon the size of vece seems to fix
the issue here for me on little-endian Intel:

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 59d3bc6e02..41ddbd879f 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -815,6 +815,7 @@ static void gen_vsplt(DisasContext *ctx, int vece)
     bofs += (uimm << vece) & 15;
 #ifndef HOST_WORDS_BIGENDIAN
     bofs ^= 15;
+    bofs &= ~((1 << vece) - 1);
 #endif

     tcg_gen_gvec_dup_mem(vece, dofs, bofs, 16, 16);


ATB,

Mark.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
                   ` (34 preceding siblings ...)
  2018-12-18  9:49 ` [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Mark Cave-Ayland
@ 2019-01-03 18:31 ` Mark Cave-Ayland
  2019-01-04 22:33   ` Richard Henderson
  35 siblings, 1 reply; 75+ messages in thread
From: Mark Cave-Ayland @ 2019-01-03 18:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-ppc, david

On 18/12/2018 06:38, Richard Henderson wrote:

> This implements some of the things that I talked about with Mark
> this morning / yesterday.  In particular:
> 
> (0) Implement expanders for nand, nor, eqv logical operations.
> 
> (1) Implement saturating arithmetic for the tcg backend.
> 
>     While I had expanders for these, they always went to helpers.
>     It's easy enough to expand byte and half-word operations for x86.
>     Beyond that, 32 and 64-bit operations can be expanded with integers.
> 
> (2) Implement minmax arithmetic for the tcg backend.
> 
>     While I had integral minmax operations, I had not yet added
>     any vector expanders for this.  (The integral stuff came in
>     for atomic minmax.)
> 
> (3) Trivial conversions to minmax for target/arm.
> 
> (4) Patches 11-18 are identical to Mark's.
> 
> (5) Patches 19-25 implement splat and logicals for VMX and VSX.
> 
>     VSX is no more difficult than VMX for these.  It does seem to be
>     just about everything that we can do for VSX at the momement.
> 
> (6) Patches 26-33 implement saturating arithmetic for VMX.
> 
> (7) Patch 34 implements minmax arithmetic for VMX.
> 
> I've tested the new operations via aarch64 guest, as that's the set
> of risu test cases I've got handy.  The rest is untested so far.

I've taken my previous PPC patchsets below:

[PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00063.html

[PATCH v2 0/8] target/ppc: remove various endian hacks from int_helper.c
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg06149.html

and then rebased this patchset on top of them (including a squash of the vsplt fix
posted earlier today at
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00287.html) and pushed the
result to https://github.com/mcayland/qemu/tree/ppc-altivec-v5.5-rth.

Fixing the vsplt instruction now gives a readable display in my MacOS tests, but I'm
still seeing "shadows" such as https://www.ilande.co.uk/tmp/qemu/badapple4.png which
I've bisected down to:


commit 71f229eb331e979971a0a79e5a2fcdfb9380bd06
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Mon Dec 17 22:39:10 2018 -0800

    target/ppc: convert vadd*s and vsub*s to vector operations

    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


So looks like there's something still not quite right with the saturation flag/vector
saturation implementation.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
  2019-01-03 18:31 ` Mark Cave-Ayland
@ 2019-01-04 22:33   ` Richard Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2019-01-04 22:33 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: qemu-ppc, david

On 1/4/19 4:31 AM, Mark Cave-Ayland wrote:
> Fixing the vsplt instruction now gives a readable display in my MacOS tests, but I'm
> still seeing "shadows" such as https://www.ilande.co.uk/tmp/qemu/badapple4.png which
> I've bisected down to:
> 
> 
> commit 71f229eb331e979971a0a79e5a2fcdfb9380bd06
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date:   Mon Dec 17 22:39:10 2018 -0800
> 
>     target/ppc: convert vadd*s and vsub*s to vector operations
> 
>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> So looks like there's something still not quite right with the saturation flag/vector
> saturation implementation.
> 

Ok, I'll try and set up some RISU tests to track this down next week.


r~

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2019-01-04 22:33 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  6:38 [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand Richard Henderson
2018-12-19  5:36   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 02/34] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
2018-12-19  5:37   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 03/34] tcg: Add gvec expanders for nand, nor, eqv Richard Henderson
2018-12-19  5:39   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 04/34] tcg: Add write_aofs to GVecGen4 Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 05/34] tcg: Add opcodes for vector saturated arithmetic Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 06/34] tcg/i386: Implement vector saturating arithmetic Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 07/34] tcg: Add opcodes for vector minmax arithmetic Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 08/34] tcg/i386: Implement " Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 09/34] target/arm: Use vector minmax expanders for aarch64 Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 10/34] target/arm: Use vector minmax expanders for aarch32 Richard Henderson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Richard Henderson
2018-12-19  6:15   ` David Gibson
2018-12-19 12:29     ` Mark Cave-Ayland
2018-12-20 16:52       ` Mark Cave-Ayland
2018-12-18  6:38 ` [Qemu-devel] [PATCH 12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Richard Henderson
2018-12-19  6:15   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Richard Henderson
2018-12-19  6:17   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Richard Henderson
2018-12-19  6:20   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Richard Henderson
2018-12-19  6:21   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 16/34] target/ppc: move FP and VMX registers into aligned vsr register array Richard Henderson
2018-12-19  6:27   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 17/34] target/ppc: convert VMX logical instructions to use vector operations Richard Henderson
2018-12-19  6:29   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Richard Henderson
2018-12-19  6:29   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 19/34] target/ppc: convert vspltis[bhw] " Richard Henderson
2018-12-19  6:31   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 20/34] target/ppc: convert vsplt[bhw] " Richard Henderson
2018-12-19  6:32   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 21/34] target/ppc: nand, nor, eqv are now generic " Richard Henderson
2018-12-19  6:32   ` David Gibson
2018-12-18  6:38 ` [Qemu-devel] [PATCH 22/34] target/ppc: convert VSX logical operations to " Richard Henderson
2018-12-19  6:33   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 23/34] target/ppc: convert xxspltib " Richard Henderson
2018-12-19  6:34   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 24/34] target/ppc: convert xxspltw " Richard Henderson
2018-12-19  6:35   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 25/34] target/ppc: convert xxsel " Richard Henderson
2018-12-19  6:35   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 26/34] target/ppc: Pass integer to helper_mtvscr Richard Henderson
2018-12-19  6:37   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 27/34] target/ppc: Use helper_mtvscr for reset and gdb Richard Henderson
2018-12-19  6:38   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 28/34] target/ppc: Remove vscr_nj and vscr_sat Richard Henderson
2018-12-19  6:38   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 29/34] target/ppc: Add helper_mfvscr Richard Henderson
2018-12-19  6:39   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 30/34] target/ppc: Use mtvscr/mfvscr for vmstate Richard Henderson
2018-12-19  6:40   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 31/34] target/ppc: Add set_vscr_sat Richard Henderson
2018-12-19  6:40   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 32/34] target/ppc: Split out VSCR_SAT to a vector field Richard Henderson
2018-12-19  6:41   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 33/34] target/ppc: convert vadd*s and vsub*s to vector operations Richard Henderson
2018-12-19  6:42   ` David Gibson
2018-12-18  6:39 ` [Qemu-devel] [PATCH 34/34] target/ppc: convert vmin* and vmax* " Richard Henderson
2018-12-19  6:42   ` David Gibson
2018-12-18  9:49 ` [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Mark Cave-Ayland
2018-12-18 14:51   ` Mark Cave-Ayland
2018-12-18 15:07     ` Richard Henderson
2018-12-18 15:22       ` Mark Cave-Ayland
2018-12-18 15:05   ` Mark Cave-Ayland
2018-12-18 15:17     ` Richard Henderson
2018-12-18 15:26       ` Mark Cave-Ayland
2018-12-18 16:16         ` Richard Henderson
2019-01-03 14:58   ` Mark Cave-Ayland
2019-01-03 18:31 ` Mark Cave-Ayland
2019-01-04 22:33   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.