All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c
@ 2019-01-30 20:36 Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 1/8] target/ppc: implement complete set of Vsr* macros Mark Cave-Ayland
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

>From working on the TCG vector operations patchset, it is apparent that there
are a large number of endian-based hacks in int_helper.c which can be removed by
making use of the various Vsr* macros.

Patch 1 is simple enough, and implements the complete set of Vsr* macros for
both big endian and little endian hosts.

Patches 2 and 3 rework the vector merge and multiply algorithms so that they
no longer require the endian-dependent HI_IDX and LO_IDX macros.

Patches 4 and 5 then completely remove the HI_IDX/LO_IDX and EL_IDX macros by
replacing the element accesses with their equivalent Vsr* macro instead.

Patches 6 and 7 tidy up the VEXT_SIGNED macro and fix a potential shift bug
in the ROTRu32 and ROTRu64 macros pointed out by Richard during the review of
v1.

Finally patch 8 is an inspection-based removal of other HOST_WORDS_BIGENDIAN
hacks in int_helper.c, again replacing accesses with the relevant Vsr* macro
instead.

Note that there are still some endian hacks left in int_helper.c after this
patchset: in particular the delightfully evil VECTOR_FOR_INORDER_I macro still
remains in places where the index variable was used for purposes other than
accessing elements within the vector.

There were also some parts where additional conversion could be done, but I
wasn't confident enough to make the change without access to PPC64 test images
or real big-endian host hardware.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>


v5:
- Rebase onto master
- Update vmrg{l,h} algorithm macros in patch 2 as per Richard's suggestion

v4:
- Rebase onto master
- Add R-B tag from Richard for patch 4
- Use Richard's improved vmrg{l,h} algorithm in patch 2

v3:
- Rebase onto master
- Add R-B tags from Richard

v2:
- Add R-B tags from Richard
- Add patches 6 and 7 to simplify the VEXT_SIGNED macro and fix a potential
  shift bug in the ROTRu32 and ROTRu64 macros
- Don't require ordered access for VNEG and VGENERIC_DO macros as pointed out
  by Richard


Mark Cave-Ayland (8):
  target/ppc: implement complete set of Vsr* macros
  target/ppc: rework vmrg{l,h}{b,h,w} instructions to use Vsr* macros
  target/ppc: rework vmul{e,o}{s,u}{b,h,w} instructions to use Vsr*
    macros
  target/ppc: eliminate use of HI_IDX and LO_IDX macros from
    int_helper.c
  target/ppc: eliminate use of EL_IDX macros from int_helper.c
  target/ppc: simplify VEXT_SIGNED macro in int_helper.c
  target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c
  target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c

 target/ppc/int_helper.c | 523 +++++++++++++++++++-----------------------------
 target/ppc/internal.h   |   9 +-
 2 files changed, 217 insertions(+), 315 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 1/8] target/ppc: implement complete set of Vsr* macros
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use " Mark Cave-Ayland
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

This prepares us for eliminating the use of direct array access within the VMX
instruction implementations.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/internal.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index c7c0f77dd6..f26a71ffcf 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -206,16 +206,23 @@ EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VsrB(i) u8[i]
+#define VsrSB(i) s8[i]
 #define VsrH(i) u16[i]
+#define VsrSH(i) s16[i]
 #define VsrW(i) u32[i]
+#define VsrSW(i) s32[i]
 #define VsrD(i) u64[i]
+#define VsrSD(i) s64[i]
 #else
 #define VsrB(i) u8[15 - (i)]
+#define VsrSB(i) s8[15 - (i)]
 #define VsrH(i) u16[7 - (i)]
+#define VsrSH(i) s16[7 - (i)]
 #define VsrW(i) u32[3 - (i)]
+#define VsrSW(i) s32[3 - (i)]
 #define VsrD(i) u64[1 - (i)]
+#define VsrSD(i) s64[1 - (i)]
 #endif
-
 static inline void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
     vsr->VsrD(0) = env->vsr[n].u64[0];
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 1/8] target/ppc: implement complete set of Vsr* macros Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-02-01  5:25   ` Richard Henderson
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 3/8] target/ppc: rework vmul{e, o}{s, u}{b, " Mark Cave-Ayland
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

The current implementations make use of the endian-specific macros MRGLO/MRGHI
and also reference HI_IDX and LO_IDX directly to calculate array offsets.

Rework the implementation to use the Vsr* macros so that these per-endian
references can be removed.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 target/ppc/int_helper.c | 54 +++++++++++++++++--------------------------------
 1 file changed, 19 insertions(+), 35 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 598731d47a..fe418b3393 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -976,43 +976,27 @@ void helper_vmladduhm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     }
 }
 
-#define VMRG_DO(name, element, highp)                                   \
-    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
-    {                                                                   \
-        ppc_avr_t result;                                               \
-        int i;                                                          \
-        size_t n_elems = ARRAY_SIZE(r->element);                        \
-                                                                        \
-        for (i = 0; i < n_elems / 2; i++) {                             \
-            if (highp) {                                                \
-                result.element[i*2+HI_IDX] = a->element[i];             \
-                result.element[i*2+LO_IDX] = b->element[i];             \
-            } else {                                                    \
-                result.element[n_elems - i * 2 - (1 + HI_IDX)] =        \
-                    b->element[n_elems - i - 1];                        \
-                result.element[n_elems - i * 2 - (1 + LO_IDX)] =        \
-                    a->element[n_elems - i - 1];                        \
-            }                                                           \
-        }                                                               \
-        *r = result;                                                    \
-    }
-#if defined(HOST_WORDS_BIGENDIAN)
-#define MRGHI 0
-#define MRGLO 1
-#else
-#define MRGHI 1
-#define MRGLO 0
-#endif
-#define VMRG(suffix, element)                   \
-    VMRG_DO(mrgl##suffix, element, MRGHI)       \
-    VMRG_DO(mrgh##suffix, element, MRGLO)
-VMRG(b, u8)
-VMRG(h, u16)
-VMRG(w, u32)
+#define VMRG_DO(name, element, access, ofs)                                  \
+    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)            \
+    {                                                                        \
+        ppc_avr_t result;                                                    \
+        int i, half = ARRAY_SIZE(r->element) / 2;                            \
+                                                                             \
+        for (i = 0; i < half; i++) {                                         \
+            result.access(i * 2 + 0) = a->access(i + ofs);                   \
+            result.access(i * 2 + 1) = b->access(i + ofs);                   \
+        }                                                                    \
+        *r = result;                                                         \
+    }
+
+#define VMRG(suffix, element, access)          \
+    VMRG_DO(mrgl##suffix, element, access, half)   \
+    VMRG_DO(mrgh##suffix, element, access, 0)
+VMRG(b, u8, VsrB)
+VMRG(h, u16, VsrH)
+VMRG(w, u32, VsrW)
 #undef VMRG_DO
 #undef VMRG
-#undef MRGHI
-#undef MRGLO
 
 void helper_vmsummbm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
                      ppc_avr_t *b, ppc_avr_t *c)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 3/8] target/ppc: rework vmul{e, o}{s, u}{b, h, w} instructions to use Vsr* macros
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 1/8] target/ppc: implement complete set of Vsr* macros Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use " Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 4/8] target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c Mark Cave-Ayland
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

The current implementations make use of the endian-specific macros HI_IDX and
LO_IDX directly to calculate array offsets.

Rework the implementation to use the Vsr* macros so that these per-endian
references can be removed.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 48 +++++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index fe418b3393..e531af5294 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1104,33 +1104,39 @@ void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 }
 
-#define VMUL_DO(name, mul_element, prod_element, cast, evenp)           \
+#define VMUL_DO_EVN(name, mul_element, mul_access, prod_access, cast)   \
     void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
     {                                                                   \
         int i;                                                          \
                                                                         \
-        VECTOR_FOR_INORDER_I(i, prod_element) {                         \
-            if (evenp) {                                                \
-                r->prod_element[i] =                                    \
-                    (cast)a->mul_element[i * 2 + HI_IDX] *              \
-                    (cast)b->mul_element[i * 2 + HI_IDX];               \
-            } else {                                                    \
-                r->prod_element[i] =                                    \
-                    (cast)a->mul_element[i * 2 + LO_IDX] *              \
-                    (cast)b->mul_element[i * 2 + LO_IDX];               \
-            }                                                           \
+        for (i = 0; i < ARRAY_SIZE(r->mul_element); i += 2) {           \
+            r->prod_access(i >> 1) = (cast)a->mul_access(i) *           \
+                                     (cast)b->mul_access(i);            \
+        }                                                               \
+    }
+
+#define VMUL_DO_ODD(name, mul_element, mul_access, prod_access, cast)   \
+    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
+    {                                                                   \
+        int i;                                                          \
+                                                                        \
+        for (i = 0; i < ARRAY_SIZE(r->mul_element); i += 2) {           \
+            r->prod_access(i >> 1) = (cast)a->mul_access(i + 1) *       \
+                                     (cast)b->mul_access(i + 1);        \
         }                                                               \
     }
-#define VMUL(suffix, mul_element, prod_element, cast)            \
-    VMUL_DO(mule##suffix, mul_element, prod_element, cast, 1)    \
-    VMUL_DO(mulo##suffix, mul_element, prod_element, cast, 0)
-VMUL(sb, s8, s16, int16_t)
-VMUL(sh, s16, s32, int32_t)
-VMUL(sw, s32, s64, int64_t)
-VMUL(ub, u8, u16, uint16_t)
-VMUL(uh, u16, u32, uint32_t)
-VMUL(uw, u32, u64, uint64_t)
-#undef VMUL_DO
+
+#define VMUL(suffix, mul_element, mul_access, prod_access, cast)       \
+    VMUL_DO_EVN(mule##suffix, mul_element, mul_access, prod_access, cast)  \
+    VMUL_DO_ODD(mulo##suffix, mul_element, mul_access, prod_access, cast)
+VMUL(sb, s8, VsrSB, VsrSH, int16_t)
+VMUL(sh, s16, VsrSH, VsrSW, int32_t)
+VMUL(sw, s32, VsrSW, VsrSD, int64_t)
+VMUL(ub, u8, VsrB, VsrH, uint16_t)
+VMUL(uh, u16, VsrH, VsrW, uint32_t)
+VMUL(uw, u32, VsrW, VsrD, uint64_t)
+#undef VMUL_DO_EVN
+#undef VMUL_DO_ODD
 #undef VMUL
 
 void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 4/8] target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
                   ` (2 preceding siblings ...)
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 3/8] target/ppc: rework vmul{e, o}{s, u}{b, " Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 5/8] target/ppc: eliminate use of EL_IDX " Mark Cave-Ayland
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

The original purpose of these macros was to correctly reference the high and low
parts of the VSRs regardless of the host endianness.

Replace these direct references to high and low parts with the relevant VsrD
macro instead, and completely remove the now-unused HI_IDX and LO_IDX macros.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 180 +++++++++++++++++++++++-------------------------
 1 file changed, 85 insertions(+), 95 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index e531af5294..7a9c02d4bb 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -389,14 +389,6 @@ target_ulong helper_602_mfrom(target_ulong arg)
 /*****************************************************************************/
 /* Altivec extension helpers */
 #if defined(HOST_WORDS_BIGENDIAN)
-#define HI_IDX 0
-#define LO_IDX 1
-#else
-#define HI_IDX 1
-#define LO_IDX 0
-#endif
-
-#if defined(HOST_WORDS_BIGENDIAN)
 #define VECTOR_FOR_INORDER_I(index, element)                    \
     for (index = 0; index < ARRAY_SIZE(r->element); index++)
 #else
@@ -514,8 +506,8 @@ void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b)
     res ^= res >> 32;
     res ^= res >> 16;
     res ^= res >> 8;
-    r->u64[LO_IDX] = res & 1;
-    r->u64[HI_IDX] = 0;
+    r->VsrD(1) = res & 1;
+    r->VsrD(0) = 0;
 }
 
 #define VARITH_DO(name, op, element)                                    \
@@ -1229,8 +1221,8 @@ void helper_vbpermq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
         }
     }
 
-    r->u64[HI_IDX] = perm;
-    r->u64[LO_IDX] = 0;
+    r->VsrD(0) = perm;
+    r->VsrD(1) = 0;
 }
 
 #undef VBPERMQ_INDEX
@@ -1559,25 +1551,25 @@ void helper_vpmsumd(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     ppc_avr_t prod[2];
 
     VECTOR_FOR_INORDER_I(i, u64) {
-        prod[i].u64[LO_IDX] = prod[i].u64[HI_IDX] = 0;
+        prod[i].VsrD(1) = prod[i].VsrD(0) = 0;
         for (j = 0; j < 64; j++) {
             if (a->u64[i] & (1ull<<j)) {
                 ppc_avr_t bshift;
                 if (j == 0) {
-                    bshift.u64[HI_IDX] = 0;
-                    bshift.u64[LO_IDX] = b->u64[i];
+                    bshift.VsrD(0) = 0;
+                    bshift.VsrD(1) = b->u64[i];
                 } else {
-                    bshift.u64[HI_IDX] = b->u64[i] >> (64-j);
-                    bshift.u64[LO_IDX] = b->u64[i] << j;
+                    bshift.VsrD(0) = b->u64[i] >> (64 - j);
+                    bshift.VsrD(1) = b->u64[i] << j;
                 }
-                prod[i].u64[LO_IDX] ^= bshift.u64[LO_IDX];
-                prod[i].u64[HI_IDX] ^= bshift.u64[HI_IDX];
+                prod[i].VsrD(1) ^= bshift.VsrD(1);
+                prod[i].VsrD(0) ^= bshift.VsrD(0);
             }
         }
     }
 
-    r->u64[LO_IDX] = prod[0].u64[LO_IDX] ^ prod[1].u64[LO_IDX];
-    r->u64[HI_IDX] = prod[0].u64[HI_IDX] ^ prod[1].u64[HI_IDX];
+    r->VsrD(1) = prod[0].VsrD(1) ^ prod[1].VsrD(1);
+    r->VsrD(0) = prod[0].VsrD(0) ^ prod[1].VsrD(0);
 #endif
 }
 
@@ -1795,7 +1787,7 @@ VEXTU_X_DO(vextuwrx, 32, 0)
 #define VSHIFT(suffix, leftp)                                           \
     void helper_vs##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)    \
     {                                                                   \
-        int shift = b->u8[LO_IDX*15] & 0x7;                             \
+        int shift = b->VsrB(15) & 0x7;                                  \
         int doit = 1;                                                   \
         int i;                                                          \
                                                                         \
@@ -1806,15 +1798,15 @@ VEXTU_X_DO(vextuwrx, 32, 0)
             if (shift == 0) {                                           \
                 *r = *a;                                                \
             } else if (leftp) {                                         \
-                uint64_t carry = a->u64[LO_IDX] >> (64 - shift);        \
+                uint64_t carry = a->VsrD(1) >> (64 - shift);            \
                                                                         \
-                r->u64[HI_IDX] = (a->u64[HI_IDX] << shift) | carry;     \
-                r->u64[LO_IDX] = a->u64[LO_IDX] << shift;               \
+                r->VsrD(0) = (a->VsrD(0) << shift) | carry;             \
+                r->VsrD(1) = a->VsrD(1) << shift;                       \
             } else {                                                    \
-                uint64_t carry = a->u64[HI_IDX] << (64 - shift);        \
+                uint64_t carry = a->VsrD(0) << (64 - shift);            \
                                                                         \
-                r->u64[LO_IDX] = (a->u64[LO_IDX] >> shift) | carry;     \
-                r->u64[HI_IDX] = a->u64[HI_IDX] >> shift;               \
+                r->VsrD(1) = (a->VsrD(1) >> shift) | carry;             \
+                r->VsrD(0) = a->VsrD(0) >> shift;                       \
             }                                                           \
         }                                                               \
     }
@@ -1900,7 +1892,7 @@ void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift)
 
 void helper_vslo(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    int sh = (b->u8[LO_IDX*0xf] >> 3) & 0xf;
+    int sh = (b->VsrB(0xf) >> 3) & 0xf;
 
 #if defined(HOST_WORDS_BIGENDIAN)
     memmove(&r->u8[0], &a->u8[sh], 16 - sh);
@@ -2096,7 +2088,7 @@ VSR(d, u64, 0x3F)
 
 void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    int sh = (b->u8[LO_IDX * 0xf] >> 3) & 0xf;
+    int sh = (b->VsrB(0xf) >> 3) & 0xf;
 
 #if defined(HOST_WORDS_BIGENDIAN)
     memmove(&r->u8[sh], &a->u8[0], 16 - sh);
@@ -2352,13 +2344,13 @@ static inline void avr_qw_not(ppc_avr_t *t, ppc_avr_t a)
 
 static int avr_qw_cmpu(ppc_avr_t a, ppc_avr_t b)
 {
-    if (a.u64[HI_IDX] < b.u64[HI_IDX]) {
+    if (a.VsrD(0) < b.VsrD(0)) {
         return -1;
-    } else if (a.u64[HI_IDX] > b.u64[HI_IDX]) {
+    } else if (a.VsrD(0) > b.VsrD(0)) {
         return 1;
-    } else if (a.u64[LO_IDX] < b.u64[LO_IDX]) {
+    } else if (a.VsrD(1) < b.VsrD(1)) {
         return -1;
-    } else if (a.u64[LO_IDX] > b.u64[LO_IDX]) {
+    } else if (a.VsrD(1) > b.VsrD(1)) {
         return 1;
     } else {
         return 0;
@@ -2367,17 +2359,17 @@ static int avr_qw_cmpu(ppc_avr_t a, ppc_avr_t b)
 
 static void avr_qw_add(ppc_avr_t *t, ppc_avr_t a, ppc_avr_t b)
 {
-    t->u64[LO_IDX] = a.u64[LO_IDX] + b.u64[LO_IDX];
-    t->u64[HI_IDX] = a.u64[HI_IDX] + b.u64[HI_IDX] +
-                     (~a.u64[LO_IDX] < b.u64[LO_IDX]);
+    t->VsrD(1) = a.VsrD(1) + b.VsrD(1);
+    t->VsrD(0) = a.VsrD(0) + b.VsrD(0) +
+                     (~a.VsrD(1) < b.VsrD(1));
 }
 
 static int avr_qw_addc(ppc_avr_t *t, ppc_avr_t a, ppc_avr_t b)
 {
     ppc_avr_t not_a;
-    t->u64[LO_IDX] = a.u64[LO_IDX] + b.u64[LO_IDX];
-    t->u64[HI_IDX] = a.u64[HI_IDX] + b.u64[HI_IDX] +
-                     (~a.u64[LO_IDX] < b.u64[LO_IDX]);
+    t->VsrD(1) = a.VsrD(1) + b.VsrD(1);
+    t->VsrD(0) = a.VsrD(0) + b.VsrD(0) +
+                     (~a.VsrD(1) < b.VsrD(1));
     avr_qw_not(&not_a, a);
     return avr_qw_cmpu(not_a, b) < 0;
 }
@@ -2399,11 +2391,11 @@ void helper_vaddeuqm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     r->u128 = a->u128 + b->u128 + (c->u128 & 1);
 #else
 
-    if (c->u64[LO_IDX] & 1) {
+    if (c->VsrD(1) & 1) {
         ppc_avr_t tmp;
 
-        tmp.u64[HI_IDX] = 0;
-        tmp.u64[LO_IDX] = c->u64[LO_IDX] & 1;
+        tmp.VsrD(0) = 0;
+        tmp.VsrD(1) = c->VsrD(1) & 1;
         avr_qw_add(&tmp, *a, tmp);
         avr_qw_add(r, tmp, *b);
     } else {
@@ -2421,8 +2413,8 @@ void helper_vaddcuq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
     avr_qw_not(&not_a, *a);
 
-    r->u64[HI_IDX] = 0;
-    r->u64[LO_IDX] = (avr_qw_cmpu(not_a, *b) < 0);
+    r->VsrD(0) = 0;
+    r->VsrD(1) = (avr_qw_cmpu(not_a, *b) < 0);
 #endif
 }
 
@@ -2437,7 +2429,7 @@ void helper_vaddecuq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     r->u128 = carry_out;
 #else
 
-    int carry_in = c->u64[LO_IDX] & 1;
+    int carry_in = c->VsrD(1) & 1;
     int carry_out = 0;
     ppc_avr_t tmp;
 
@@ -2447,8 +2439,8 @@ void helper_vaddecuq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
         ppc_avr_t one = QW_ONE;
         carry_out = avr_qw_addc(&tmp, tmp, one);
     }
-    r->u64[HI_IDX] = 0;
-    r->u64[LO_IDX] = carry_out;
+    r->VsrD(0) = 0;
+    r->VsrD(1) = carry_out;
 #endif
 }
 
@@ -2476,8 +2468,8 @@ void helper_vsubeuqm(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     avr_qw_not(&tmp, *b);
     avr_qw_add(&sum, *a, tmp);
 
-    tmp.u64[HI_IDX] = 0;
-    tmp.u64[LO_IDX] = c->u64[LO_IDX] & 1;
+    tmp.VsrD(0) = 0;
+    tmp.VsrD(1) = c->VsrD(1) & 1;
     avr_qw_add(r, sum, tmp);
 #endif
 }
@@ -2493,10 +2485,10 @@ void helper_vsubcuq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
         ppc_avr_t tmp;
         avr_qw_not(&tmp, *b);
         avr_qw_add(&tmp, *a, tmp);
-        carry = ((tmp.s64[HI_IDX] == -1ull) && (tmp.s64[LO_IDX] == -1ull));
+        carry = ((tmp.VsrSD(0) == -1ull) && (tmp.VsrSD(1) == -1ull));
     }
-    r->u64[HI_IDX] = 0;
-    r->u64[LO_IDX] = carry;
+    r->VsrD(0) = 0;
+    r->VsrD(1) = carry;
 #endif
 }
 
@@ -2507,17 +2499,17 @@ void helper_vsubecuq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
         (~a->u128 < ~b->u128) ||
         ((c->u128 & 1) && (a->u128 + ~b->u128 == (__uint128_t)-1));
 #else
-    int carry_in = c->u64[LO_IDX] & 1;
+    int carry_in = c->VsrD(1) & 1;
     int carry_out = (avr_qw_cmpu(*a, *b) > 0);
     if (!carry_out && carry_in) {
         ppc_avr_t tmp;
         avr_qw_not(&tmp, *b);
         avr_qw_add(&tmp, *a, tmp);
-        carry_out = ((tmp.u64[HI_IDX] == -1ull) && (tmp.u64[LO_IDX] == -1ull));
+        carry_out = ((tmp.VsrD(0) == -1ull) && (tmp.VsrD(1) == -1ull));
     }
 
-    r->u64[HI_IDX] = 0;
-    r->u64[LO_IDX] = carry_out;
+    r->VsrD(0) = 0;
+    r->VsrD(1) = carry_out;
 #endif
 }
 
@@ -2615,7 +2607,7 @@ static bool bcd_is_valid(ppc_avr_t *bcd)
 
 static int bcd_cmp_zero(ppc_avr_t *bcd)
 {
-    if (bcd->u64[HI_IDX] == 0 && (bcd->u64[LO_IDX] >> 4) == 0) {
+    if (bcd->VsrD(0) == 0 && (bcd->VsrD(1) >> 4) == 0) {
         return CRF_EQ;
     } else {
         return (bcd_get_sgn(bcd) == 1) ? CRF_GT : CRF_LT;
@@ -2735,7 +2727,7 @@ uint32_t helper_bcdadd(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
     }
 
     if (unlikely(invalid)) {
-        result.u64[HI_IDX] = result.u64[LO_IDX] = -1;
+        result.VsrD(0) = result.VsrD(1) = -1;
         cr = CRF_SO;
     } else if (overflow) {
         cr |= CRF_SO;
@@ -2804,7 +2796,7 @@ uint32_t helper_bcdctn(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
     int invalid = (sgnb == 0);
     ppc_avr_t ret = { .u64 = { 0, 0 } };
 
-    int ox_flag = (b->u64[HI_IDX] != 0) || ((b->u64[LO_IDX] >> 32) != 0);
+    int ox_flag = (b->VsrD(0) != 0) || ((b->VsrD(1) >> 32) != 0);
 
     for (i = 1; i < 8; i++) {
         set_national_digit(&ret, 0x30 + bcd_get_digit(b, i, &invalid), i);
@@ -2884,7 +2876,7 @@ uint32_t helper_bcdctz(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
     int invalid = (sgnb == 0);
     ppc_avr_t ret = { .u64 = { 0, 0 } };
 
-    int ox_flag = ((b->u64[HI_IDX] >> 4) != 0);
+    int ox_flag = ((b->VsrD(0) >> 4) != 0);
 
     for (i = 0; i < 16; i++) {
         digit = bcd_get_digit(b, i + 1, &invalid);
@@ -2925,13 +2917,13 @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
     uint64_t hi_value;
     ppc_avr_t ret = { .u64 = { 0, 0 } };
 
-    if (b->s64[HI_IDX] < 0) {
-        lo_value = -b->s64[LO_IDX];
-        hi_value = ~b->u64[HI_IDX] + !lo_value;
+    if (b->VsrSD(0) < 0) {
+        lo_value = -b->VsrSD(1);
+        hi_value = ~b->VsrD(0) + !lo_value;
         bcd_put_digit(&ret, 0xD, 0);
     } else {
-        lo_value = b->u64[LO_IDX];
-        hi_value = b->u64[HI_IDX];
+        lo_value = b->VsrD(1);
+        hi_value = b->VsrD(0);
         bcd_put_digit(&ret, bcd_preferred_sgn(0, ps), 0);
     }
 
@@ -2979,11 +2971,11 @@ uint32_t helper_bcdctsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
     }
 
     if (sgnb == -1) {
-        r->s64[LO_IDX] = -lo_value;
-        r->s64[HI_IDX] = ~hi_value + !r->s64[LO_IDX];
+        r->VsrSD(1) = -lo_value;
+        r->VsrSD(0) = ~hi_value + !r->VsrSD(1);
     } else {
-        r->s64[LO_IDX] = lo_value;
-        r->s64[HI_IDX] = hi_value;
+        r->VsrSD(1) = lo_value;
+        r->VsrSD(0) = hi_value;
     }
 
     cr = bcd_cmp_zero(b);
@@ -3043,7 +3035,7 @@ uint32_t helper_bcds(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
     bool ox_flag = false;
     int sgnb = bcd_get_sgn(b);
     ppc_avr_t ret = *b;
-    ret.u64[LO_IDX] &= ~0xf;
+    ret.VsrD(1) &= ~0xf;
 
     if (bcd_is_valid(b) == false) {
         return CRF_SO;
@@ -3056,9 +3048,9 @@ uint32_t helper_bcds(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
     }
 
     if (i > 0) {
-        ulshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], i * 4, &ox_flag);
+        ulshift(&ret.VsrD(1), &ret.VsrD(0), i * 4, &ox_flag);
     } else {
-        urshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], -i * 4);
+        urshift(&ret.VsrD(1), &ret.VsrD(0), -i * 4);
     }
     bcd_put_digit(&ret, bcd_preferred_sgn(sgnb, ps), 0);
 
@@ -3095,13 +3087,13 @@ uint32_t helper_bcdus(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
 #endif
     if (i >= 32) {
         ox_flag = true;
-        ret.u64[LO_IDX] = ret.u64[HI_IDX] = 0;
+        ret.VsrD(1) = ret.VsrD(0) = 0;
     } else if (i <= -32) {
-        ret.u64[LO_IDX] = ret.u64[HI_IDX] = 0;
+        ret.VsrD(1) = ret.VsrD(0) = 0;
     } else if (i > 0) {
-        ulshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], i * 4, &ox_flag);
+        ulshift(&ret.VsrD(1), &ret.VsrD(0), i * 4, &ox_flag);
     } else {
-        urshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], -i * 4);
+        urshift(&ret.VsrD(1), &ret.VsrD(0), -i * 4);
     }
     *r = ret;
 
@@ -3121,7 +3113,7 @@ uint32_t helper_bcdsr(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
     bool ox_flag = false;
     int sgnb = bcd_get_sgn(b);
     ppc_avr_t ret = *b;
-    ret.u64[LO_IDX] &= ~0xf;
+    ret.VsrD(1) &= ~0xf;
 
 #if defined(HOST_WORDS_BIGENDIAN)
     int i = a->s8[7];
@@ -3142,9 +3134,9 @@ uint32_t helper_bcdsr(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
     }
 
     if (i > 0) {
-        ulshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], i * 4, &ox_flag);
+        ulshift(&ret.VsrD(1), &ret.VsrD(0), i * 4, &ox_flag);
     } else {
-        urshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], -i * 4);
+        urshift(&ret.VsrD(1), &ret.VsrD(0), -i * 4);
 
         if (bcd_get_digit(&ret, 0, &invalid) >= 5) {
             bcd_add_mag(&ret, &ret, &bcd_one, &invalid, &unused);
@@ -3178,19 +3170,19 @@ uint32_t helper_bcdtrunc(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
 
     if (i > 16 && i < 32) {
         mask = (uint64_t)-1 >> (128 - i * 4);
-        if (ret.u64[HI_IDX] & ~mask) {
+        if (ret.VsrD(0) & ~mask) {
             ox_flag = CRF_SO;
         }
 
-        ret.u64[HI_IDX] &= mask;
+        ret.VsrD(0) &= mask;
     } else if (i >= 0 && i <= 16) {
         mask = (uint64_t)-1 >> (64 - i * 4);
-        if (ret.u64[HI_IDX] || (ret.u64[LO_IDX] & ~mask)) {
+        if (ret.VsrD(0) || (ret.VsrD(1) & ~mask)) {
             ox_flag = CRF_SO;
         }
 
-        ret.u64[LO_IDX] &= mask;
-        ret.u64[HI_IDX] = 0;
+        ret.VsrD(1) &= mask;
+        ret.VsrD(0) = 0;
     }
     bcd_put_digit(&ret, bcd_preferred_sgn(bcd_get_sgn(b), ps), 0);
     *r = ret;
@@ -3221,28 +3213,28 @@ uint32_t helper_bcdutrunc(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
 #endif
     if (i > 16 && i < 33) {
         mask = (uint64_t)-1 >> (128 - i * 4);
-        if (ret.u64[HI_IDX] & ~mask) {
+        if (ret.VsrD(0) & ~mask) {
             ox_flag = CRF_SO;
         }
 
-        ret.u64[HI_IDX] &= mask;
+        ret.VsrD(0) &= mask;
     } else if (i > 0 && i <= 16) {
         mask = (uint64_t)-1 >> (64 - i * 4);
-        if (ret.u64[HI_IDX] || (ret.u64[LO_IDX] & ~mask)) {
+        if (ret.VsrD(0) || (ret.VsrD(1) & ~mask)) {
             ox_flag = CRF_SO;
         }
 
-        ret.u64[LO_IDX] &= mask;
-        ret.u64[HI_IDX] = 0;
+        ret.VsrD(1) &= mask;
+        ret.VsrD(0) = 0;
     } else if (i == 0) {
-        if (ret.u64[HI_IDX] || ret.u64[LO_IDX]) {
+        if (ret.VsrD(0) || ret.VsrD(1)) {
             ox_flag = CRF_SO;
         }
-        ret.u64[HI_IDX] = ret.u64[LO_IDX] = 0;
+        ret.VsrD(0) = ret.VsrD(1) = 0;
     }
 
     *r = ret;
-    if (r->u64[HI_IDX] == 0 && r->u64[LO_IDX] == 0) {
+    if (r->VsrD(0) == 0 && r->VsrD(1) == 0) {
         return ox_flag | CRF_EQ;
     }
 
@@ -3414,8 +3406,6 @@ void helper_vpermxor(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 }
 
 #undef VECTOR_FOR_INORDER_I
-#undef HI_IDX
-#undef LO_IDX
 
 /*****************************************************************************/
 /* SPE extension helpers */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 5/8] target/ppc: eliminate use of EL_IDX macros from int_helper.c
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
                   ` (3 preceding siblings ...)
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 4/8] target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 6/8] target/ppc: simplify VEXT_SIGNED macro in int_helper.c Mark Cave-Ayland
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

These macros can be eliminated by instead using the relavant Vsr* macros in
the few locations where they appear.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 66 ++++++++++++++++++++-----------------------------
 1 file changed, 27 insertions(+), 39 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 7a9c02d4bb..355b6630a2 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -3306,12 +3306,7 @@ void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     *r = result;
 }
 
-#define ROTRu32(v, n) (((v) >> (n)) | ((v) << (32-n)))
-#if defined(HOST_WORDS_BIGENDIAN)
-#define EL_IDX(i) (i)
-#else
-#define EL_IDX(i) (3 - (i))
-#endif
+#define ROTRu32(v, n) (((v) >> (n)) | ((v) << (32 - n)))
 
 void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
 {
@@ -3319,40 +3314,34 @@ void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
     int six = st_six & 0xF;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u32) {
+    for (i = 0; i < ARRAY_SIZE(r->u32); i++) {
         if (st == 0) {
             if ((six & (0x8 >> i)) == 0) {
-                r->u32[EL_IDX(i)] = ROTRu32(a->u32[EL_IDX(i)], 7) ^
-                                    ROTRu32(a->u32[EL_IDX(i)], 18) ^
-                                    (a->u32[EL_IDX(i)] >> 3);
+                r->VsrW(i) = ROTRu32(a->VsrW(i), 7) ^
+                             ROTRu32(a->VsrW(i), 18) ^
+                             (a->VsrW(i) >> 3);
             } else { /* six.bit[i] == 1 */
-                r->u32[EL_IDX(i)] = ROTRu32(a->u32[EL_IDX(i)], 17) ^
-                                    ROTRu32(a->u32[EL_IDX(i)], 19) ^
-                                    (a->u32[EL_IDX(i)] >> 10);
+                r->VsrW(i) = ROTRu32(a->VsrW(i), 17) ^
+                             ROTRu32(a->VsrW(i), 19) ^
+                             (a->VsrW(i) >> 10);
             }
         } else { /* st == 1 */
             if ((six & (0x8 >> i)) == 0) {
-                r->u32[EL_IDX(i)] = ROTRu32(a->u32[EL_IDX(i)], 2) ^
-                                    ROTRu32(a->u32[EL_IDX(i)], 13) ^
-                                    ROTRu32(a->u32[EL_IDX(i)], 22);
+                r->VsrW(i) = ROTRu32(a->VsrW(i), 2) ^
+                             ROTRu32(a->VsrW(i), 13) ^
+                             ROTRu32(a->VsrW(i), 22);
             } else { /* six.bit[i] == 1 */
-                r->u32[EL_IDX(i)] = ROTRu32(a->u32[EL_IDX(i)], 6) ^
-                                    ROTRu32(a->u32[EL_IDX(i)], 11) ^
-                                    ROTRu32(a->u32[EL_IDX(i)], 25);
+                r->VsrW(i) = ROTRu32(a->VsrW(i), 6) ^
+                             ROTRu32(a->VsrW(i), 11) ^
+                             ROTRu32(a->VsrW(i), 25);
             }
         }
     }
 }
 
 #undef ROTRu32
-#undef EL_IDX
 
 #define ROTRu64(v, n) (((v) >> (n)) | ((v) << (64-n)))
-#if defined(HOST_WORDS_BIGENDIAN)
-#define EL_IDX(i) (i)
-#else
-#define EL_IDX(i) (1 - (i))
-#endif
 
 void helper_vshasigmad(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
 {
@@ -3360,33 +3349,32 @@ void helper_vshasigmad(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
     int six = st_six & 0xF;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u64) {
+    for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
         if (st == 0) {
             if ((six & (0x8 >> (2*i))) == 0) {
-                r->u64[EL_IDX(i)] = ROTRu64(a->u64[EL_IDX(i)], 1) ^
-                                    ROTRu64(a->u64[EL_IDX(i)], 8) ^
-                                    (a->u64[EL_IDX(i)] >> 7);
+                r->VsrD(i) = ROTRu64(a->VsrD(i), 1) ^
+                             ROTRu64(a->VsrD(i), 8) ^
+                             (a->VsrD(i) >> 7);
             } else { /* six.bit[2*i] == 1 */
-                r->u64[EL_IDX(i)] = ROTRu64(a->u64[EL_IDX(i)], 19) ^
-                                    ROTRu64(a->u64[EL_IDX(i)], 61) ^
-                                    (a->u64[EL_IDX(i)] >> 6);
+                r->VsrD(i) = ROTRu64(a->VsrD(i), 19) ^
+                             ROTRu64(a->VsrD(i), 61) ^
+                             (a->VsrD(i) >> 6);
             }
         } else { /* st == 1 */
             if ((six & (0x8 >> (2*i))) == 0) {
-                r->u64[EL_IDX(i)] = ROTRu64(a->u64[EL_IDX(i)], 28) ^
-                                    ROTRu64(a->u64[EL_IDX(i)], 34) ^
-                                    ROTRu64(a->u64[EL_IDX(i)], 39);
+                r->VsrD(i) = ROTRu64(a->VsrD(i), 28) ^
+                             ROTRu64(a->VsrD(i), 34) ^
+                             ROTRu64(a->VsrD(i), 39);
             } else { /* six.bit[2*i] == 1 */
-                r->u64[EL_IDX(i)] = ROTRu64(a->u64[EL_IDX(i)], 14) ^
-                                    ROTRu64(a->u64[EL_IDX(i)], 18) ^
-                                    ROTRu64(a->u64[EL_IDX(i)], 41);
+                r->VsrD(i) = ROTRu64(a->VsrD(i), 14) ^
+                             ROTRu64(a->VsrD(i), 18) ^
+                             ROTRu64(a->VsrD(i), 41);
             }
         }
     }
 }
 
 #undef ROTRu64
-#undef EL_IDX
 
 void helper_vpermxor(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 6/8] target/ppc: simplify VEXT_SIGNED macro in int_helper.c
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
                   ` (4 preceding siblings ...)
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 5/8] target/ppc: eliminate use of EL_IDX " Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 7/8] target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c Mark Cave-Ayland
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

As pointed out by Richard: it does not need the mask argument, nor does it need
the recast argument. The masking is implied by the cast argument, and the
recast is implied by the assignment.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 355b6630a2..ffc9cbc4ed 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2024,19 +2024,19 @@ void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
     putVSR(xtn, &xt, env);
 }
 
-#define VEXT_SIGNED(name, element, mask, cast, recast)              \
+#define VEXT_SIGNED(name, element, cast)                            \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
 {                                                                   \
     int i;                                                          \
     VECTOR_FOR_INORDER_I(i, element) {                              \
-        r->element[i] = (recast)((cast)(b->element[i] & mask));     \
+        r->element[i] = (cast)b->element[i];                        \
     }                                                               \
 }
-VEXT_SIGNED(vextsb2w, s32, UINT8_MAX, int8_t, int32_t)
-VEXT_SIGNED(vextsb2d, s64, UINT8_MAX, int8_t, int64_t)
-VEXT_SIGNED(vextsh2w, s32, UINT16_MAX, int16_t, int32_t)
-VEXT_SIGNED(vextsh2d, s64, UINT16_MAX, int16_t, int64_t)
-VEXT_SIGNED(vextsw2d, s64, UINT32_MAX, int32_t, int64_t)
+VEXT_SIGNED(vextsb2w, s32, int8_t)
+VEXT_SIGNED(vextsb2d, s64, int8_t)
+VEXT_SIGNED(vextsh2w, s32, int16_t)
+VEXT_SIGNED(vextsh2d, s64, int16_t)
+VEXT_SIGNED(vextsw2d, s64, int32_t)
 #undef VEXT_SIGNED
 
 #define VNEG(name, element)                                         \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 7/8] target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
                   ` (5 preceding siblings ...)
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 6/8] target/ppc: simplify VEXT_SIGNED macro in int_helper.c Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c Mark Cave-Ayland
  2019-01-31  3:09 ` [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c David Gibson
  8 siblings, 0 replies; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

Richard points out that these macros suffer from a -fsanitize=shift bug in that
they improperly handle n == 0 turning it into a shift by 32/64 respectively.
Replace them with QEMU's existing ror32() and ror64() functions instead.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 48 ++++++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index ffc9cbc4ed..916d10c25b 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -3306,8 +3306,6 @@ void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     *r = result;
 }
 
-#define ROTRu32(v, n) (((v) >> (n)) | ((v) << (32 - n)))
-
 void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
 {
     int st = (st_six & 0x10) != 0;
@@ -3317,32 +3315,28 @@ void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
     for (i = 0; i < ARRAY_SIZE(r->u32); i++) {
         if (st == 0) {
             if ((six & (0x8 >> i)) == 0) {
-                r->VsrW(i) = ROTRu32(a->VsrW(i), 7) ^
-                             ROTRu32(a->VsrW(i), 18) ^
+                r->VsrW(i) = ror32(a->VsrW(i), 7) ^
+                             ror32(a->VsrW(i), 18) ^
                              (a->VsrW(i) >> 3);
             } else { /* six.bit[i] == 1 */
-                r->VsrW(i) = ROTRu32(a->VsrW(i), 17) ^
-                             ROTRu32(a->VsrW(i), 19) ^
+                r->VsrW(i) = ror32(a->VsrW(i), 17) ^
+                             ror32(a->VsrW(i), 19) ^
                              (a->VsrW(i) >> 10);
             }
         } else { /* st == 1 */
             if ((six & (0x8 >> i)) == 0) {
-                r->VsrW(i) = ROTRu32(a->VsrW(i), 2) ^
-                             ROTRu32(a->VsrW(i), 13) ^
-                             ROTRu32(a->VsrW(i), 22);
+                r->VsrW(i) = ror32(a->VsrW(i), 2) ^
+                             ror32(a->VsrW(i), 13) ^
+                             ror32(a->VsrW(i), 22);
             } else { /* six.bit[i] == 1 */
-                r->VsrW(i) = ROTRu32(a->VsrW(i), 6) ^
-                             ROTRu32(a->VsrW(i), 11) ^
-                             ROTRu32(a->VsrW(i), 25);
+                r->VsrW(i) = ror32(a->VsrW(i), 6) ^
+                             ror32(a->VsrW(i), 11) ^
+                             ror32(a->VsrW(i), 25);
             }
         }
     }
 }
 
-#undef ROTRu32
-
-#define ROTRu64(v, n) (((v) >> (n)) | ((v) << (64-n)))
-
 void helper_vshasigmad(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
 {
     int st = (st_six & 0x10) != 0;
@@ -3352,30 +3346,28 @@ void helper_vshasigmad(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
     for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
         if (st == 0) {
             if ((six & (0x8 >> (2*i))) == 0) {
-                r->VsrD(i) = ROTRu64(a->VsrD(i), 1) ^
-                             ROTRu64(a->VsrD(i), 8) ^
+                r->VsrD(i) = ror64(a->VsrD(i), 1) ^
+                             ror64(a->VsrD(i), 8) ^
                              (a->VsrD(i) >> 7);
             } else { /* six.bit[2*i] == 1 */
-                r->VsrD(i) = ROTRu64(a->VsrD(i), 19) ^
-                             ROTRu64(a->VsrD(i), 61) ^
+                r->VsrD(i) = ror64(a->VsrD(i), 19) ^
+                             ror64(a->VsrD(i), 61) ^
                              (a->VsrD(i) >> 6);
             }
         } else { /* st == 1 */
             if ((six & (0x8 >> (2*i))) == 0) {
-                r->VsrD(i) = ROTRu64(a->VsrD(i), 28) ^
-                             ROTRu64(a->VsrD(i), 34) ^
-                             ROTRu64(a->VsrD(i), 39);
+                r->VsrD(i) = ror64(a->VsrD(i), 28) ^
+                             ror64(a->VsrD(i), 34) ^
+                             ror64(a->VsrD(i), 39);
             } else { /* six.bit[2*i] == 1 */
-                r->VsrD(i) = ROTRu64(a->VsrD(i), 14) ^
-                             ROTRu64(a->VsrD(i), 18) ^
-                             ROTRu64(a->VsrD(i), 41);
+                r->VsrD(i) = ror64(a->VsrD(i), 14) ^
+                             ror64(a->VsrD(i), 18) ^
+                             ror64(a->VsrD(i), 41);
             }
         }
     }
 }
 
-#undef ROTRu64
-
 void helper_vpermxor(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
     ppc_avr_t result;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
                   ` (6 preceding siblings ...)
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 7/8] target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c Mark Cave-Ayland
@ 2019-01-30 20:36 ` Mark Cave-Ayland
  2019-02-03 14:58   ` [Qemu-devel] R: " Dino Papararo
  2019-01-31  3:09 ` [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c David Gibson
  8 siblings, 1 reply; 12+ messages in thread
From: Mark Cave-Ayland @ 2019-01-30 20:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, richard.henderson, david

Following on from the previous work, there are numerous endian-related hacks
in int_helper.c that can now be replaced with Vsr* macros.

There are also a few places where the VECTOR_FOR_INORDER_I macro can be
replaced with a normal iterator since the processing order is irrelevant.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 155 ++++++++++++++----------------------------------
 1 file changed, 45 insertions(+), 110 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 916d10c25b..8efc283388 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -443,8 +443,8 @@ void helper_lvsl(ppc_avr_t *r, target_ulong sh)
 {
     int i, j = (sh & 0xf);
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        r->u8[i] = j++;
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        r->VsrB(i) = j++;
     }
 }
 
@@ -452,18 +452,14 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh)
 {
     int i, j = 0x10 - (sh & 0xf);
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        r->u8[i] = j++;
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        r->VsrB(i) = j++;
     }
 }
 
 void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)
 {
-#if defined(HOST_WORDS_BIGENDIAN)
-    env->vscr = r->u32[3];
-#else
-    env->vscr = r->u32[0];
-#endif
+    env->vscr = r->VsrW(3);
     set_flush_to_zero(vscr_nj, &env->vec_status);
 }
 
@@ -870,8 +866,8 @@ target_ulong helper_vclzlsbb(ppc_avr_t *r)
 {
     target_ulong count = 0;
     int i;
-    VECTOR_FOR_INORDER_I(i, u8) {
-        if (r->u8[i] & 0x01) {
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        if (r->VsrB(i) & 0x01) {
             break;
         }
         count++;
@@ -883,12 +879,8 @@ target_ulong helper_vctzlsbb(ppc_avr_t *r)
 {
     target_ulong count = 0;
     int i;
-#if defined(HOST_WORDS_BIGENDIAN)
     for (i = ARRAY_SIZE(r->u8) - 1; i >= 0; i--) {
-#else
-    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
-#endif
-        if (r->u8[i] & 0x01) {
+        if (r->VsrB(i) & 0x01) {
             break;
         }
         count++;
@@ -1137,18 +1129,14 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
     ppc_avr_t result;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        int s = c->u8[i] & 0x1f;
-#if defined(HOST_WORDS_BIGENDIAN)
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        int s = c->VsrB(i) & 0x1f;
         int index = s & 0xf;
-#else
-        int index = 15 - (s & 0xf);
-#endif
 
         if (s & 0x10) {
-            result.u8[i] = b->u8[index];
+            result.VsrB(i) = b->VsrB(index);
         } else {
-            result.u8[i] = a->u8[index];
+            result.VsrB(i) = a->VsrB(index);
         }
     }
     *r = result;
@@ -1160,18 +1148,14 @@ void helper_vpermr(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
     ppc_avr_t result;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        int s = c->u8[i] & 0x1f;
-#if defined(HOST_WORDS_BIGENDIAN)
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        int s = c->VsrB(i) & 0x1f;
         int index = 15 - (s & 0xf);
-#else
-        int index = s & 0xf;
-#endif
 
         if (s & 0x10) {
-            result.u8[i] = a->u8[index];
+            result.VsrB(i) = a->VsrB(index);
         } else {
-            result.u8[i] = b->u8[index];
+            result.VsrB(i) = b->VsrB(index);
         }
     }
     *r = result;
@@ -1868,25 +1852,14 @@ void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift)
     int i;
     ppc_avr_t result;
 
-#if defined(HOST_WORDS_BIGENDIAN)
     for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
         int index = sh + i;
         if (index > 0xf) {
-            result.u8[i] = b->u8[index - 0x10];
-        } else {
-            result.u8[i] = a->u8[index];
-        }
-    }
-#else
-    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
-        int index = (16 - sh) + i;
-        if (index > 0xf) {
-            result.u8[i] = a->u8[index - 0x10];
+            result.VsrB(i) = b->VsrB(index - 0x10);
         } else {
-            result.u8[i] = b->u8[index];
+            result.VsrB(i) = a->VsrB(index);
         }
     }
-#endif
     *r = result;
 }
 
@@ -1905,25 +1878,20 @@ void helper_vslo(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 /* Experimental testing shows that hardware masks the immediate.  */
 #define _SPLAT_MASKED(element) (splat & (ARRAY_SIZE(r->element) - 1))
-#if defined(HOST_WORDS_BIGENDIAN)
 #define SPLAT_ELEMENT(element) _SPLAT_MASKED(element)
-#else
-#define SPLAT_ELEMENT(element)                                  \
-    (ARRAY_SIZE(r->element) - 1 - _SPLAT_MASKED(element))
-#endif
-#define VSPLT(suffix, element)                                          \
+#define VSPLT(suffix, element, access)                                  \
     void helper_vsplt##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
     {                                                                   \
-        uint32_t s = b->element[SPLAT_ELEMENT(element)];                \
+        uint32_t s = b->access(SPLAT_ELEMENT(element));                 \
         int i;                                                          \
                                                                         \
         for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            r->element[i] = s;                                          \
+            r->access(i) = s;                                           \
         }                                                               \
     }
-VSPLT(b, u8)
-VSPLT(h, u16)
-VSPLT(w, u32)
+VSPLT(b, u8, VsrB)
+VSPLT(h, u16, VsrH)
+VSPLT(w, u32, VsrW)
 #undef VSPLT
 #undef SPLAT_ELEMENT
 #undef _SPLAT_MASKED
@@ -1984,17 +1952,10 @@ void helper_xxextractuw(CPUPPCState *env, target_ulong xtn,
     getVSR(xbn, &xb, env);
     memset(&xt, 0, sizeof(xt));
 
-#if defined(HOST_WORDS_BIGENDIAN)
     ext_index = index;
     for (i = 0; i < es; i++, ext_index++) {
-        xt.u8[8 - es + i] = xb.u8[ext_index % 16];
-    }
-#else
-    ext_index = 15 - index;
-    for (i = es - 1; i >= 0; i--, ext_index--) {
-        xt.u8[8 + i] = xb.u8[ext_index % 16];
+        xt.VsrB(8 - es + i) = xb.VsrB(ext_index % 16);
     }
-#endif
 
     putVSR(xtn, &xt, env);
 }
@@ -2009,17 +1970,10 @@ void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
     getVSR(xbn, &xb, env);
     getVSR(xtn, &xt, env);
 
-#if defined(HOST_WORDS_BIGENDIAN)
     ins_index = index;
     for (i = 0; i < es && ins_index < 16; i++, ins_index++) {
-        xt.u8[ins_index] = xb.u8[8 - es + i];
-    }
-#else
-    ins_index = 15 - index;
-    for (i = es - 1; i >= 0 && ins_index >= 0; i--, ins_index--) {
-        xt.u8[ins_index] = xb.u8[8 + i];
+        xt.VsrB(ins_index) = xb.VsrB(8 - es + i);
     }
-#endif
 
     putVSR(xtn, &xt, env);
 }
@@ -2028,7 +1982,7 @@ void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
 void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
 {                                                                   \
     int i;                                                          \
-    VECTOR_FOR_INORDER_I(i, element) {                              \
+    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
         r->element[i] = (cast)b->element[i];                        \
     }                                                               \
 }
@@ -2043,7 +1997,7 @@ VEXT_SIGNED(vextsw2d, s64, int32_t)
 void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
 {                                                                   \
     int i;                                                          \
-    VECTOR_FOR_INORDER_I(i, element) {                              \
+    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
         r->element[i] = -b->element[i];                             \
     }                                                               \
 }
@@ -2115,17 +2069,13 @@ void helper_vsumsws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     ppc_avr_t result;
     int sat = 0;
 
-#if defined(HOST_WORDS_BIGENDIAN)
-    upper = ARRAY_SIZE(r->s32)-1;
-#else
-    upper = 0;
-#endif
-    t = (int64_t)b->s32[upper];
+    upper = ARRAY_SIZE(r->s32) - 1;
+    t = (int64_t)b->VsrSW(upper);
     for (i = 0; i < ARRAY_SIZE(r->s32); i++) {
-        t += a->s32[i];
-        result.s32[i] = 0;
+        t += a->VsrSW(i);
+        result.VsrSW(i) = 0;
     }
-    result.s32[upper] = cvtsdsw(t, &sat);
+    result.VsrSW(upper) = cvtsdsw(t, &sat);
     *r = result;
 
     if (sat) {
@@ -2139,19 +2089,15 @@ void helper_vsum2sws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     ppc_avr_t result;
     int sat = 0;
 
-#if defined(HOST_WORDS_BIGENDIAN)
     upper = 1;
-#else
-    upper = 0;
-#endif
     for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
-        int64_t t = (int64_t)b->s32[upper + i * 2];
+        int64_t t = (int64_t)b->VsrSW(upper + i * 2);
 
-        result.u64[i] = 0;
+        result.VsrW(i) = 0;
         for (j = 0; j < ARRAY_SIZE(r->u64); j++) {
-            t += a->s32[2 * i + j];
+            t += a->VsrSW(2 * i + j);
         }
-        result.s32[upper + i * 2] = cvtsdsw(t, &sat);
+        result.VsrSW(upper + i * 2) = cvtsdsw(t, &sat);
     }
 
     *r = result;
@@ -2276,7 +2222,7 @@ VUPK(lsw, s64, s32, UPKLO)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        VECTOR_FOR_INORDER_I(i, element) {                              \
+        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
             r->element[i] = name(b->element[i]);                        \
         }                                                               \
     }
@@ -2616,20 +2562,12 @@ static int bcd_cmp_zero(ppc_avr_t *bcd)
 
 static uint16_t get_national_digit(ppc_avr_t *reg, int n)
 {
-#if defined(HOST_WORDS_BIGENDIAN)
-    return reg->u16[7 - n];
-#else
-    return reg->u16[n];
-#endif
+    return reg->VsrH(7 - n);
 }
 
 static void set_national_digit(ppc_avr_t *reg, uint8_t val, int n)
 {
-#if defined(HOST_WORDS_BIGENDIAN)
-    reg->u16[7 - n] = val;
-#else
-    reg->u16[n] = val;
-#endif
+    reg->VsrH(7 - n) = val;
 }
 
 static int bcd_cmp_mag(ppc_avr_t *a, ppc_avr_t *b)
@@ -3373,14 +3311,11 @@ void helper_vpermxor(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     ppc_avr_t result;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        int indexA = c->u8[i] >> 4;
-        int indexB = c->u8[i] & 0xF;
-#if defined(HOST_WORDS_BIGENDIAN)
-        result.u8[i] = a->u8[indexA] ^ b->u8[indexB];
-#else
-        result.u8[i] = a->u8[15-indexA] ^ b->u8[15-indexB];
-#endif
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        int indexA = c->VsrB(i) >> 4;
+        int indexB = c->VsrB(i) & 0xF;
+
+        result.VsrB(i) = a->VsrB(indexA) ^ b->VsrB(indexB);
     }
     *r = result;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c
  2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
                   ` (7 preceding siblings ...)
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c Mark Cave-Ayland
@ 2019-01-31  3:09 ` David Gibson
  8 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2019-01-31  3:09 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: qemu-devel, qemu-ppc, richard.henderson

[-- Attachment #1: Type: text/plain, Size: 3225 bytes --]

On Wed, Jan 30, 2019 at 08:36:30PM +0000, Mark Cave-Ayland wrote:
> >From working on the TCG vector operations patchset, it is apparent that there
> are a large number of endian-based hacks in int_helper.c which can be removed by
> making use of the various Vsr* macros.
> 
> Patch 1 is simple enough, and implements the complete set of Vsr* macros for
> both big endian and little endian hosts.
> 
> Patches 2 and 3 rework the vector merge and multiply algorithms so that they
> no longer require the endian-dependent HI_IDX and LO_IDX macros.
> 
> Patches 4 and 5 then completely remove the HI_IDX/LO_IDX and EL_IDX macros by
> replacing the element accesses with their equivalent Vsr* macro instead.
> 
> Patches 6 and 7 tidy up the VEXT_SIGNED macro and fix a potential shift bug
> in the ROTRu32 and ROTRu64 macros pointed out by Richard during the review of
> v1.
> 
> Finally patch 8 is an inspection-based removal of other HOST_WORDS_BIGENDIAN
> hacks in int_helper.c, again replacing accesses with the relevant Vsr* macro
> instead.
> 
> Note that there are still some endian hacks left in int_helper.c after this
> patchset: in particular the delightfully evil VECTOR_FOR_INORDER_I macro still
> remains in places where the index variable was used for purposes other than
> accessing elements within the vector.
> 
> There were also some parts where additional conversion could be done, but I
> wasn't confident enough to make the change without access to PPC64 test images
> or real big-endian host hardware.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Applied to ppc-for-4.0, thanks.

> 
> 
> v5:
> - Rebase onto master
> - Update vmrg{l,h} algorithm macros in patch 2 as per Richard's suggestion
> 
> v4:
> - Rebase onto master
> - Add R-B tag from Richard for patch 4
> - Use Richard's improved vmrg{l,h} algorithm in patch 2
> 
> v3:
> - Rebase onto master
> - Add R-B tags from Richard
> 
> v2:
> - Add R-B tags from Richard
> - Add patches 6 and 7 to simplify the VEXT_SIGNED macro and fix a potential
>   shift bug in the ROTRu32 and ROTRu64 macros
> - Don't require ordered access for VNEG and VGENERIC_DO macros as pointed out
>   by Richard
> 
> 
> Mark Cave-Ayland (8):
>   target/ppc: implement complete set of Vsr* macros
>   target/ppc: rework vmrg{l,h}{b,h,w} instructions to use Vsr* macros
>   target/ppc: rework vmul{e,o}{s,u}{b,h,w} instructions to use Vsr*
>     macros
>   target/ppc: eliminate use of HI_IDX and LO_IDX macros from
>     int_helper.c
>   target/ppc: eliminate use of EL_IDX macros from int_helper.c
>   target/ppc: simplify VEXT_SIGNED macro in int_helper.c
>   target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c
>   target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c
> 
>  target/ppc/int_helper.c | 523 +++++++++++++++++++-----------------------------
>  target/ppc/internal.h   |   9 +-
>  2 files changed, 217 insertions(+), 315 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use Vsr* macros
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use " Mark Cave-Ayland
@ 2019-02-01  5:25   ` Richard Henderson
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2019-02-01  5:25 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david

On 1/30/19 12:36 PM, Mark Cave-Ayland wrote:
> The current implementations make use of the endian-specific macros MRGLO/MRGHI
> and also reference HI_IDX and LO_IDX directly to calculate array offsets.
> 
> Rework the implementation to use the Vsr* macros so that these per-endian
> references can be removed.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/int_helper.c | 54 +++++++++++++++++--------------------------------
>  1 file changed, 19 insertions(+), 35 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] R: [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c
  2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c Mark Cave-Ayland
@ 2019-02-03 14:58   ` Dino Papararo
  0 siblings, 0 replies; 12+ messages in thread
From: Dino Papararo @ 2019-02-03 14:58 UTC (permalink / raw)
  To: 'Mark Cave-Ayland',
	qemu-devel, qemu-ppc, richard.henderson, david

Hello Mark,
I have a question about improving speed manually unrolling loops like this

Assuming ARRAY_SIZE(r->u8) is always multiple of 4 you can manually improve loop in this way, on modern CPU non sequential instructions can be computed nearly for free:

> {
>     int i, j = (sh & 0xf);
>
> -    VECTOR_FOR_INORDER_I(i, u8) {
> -        r->u8[i] = j++;
> +    for (i = 0; i < ARRAY_SIZE(r->u8); i+=4,j+=4) {
> +        r->VsrB(i) = j;
> +        r->VsrB(i+1) = j+1;
> +        r->VsrB(i+2) = j+2;
> +        r->VsrB(i+3) = j+3; }
> }

In this patch there are a lot of functions can benefit by unrolling loops, with a huge speed improvement.
Maybe compiler could do it itself but aren't humans still better? 😊

Best Regards,
Dino Papararo

-----Messaggio originale-----
Da: Qemu-devel <qemu-devel-bounces+skizzato73=msn.com@nongnu.org> Per conto di Mark Cave-Ayland
Inviato: mercoledì 30 gennaio 2019 21:37
A: qemu-devel@nongnu.org; qemu-ppc@nongnu.org; richard.henderson@linaro.org; david@gibson.dropbear.id.au
Oggetto: [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c

Following on from the previous work, there are numerous endian-related hacks in int_helper.c that can now be replaced with Vsr* macros.

There are also a few places where the VECTOR_FOR_INORDER_I macro can be replaced with a normal iterator since the processing order is irrelevant.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 155 ++++++++++++++----------------------------------
 1 file changed, 45 insertions(+), 110 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 916d10c25b..8efc283388 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -443,8 +443,8 @@ void helper_lvsl(ppc_avr_t *r, target_ulong sh)  {
     int i, j = (sh & 0xf);
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        r->u8[i] = j++;
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        r->VsrB(i) = j++;
     }
 }
 
@@ -452,18 +452,14 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh)  {
     int i, j = 0x10 - (sh & 0xf);
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        r->u8[i] = j++;
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        r->VsrB(i) = j++;
     }
 }
 
 void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)  { -#if defined(HOST_WORDS_BIGENDIAN)
-    env->vscr = r->u32[3];
-#else
-    env->vscr = r->u32[0];
-#endif
+    env->vscr = r->VsrW(3);
     set_flush_to_zero(vscr_nj, &env->vec_status);  }
 
@@ -870,8 +866,8 @@ target_ulong helper_vclzlsbb(ppc_avr_t *r)  {
     target_ulong count = 0;
     int i;
-    VECTOR_FOR_INORDER_I(i, u8) {
-        if (r->u8[i] & 0x01) {
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        if (r->VsrB(i) & 0x01) {
             break;
         }
         count++;
@@ -883,12 +879,8 @@ target_ulong helper_vctzlsbb(ppc_avr_t *r)  {
     target_ulong count = 0;
     int i;
-#if defined(HOST_WORDS_BIGENDIAN)
     for (i = ARRAY_SIZE(r->u8) - 1; i >= 0; i--) { -#else
-    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
-#endif
-        if (r->u8[i] & 0x01) {
+        if (r->VsrB(i) & 0x01) {
             break;
         }
         count++;
@@ -1137,18 +1129,14 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
     ppc_avr_t result;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        int s = c->u8[i] & 0x1f;
-#if defined(HOST_WORDS_BIGENDIAN)
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        int s = c->VsrB(i) & 0x1f;
         int index = s & 0xf;
-#else
-        int index = 15 - (s & 0xf);
-#endif
 
         if (s & 0x10) {
-            result.u8[i] = b->u8[index];
+            result.VsrB(i) = b->VsrB(index);
         } else {
-            result.u8[i] = a->u8[index];
+            result.VsrB(i) = a->VsrB(index);
         }
     }
     *r = result;
@@ -1160,18 +1148,14 @@ void helper_vpermr(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
     ppc_avr_t result;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        int s = c->u8[i] & 0x1f;
-#if defined(HOST_WORDS_BIGENDIAN)
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        int s = c->VsrB(i) & 0x1f;
         int index = 15 - (s & 0xf);
-#else
-        int index = s & 0xf;
-#endif
 
         if (s & 0x10) {
-            result.u8[i] = a->u8[index];
+            result.VsrB(i) = a->VsrB(index);
         } else {
-            result.u8[i] = b->u8[index];
+            result.VsrB(i) = b->VsrB(index);
         }
     }
     *r = result;
@@ -1868,25 +1852,14 @@ void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift)
     int i;
     ppc_avr_t result;
 
-#if defined(HOST_WORDS_BIGENDIAN)
     for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
         int index = sh + i;
         if (index > 0xf) {
-            result.u8[i] = b->u8[index - 0x10];
-        } else {
-            result.u8[i] = a->u8[index];
-        }
-    }
-#else
-    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
-        int index = (16 - sh) + i;
-        if (index > 0xf) {
-            result.u8[i] = a->u8[index - 0x10];
+            result.VsrB(i) = b->VsrB(index - 0x10);
         } else {
-            result.u8[i] = b->u8[index];
+            result.VsrB(i) = a->VsrB(index);
         }
     }
-#endif
     *r = result;
 }
 
@@ -1905,25 +1878,20 @@ void helper_vslo(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 /* Experimental testing shows that hardware masks the immediate.  */  #define _SPLAT_MASKED(element) (splat & (ARRAY_SIZE(r->element) - 1)) -#if defined(HOST_WORDS_BIGENDIAN)  #define SPLAT_ELEMENT(element) _SPLAT_MASKED(element) -#else
-#define SPLAT_ELEMENT(element)                                  \
-    (ARRAY_SIZE(r->element) - 1 - _SPLAT_MASKED(element))
-#endif
-#define VSPLT(suffix, element)                                          \
+#define VSPLT(suffix, element, access)                                  \
     void helper_vsplt##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
     {                                                                   \
-        uint32_t s = b->element[SPLAT_ELEMENT(element)];                \
+        uint32_t s = b->access(SPLAT_ELEMENT(element));                 \
         int i;                                                          \
                                                                         \
         for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            r->element[i] = s;                                          \
+            r->access(i) = s;                                           \
         }                                                               \
     }
-VSPLT(b, u8)
-VSPLT(h, u16)
-VSPLT(w, u32)
+VSPLT(b, u8, VsrB)
+VSPLT(h, u16, VsrH)
+VSPLT(w, u32, VsrW)
 #undef VSPLT
 #undef SPLAT_ELEMENT
 #undef _SPLAT_MASKED
@@ -1984,17 +1952,10 @@ void helper_xxextractuw(CPUPPCState *env, target_ulong xtn,
     getVSR(xbn, &xb, env);
     memset(&xt, 0, sizeof(xt));
 
-#if defined(HOST_WORDS_BIGENDIAN)
     ext_index = index;
     for (i = 0; i < es; i++, ext_index++) {
-        xt.u8[8 - es + i] = xb.u8[ext_index % 16];
-    }
-#else
-    ext_index = 15 - index;
-    for (i = es - 1; i >= 0; i--, ext_index--) {
-        xt.u8[8 + i] = xb.u8[ext_index % 16];
+        xt.VsrB(8 - es + i) = xb.VsrB(ext_index % 16);
     }
-#endif
 
     putVSR(xtn, &xt, env);
 }
@@ -2009,17 +1970,10 @@ void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
     getVSR(xbn, &xb, env);
     getVSR(xtn, &xt, env);
 
-#if defined(HOST_WORDS_BIGENDIAN)
     ins_index = index;
     for (i = 0; i < es && ins_index < 16; i++, ins_index++) {
-        xt.u8[ins_index] = xb.u8[8 - es + i];
-    }
-#else
-    ins_index = 15 - index;
-    for (i = es - 1; i >= 0 && ins_index >= 0; i--, ins_index--) {
-        xt.u8[ins_index] = xb.u8[8 + i];
+        xt.VsrB(ins_index) = xb.VsrB(8 - es + i);
     }
-#endif
 
     putVSR(xtn, &xt, env);
 }
@@ -2028,7 +1982,7 @@ void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
 void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
 {                                                                   \
     int i;                                                          \
-    VECTOR_FOR_INORDER_I(i, element) {                              \
+    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
         r->element[i] = (cast)b->element[i];                        \
     }                                                               \
 }
@@ -2043,7 +1997,7 @@ VEXT_SIGNED(vextsw2d, s64, int32_t)
 void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
 {                                                                   \
     int i;                                                          \
-    VECTOR_FOR_INORDER_I(i, element) {                              \
+    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
         r->element[i] = -b->element[i];                             \
     }                                                               \
 }
@@ -2115,17 +2069,13 @@ void helper_vsumsws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     ppc_avr_t result;
     int sat = 0;
 
-#if defined(HOST_WORDS_BIGENDIAN)
-    upper = ARRAY_SIZE(r->s32)-1;
-#else
-    upper = 0;
-#endif
-    t = (int64_t)b->s32[upper];
+    upper = ARRAY_SIZE(r->s32) - 1;
+    t = (int64_t)b->VsrSW(upper);
     for (i = 0; i < ARRAY_SIZE(r->s32); i++) {
-        t += a->s32[i];
-        result.s32[i] = 0;
+        t += a->VsrSW(i);
+        result.VsrSW(i) = 0;
     }
-    result.s32[upper] = cvtsdsw(t, &sat);
+    result.VsrSW(upper) = cvtsdsw(t, &sat);
     *r = result;
 
     if (sat) {
@@ -2139,19 +2089,15 @@ void helper_vsum2sws(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     ppc_avr_t result;
     int sat = 0;
 
-#if defined(HOST_WORDS_BIGENDIAN)
     upper = 1;
-#else
-    upper = 0;
-#endif
     for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
-        int64_t t = (int64_t)b->s32[upper + i * 2];
+        int64_t t = (int64_t)b->VsrSW(upper + i * 2);
 
-        result.u64[i] = 0;
+        result.VsrW(i) = 0;
         for (j = 0; j < ARRAY_SIZE(r->u64); j++) {
-            t += a->s32[2 * i + j];
+            t += a->VsrSW(2 * i + j);
         }
-        result.s32[upper + i * 2] = cvtsdsw(t, &sat);
+        result.VsrSW(upper + i * 2) = cvtsdsw(t, &sat);
     }
 
     *r = result;
@@ -2276,7 +2222,7 @@ VUPK(lsw, s64, s32, UPKLO)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        VECTOR_FOR_INORDER_I(i, element) {                              \
+        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
             r->element[i] = name(b->element[i]);                        \
         }                                                               \
     }
@@ -2616,20 +2562,12 @@ static int bcd_cmp_zero(ppc_avr_t *bcd)
 
 static uint16_t get_national_digit(ppc_avr_t *reg, int n)  { -#if defined(HOST_WORDS_BIGENDIAN)
-    return reg->u16[7 - n];
-#else
-    return reg->u16[n];
-#endif
+    return reg->VsrH(7 - n);
 }
 
 static void set_national_digit(ppc_avr_t *reg, uint8_t val, int n)  { -#if defined(HOST_WORDS_BIGENDIAN)
-    reg->u16[7 - n] = val;
-#else
-    reg->u16[n] = val;
-#endif
+    reg->VsrH(7 - n) = val;
 }
 
 static int bcd_cmp_mag(ppc_avr_t *a, ppc_avr_t *b) @@ -3373,14 +3311,11 @@ void helper_vpermxor(ppc_avr_t *r,  ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     ppc_avr_t result;
     int i;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        int indexA = c->u8[i] >> 4;
-        int indexB = c->u8[i] & 0xF;
-#if defined(HOST_WORDS_BIGENDIAN)
-        result.u8[i] = a->u8[indexA] ^ b->u8[indexB];
-#else
-        result.u8[i] = a->u8[15-indexA] ^ b->u8[15-indexB];
-#endif
+    for (i = 0; i < ARRAY_SIZE(r->u8); i++) {
+        int indexA = c->VsrB(i) >> 4;
+        int indexB = c->VsrB(i) & 0xF;
+
+        result.VsrB(i) = a->VsrB(indexA) ^ b->VsrB(indexB);
     }
     *r = result;
 }
--
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-02-03 14:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30 20:36 [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 1/8] target/ppc: implement complete set of Vsr* macros Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 2/8] target/ppc: rework vmrg{l, h}{b, h, w} instructions to use " Mark Cave-Ayland
2019-02-01  5:25   ` Richard Henderson
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 3/8] target/ppc: rework vmul{e, o}{s, u}{b, " Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 4/8] target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 5/8] target/ppc: eliminate use of EL_IDX " Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 6/8] target/ppc: simplify VEXT_SIGNED macro in int_helper.c Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 7/8] target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c Mark Cave-Ayland
2019-01-30 20:36 ` [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c Mark Cave-Ayland
2019-02-03 14:58   ` [Qemu-devel] R: " Dino Papararo
2019-01-31  3:09 ` [Qemu-devel] [PATCH v5 0/8] target/ppc: remove various endian hacks from int_helper.c David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.