All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] target-arm: fix Neon VUZP, VZIP instructions
@ 2011-02-11 16:13 Peter Maydell
  2011-02-11 16:14 ` [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function Peter Maydell
  2011-02-11 16:14 ` [Qemu-devel] [PATCH 2/2] target-arm: Move Neon VZIP " Peter Maydell
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Maydell @ 2011-02-11 16:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

This patch series is a pair of patches from the meego tree which
fix bugs in the Neon VZIP and VUZP instructions by abandoning
the existing inline implementations in favour of calling out to
a straightforward helper function. The inline routines could
generate 50+ TCG ops each, which is well over the recommended
limit in tcg/README for using helpers instead; they also did
not give the correct results...

I've tested these patches using the usual random instruction
generation approach.

Juha Riihimäki (2):
  target-arm: Move Neon VUZP to a helper function
  target-arm: Move Neon VZIP to a helper function

 target-arm/helpers.h     |    3 +
 target-arm/neon_helper.c |  166 ++++++++++++++++++++++++++++++++++++++++++++++
 target-arm/translate.c   |  163 ++-------------------------------------------
 3 files changed, 177 insertions(+), 155 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function
  2011-02-11 16:13 [Qemu-devel] [PATCH 0/2] target-arm: fix Neon VUZP, VZIP instructions Peter Maydell
@ 2011-02-11 16:14 ` Peter Maydell
  2011-02-11 16:53   ` Peter Maydell
  2011-02-11 16:14 ` [Qemu-devel] [PATCH 2/2] target-arm: Move Neon VZIP " Peter Maydell
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2011-02-11 16:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Move the implementation of the Neon VUZP unzip instruction from inline
code to a helper function. (At 50+ TCG ops it was well over the
ecommended limit for coding inline.) The helper implementation also
fixes the handling of the quadword version of the instruction.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helpers.h     |    2 +
 target-arm/neon_helper.c |   85 ++++++++++++++++++++++++++++++++++++++++++++++
 target-arm/translate.c   |   82 ++------------------------------------------
 3 files changed, 91 insertions(+), 78 deletions(-)

diff --git a/target-arm/helpers.h b/target-arm/helpers.h
index 77f1635..893503f 100644
--- a/target-arm/helpers.h
+++ b/target-arm/helpers.h
@@ -460,4 +460,6 @@ DEF_HELPER_3(iwmmxt_muladdswl, i64, i64, i32, i32)
 
 DEF_HELPER_2(set_teecr, void, env, i32)
 
+DEF_HELPER_2(neon_unzip, void, env, i32)
+
 #include "def-helper.h"
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index dc09968..f8d4b90 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1663,3 +1663,88 @@ uint32_t HELPER(neon_acgt_f32)(uint32_t a, uint32_t b)
     float32 f1 = float32_abs(vfp_itos(b));
     return (float32_compare_quiet(f0, f1, NFS) > 0) ? ~0 : 0;
 }
+
+#define ELEM(V, N, SIZE) (((V) >> ((N) * (SIZE))) & ((1ull << (SIZE)) - 1))
+
+void HELPER(neon_unzip)(CPUState *env, uint32_t insn)
+{
+    int rd = ((insn >> 18) & 0x10) | ((insn >> 12) & 0x0f);
+    int rm = ((insn >> 1) & 0x10) | (insn & 0x0f);
+    int size = (insn >> 18) & 3;
+    if (insn & 0x40) { /* Q */
+        uint64_t zm0 = float64_val(env->vfp.regs[rm]);
+        uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
+        uint64_t zd0 = float64_val(env->vfp.regs[rd]);
+        uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+        uint64_t m0 = 0, m1 = 0, d0 = 0, d1 = 0;
+        switch (size) {
+            case 0:
+                d0 = ELEM(zd0, 0, 8) | (ELEM(zd0, 2, 8) << 8)
+                     | (ELEM(zd0, 4, 8) << 16) | (ELEM(zd0, 6, 8) << 24)
+                     | (ELEM(zd1, 0, 8) << 32) | (ELEM(zd1, 2, 8) << 40)
+                     | (ELEM(zd1, 4, 8) << 48) | (ELEM(zd1, 6, 8) << 56);
+                d1 = ELEM(zm0, 0, 8) | (ELEM(zm0, 2, 8) << 8)
+                     | (ELEM(zm0, 4, 8) << 16) | (ELEM(zm0, 6, 8) << 24)
+                     | (ELEM(zm1, 0, 8) << 32) | (ELEM(zm1, 2, 8) << 40)
+                     | (ELEM(zm1, 4, 8) << 48) | (ELEM(zm1, 6, 8) << 56);
+                m0 = ELEM(zd0, 1, 8) | (ELEM(zd0, 3, 8) << 8)
+                     | (ELEM(zd0, 5, 8) << 16) | (ELEM(zd0, 7, 8) << 24)
+                     | (ELEM(zd1, 1, 8) << 32) | (ELEM(zd1, 3, 8) << 40)
+                     | (ELEM(zd1, 5, 8) << 48) | (ELEM(zd1, 7, 8) << 56);
+                m1 = ELEM(zm0, 1, 8) | (ELEM(zm0, 3, 8) << 8)
+                     | (ELEM(zm0, 5, 8) << 16) | (ELEM(zm0, 7, 8) << 24)
+                     | (ELEM(zm1, 1, 8) << 32) | (ELEM(zm1, 3, 8) << 40)
+                     | (ELEM(zm1, 5, 8) << 48) | (ELEM(zm1, 7, 8) << 56);
+                break;
+            case 1:
+                d0 = ELEM(zd0, 0, 16) | (ELEM(zd0, 2, 16) << 16)
+                     | (ELEM(zd1, 0, 16) << 32) | (ELEM(zd1, 2, 16) << 48);
+                d1 = ELEM(zm0, 0, 16) | (ELEM(zm0, 2, 16) << 16)
+                     | (ELEM(zm1, 0, 16) << 32) | (ELEM(zm1, 2, 16) << 48);
+                m0 = ELEM(zd0, 1, 16) | (ELEM(zd0, 3, 16) << 16)
+                     | (ELEM(zd1, 1, 16) << 32) | (ELEM(zd1, 3, 16) << 48);
+                m1 = ELEM(zm0, 1, 16) | (ELEM(zm0, 3, 16) << 16)
+                     | (ELEM(zm1, 1, 16) << 32) | (ELEM(zm1, 3, 16) << 48);
+                break;
+            case 2:
+                d0 = ELEM(zd0, 0, 32) | (ELEM(zd1, 0, 32) << 32);
+                d1 = ELEM(zm0, 0, 32) | (ELEM(zm1, 0, 32) << 32);
+                m0 = ELEM(zd0, 1, 32) | (ELEM(zd1, 1, 32) << 32);
+                m1 = ELEM(zm0, 1, 32) | (ELEM(zm1, 1, 32) << 32);
+                break;
+            default:
+                break;
+        }
+        env->vfp.regs[rm] = make_float64(m0);
+        env->vfp.regs[rm + 1] = make_float64(m1);
+        env->vfp.regs[rd] = make_float64(d0);
+        env->vfp.regs[rd + 1] = make_float64(d1);
+    } else {
+        uint64_t zm = float64_val(env->vfp.regs[rm]);
+        uint64_t zd = float64_val(env->vfp.regs[rd]);
+        uint64_t m = 0, d = 0;
+        switch (size) {
+            case 0:
+                d = ELEM(zd, 0, 8) | (ELEM(zd, 2, 8) << 8)
+                    | (ELEM(zd, 4, 8) << 16) | (ELEM(zd, 6, 8) << 24)
+                    | (ELEM(zm, 0, 8) << 32) | (ELEM(zm, 2, 8) << 40)
+                    | (ELEM(zm, 4, 8) << 48) | (ELEM(zm, 6, 8) << 56);
+                m = ELEM(zd, 1, 8) | (ELEM(zd, 3, 8) << 8)
+                    | (ELEM(zd, 5, 8) << 16) | (ELEM(zd, 7, 8) << 24)
+                    | (ELEM(zm, 1, 8) << 32) | (ELEM(zm, 3, 8) << 40)
+                    | (ELEM(zm, 5, 8) << 48) | (ELEM(zm, 7, 8) << 56);
+                break;
+            case 1:
+                d = ELEM(zd, 0, 16) | (ELEM(zd, 2, 16) << 16)
+                    | (ELEM(zm, 0, 16) << 32) | (ELEM(zm, 2, 16) << 48);
+                m = ELEM(zd, 1, 16) | (ELEM(zd, 3, 16) << 16)
+                    | (ELEM(zm, 1, 16) << 32) | (ELEM(zm, 3, 16) << 48);
+                break;
+            default:
+                /* size == 2 is a no-op for doubleword vectors */
+                break;
+        }
+        env->vfp.regs[rm] = make_float64(m);
+        env->vfp.regs[rd] = make_float64(d);
+    }
+}
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 362d1d0..3200742 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3614,42 +3614,6 @@ static inline TCGv neon_get_scalar(int size, int reg)
     return tmp;
 }
 
-static void gen_neon_unzip_u8(TCGv t0, TCGv t1)
-{
-    TCGv rd, rm, tmp;
-
-    rd = new_tmp();
-    rm = new_tmp();
-    tmp = new_tmp();
-
-    tcg_gen_andi_i32(rd, t0, 0xff);
-    tcg_gen_shri_i32(tmp, t0, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0xff00);
-    tcg_gen_or_i32(rd, rd, tmp);
-    tcg_gen_shli_i32(tmp, t1, 16);
-    tcg_gen_andi_i32(tmp, tmp, 0xff0000);
-    tcg_gen_or_i32(rd, rd, tmp);
-    tcg_gen_shli_i32(tmp, t1, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0xff000000);
-    tcg_gen_or_i32(rd, rd, tmp);
-
-    tcg_gen_shri_i32(rm, t0, 8);
-    tcg_gen_andi_i32(rm, rm, 0xff);
-    tcg_gen_shri_i32(tmp, t0, 16);
-    tcg_gen_andi_i32(tmp, tmp, 0xff00);
-    tcg_gen_or_i32(rm, rm, tmp);
-    tcg_gen_shli_i32(tmp, t1, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0xff0000);
-    tcg_gen_or_i32(rm, rm, tmp);
-    tcg_gen_andi_i32(tmp, t1, 0xff000000);
-    tcg_gen_or_i32(t1, rm, tmp);
-    tcg_gen_mov_i32(t0, rd);
-
-    dead_tmp(tmp);
-    dead_tmp(rm);
-    dead_tmp(rd);
-}
-
 static void gen_neon_zip_u8(TCGv t0, TCGv t1)
 {
     TCGv rd, rm, tmp;
@@ -3705,25 +3669,6 @@ static void gen_neon_zip_u16(TCGv t0, TCGv t1)
     dead_tmp(tmp);
 }
 
-static void gen_neon_unzip(int reg, int q, int tmp, int size)
-{
-    int n;
-    TCGv t0, t1;
-
-    for (n = 0; n < q + 1; n += 2) {
-        t0 = neon_load_reg(reg, n);
-        t1 = neon_load_reg(reg, n + 1);
-        switch (size) {
-        case 0: gen_neon_unzip_u8(t0, t1); break;
-        case 1: gen_neon_zip_u16(t0, t1); break; /* zip and unzip are the same.  */
-        case 2: /* no-op */; break;
-        default: abort();
-        }
-        neon_store_scratch(tmp + n, t0);
-        neon_store_scratch(tmp + n + 1, t1);
-    }
-}
-
 static void gen_neon_trn_u8(TCGv t0, TCGv t1)
 {
     TCGv rd, tmp;
@@ -5436,31 +5381,12 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     }
                     break;
                 case 34: /* VUZP */
-                    /* Reg  Before       After
-                       Rd   A3 A2 A1 A0  B2 B0 A2 A0
-                       Rm   B3 B2 B1 B0  B3 B1 A3 A1
-                     */
-                    if (size == 3)
+                    if (size == 3 || (!q && size == 2)) {
                         return 1;
-                    gen_neon_unzip(rd, q, 0, size);
-                    gen_neon_unzip(rm, q, 4, size);
-                    if (q) {
-                        static int unzip_order_q[8] =
-                            {0, 2, 4, 6, 1, 3, 5, 7};
-                        for (n = 0; n < 8; n++) {
-                            int reg = (n < 4) ? rd : rm;
-                            tmp = neon_load_scratch(unzip_order_q[n]);
-                            neon_store_reg(reg, n % 4, tmp);
-                        }
-                    } else {
-                        static int unzip_order[4] =
-                            {0, 4, 1, 5};
-                        for (n = 0; n < 4; n++) {
-                            int reg = (n < 2) ? rd : rm;
-                            tmp = neon_load_scratch(unzip_order[n]);
-                            neon_store_reg(reg, n % 2, tmp);
-                        }
                     }
+                    tmp = tcg_const_i32(insn);
+                    gen_helper_neon_unzip(cpu_env, tmp);
+                    tcg_temp_free_i32(tmp);
                     break;
                 case 35: /* VZIP */
                     /* Reg  Before       After
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 2/2] target-arm: Move Neon VZIP to a helper function
  2011-02-11 16:13 [Qemu-devel] [PATCH 0/2] target-arm: fix Neon VUZP, VZIP instructions Peter Maydell
  2011-02-11 16:14 ` [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function Peter Maydell
@ 2011-02-11 16:14 ` Peter Maydell
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2011-02-11 16:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Move the implementation of the Neon VUZP unzip instruction from inline
code to a helper function. (At 50+ TCG ops it was well over the
recommended limit for coding inline.) The helper implementation also
gives the correct answers where the inline implementation did not.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helpers.h     |    1 +
 target-arm/neon_helper.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++
 target-arm/translate.c   |   81 ++-------------------------------------------
 3 files changed, 86 insertions(+), 77 deletions(-)

diff --git a/target-arm/helpers.h b/target-arm/helpers.h
index 893503f..9dc55c1 100644
--- a/target-arm/helpers.h
+++ b/target-arm/helpers.h
@@ -461,5 +461,6 @@ DEF_HELPER_3(iwmmxt_muladdswl, i64, i64, i32, i32)
 DEF_HELPER_2(set_teecr, void, env, i32)
 
 DEF_HELPER_2(neon_unzip, void, env, i32)
+DEF_HELPER_2(neon_zip, void, env, i32)
 
 #include "def-helper.h"
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index f8d4b90..b348896 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1748,3 +1748,84 @@ void HELPER(neon_unzip)(CPUState *env, uint32_t insn)
         env->vfp.regs[rd] = make_float64(d);
     }
 }
+
+void HELPER(neon_zip)(CPUState *env, uint32_t insn)
+{
+    int rd = ((insn >> 18) & 0x10) | ((insn >> 12) & 0x0f);
+    int rm = ((insn >> 1) & 0x10) | (insn & 0x0f);
+    int size = (insn >> 18) & 3;
+    if (insn & 0x40) { /* Q */
+        uint64_t zm0 = float64_val(env->vfp.regs[rm]);
+        uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
+        uint64_t zd0 = float64_val(env->vfp.regs[rd]);
+        uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+        uint64_t m0 = 0, m1 = 0, d0 = 0, d1 = 0;
+        switch (size) {
+            case 0:
+                d0 = ELEM(zd0, 0, 8) | (ELEM(zm0, 0, 8) << 8)
+                     | (ELEM(zd0, 1, 8) << 16) | (ELEM(zm0, 1, 8) << 24)
+                     | (ELEM(zd0, 2, 8) << 32) | (ELEM(zm0, 2, 8) << 40)
+                     | (ELEM(zd0, 3, 8) << 48) | (ELEM(zm0, 3, 8) << 56);
+                d1 = ELEM(zd0, 4, 8) | (ELEM(zm0, 4, 8) << 8)
+                     | (ELEM(zd0, 5, 8) << 16) | (ELEM(zm0, 5, 8) << 24)
+                     | (ELEM(zd0, 6, 8) << 32) | (ELEM(zm0, 6, 8) << 40)
+                     | (ELEM(zd0, 7, 8) << 48) | (ELEM(zm0, 7, 8) << 56);
+                m0 = ELEM(zd1, 0, 8) | (ELEM(zm1, 0, 8) << 8)
+                     | (ELEM(zd1, 1, 8) << 16) | (ELEM(zm1, 1, 8) << 24)
+                     | (ELEM(zd1, 2, 8) << 32) | (ELEM(zm1, 2, 8) << 40)
+                     | (ELEM(zd1, 3, 8) << 48) | (ELEM(zm1, 3, 8) << 56);
+                m1 = ELEM(zd1, 4, 8) | (ELEM(zm1, 4, 8) << 8)
+                     | (ELEM(zd1, 5, 8) << 16) | (ELEM(zm1, 5, 8) << 24)
+                     | (ELEM(zd1, 6, 8) << 32) | (ELEM(zm1, 6, 8) << 40)
+                     | (ELEM(zd1, 7, 8) << 48) | (ELEM(zm1, 7, 8) << 56);
+                break;
+            case 1:
+                d0 = ELEM(zd0, 0, 16) | (ELEM(zm0, 0, 16) << 16)
+                     | (ELEM(zd0, 1, 16) << 32) | (ELEM(zm0, 1, 16) << 48);
+                d1 = ELEM(zd0, 2, 16) | (ELEM(zm0, 2, 16) << 16)
+                     | (ELEM(zd0, 3, 16) << 32) | (ELEM(zm0, 3, 16) << 48);
+                m0 = ELEM(zd1, 0, 16) | (ELEM(zm1, 0, 16) << 16)
+                     | (ELEM(zd1, 1, 16) << 32) | (ELEM(zm1, 1, 16) << 48);
+                m1 = ELEM(zd1, 2, 16) | (ELEM(zm1, 2, 16) << 16)
+                     | (ELEM(zd1, 3, 16) << 32) | (ELEM(zm1, 3, 16) << 48);
+                break;
+            case 2:
+                d0 = ELEM(zd0, 0, 32) | (ELEM(zm0, 0, 32) << 32);
+                d1 = ELEM(zd0, 1, 32) | (ELEM(zm0, 1, 32) << 32);
+                m0 = ELEM(zd1, 0, 32) | (ELEM(zm1, 0, 32) << 32);
+                m1 = ELEM(zd1, 1, 32) | (ELEM(zm1, 1, 32) << 32);
+                break;
+        }
+        env->vfp.regs[rm] = make_float64(m0);
+        env->vfp.regs[rm + 1] = make_float64(m1);
+        env->vfp.regs[rd] = make_float64(d0);
+        env->vfp.regs[rd + 1] = make_float64(d1);
+    } else {
+        uint64_t zm = float64_val(env->vfp.regs[rm]);
+        uint64_t zd = float64_val(env->vfp.regs[rd]);
+        uint64_t m = 0, d = 0;
+        switch (size) {
+            case 0:
+                d = ELEM(zd, 0, 8) | (ELEM(zm, 0, 8) << 8)
+                    | (ELEM(zd, 1, 8) << 16) | (ELEM(zm, 1, 8) << 24)
+                    | (ELEM(zd, 2, 8) << 32) | (ELEM(zm, 2, 8) << 40)
+                    | (ELEM(zd, 3, 8) << 48) | (ELEM(zm, 3, 8) << 56);
+                m = ELEM(zd, 4, 8) | (ELEM(zm, 4, 8) << 8)
+                    | (ELEM(zd, 5, 8) << 16) | (ELEM(zm, 5, 8) << 24)
+                    | (ELEM(zd, 6, 8) << 32) | (ELEM(zm, 6, 8) << 40)
+                    | (ELEM(zd, 7, 8) << 48) | (ELEM(zm, 7, 8) << 56);
+                break;
+            case 1:
+                d = ELEM(zd, 0, 16) | (ELEM(zm, 0, 16) << 16)
+                    | (ELEM(zd, 1, 16) << 32) | (ELEM(zm, 1, 16) << 48);
+                m = ELEM(zd, 2, 16) | (ELEM(zm, 2, 16) << 16)
+                    | (ELEM(zd, 3, 16) << 32) | (ELEM(zm, 3, 16) << 48);
+                break;
+            default:
+                /* size == 2 is a no-op for doubleword vectors */
+                break;
+        }
+        env->vfp.regs[rm] = make_float64(m);
+        env->vfp.regs[rd] = make_float64(d);
+    }
+}
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 3200742..d72a37a 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3614,61 +3614,6 @@ static inline TCGv neon_get_scalar(int size, int reg)
     return tmp;
 }
 
-static void gen_neon_zip_u8(TCGv t0, TCGv t1)
-{
-    TCGv rd, rm, tmp;
-
-    rd = new_tmp();
-    rm = new_tmp();
-    tmp = new_tmp();
-
-    tcg_gen_andi_i32(rd, t0, 0xff);
-    tcg_gen_shli_i32(tmp, t1, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0xff00);
-    tcg_gen_or_i32(rd, rd, tmp);
-    tcg_gen_shli_i32(tmp, t0, 16);
-    tcg_gen_andi_i32(tmp, tmp, 0xff0000);
-    tcg_gen_or_i32(rd, rd, tmp);
-    tcg_gen_shli_i32(tmp, t1, 24);
-    tcg_gen_andi_i32(tmp, tmp, 0xff000000);
-    tcg_gen_or_i32(rd, rd, tmp);
-
-    tcg_gen_andi_i32(rm, t1, 0xff000000);
-    tcg_gen_shri_i32(tmp, t0, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0xff0000);
-    tcg_gen_or_i32(rm, rm, tmp);
-    tcg_gen_shri_i32(tmp, t1, 8);
-    tcg_gen_andi_i32(tmp, tmp, 0xff00);
-    tcg_gen_or_i32(rm, rm, tmp);
-    tcg_gen_shri_i32(tmp, t0, 16);
-    tcg_gen_andi_i32(tmp, tmp, 0xff);
-    tcg_gen_or_i32(t1, rm, tmp);
-    tcg_gen_mov_i32(t0, rd);
-
-    dead_tmp(tmp);
-    dead_tmp(rm);
-    dead_tmp(rd);
-}
-
-static void gen_neon_zip_u16(TCGv t0, TCGv t1)
-{
-    TCGv tmp, tmp2;
-
-    tmp = new_tmp();
-    tmp2 = new_tmp();
-
-    tcg_gen_andi_i32(tmp, t0, 0xffff);
-    tcg_gen_shli_i32(tmp2, t1, 16);
-    tcg_gen_or_i32(tmp, tmp, tmp2);
-    tcg_gen_andi_i32(t1, t1, 0xffff0000);
-    tcg_gen_shri_i32(tmp2, t0, 16);
-    tcg_gen_or_i32(t1, t1, tmp2);
-    tcg_gen_mov_i32(t0, tmp);
-
-    dead_tmp(tmp2);
-    dead_tmp(tmp);
-}
-
 static void gen_neon_trn_u8(TCGv t0, TCGv t1)
 {
     TCGv rd, tmp;
@@ -5389,30 +5334,12 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     tcg_temp_free_i32(tmp);
                     break;
                 case 35: /* VZIP */
-                    /* Reg  Before       After
-                       Rd   A3 A2 A1 A0  B1 A1 B0 A0
-                       Rm   B3 B2 B1 B0  B3 A3 B2 A2
-                     */
-                    if (size == 3)
+                    if (size == 3 || (!q && size == 2)) {
                         return 1;
-                    count = (q ? 4 : 2);
-                    for (n = 0; n < count; n++) {
-                        tmp = neon_load_reg(rd, n);
-                        tmp2 = neon_load_reg(rd, n);
-                        switch (size) {
-                        case 0: gen_neon_zip_u8(tmp, tmp2); break;
-                        case 1: gen_neon_zip_u16(tmp, tmp2); break;
-                        case 2: /* no-op */; break;
-                        default: abort();
-                        }
-                        neon_store_scratch(n * 2, tmp);
-                        neon_store_scratch(n * 2 + 1, tmp2);
-                    }
-                    for (n = 0; n < count * 2; n++) {
-                        int reg = (n < count) ? rd : rm;
-                        tmp = neon_load_scratch(n);
-                        neon_store_reg(reg, n % count, tmp);
                     }
+                    tmp = tcg_const_i32(insn);
+                    gen_helper_neon_zip(cpu_env, tmp);
+                    tcg_temp_free_i32(tmp);
                     break;
                 case 36: case 37: /* VMOVN, VQMOVUN, VQMOVN */
                     if (size == 3)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function
  2011-02-11 16:14 ` [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function Peter Maydell
@ 2011-02-11 16:53   ` Peter Maydell
  2011-02-11 17:03     ` Nathan Froyd
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2011-02-11 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

On 11 February 2011 16:14, Peter Maydell <peter.maydell@linaro.org> wrote:
> +void HELPER(neon_unzip)(CPUState *env, uint32_t insn)
> +{
> +    int rd = ((insn >> 18) & 0x10) | ((insn >> 12) & 0x0f);
> +    int rm = ((insn >> 1) & 0x10) | (insn & 0x0f);
> +    int size = (insn >> 18) & 3;
> +    if (insn & 0x40) { /* Q */
> +        uint64_t zm0 = float64_val(env->vfp.regs[rm]);
> +        uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
> +        uint64_t zd0 = float64_val(env->vfp.regs[rd]);
> +        uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);

I can rework these patches if people don't like the way this is
effectively doing decoding in a helper function, incidentally,
although I'm not convinced it would end up any nicer overall.

-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function
  2011-02-11 16:53   ` Peter Maydell
@ 2011-02-11 17:03     ` Nathan Froyd
  2011-02-11 17:12       ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Nathan Froyd @ 2011-02-11 17:03 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, patches

On Fri, Feb 11, 2011 at 04:53:30PM +0000, Peter Maydell wrote:
> On 11 February 2011 16:14, Peter Maydell <peter.maydell@linaro.org> wrote:
> > +void HELPER(neon_unzip)(CPUState *env, uint32_t insn)
> > +{
> > +    int rd = ((insn >> 18) & 0x10) | ((insn >> 12) & 0x0f);
> > +    int rm = ((insn >> 1) & 0x10) | (insn & 0x0f);
> > +    int size = (insn >> 18) & 3;
> > +    if (insn & 0x40) { /* Q */
> > +        uint64_t zm0 = float64_val(env->vfp.regs[rm]);
> > +        uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
> > +        uint64_t zd0 = float64_val(env->vfp.regs[rd]);
> > +        uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
> 
> I can rework these patches if people don't like the way this is
> effectively doing decoding in a helper function, incidentally,
> although I'm not convinced it would end up any nicer overall.

I do think the preferred way would be to extract rd, rm, size, and Q
up-front, rather than having the helper twiddle instruction bits.

-Nathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function
  2011-02-11 17:03     ` Nathan Froyd
@ 2011-02-11 17:12       ` Peter Maydell
  2011-02-11 17:14         ` Nathan Froyd
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2011-02-11 17:12 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: qemu-devel, patches

On 11 February 2011 17:03, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On Fri, Feb 11, 2011 at 04:53:30PM +0000, Peter Maydell wrote:
>> On 11 February 2011 16:14, Peter Maydell <peter.maydell@linaro.org> wrote:
>> > +void HELPER(neon_unzip)(CPUState *env, uint32_t insn)
>> > +{
>> > +    int rd = ((insn >> 18) & 0x10) | ((insn >> 12) & 0x0f);
>> > +    int rm = ((insn >> 1) & 0x10) | (insn & 0x0f);
>> > +    int size = (insn >> 18) & 3;
>> > +    if (insn & 0x40) { /* Q */
>> > +        uint64_t zm0 = float64_val(env->vfp.regs[rm]);
>> > +        uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
>> > +        uint64_t zd0 = float64_val(env->vfp.regs[rd]);
>> > +        uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
>>
>> I can rework these patches if people don't like the way this is
>> effectively doing decoding in a helper function, incidentally,
>> although I'm not convinced it would end up any nicer overall.
>
> I do think the preferred way would be to extract rd, rm, size, and Q
> up-front, rather than having the helper twiddle instruction bits.

OK. You're happy to still have the helper do the reading and
writing of env->vfp.regs[] directly, though?

-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function
  2011-02-11 17:12       ` Peter Maydell
@ 2011-02-11 17:14         ` Nathan Froyd
  0 siblings, 0 replies; 7+ messages in thread
From: Nathan Froyd @ 2011-02-11 17:14 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, patches

On Fri, Feb 11, 2011 at 05:12:32PM +0000, Peter Maydell wrote:
> On 11 February 2011 17:03, Nathan Froyd <froydnj@codesourcery.com> wrote:
> > I do think the preferred way would be to extract rd, rm, size, and Q
> > up-front, rather than having the helper twiddle instruction bits.
> 
> OK. You're happy to still have the helper do the reading and
> writing of env->vfp.regs[] directly, though?

I think you can make a case either way, but you're passing enough values
already that accessing env->vfp.regs directly in the helper seems
reasonable.

-Nathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-11 17:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-11 16:13 [Qemu-devel] [PATCH 0/2] target-arm: fix Neon VUZP, VZIP instructions Peter Maydell
2011-02-11 16:14 ` [Qemu-devel] [PATCH 1/2] target-arm: Move Neon VUZP to a helper function Peter Maydell
2011-02-11 16:53   ` Peter Maydell
2011-02-11 17:03     ` Nathan Froyd
2011-02-11 17:12       ` Peter Maydell
2011-02-11 17:14         ` Nathan Froyd
2011-02-11 16:14 ` [Qemu-devel] [PATCH 2/2] target-arm: Move Neon VZIP " Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.