[Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions
@ 2019-04-17 15:33 ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize and refactor MSA instructions ILVEV.<B|H|W|D>,
ILVOD.<B|H|W|D>, ILVL.<B|H|W|D> and ILVR.<B|H|W|D>.

v7:
 - Use tcg constants, instead of uint64_t constants in
   ILVEV.<B|H> and ILVOD.<B|H> instructions.
 - Refactor gen_ilvod_b and gen_ilvod_h functions. Use
   the shared function gen_ilvod_bh, which has two extra
   arguments mask and shift, because mask and shift are
   the only differences in the implementation of those
   two functions.
   Same applies for gen_ilvev_b and gen_ilvev_h.
 - Use assigning uint64_t constant values to the bit mask,
   instead of shifting the bit mask, in ILVR.<H|W> and
   ILVL.<H|W> instructions.
 - Use only one helper for ILVEV.D and ILVR.D instructions,
   because they are equivalent. 
   Same applies for ILVOD.D and ILVL.D.
 - Minor changes in the commit messages.

v6:
 - Add ILVL.<B|H|W|D> and ILVR.<B|H|W|D> MSA instructions
   with mixed approaches (with helpers and with tcg
   registers).
 - Test the performance for ILVL.<B|H|W|D> and
   ILVR.<B|H|W|D> MSA instructions, with helpers,
   with tcg and with the mixed approach.
 - Use a tcg register instead of an int variable for
   storing a constant value of the mask (for logic
   operations).
 - Eliminate some unnecessary tcg_gen calls.
 - Changes in commit messages and the cover letter.

v5:
 - Use tcg_gen_deposit function.
 - Added performance number for no-deposit and
   with-deposit cases of ILVEV.W.
 - Minor changes in commit messages and the cover letter.

v4:
 - Clean up typing errors.
 - Change the commit message and the cover letter.
 - Fix bug for ILVEV.D, in case where the destination
   and one of the sources are the same register.

v3:
 - Reduce the number of logic operations to a
   minimum.
 - Add comments.

v2:
 - Minor changes in commit messages and the cover letter.

Mateja Marjanovic (6):
  target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions
  target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions
  target/mips: Optimize ILVL.<B|H|W|D> MSA instructions
  target/mips: Optimize ILVR.<B|H|W|D> MSA instructions
  target/mips: Merge implementation of ILVEV.D and ILVR.D
  target/mips: Merge implementation of ILVOD.D and ILVL.D

 target/mips/helper.h     |   7 +-
 target/mips/msa_helper.c |  82 +++++++----
 target/mips/translate.c  | 376 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 425 insertions(+), 40 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions
@ 2019-04-17 15:33 ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize and refactor MSA instructions ILVEV.<B|H|W|D>,
ILVOD.<B|H|W|D>, ILVL.<B|H|W|D> and ILVR.<B|H|W|D>.

v7:
 - Use tcg constants, instead of uint64_t constants in
   ILVEV.<B|H> and ILVOD.<B|H> instructions.
 - Refactor gen_ilvod_b and gen_ilvod_h functions. Use
   the shared function gen_ilvod_bh, which has two extra
   arguments mask and shift, because mask and shift are
   the only differences in the implementation of those
   two functions.
   Same applies for gen_ilvev_b and gen_ilvev_h.
 - Use assigning uint64_t constant values to the bit mask,
   instead of shifting the bit mask, in ILVR.<H|W> and
   ILVL.<H|W> instructions.
 - Use only one helper for ILVEV.D and ILVR.D instructions,
   because they are equivalent. 
   Same applies for ILVOD.D and ILVL.D.
 - Minor changes in the commit messages.

v6:
 - Add ILVL.<B|H|W|D> and ILVR.<B|H|W|D> MSA instructions
   with mixed approaches (with helpers and with tcg
   registers).
 - Test the performance for ILVL.<B|H|W|D> and
   ILVR.<B|H|W|D> MSA instructions, with helpers,
   with tcg and with the mixed approach.
 - Use a tcg register instead of an int variable for
   storing a constant value of the mask (for logic
   operations).
 - Eliminate some unnecessary tcg_gen calls.
 - Changes in commit messages and the cover letter.

v5:
 - Use tcg_gen_deposit function.
 - Added performance number for no-deposit and
   with-deposit cases of ILVEV.W.
 - Minor changes in commit messages and the cover letter.

v4:
 - Clean up typing errors.
 - Change the commit message and the cover letter.
 - Fix bug for ILVEV.D, in case where the destination
   and one of the sources are the same register.

v3:
 - Reduce the number of logic operations to a
   minimum.
 - Add comments.

v2:
 - Minor changes in commit messages and the cover letter.

Mateja Marjanovic (6):
  target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions
  target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions
  target/mips: Optimize ILVL.<B|H|W|D> MSA instructions
  target/mips: Optimize ILVR.<B|H|W|D> MSA instructions
  target/mips: Merge implementation of ILVEV.D and ILVR.D
  target/mips: Merge implementation of ILVOD.D and ILVL.D

 target/mips/helper.h     |   7 +-
 target/mips/msa_helper.c |  82 +++++++----
 target/mips/translate.c  | 376 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 425 insertions(+), 40 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize set of MSA instructions ILVOD.<B|H|W|D>, using
directly tcg registers and performing logic on them instead
of using helpers.

In the following table, the first column is the performance
before this patch. The second represents the performance
after converting from helpers to tcg, but without using
tcg_gen_deposit function. The third one is with the deposit
function and with using a uint64_t constant bit mask, and
the fourth is with the deposit function and with a mask
which is a tcg constant. The fourth is implemented in this
patch.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

==================================================================
|| instruction ||     1     ||     2    ||     3    ||     4    ||
==================================================================
||   ilvod.b   || 117.50 ms || 24.13 ms || 24.45 ms || 23.24 ms ||
||   ilvod.h   ||  93.16 ms || 24.21 ms || 24.28 ms || 23.20 ms ||
||   ilvod.w   || 119.90 ms || 24.15 ms || 23.19 ms || 22.95 ms ||
||   ilvod.d   ||  43.01 ms || 21.17 ms || 23.07 ms || 22.59 ms ||
==================================================================
1 - before
2 - no-deposit-no-mask-as-tcg-constant
3 - with-deposit-no-mask-as-tcg-constant
4 - with-deposit-with-mask-as-tcg-constant (final)

The deposit function is used only in ILVOD.W.

No-deposit version of the ILVOD.W implementation:

static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
                               uint32_t ws, uint32_t wt)
{
    TCGv_i64 t1 = tcg_temp_new_i64();
    TCGv_i64 t2 = tcg_temp_new_i64();
    TCGv_i64 mask = tcg_const_i64(0xffffffff00000000ULL);

    tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
    tcg_gen_shri_i64(t1, t1, 32);
    tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);

    tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask);
    tcg_gen_shri_i64(t1, t1, 32);
    tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask);
    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);

    tcg_temp_free_i64(mask);
    tcg_temp_free_i64(t1);
    tcg_temp_free_i64(t2);
}

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |  1 -
 target/mips/msa_helper.c |  7 ----
 target/mips/translate.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 2863f60..02e16c7 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -865,7 +865,6 @@ DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvod_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index 6c57281..a7ea6aa 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1206,13 +1206,6 @@ MSA_FN_DF(ilvr_df)
 MSA_FN_DF(ilvev_df)
 #undef MSA_DO
 
-#define MSA_DO(DF)                          \
-    do {                                    \
-        pwx->DF[2*i]   = pwt->DF[2*i+1];    \
-        pwx->DF[2*i+1] = pws->DF[2*i+1];    \
-    } while (0)
-MSA_FN_DF(ilvod_df)
-#undef MSA_DO
 #undef MSA_LOOP_COND
 
 #define MSA_LOOP_COND(DF) \
diff --git a/target/mips/translate.c b/target/mips/translate.c
index bba8b6c..4c8fef0 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28884,6 +28884,80 @@ static void gen_msa_bit(CPUMIPSState *env, DisasContext *ctx)
     tcg_temp_free_i32(tws);
 }
 
+/*
+ * [MSA] ILVOD.<B|H> wd, ws, wt
+ *
+ *   Vector Interleave Odd (<byte|halfword> data elements)
+ *
+ */
+static inline void gen_ilvod_bh(CPUMIPSState *env, uint32_t wd,
+                                uint32_t ws, uint32_t wt,
+                                uint64_t mask, uint32_t shift)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 mask_tcg = tcg_const_i64(mask);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask_tcg);
+    tcg_gen_shri_i64(t1, t1, shift);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask_tcg);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask_tcg);
+    tcg_gen_shri_i64(t1, t1, shift);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask_tcg);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
+
+    tcg_temp_free_i64(mask_tcg);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static inline void gen_ilvod_b(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvod_bh(env, wd, ws, wt, 0xff00ff00ff00ff00ULL, 8);
+}
+
+static inline void gen_ilvod_h(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvod_bh(env, wd, ws, wt, 0xffff0000ffff0000ULL, 16);
+}
+
+/*
+ * [MSA] ILVOD.W wd, ws, wt
+ *
+ *   Vector Interleave Odd (word data elements)
+ *
+ */
+static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_shri_i64(t1, msa_wr_d[wt * 2], 32);
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2], msa_wr_d[ws * 2], t1, 0, 32);
+
+    tcg_gen_shri_i64(t1, msa_wr_d[wt * 2 + 1], 32);
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1], t1, 0, 32);
+
+    tcg_temp_free_i64(t1);
+}
+
+/*
+ * [MSA] ILVOD.D wd, ws, wt
+ *
+ *   Vector Interleave Odd (doubleword data elements)
+ *
+ */
+static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
+}
+
 static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
 {
 #define MASK_MSA_3R(op)    (MASK_MSA_MINOR(op) | (op & (0x7 << 23)))
@@ -29055,7 +29129,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_mod_u_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVOD_df:
-        gen_helper_msa_ilvod_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_ilvod_b(env, wd, ws, wt);
+            break;
+        case DF_HALF:
+            gen_ilvod_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvod_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvod_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
 
     case OPC_DOTP_S_df:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize set of MSA instructions ILVOD.<B|H|W|D>, using
directly tcg registers and performing logic on them instead
of using helpers.

In the following table, the first column is the performance
before this patch. The second represents the performance
after converting from helpers to tcg, but without using
tcg_gen_deposit function. The third one is with the deposit
function and with using a uint64_t constant bit mask, and
the fourth is with the deposit function and with a mask
which is a tcg constant. The fourth is implemented in this
patch.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

==================================================================
|| instruction ||     1     ||     2    ||     3    ||     4    ||
==================================================================
||   ilvod.b   || 117.50 ms || 24.13 ms || 24.45 ms || 23.24 ms ||
||   ilvod.h   ||  93.16 ms || 24.21 ms || 24.28 ms || 23.20 ms ||
||   ilvod.w   || 119.90 ms || 24.15 ms || 23.19 ms || 22.95 ms ||
||   ilvod.d   ||  43.01 ms || 21.17 ms || 23.07 ms || 22.59 ms ||
==================================================================
1 - before
2 - no-deposit-no-mask-as-tcg-constant
3 - with-deposit-no-mask-as-tcg-constant
4 - with-deposit-with-mask-as-tcg-constant (final)

The deposit function is used only in ILVOD.W.

No-deposit version of the ILVOD.W implementation:

static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
                               uint32_t ws, uint32_t wt)
{
    TCGv_i64 t1 = tcg_temp_new_i64();
    TCGv_i64 t2 = tcg_temp_new_i64();
    TCGv_i64 mask = tcg_const_i64(0xffffffff00000000ULL);

    tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
    tcg_gen_shri_i64(t1, t1, 32);
    tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);

    tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask);
    tcg_gen_shri_i64(t1, t1, 32);
    tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask);
    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);

    tcg_temp_free_i64(mask);
    tcg_temp_free_i64(t1);
    tcg_temp_free_i64(t2);
}

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |  1 -
 target/mips/msa_helper.c |  7 ----
 target/mips/translate.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 2863f60..02e16c7 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -865,7 +865,6 @@ DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvod_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index 6c57281..a7ea6aa 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1206,13 +1206,6 @@ MSA_FN_DF(ilvr_df)
 MSA_FN_DF(ilvev_df)
 #undef MSA_DO
 
-#define MSA_DO(DF)                          \
-    do {                                    \
-        pwx->DF[2*i]   = pwt->DF[2*i+1];    \
-        pwx->DF[2*i+1] = pws->DF[2*i+1];    \
-    } while (0)
-MSA_FN_DF(ilvod_df)
-#undef MSA_DO
 #undef MSA_LOOP_COND
 
 #define MSA_LOOP_COND(DF) \
diff --git a/target/mips/translate.c b/target/mips/translate.c
index bba8b6c..4c8fef0 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28884,6 +28884,80 @@ static void gen_msa_bit(CPUMIPSState *env, DisasContext *ctx)
     tcg_temp_free_i32(tws);
 }
 
+/*
+ * [MSA] ILVOD.<B|H> wd, ws, wt
+ *
+ *   Vector Interleave Odd (<byte|halfword> data elements)
+ *
+ */
+static inline void gen_ilvod_bh(CPUMIPSState *env, uint32_t wd,
+                                uint32_t ws, uint32_t wt,
+                                uint64_t mask, uint32_t shift)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 mask_tcg = tcg_const_i64(mask);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask_tcg);
+    tcg_gen_shri_i64(t1, t1, shift);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask_tcg);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask_tcg);
+    tcg_gen_shri_i64(t1, t1, shift);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask_tcg);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
+
+    tcg_temp_free_i64(mask_tcg);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static inline void gen_ilvod_b(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvod_bh(env, wd, ws, wt, 0xff00ff00ff00ff00ULL, 8);
+}
+
+static inline void gen_ilvod_h(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvod_bh(env, wd, ws, wt, 0xffff0000ffff0000ULL, 16);
+}
+
+/*
+ * [MSA] ILVOD.W wd, ws, wt
+ *
+ *   Vector Interleave Odd (word data elements)
+ *
+ */
+static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_shri_i64(t1, msa_wr_d[wt * 2], 32);
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2], msa_wr_d[ws * 2], t1, 0, 32);
+
+    tcg_gen_shri_i64(t1, msa_wr_d[wt * 2 + 1], 32);
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1], t1, 0, 32);
+
+    tcg_temp_free_i64(t1);
+}
+
+/*
+ * [MSA] ILVOD.D wd, ws, wt
+ *
+ *   Vector Interleave Odd (doubleword data elements)
+ *
+ */
+static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
+}
+
 static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
 {
 #define MASK_MSA_3R(op)    (MASK_MSA_MINOR(op) | (op & (0x7 << 23)))
@@ -29055,7 +29129,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_mod_u_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVOD_df:
-        gen_helper_msa_ilvod_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_ilvod_b(env, wd, ws, wt);
+            break;
+        case DF_HALF:
+            gen_ilvod_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvod_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvod_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
 
     case OPC_DOTP_S_df:
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 2/6] target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize set of MSA instructions ILVEV.<B|H|W|D>, using
directly tcg registers and performing logic on them
instead of using helpers.

In the following table, the first column is the performance
before this patch. The second represents the performance
after converting from helpers to tcg, but without using
tcg_gen_deposit function. The third one is with using the
tcg_gen_deposit function and with using a uint64_t constant
bit mask, and the fourth is with using the tcg_gen_deposit
function and with a mask which is a tcg constant. The fourth
is implemented in this patch.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

==================================================================
|| instruction ||     1     ||     2    ||     3    ||     4    ||
==================================================================
||   ilvev.b   || 126.92 ms || 24.52 ms || 25.19 ms || 23.89 ms ||
||   ilvev.h   ||  93.67 ms || 23.92 ms || 24.76 ms || 24.31 ms ||
||   ilvev.w   || 117.86 ms || 23.83 ms || 21.84 ms || 21.99 ms ||
||   ilvev.d   ||  45.49 ms || 19.74 ms || 20.21 ms || 20.07 ms ||
==================================================================
1 - before
2 - no-deposit-no-mask-as-tcg-constant
3 - with-deposit-no-mask-as-tcg-constant
4 - with-deposit-with-mask-as-tcg-constant (final)

The deposit function is used only in ILVEV.W.

No-deposit version of the ILVEV.W implementation:

static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
                               uint32_t ws, uint32_t wt)
{
    TCGv_i64 t1 = tcg_temp_new_i64();
    TCGv_i64 t2 = tcg_temp_new_i64();
    uint64_t mask = 0x00000000ffffffffULL;

    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
    tcg_gen_shli_i64(t2, t2, 32);
    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);

    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask);
    tcg_gen_shli_i64(t2, t2, 32);
    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);

    tcg_temp_free_i64(t1);
    tcg_temp_free_i64(t2);
}

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |  1 -
 target/mips/msa_helper.c |  9 -----
 target/mips/translate.c  | 87 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 02e16c7..82f6a40 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -864,7 +864,6 @@ DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index a7ea6aa..d5c3842 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1197,15 +1197,6 @@ MSA_FN_DF(ilvl_df)
     } while (0)
 MSA_FN_DF(ilvr_df)
 #undef MSA_DO
-
-#define MSA_DO(DF)                      \
-    do {                                \
-        pwx->DF[2*i]   = pwt->DF[2*i];  \
-        pwx->DF[2*i+1] = pws->DF[2*i];  \
-    } while (0)
-MSA_FN_DF(ilvev_df)
-#undef MSA_DO
-
 #undef MSA_LOOP_COND
 
 #define MSA_LOOP_COND(DF) \
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 4c8fef0..95bcc65 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28958,6 +28958,76 @@ static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd,
     tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
 }
 
+
+/*
+ * [MSA] ILVEV.<B|H> wd, ws, wt
+ *
+ *   Vector Interleave Even (<byte|halfword> data elements)
+ *
+ */
+static inline void gen_ilvev_bh(CPUMIPSState *env, uint32_t wd,
+                                uint32_t ws, uint32_t wt,
+                                uint64_t mask, uint32_t shift)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 mask_tcg = tcg_const_i64(mask);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask_tcg);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask_tcg);
+    tcg_gen_shli_i64(t2, t2, shift);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask_tcg);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask_tcg);
+    tcg_gen_shli_i64(t2, t2, shift);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
+
+    tcg_temp_free_i64(mask_tcg);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvev_bh(env, wd, ws, wt, 0x00ff00ff00ff00ffULL, 8);
+}
+
+static inline void gen_ilvev_h(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvev_bh(env, wd, ws, wt, 0x0000ffff0000ffffULL, 16);
+}
+
+/*
+ * [MSA] ILVEV.W wd, ws, wt
+ *
+ *   Vector Interleave Even (word data elements)
+ *
+ */
+static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2],
+                        msa_wr_d[ws * 2], 32, 32);
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[wt * 2 + 1],
+                        msa_wr_d[ws * 2 + 1], 32, 32);
+}
+
+/*
+ * [MSA] ILVEV.D wd, ws, wt
+ *
+ *   Vector Interleave Even (Doubleword data elements)
+ *
+ */
+static inline void gen_ilvev_d(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]);
+}
+
 static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
 {
 #define MASK_MSA_3R(op)    (MASK_MSA_MINOR(op) | (op & (0x7 << 23)))
@@ -29114,7 +29184,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_mod_s_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVEV_df:
-        gen_helper_msa_ilvev_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_ilvev_b(env, wd, ws, wt);
+            break;
+        case DF_HALF:
+            gen_ilvev_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvev_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvev_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
     case OPC_BINSR_df:
         gen_helper_msa_binsr_df(cpu_env, tdf, twd, tws, twt);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 2/6] target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize set of MSA instructions ILVEV.<B|H|W|D>, using
directly tcg registers and performing logic on them
instead of using helpers.

In the following table, the first column is the performance
before this patch. The second represents the performance
after converting from helpers to tcg, but without using
tcg_gen_deposit function. The third one is with using the
tcg_gen_deposit function and with using a uint64_t constant
bit mask, and the fourth is with using the tcg_gen_deposit
function and with a mask which is a tcg constant. The fourth
is implemented in this patch.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

==================================================================
|| instruction ||     1     ||     2    ||     3    ||     4    ||
==================================================================
||   ilvev.b   || 126.92 ms || 24.52 ms || 25.19 ms || 23.89 ms ||
||   ilvev.h   ||  93.67 ms || 23.92 ms || 24.76 ms || 24.31 ms ||
||   ilvev.w   || 117.86 ms || 23.83 ms || 21.84 ms || 21.99 ms ||
||   ilvev.d   ||  45.49 ms || 19.74 ms || 20.21 ms || 20.07 ms ||
==================================================================
1 - before
2 - no-deposit-no-mask-as-tcg-constant
3 - with-deposit-no-mask-as-tcg-constant
4 - with-deposit-with-mask-as-tcg-constant (final)

The deposit function is used only in ILVEV.W.

No-deposit version of the ILVEV.W implementation:

static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
                               uint32_t ws, uint32_t wt)
{
    TCGv_i64 t1 = tcg_temp_new_i64();
    TCGv_i64 t2 = tcg_temp_new_i64();
    uint64_t mask = 0x00000000ffffffffULL;

    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
    tcg_gen_shli_i64(t2, t2, 32);
    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);

    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask);
    tcg_gen_shli_i64(t2, t2, 32);
    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);

    tcg_temp_free_i64(t1);
    tcg_temp_free_i64(t2);
}

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |  1 -
 target/mips/msa_helper.c |  9 -----
 target/mips/translate.c  | 87 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 02e16c7..82f6a40 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -864,7 +864,6 @@ DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index a7ea6aa..d5c3842 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1197,15 +1197,6 @@ MSA_FN_DF(ilvl_df)
     } while (0)
 MSA_FN_DF(ilvr_df)
 #undef MSA_DO
-
-#define MSA_DO(DF)                      \
-    do {                                \
-        pwx->DF[2*i]   = pwt->DF[2*i];  \
-        pwx->DF[2*i+1] = pws->DF[2*i];  \
-    } while (0)
-MSA_FN_DF(ilvev_df)
-#undef MSA_DO
-
 #undef MSA_LOOP_COND
 
 #define MSA_LOOP_COND(DF) \
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 4c8fef0..95bcc65 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28958,6 +28958,76 @@ static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd,
     tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
 }
 
+
+/*
+ * [MSA] ILVEV.<B|H> wd, ws, wt
+ *
+ *   Vector Interleave Even (<byte|halfword> data elements)
+ *
+ */
+static inline void gen_ilvev_bh(CPUMIPSState *env, uint32_t wd,
+                                uint32_t ws, uint32_t wt,
+                                uint64_t mask, uint32_t shift)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 mask_tcg = tcg_const_i64(mask);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask_tcg);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask_tcg);
+    tcg_gen_shli_i64(t2, t2, shift);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
+
+    tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask_tcg);
+    tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask_tcg);
+    tcg_gen_shli_i64(t2, t2, shift);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
+
+    tcg_temp_free_i64(mask_tcg);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvev_bh(env, wd, ws, wt, 0x00ff00ff00ff00ffULL, 8);
+}
+
+static inline void gen_ilvev_h(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    gen_ilvev_bh(env, wd, ws, wt, 0x0000ffff0000ffffULL, 16);
+}
+
+/*
+ * [MSA] ILVEV.W wd, ws, wt
+ *
+ *   Vector Interleave Even (word data elements)
+ *
+ */
+static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2],
+                        msa_wr_d[ws * 2], 32, 32);
+    tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[wt * 2 + 1],
+                        msa_wr_d[ws * 2 + 1], 32, 32);
+}
+
+/*
+ * [MSA] ILVEV.D wd, ws, wt
+ *
+ *   Vector Interleave Even (Doubleword data elements)
+ *
+ */
+static inline void gen_ilvev_d(CPUMIPSState *env, uint32_t wd,
+                               uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]);
+}
+
 static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
 {
 #define MASK_MSA_3R(op)    (MASK_MSA_MINOR(op) | (op & (0x7 << 23)))
@@ -29114,7 +29184,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_mod_s_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVEV_df:
-        gen_helper_msa_ilvev_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_ilvev_b(env, wd, ws, wt);
+            break;
+        case DF_HALF:
+            gen_ilvev_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvev_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvev_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
     case OPC_BINSR_df:
         gen_helper_msa_binsr_df(cpu_env, tdf, twd, tws, twt);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 3/6] target/mips: Optimize ILVL.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize ILVL.<B|H|W|D> instructions, using a hybrid
approach. For byte data elements, use a helper with an
unrolled loop (having much better performance than
direct tcg translation), for halfword, word and
doubleword data elements use directly tcg registers
and logic performed on them.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

============================================================
||instruction||  helper  ||   tcg    ||      hybrid       ||
============================================================
||  ilvl.b   || 59.91 ms || 74.41 ms || 60.50 ms (helper) ||
||  ilvl.h   || 41.33 ms || 33.08 ms || 33.34 ms (tcg)    ||
||  ilvl.w   || 30.99 ms || 22.87 ms || 23.19 ms (tcg)    ||
||  ilvl.d   || 26.40 ms || 19.64 ms || 20.49 ms (tcg)    ||
============================================================

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |   3 +-
 target/mips/msa_helper.c |  33 +++++++++++----
 target/mips/translate.c  | 106 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 132 insertions(+), 10 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 82f6a40..f737a0e 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -862,7 +862,6 @@ DEF_HELPER_5(msa_sld_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_splat_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
@@ -934,6 +933,8 @@ DEF_HELPER_4(msa_pcnt_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nloc_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nlzc_df, void, env, i32, i32, i32)
 
+DEF_HELPER_4(msa_ilvl_b, void, env, i32, i32, i32)
+
 DEF_HELPER_4(msa_copy_s_b, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_copy_s_h, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_copy_s_w, void, env, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index d5c3842..20ce447 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1184,14 +1184,6 @@ MSA_FN_DF(pckod_df)
 
 #define MSA_DO(DF)                      \
     do {                                \
-        pwx->DF[2*i]   = L##DF(pwt, i); \
-        pwx->DF[2*i+1] = L##DF(pws, i); \
-    } while (0)
-MSA_FN_DF(ilvl_df)
-#undef MSA_DO
-
-#define MSA_DO(DF)                      \
-    do {                                \
         pwx->DF[2*i]   = R##DF(pwt, i); \
         pwx->DF[2*i+1] = R##DF(pws, i); \
     } while (0)
@@ -1214,6 +1206,31 @@ MSA_FN_DF(vshf_df)
 #undef MSA_LOOP_COND
 #undef MSA_FN_DF
 
+void helper_msa_ilvl_b(CPUMIPSState *env, uint32_t wd,
+                       uint32_t ws, uint32_t wt)
+{
+    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
+    wr_t *pws = &(env->active_fpu.fpr[ws].wr);
+    wr_t *pwt = &(env->active_fpu.fpr[wt].wr);
+
+    pwd->b[0]  = pwt->b[8];
+    pwd->b[1]  = pws->b[8];
+    pwd->b[2]  = pwt->b[9];
+    pwd->b[3]  = pws->b[9];
+    pwd->b[4]  = pwt->b[10];
+    pwd->b[5]  = pws->b[10];
+    pwd->b[6]  = pwt->b[11];
+    pwd->b[7]  = pws->b[11];
+    pwd->b[8]  = pwt->b[12];
+    pwd->b[9]  = pws->b[12];
+    pwd->b[10] = pwt->b[13];
+    pwd->b[11] = pws->b[13];
+    pwd->b[12] = pwt->b[14];
+    pwd->b[13] = pws->b[14];
+    pwd->b[14] = pwt->b[15];
+    pwd->b[15] = pws->b[15];
+}
+
 void helper_msa_sldi_df(CPUMIPSState *env, uint32_t df, uint32_t wd,
                         uint32_t ws, uint32_t n)
 {
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 95bcc65..ee60f29 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28885,6 +28885,95 @@ static void gen_msa_bit(CPUMIPSState *env, DisasContext *ctx)
 }
 
 /*
+ * [MSA] ILVL.H wd, ws, wt
+ *
+ *   Vector Interleave Left (halfword data elements)
+ *
+ */
+static inline void gen_ilvl_h(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x000000000000ffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0x00000000ffff0000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0x0000ffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0xffff000000000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVL.W wd, ws, wt
+ *
+ *   Vector Interleave Left (word data elements)
+ *
+ */
+static inline void gen_ilvl_w(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x00000000ffffffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0xffffffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVL.D wd, ws, wt
+ *
+ *   Vector Interleave Left (doubleword data elements)
+ *
+ */
+static inline void gen_ilvl_d(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
+}
+
+/*
  * [MSA] ILVOD.<B|H> wd, ws, wt
  *
  *   Vector Interleave Odd (<byte|halfword> data elements)
@@ -29148,7 +29237,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_div_s_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVL_df:
-        gen_helper_msa_ilvl_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_helper_msa_ilvl_b(cpu_env, twd, tws, twt);
+            break;
+        case DF_HALF:
+            gen_ilvl_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvl_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvl_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
     case OPC_BNEG_df:
         gen_helper_msa_bneg_df(cpu_env, tdf, twd, tws, twt);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 3/6] target/mips: Optimize ILVL.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize ILVL.<B|H|W|D> instructions, using a hybrid
approach. For byte data elements, use a helper with an
unrolled loop (having much better performance than
direct tcg translation), for halfword, word and
doubleword data elements use directly tcg registers
and logic performed on them.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

============================================================
||instruction||  helper  ||   tcg    ||      hybrid       ||
============================================================
||  ilvl.b   || 59.91 ms || 74.41 ms || 60.50 ms (helper) ||
||  ilvl.h   || 41.33 ms || 33.08 ms || 33.34 ms (tcg)    ||
||  ilvl.w   || 30.99 ms || 22.87 ms || 23.19 ms (tcg)    ||
||  ilvl.d   || 26.40 ms || 19.64 ms || 20.49 ms (tcg)    ||
============================================================

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |   3 +-
 target/mips/msa_helper.c |  33 +++++++++++----
 target/mips/translate.c  | 106 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 132 insertions(+), 10 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 82f6a40..f737a0e 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -862,7 +862,6 @@ DEF_HELPER_5(msa_sld_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_splat_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
@@ -934,6 +933,8 @@ DEF_HELPER_4(msa_pcnt_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nloc_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nlzc_df, void, env, i32, i32, i32)
 
+DEF_HELPER_4(msa_ilvl_b, void, env, i32, i32, i32)
+
 DEF_HELPER_4(msa_copy_s_b, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_copy_s_h, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_copy_s_w, void, env, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index d5c3842..20ce447 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1184,14 +1184,6 @@ MSA_FN_DF(pckod_df)
 
 #define MSA_DO(DF)                      \
     do {                                \
-        pwx->DF[2*i]   = L##DF(pwt, i); \
-        pwx->DF[2*i+1] = L##DF(pws, i); \
-    } while (0)
-MSA_FN_DF(ilvl_df)
-#undef MSA_DO
-
-#define MSA_DO(DF)                      \
-    do {                                \
         pwx->DF[2*i]   = R##DF(pwt, i); \
         pwx->DF[2*i+1] = R##DF(pws, i); \
     } while (0)
@@ -1214,6 +1206,31 @@ MSA_FN_DF(vshf_df)
 #undef MSA_LOOP_COND
 #undef MSA_FN_DF
 
+void helper_msa_ilvl_b(CPUMIPSState *env, uint32_t wd,
+                       uint32_t ws, uint32_t wt)
+{
+    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
+    wr_t *pws = &(env->active_fpu.fpr[ws].wr);
+    wr_t *pwt = &(env->active_fpu.fpr[wt].wr);
+
+    pwd->b[0]  = pwt->b[8];
+    pwd->b[1]  = pws->b[8];
+    pwd->b[2]  = pwt->b[9];
+    pwd->b[3]  = pws->b[9];
+    pwd->b[4]  = pwt->b[10];
+    pwd->b[5]  = pws->b[10];
+    pwd->b[6]  = pwt->b[11];
+    pwd->b[7]  = pws->b[11];
+    pwd->b[8]  = pwt->b[12];
+    pwd->b[9]  = pws->b[12];
+    pwd->b[10] = pwt->b[13];
+    pwd->b[11] = pws->b[13];
+    pwd->b[12] = pwt->b[14];
+    pwd->b[13] = pws->b[14];
+    pwd->b[14] = pwt->b[15];
+    pwd->b[15] = pws->b[15];
+}
+
 void helper_msa_sldi_df(CPUMIPSState *env, uint32_t df, uint32_t wd,
                         uint32_t ws, uint32_t n)
 {
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 95bcc65..ee60f29 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28885,6 +28885,95 @@ static void gen_msa_bit(CPUMIPSState *env, DisasContext *ctx)
 }
 
 /*
+ * [MSA] ILVL.H wd, ws, wt
+ *
+ *   Vector Interleave Left (halfword data elements)
+ *
+ */
+static inline void gen_ilvl_h(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x000000000000ffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0x00000000ffff0000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0x0000ffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0xffff000000000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVL.W wd, ws, wt
+ *
+ *   Vector Interleave Left (word data elements)
+ *
+ */
+static inline void gen_ilvl_w(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x00000000ffffffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0xffffffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2 + 1], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVL.D wd, ws, wt
+ *
+ *   Vector Interleave Left (doubleword data elements)
+ *
+ */
+static inline void gen_ilvl_d(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
+}
+
+/*
  * [MSA] ILVOD.<B|H> wd, ws, wt
  *
  *   Vector Interleave Odd (<byte|halfword> data elements)
@@ -29148,7 +29237,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_div_s_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVL_df:
-        gen_helper_msa_ilvl_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_helper_msa_ilvl_b(cpu_env, twd, tws, twt);
+            break;
+        case DF_HALF:
+            gen_ilvl_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvl_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvl_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
     case OPC_BNEG_df:
         gen_helper_msa_bneg_df(cpu_env, tdf, twd, tws, twt);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize ILVR.<B|H|W|D> instructions, using a hybrid
approach. For byte data elements, use a helper with an
unrolled loop (having much better performance than
direct tcg translation), for halfword, word and
doubleword data elements use directly tcg registers
and logic performed on them.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

============================================================
||instruction||  helper  ||   tcg    ||      hybrid       ||
============================================================
||  ilvr.b   || 62.87 ms || 74.76 ms || 61.49 ms (helper) ||
||  ilvr.h   || 44.11 ms || 33.00 ms || 33.69 ms (tcg)    ||
||  ilvr.w   || 34.97 ms || 23.06 ms || 23.01 ms (tcg)    ||
||  ilvr.d   || 27.33 ms || 19.87 ms || 19.65 ms (tcg)    ||
============================================================

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |   2 +-
 target/mips/msa_helper.c |  33 +++++++++++----
 target/mips/translate.c  | 107 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 132 insertions(+), 10 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index f737a0e..4aee81c 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -862,7 +862,6 @@ DEF_HELPER_5(msa_sld_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_splat_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
@@ -933,6 +932,7 @@ DEF_HELPER_4(msa_pcnt_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nloc_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nlzc_df, void, env, i32, i32, i32)
 
+DEF_HELPER_4(msa_ilvr_b, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_ilvl_b, void, env, i32, i32, i32)
 
 DEF_HELPER_4(msa_copy_s_b, void, env, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index 20ce447..fedd9b5 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1181,14 +1181,6 @@ MSA_FN_DF(pckev_df)
     } while (0)
 MSA_FN_DF(pckod_df)
 #undef MSA_DO
-
-#define MSA_DO(DF)                      \
-    do {                                \
-        pwx->DF[2*i]   = R##DF(pwt, i); \
-        pwx->DF[2*i+1] = R##DF(pws, i); \
-    } while (0)
-MSA_FN_DF(ilvr_df)
-#undef MSA_DO
 #undef MSA_LOOP_COND
 
 #define MSA_LOOP_COND(DF) \
@@ -1206,6 +1198,31 @@ MSA_FN_DF(vshf_df)
 #undef MSA_LOOP_COND
 #undef MSA_FN_DF
 
+void helper_msa_ilvr_b(CPUMIPSState *env, uint32_t wd,
+                       uint32_t ws, uint32_t wt)
+{
+    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
+    wr_t *pws = &(env->active_fpu.fpr[ws].wr);
+    wr_t *pwt = &(env->active_fpu.fpr[wt].wr);
+
+    pwd->b[15] = pws->b[7];
+    pwd->b[14] = pwt->b[7];
+    pwd->b[13] = pws->b[6];
+    pwd->b[12] = pwt->b[6];
+    pwd->b[11] = pws->b[5];
+    pwd->b[10] = pwt->b[5];
+    pwd->b[9]  = pws->b[4];
+    pwd->b[8]  = pwt->b[4];
+    pwd->b[7]  = pws->b[3];
+    pwd->b[6]  = pwt->b[3];
+    pwd->b[5]  = pws->b[2];
+    pwd->b[4]  = pwt->b[2];
+    pwd->b[3]  = pws->b[1];
+    pwd->b[2]  = pwt->b[1];
+    pwd->b[1]  = pws->b[0];
+    pwd->b[0]  = pwt->b[0];
+}
+
 void helper_msa_ilvl_b(CPUMIPSState *env, uint32_t wd,
                        uint32_t ws, uint32_t wt)
 {
diff --git a/target/mips/translate.c b/target/mips/translate.c
index ee60f29..4c7b076 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28885,6 +28885,96 @@ static void gen_msa_bit(CPUMIPSState *env, DisasContext *ctx)
 }
 
 /*
+ * [MSA] ILVR.H wd, ws, wt
+ *
+ *   Vector Interleave Right (halfword data elements)
+ *
+ */
+static inline void gen_ilvr_h(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x000000000000ffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0x00000000ffff0000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0x0000ffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0xffff000000000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVR.W wd, ws, wt
+ *
+ *   Vector Interleave Right (word data elements)
+ *
+ */
+static inline void gen_ilvr_w(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x00000000ffffffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0xffffffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVR.D wd, ws, wt
+ *
+ *   Vector Interleave Right (doubleword data elements)
+ *
+ */
+static inline void gen_ilvr_d(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]);
+}
+
+
+/*
  * [MSA] ILVL.H wd, ws, wt
  *
  *   Vector Interleave Left (halfword data elements)
@@ -29273,7 +29363,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_div_u_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVR_df:
-        gen_helper_msa_ilvr_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_helper_msa_ilvr_b(cpu_env, twd, tws, twt);
+            break;
+        case DF_HALF:
+            gen_ilvr_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvr_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvr_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
     case OPC_BINSL_df:
         gen_helper_msa_binsl_df(cpu_env, tdf, twd, tws, twt);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

Optimize ILVR.<B|H|W|D> instructions, using a hybrid
approach. For byte data elements, use a helper with an
unrolled loop (having much better performance than
direct tcg translation), for halfword, word and
doubleword data elements use directly tcg registers
and logic performed on them.

Performance measurement is done by executing the
instructions 10 million times on a computer
with Intel Core i7-3770 CPU @ 3.40GHz×8.

============================================================
||instruction||  helper  ||   tcg    ||      hybrid       ||
============================================================
||  ilvr.b   || 62.87 ms || 74.76 ms || 61.49 ms (helper) ||
||  ilvr.h   || 44.11 ms || 33.00 ms || 33.69 ms (tcg)    ||
||  ilvr.w   || 34.97 ms || 23.06 ms || 23.01 ms (tcg)    ||
||  ilvr.d   || 27.33 ms || 19.87 ms || 19.65 ms (tcg)    ||
============================================================

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/helper.h     |   2 +-
 target/mips/msa_helper.c |  33 +++++++++++----
 target/mips/translate.c  | 107 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 132 insertions(+), 10 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index f737a0e..4aee81c 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -862,7 +862,6 @@ DEF_HELPER_5(msa_sld_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_splat_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
@@ -933,6 +932,7 @@ DEF_HELPER_4(msa_pcnt_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nloc_df, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_nlzc_df, void, env, i32, i32, i32)
 
+DEF_HELPER_4(msa_ilvr_b, void, env, i32, i32, i32)
 DEF_HELPER_4(msa_ilvl_b, void, env, i32, i32, i32)
 
 DEF_HELPER_4(msa_copy_s_b, void, env, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index 20ce447..fedd9b5 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1181,14 +1181,6 @@ MSA_FN_DF(pckev_df)
     } while (0)
 MSA_FN_DF(pckod_df)
 #undef MSA_DO
-
-#define MSA_DO(DF)                      \
-    do {                                \
-        pwx->DF[2*i]   = R##DF(pwt, i); \
-        pwx->DF[2*i+1] = R##DF(pws, i); \
-    } while (0)
-MSA_FN_DF(ilvr_df)
-#undef MSA_DO
 #undef MSA_LOOP_COND
 
 #define MSA_LOOP_COND(DF) \
@@ -1206,6 +1198,31 @@ MSA_FN_DF(vshf_df)
 #undef MSA_LOOP_COND
 #undef MSA_FN_DF
 
+void helper_msa_ilvr_b(CPUMIPSState *env, uint32_t wd,
+                       uint32_t ws, uint32_t wt)
+{
+    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
+    wr_t *pws = &(env->active_fpu.fpr[ws].wr);
+    wr_t *pwt = &(env->active_fpu.fpr[wt].wr);
+
+    pwd->b[15] = pws->b[7];
+    pwd->b[14] = pwt->b[7];
+    pwd->b[13] = pws->b[6];
+    pwd->b[12] = pwt->b[6];
+    pwd->b[11] = pws->b[5];
+    pwd->b[10] = pwt->b[5];
+    pwd->b[9]  = pws->b[4];
+    pwd->b[8]  = pwt->b[4];
+    pwd->b[7]  = pws->b[3];
+    pwd->b[6]  = pwt->b[3];
+    pwd->b[5]  = pws->b[2];
+    pwd->b[4]  = pwt->b[2];
+    pwd->b[3]  = pws->b[1];
+    pwd->b[2]  = pwt->b[1];
+    pwd->b[1]  = pws->b[0];
+    pwd->b[0]  = pwt->b[0];
+}
+
 void helper_msa_ilvl_b(CPUMIPSState *env, uint32_t wd,
                        uint32_t ws, uint32_t wt)
 {
diff --git a/target/mips/translate.c b/target/mips/translate.c
index ee60f29..4c7b076 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28885,6 +28885,96 @@ static void gen_msa_bit(CPUMIPSState *env, DisasContext *ctx)
 }
 
 /*
+ * [MSA] ILVR.H wd, ws, wt
+ *
+ *   Vector Interleave Right (halfword data elements)
+ *
+ */
+static inline void gen_ilvr_h(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x000000000000ffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0x00000000ffff0000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0x0000ffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+
+    mask = 0xffff000000000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_or_i64(t2, t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVR.W wd, ws, wt
+ *
+ *   Vector Interleave Right (word data elements)
+ *
+ */
+static inline void gen_ilvr_w(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint64_t mask = 0x00000000ffffffffULL;
+
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_shli_i64(t1, t1, 32);
+    tcg_gen_or_i64(msa_wr_d[wd * 2], t2, t1);
+
+    mask = 0xffffffff00000000ULL;
+    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+    tcg_gen_shri_i64(t1, t1, 32);
+    tcg_gen_mov_i64(t2, t1);
+    tcg_gen_andi_i64(t1, msa_wr_d[ws * 2], mask);
+    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t2, t1);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
+/*
+ * [MSA] ILVR.D wd, ws, wt
+ *
+ *   Vector Interleave Right (doubleword data elements)
+ *
+ */
+static inline void gen_ilvr_d(CPUMIPSState *env, uint32_t wd,
+                              uint32_t ws, uint32_t wt)
+{
+    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
+    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]);
+}
+
+
+/*
  * [MSA] ILVL.H wd, ws, wt
  *
  *   Vector Interleave Left (halfword data elements)
@@ -29273,7 +29363,22 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
         gen_helper_msa_div_u_df(cpu_env, tdf, twd, tws, twt);
         break;
     case OPC_ILVR_df:
-        gen_helper_msa_ilvr_df(cpu_env, tdf, twd, tws, twt);
+        switch (df) {
+        case DF_BYTE:
+            gen_helper_msa_ilvr_b(cpu_env, twd, tws, twt);
+            break;
+        case DF_HALF:
+            gen_ilvr_h(env, wd, ws, wt);
+            break;
+        case DF_WORD:
+            gen_ilvr_w(env, wd, ws, wt);
+            break;
+        case DF_DOUBLE:
+            gen_ilvr_d(env, wd, ws, wt);
+            break;
+        default:
+            assert(0);
+        }
         break;
     case OPC_BINSL_df:
         gen_helper_msa_binsl_df(cpu_env, tdf, twd, tws, twt);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

>From 55e222d8139e3dd034069fb512b83fd2541ec067 Mon Sep 17 00:00:00 2001
From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
Date: Wed, 17 Apr 2019 14:50:55 +0200
Subject: [PATCH v7 5/6] target/mips: Merge implementation of ILVEV.D and ILVR.D

The implementation for ILVEV.D and ILVR.D instructions
is equivalent, so use a single handler for both of them.

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/translate.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 4c7b076..656153a 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28961,20 +28961,6 @@ static inline void gen_ilvr_w(CPUMIPSState *env, uint32_t wd,
 }
 
 /*
- * [MSA] ILVR.D wd, ws, wt
- *
- *   Vector Interleave Right (doubleword data elements)
- *
- */
-static inline void gen_ilvr_d(CPUMIPSState *env, uint32_t wd,
-                              uint32_t ws, uint32_t wt)
-{
-    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
-    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]);
-}
-
-
-/*
  * [MSA] ILVL.H wd, ws, wt
  *
  *   Vector Interleave Left (halfword data elements)
@@ -29197,10 +29183,16 @@ static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
 /*
  * [MSA] ILVEV.D wd, ws, wt
  *
- *   Vector Interleave Even (Doubleword data elements)
+ *   Vector Interleave Even (doubleword data elements)
+ *
+ * [MSA] ILVR.D wd, ws, wt
+ *
+ *   Vector Interleave Right (doubleword data elements)
+ *
+ *  These two instructions are functionally equivalent.
  *
  */
-static inline void gen_ilvev_d(CPUMIPSState *env, uint32_t wd,
+static inline void gen_ilvev_ilvr_d(CPUMIPSState *env, uint32_t wd,
                                     uint32_t ws, uint32_t wt)
 {
     tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
@@ -29374,7 +29366,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvr_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvr_d(env, wd, ws, wt);
+            gen_ilvev_ilvr_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
@@ -29404,7 +29396,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvev_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvev_d(env, wd, ws, wt);
+            gen_ilvev_ilvr_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> MSA instructions
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From 55e222d8139e3dd034069fb512b83fd2541ec067 Mon Sep 17 00:00:00 2001
From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
Date: Wed, 17 Apr 2019 14:50:55 +0200
Subject: [PATCH v7 5/6] target/mips: Merge implementation of ILVEV.D and ILVR.D

The implementation for ILVEV.D and ILVR.D instructions
is equivalent, so use a single handler for both of them.

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/translate.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 4c7b076..656153a 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28961,20 +28961,6 @@ static inline void gen_ilvr_w(CPUMIPSState *env, uint32_t wd,
 }
 
 /*
- * [MSA] ILVR.D wd, ws, wt
- *
- *   Vector Interleave Right (doubleword data elements)
- *
- */
-static inline void gen_ilvr_d(CPUMIPSState *env, uint32_t wd,
-                              uint32_t ws, uint32_t wt)
-{
-    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
-    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2]);
-}
-
-
-/*
  * [MSA] ILVL.H wd, ws, wt
  *
  *   Vector Interleave Left (halfword data elements)
@@ -29197,10 +29183,16 @@ static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
 /*
  * [MSA] ILVEV.D wd, ws, wt
  *
- *   Vector Interleave Even (Doubleword data elements)
+ *   Vector Interleave Even (doubleword data elements)
+ *
+ * [MSA] ILVR.D wd, ws, wt
+ *
+ *   Vector Interleave Right (doubleword data elements)
+ *
+ *  These two instructions are functionally equivalent.
  *
  */
-static inline void gen_ilvev_d(CPUMIPSState *env, uint32_t wd,
+static inline void gen_ilvev_ilvr_d(CPUMIPSState *env, uint32_t wd,
                                     uint32_t ws, uint32_t wt)
 {
     tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2]);
@@ -29374,7 +29366,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvr_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvr_d(env, wd, ws, wt);
+            gen_ilvev_ilvr_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
@@ -29404,7 +29396,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvev_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvev_d(env, wd, ws, wt);
+            gen_ilvev_ilvr_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 6/6] target/mips: Merge implementation of ILVOD.D and ILVL.D
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, philmd, richard.henderson, amarkovic, arikalo

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

The implementation for ILVOD.D and ILVL.D instructions
is equivalent, so use a single handler for both of them.

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/translate.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 656153a..14aaf98 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -29037,19 +29037,6 @@ static inline void gen_ilvl_w(CPUMIPSState *env, uint32_t wd,
 }
 
 /*
- * [MSA] ILVL.D wd, ws, wt
- *
- *   Vector Interleave Left (doubleword data elements)
- *
- */
-static inline void gen_ilvl_d(CPUMIPSState *env, uint32_t wd,
-                              uint32_t ws, uint32_t wt)
-{
-    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
-    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
-}
-
-/*
  * [MSA] ILVOD.<B|H> wd, ws, wt
  *
  *   Vector Interleave Odd (<byte|halfword> data elements)
@@ -29115,9 +29102,15 @@ static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
  *
  *   Vector Interleave Odd (doubleword data elements)
  *
+ * [MSA] ILVL.D wd, ws, wt
+ *
+ *   Vector Interleave Left (doubleword data elements)
+ *
+ *  These two instructions are functionally equivalent.
+ *
  */
-static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd,
-                               uint32_t ws, uint32_t wt)
+static inline void gen_ilvod_ilvl_d(CPUMIPSState *env, uint32_t wd,
+                                    uint32_t ws, uint32_t wt)
 {
     tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
     tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
@@ -29330,7 +29323,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvl_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvl_d(env, wd, ws, wt);
+            gen_ilvod_ilvl_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
@@ -29426,7 +29419,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvod_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvod_d(env, wd, ws, wt);
+            gen_ilvod_ilvl_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH v7 6/6] target/mips: Merge implementation of ILVOD.D and ILVL.D
@ 2019-04-17 15:33   ` Mateja Marjanovic
  0 siblings, 0 replies; 20+ messages in thread
From: Mateja Marjanovic @ 2019-04-17 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: arikalo, richard.henderson, philmd, amarkovic, aurelien

From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>

The implementation for ILVOD.D and ILVL.D instructions
is equivalent, so use a single handler for both of them.

Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
---
 target/mips/translate.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 656153a..14aaf98 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -29037,19 +29037,6 @@ static inline void gen_ilvl_w(CPUMIPSState *env, uint32_t wd,
 }
 
 /*
- * [MSA] ILVL.D wd, ws, wt
- *
- *   Vector Interleave Left (doubleword data elements)
- *
- */
-static inline void gen_ilvl_d(CPUMIPSState *env, uint32_t wd,
-                              uint32_t ws, uint32_t wt)
-{
-    tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
-    tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
-}
-
-/*
  * [MSA] ILVOD.<B|H> wd, ws, wt
  *
  *   Vector Interleave Odd (<byte|halfword> data elements)
@@ -29115,9 +29102,15 @@ static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
  *
  *   Vector Interleave Odd (doubleword data elements)
  *
+ * [MSA] ILVL.D wd, ws, wt
+ *
+ *   Vector Interleave Left (doubleword data elements)
+ *
+ *  These two instructions are functionally equivalent.
+ *
  */
-static inline void gen_ilvod_d(CPUMIPSState *env, uint32_t wd,
-                               uint32_t ws, uint32_t wt)
+static inline void gen_ilvod_ilvl_d(CPUMIPSState *env, uint32_t wd,
+                                    uint32_t ws, uint32_t wt)
 {
     tcg_gen_mov_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2 + 1]);
     tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
@@ -29330,7 +29323,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvl_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvl_d(env, wd, ws, wt);
+            gen_ilvod_ilvl_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
@@ -29426,7 +29419,7 @@ static void gen_msa_3r(CPUMIPSState *env, DisasContext *ctx)
             gen_ilvod_w(env, wd, ws, wt);
             break;
         case DF_DOUBLE:
-            gen_ilvod_d(env, wd, ws, wt);
+            gen_ilvod_ilvl_d(env, wd, ws, wt);
             break;
         default:
             assert(0);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions
@ 2019-04-17 19:29     ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2019-04-17 19:29 UTC (permalink / raw)
  To: Mateja Marjanovic, qemu-devel; +Cc: aurelien, philmd, amarkovic, arikalo

On 4/17/19 5:33 AM, Mateja Marjanovic wrote:
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> 
> Optimize set of MSA instructions ILVOD.<B|H|W|D>, using
> directly tcg registers and performing logic on them instead
> of using helpers.
> 
> In the following table, the first column is the performance
> before this patch. The second represents the performance
> after converting from helpers to tcg, but without using
> tcg_gen_deposit function. The third one is with the deposit
> function and with using a uint64_t constant bit mask, and
> the fourth is with the deposit function and with a mask
> which is a tcg constant. The fourth is implemented in this
> patch.
> 
> Performance measurement is done by executing the
> instructions 10 million times on a computer
> with Intel Core i7-3770 CPU @ 3.40GHz×8.
> 
> ==================================================================
> || instruction ||     1     ||     2    ||     3    ||     4    ||
> ==================================================================
> ||   ilvod.b   || 117.50 ms || 24.13 ms || 24.45 ms || 23.24 ms ||
> ||   ilvod.h   ||  93.16 ms || 24.21 ms || 24.28 ms || 23.20 ms ||
> ||   ilvod.w   || 119.90 ms || 24.15 ms || 23.19 ms || 22.95 ms ||
> ||   ilvod.d   ||  43.01 ms || 21.17 ms || 23.07 ms || 22.59 ms ||
> ==================================================================
> 1 - before
> 2 - no-deposit-no-mask-as-tcg-constant
> 3 - with-deposit-no-mask-as-tcg-constant
> 4 - with-deposit-with-mask-as-tcg-constant (final)
> 
> The deposit function is used only in ILVOD.W.
> 
> No-deposit version of the ILVOD.W implementation:
> 
> static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
>                                uint32_t ws, uint32_t wt)
> {
>     TCGv_i64 t1 = tcg_temp_new_i64();
>     TCGv_i64 t2 = tcg_temp_new_i64();
>     TCGv_i64 mask = tcg_const_i64(0xffffffff00000000ULL);
> 
>     tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
>     tcg_gen_shri_i64(t1, t1, 32);
>     tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
>     tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
> 
>     tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask);
>     tcg_gen_shri_i64(t1, t1, 32);
>     tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask);
>     tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
> 
>     tcg_temp_free_i64(mask);
>     tcg_temp_free_i64(t1);
>     tcg_temp_free_i64(t2);
> }
> 
> Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> ---
>  target/mips/helper.h     |  1 -
>  target/mips/msa_helper.c |  7 ----
>  target/mips/translate.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 90 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions
@ 2019-04-17 19:29     ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2019-04-17 19:29 UTC (permalink / raw)
  To: Mateja Marjanovic, qemu-devel; +Cc: arikalo, philmd, amarkovic, aurelien

On 4/17/19 5:33 AM, Mateja Marjanovic wrote:
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> 
> Optimize set of MSA instructions ILVOD.<B|H|W|D>, using
> directly tcg registers and performing logic on them instead
> of using helpers.
> 
> In the following table, the first column is the performance
> before this patch. The second represents the performance
> after converting from helpers to tcg, but without using
> tcg_gen_deposit function. The third one is with the deposit
> function and with using a uint64_t constant bit mask, and
> the fourth is with the deposit function and with a mask
> which is a tcg constant. The fourth is implemented in this
> patch.
> 
> Performance measurement is done by executing the
> instructions 10 million times on a computer
> with Intel Core i7-3770 CPU @ 3.40GHz×8.
> 
> ==================================================================
> || instruction ||     1     ||     2    ||     3    ||     4    ||
> ==================================================================
> ||   ilvod.b   || 117.50 ms || 24.13 ms || 24.45 ms || 23.24 ms ||
> ||   ilvod.h   ||  93.16 ms || 24.21 ms || 24.28 ms || 23.20 ms ||
> ||   ilvod.w   || 119.90 ms || 24.15 ms || 23.19 ms || 22.95 ms ||
> ||   ilvod.d   ||  43.01 ms || 21.17 ms || 23.07 ms || 22.59 ms ||
> ==================================================================
> 1 - before
> 2 - no-deposit-no-mask-as-tcg-constant
> 3 - with-deposit-no-mask-as-tcg-constant
> 4 - with-deposit-with-mask-as-tcg-constant (final)
> 
> The deposit function is used only in ILVOD.W.
> 
> No-deposit version of the ILVOD.W implementation:
> 
> static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
>                                uint32_t ws, uint32_t wt)
> {
>     TCGv_i64 t1 = tcg_temp_new_i64();
>     TCGv_i64 t2 = tcg_temp_new_i64();
>     TCGv_i64 mask = tcg_const_i64(0xffffffff00000000ULL);
> 
>     tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
>     tcg_gen_shri_i64(t1, t1, 32);
>     tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
>     tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
> 
>     tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask);
>     tcg_gen_shri_i64(t1, t1, 32);
>     tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask);
>     tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
> 
>     tcg_temp_free_i64(mask);
>     tcg_temp_free_i64(t1);
>     tcg_temp_free_i64(t2);
> }
> 
> Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> ---
>  target/mips/helper.h     |  1 -
>  target/mips/msa_helper.c |  7 ----
>  target/mips/translate.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 90 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions
@ 2019-04-17 19:34     ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2019-04-17 19:34 UTC (permalink / raw)
  To: Mateja Marjanovic, qemu-devel; +Cc: aurelien, philmd, amarkovic, arikalo

On 4/17/19 5:33 AM, Mateja Marjanovic wrote:
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> 
> Optimize set of MSA instructions ILVEV.<B|H|W|D>, using
> directly tcg registers and performing logic on them
> instead of using helpers.
> 
> In the following table, the first column is the performance
> before this patch. The second represents the performance
> after converting from helpers to tcg, but without using
> tcg_gen_deposit function. The third one is with using the
> tcg_gen_deposit function and with using a uint64_t constant
> bit mask, and the fourth is with using the tcg_gen_deposit
> function and with a mask which is a tcg constant. The fourth
> is implemented in this patch.
> 
> Performance measurement is done by executing the
> instructions 10 million times on a computer
> with Intel Core i7-3770 CPU @ 3.40GHz×8.
> 
> ==================================================================
> || instruction ||     1     ||     2    ||     3    ||     4    ||
> ==================================================================
> ||   ilvev.b   || 126.92 ms || 24.52 ms || 25.19 ms || 23.89 ms ||
> ||   ilvev.h   ||  93.67 ms || 23.92 ms || 24.76 ms || 24.31 ms ||
> ||   ilvev.w   || 117.86 ms || 23.83 ms || 21.84 ms || 21.99 ms ||
> ||   ilvev.d   ||  45.49 ms || 19.74 ms || 20.21 ms || 20.07 ms ||
> ==================================================================
> 1 - before
> 2 - no-deposit-no-mask-as-tcg-constant
> 3 - with-deposit-no-mask-as-tcg-constant
> 4 - with-deposit-with-mask-as-tcg-constant (final)
> 
> The deposit function is used only in ILVEV.W.
> 
> No-deposit version of the ILVEV.W implementation:
> 
> static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
>                                uint32_t ws, uint32_t wt)
> {
>     TCGv_i64 t1 = tcg_temp_new_i64();
>     TCGv_i64 t2 = tcg_temp_new_i64();
>     uint64_t mask = 0x00000000ffffffffULL;
> 
>     tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
>     tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
>     tcg_gen_shli_i64(t2, t2, 32);
>     tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
> 
>     tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
>     tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask);
>     tcg_gen_shli_i64(t2, t2, 32);
>     tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
> 
>     tcg_temp_free_i64(t1);
>     tcg_temp_free_i64(t2);
> }
> 
> Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> ---
>  target/mips/helper.h     |  1 -
>  target/mips/msa_helper.c |  9 -----
>  target/mips/translate.c  | 87 +++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 86 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions
@ 2019-04-17 19:34     ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2019-04-17 19:34 UTC (permalink / raw)
  To: Mateja Marjanovic, qemu-devel; +Cc: arikalo, philmd, amarkovic, aurelien

On 4/17/19 5:33 AM, Mateja Marjanovic wrote:
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> 
> Optimize set of MSA instructions ILVEV.<B|H|W|D>, using
> directly tcg registers and performing logic on them
> instead of using helpers.
> 
> In the following table, the first column is the performance
> before this patch. The second represents the performance
> after converting from helpers to tcg, but without using
> tcg_gen_deposit function. The third one is with using the
> tcg_gen_deposit function and with using a uint64_t constant
> bit mask, and the fourth is with using the tcg_gen_deposit
> function and with a mask which is a tcg constant. The fourth
> is implemented in this patch.
> 
> Performance measurement is done by executing the
> instructions 10 million times on a computer
> with Intel Core i7-3770 CPU @ 3.40GHz×8.
> 
> ==================================================================
> || instruction ||     1     ||     2    ||     3    ||     4    ||
> ==================================================================
> ||   ilvev.b   || 126.92 ms || 24.52 ms || 25.19 ms || 23.89 ms ||
> ||   ilvev.h   ||  93.67 ms || 23.92 ms || 24.76 ms || 24.31 ms ||
> ||   ilvev.w   || 117.86 ms || 23.83 ms || 21.84 ms || 21.99 ms ||
> ||   ilvev.d   ||  45.49 ms || 19.74 ms || 20.21 ms || 20.07 ms ||
> ==================================================================
> 1 - before
> 2 - no-deposit-no-mask-as-tcg-constant
> 3 - with-deposit-no-mask-as-tcg-constant
> 4 - with-deposit-with-mask-as-tcg-constant (final)
> 
> The deposit function is used only in ILVEV.W.
> 
> No-deposit version of the ILVEV.W implementation:
> 
> static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
>                                uint32_t ws, uint32_t wt)
> {
>     TCGv_i64 t1 = tcg_temp_new_i64();
>     TCGv_i64 t2 = tcg_temp_new_i64();
>     uint64_t mask = 0x00000000ffffffffULL;
> 
>     tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
>     tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
>     tcg_gen_shli_i64(t2, t2, 32);
>     tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
> 
>     tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
>     tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask);
>     tcg_gen_shli_i64(t2, t2, 32);
>     tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
> 
>     tcg_temp_free_i64(t1);
>     tcg_temp_free_i64(t2);
> }
> 
> Suggested-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> ---
>  target/mips/helper.h     |  1 -
>  target/mips/msa_helper.c |  9 -----
>  target/mips/translate.c  | 87 +++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 86 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions
@ 2019-04-17 20:31   ` Aleksandar Markovic
  0 siblings, 0 replies; 20+ messages in thread
From: Aleksandar Markovic @ 2019-04-17 20:31 UTC (permalink / raw)
  To: Mateja Marjanovic, qemu-devel
  Cc: aurelien, philmd, richard.henderson, Aleksandar Rikalo

> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> Subject: [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions
> 
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> 
> Optimize and refactor MSA instructions ILVEV.<B|H|W|D>,
> ILVOD.<B|H|W|D>, ILVL.<B|H|W|D> and ILVR.<B|H|W|D>.

Patch number 5/6 seems to be for some reason lost. Please resend the
complete series.

Thanks,
Aleksandar

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions
@ 2019-04-17 20:31   ` Aleksandar Markovic
  0 siblings, 0 replies; 20+ messages in thread
From: Aleksandar Markovic @ 2019-04-17 20:31 UTC (permalink / raw)
  To: Mateja Marjanovic, qemu-devel
  Cc: Aleksandar Rikalo, richard.henderson, philmd, aurelien

> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com>
> Subject: [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions
> 
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com>
> 
> Optimize and refactor MSA instructions ILVEV.<B|H|W|D>,
> ILVOD.<B|H|W|D>, ILVL.<B|H|W|D> and ILVR.<B|H|W|D>.

Patch number 5/6 seems to be for some reason lost. Please resend the
complete series.

Thanks,
Aleksandar


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-04-17 20:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17 15:33 [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions Mateja Marjanovic
2019-04-17 15:33 ` Mateja Marjanovic
2019-04-17 15:33 ` [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions Mateja Marjanovic
2019-04-17 15:33   ` Mateja Marjanovic
2019-04-17 19:29   ` Richard Henderson
2019-04-17 19:29     ` Richard Henderson
2019-04-17 15:33 ` [Qemu-devel] [PATCH v7 2/6] target/mips: Optimize ILVEV.<B|H|W|D> " Mateja Marjanovic
2019-04-17 15:33   ` Mateja Marjanovic
2019-04-17 19:34   ` Richard Henderson
2019-04-17 19:34     ` Richard Henderson
2019-04-17 15:33 ` [Qemu-devel] [PATCH v7 3/6] target/mips: Optimize ILVL.<B|H|W|D> " Mateja Marjanovic
2019-04-17 15:33   ` Mateja Marjanovic
2019-04-17 15:33 ` [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> " Mateja Marjanovic
2019-04-17 15:33   ` Mateja Marjanovic
2019-04-17 15:33 ` Mateja Marjanovic
2019-04-17 15:33   ` Mateja Marjanovic
2019-04-17 15:33 ` [Qemu-devel] [PATCH v7 6/6] target/mips: Merge implementation of ILVOD.D and ILVL.D Mateja Marjanovic
2019-04-17 15:33   ` Mateja Marjanovic
2019-04-17 20:31 ` [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions Aleksandar Markovic
2019-04-17 20:31   ` Aleksandar Markovic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.