All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-arm@nongnu.org, qemu-devel@nongnu.org
Subject: [PATCH for-6.2 12/53] target/arm: Implement MVE VMULL (polynomial)
Date: Thu, 29 Jul 2021 12:14:31 +0100	[thread overview]
Message-ID: <20210729111512.16541-13-peter.maydell@linaro.org> (raw)
In-Reply-To: <20210729111512.16541-1-peter.maydell@linaro.org>

Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
inputs are in either the low or the high half of each double-width
element.

The assembler for this insn indicates the size with "P8" or "P16",
encoded into bit 28 as size = 0 or 1. We choose to follow the
same encoding as VQDMULL and decode this into a->size as MO_16
or MO_32 indicating the size of the result elements. This then
carries through to the helper function names where it then
matches up with the existing pmull_h() which does an 8x8->16
operation and a new pmull_w() which does the 16x16->32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/vec_internal.h  | 11 +++++++++++
 target/arm/mve.decode      | 14 ++++++++++----
 target/arm/mve_helper.c    | 16 ++++++++++++++++
 target/arm/translate-mve.c | 28 ++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 14 +++++++++++++-
 6 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 56e40844ad9..84adfb21517 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -145,6 +145,11 @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vmullpbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullpth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullpbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullptw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index 865d2139447..2a335582906 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -206,4 +206,15 @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *);
 int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
 int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 
+/*
+ * 8 x 8 -> 16 vector polynomial multiply where the inputs are
+ * in the low 8 bits of each 16-bit element
+*/
+uint64_t pmull_h(uint64_t op1, uint64_t op2);
+/*
+ * 16 x 16 -> 32 vector polynomial multiply where the inputs are
+ * in the low 16 bits of each 32-bit element
+ */
+uint64_t pmull_w(uint64_t op1, uint64_t op2);
+
 #endif /* TARGET_ARM_VEC_INTERNALS_H */
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index fa9d921f933..de079ec517d 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -173,10 +173,16 @@ VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
 VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 
-VMULL_BS         111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
-VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+{
+  VMULLP_B       111 . 1110 0 . 11 ... 1 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
+  VMULL_BS       111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
+  VMULL_BU       111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
+}
+{
+  VMULLP_T       111 . 1110 0 . 11 ... 1 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
+  VMULL_TS       111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+  VMULL_TU       111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+}
 
 VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index be8b9545317..91fb346d7e5 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -481,6 +481,22 @@ DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
 DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
 DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
 
+/*
+ * Polynomial multiply. We can always do this generating 64 bits
+ * of the result at a time, so we don't need to use DO_2OP_L.
+ */
+#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
+#define VMULLPW_MASK 0x0000ffff0000ffffULL
+#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
+#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
+#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
+#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
+
+DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
+DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
+DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
+DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
+
 /*
  * Because the computation type is at least twice as large as required,
  * these work for both signed and unsigned source types.
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index a2a45036a0b..d318f34b2bc 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -464,6 +464,34 @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
     return do_2op(s, a, fns[a->size]);
 }
 
+static bool trans_VMULLP_B(DisasContext *s, arg_2op *a)
+{
+    /*
+     * Note that a->size indicates the output size, ie VMULL.P8
+     * is the 8x8->16 operation and a->size is MO_16; VMULL.P16
+     * is the 16x16->32 operation and a->size is MO_32.
+     */
+    static MVEGenTwoOpFn * const fns[] = {
+        NULL,
+        gen_helper_mve_vmullpbh,
+        gen_helper_mve_vmullpbw,
+        NULL,
+    };
+    return do_2op(s, a, fns[a->size]);
+}
+
+static bool trans_VMULLP_T(DisasContext *s, arg_2op *a)
+{
+    /* a->size is as for trans_VMULLP_B */
+    static MVEGenTwoOpFn * const fns[] = {
+        NULL,
+        gen_helper_mve_vmullpth,
+        gen_helper_mve_vmullptw,
+        NULL,
+    };
+    return do_2op(s, a, fns[a->size]);
+}
+
 /*
  * VADC and VSBC: these perform an add-with-carry or subtract-with-carry
  * of the 32-bit elements in each lane of the input vectors, where the
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 034f6b84f78..17fb1583622 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -2028,11 +2028,23 @@ static uint64_t expand_byte_to_half(uint64_t x)
          | ((x & 0xff000000) << 24);
 }
 
-static uint64_t pmull_h(uint64_t op1, uint64_t op2)
+uint64_t pmull_w(uint64_t op1, uint64_t op2)
 {
     uint64_t result = 0;
     int i;
+    for (i = 0; i < 16; ++i) {
+        uint64_t mask = (op1 & 0x0000000100000001ull) * 0xffffffff;
+        result ^= op2 & mask;
+        op1 >>= 1;
+        op2 <<= 1;
+    }
+    return result;
+}
 
+uint64_t pmull_h(uint64_t op1, uint64_t op2)
+{
+    uint64_t result = 0;
+    int i;
     for (i = 0; i < 8; ++i) {
         uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
         result ^= op2 & mask;
-- 
2.20.1



  parent reply	other threads:[~2021-07-29 11:21 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 11:14 [PATCH for-6.2 00/53] target/arm: MVE slices 3 and 4 Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 01/53] target/arm: Note that we handle VMOVL as a special case of VSHLL Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 02/53] target/arm: Print MVE VPR in CPU dumps Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 03/53] target/arm: Fix MVE VSLI by 0 and VSRI by <dt> Peter Maydell
2021-07-30 18:56   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 04/53] target/arm: Fix signed VADDV Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 05/53] target/arm: Fix mask handling for MVE narrowing operations Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 06/53] target/arm: Fix 48-bit saturating shifts Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 07/53] target/arm: Fix MVE 48-bit SQRSHRL for small right shifts Peter Maydell
2021-07-30 19:07   ` Richard Henderson
2021-08-12  9:43     ` Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 08/53] target/arm: Fix calculation of LTP mask when LR is 0 Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 09/53] target/arm: Factor out mve_eci_mask() Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 10/53] target/arm: Fix VPT advance when ECI is non-zero Peter Maydell
2021-07-30 19:14   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 11/53] target/arm: Fix VLDRB/H/W for predicated elements Peter Maydell
2021-07-29 11:14 ` Peter Maydell [this message]
2021-07-29 11:14 ` [PATCH for-6.2 13/53] target/arm: Implement MVE incrementing/decrementing dup insns Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 14/53] target/arm: Factor out gen_vpst() Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 15/53] target/arm: Implement MVE integer vector comparisons Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 16/53] target/arm: Implement MVE integer vector-vs-scalar comparisons Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 17/53] target/arm: Implement MVE VPSEL Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 18/53] target/arm: Implement MVE VMLAS Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 19/53] target/arm: Implement MVE shift-by-scalar Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 20/53] target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 21/53] target/arm: Implement MVE integer min/max across vector Peter Maydell
2021-07-30 19:15   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 22/53] target/arm: Implement MVE VABAV Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 23/53] target/arm: Implement MVE narrowing moves Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 24/53] target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 25/53] target/arm: Implement MVE VMLADAV and VMLSLDAV Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 26/53] target/arm: Implement MVE VMLA Peter Maydell
2021-07-30 19:18   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 27/53] target/arm: Implement MVE saturating doubling multiply accumulates Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 28/53] target/arm: Implement MVE VQABS, VQNEG Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 29/53] target/arm: Implement MVE VMAXA, VMINA Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 30/53] target/arm: Implement MVE VMOV to/from 2 general-purpose registers Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 31/53] target/arm: Implement MVE VPNOT Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 32/53] target/arm: Implement MVE VCTP Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 33/53] target/arm: Implement MVE scatter-gather insns Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 34/53] target/arm: Implement MVE scatter-gather immediate forms Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 35/53] target/arm: Implement MVE interleaving loads/stores Peter Maydell
2021-07-29 11:14 ` [PATCH for-6.2 36/53] target/arm: Implement MVE VADD (floating-point) Peter Maydell
2021-07-30 19:27   ` Richard Henderson
2021-07-30 19:37   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 37/53] target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNM Peter Maydell
2021-07-30 19:28   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 38/53] target/arm: Implement MVE VCADD Peter Maydell
2021-07-30 19:32   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 39/53] target/arm: Implement MVE VFMA and VFMS Peter Maydell
2021-07-30 19:34   ` Richard Henderson
2021-07-30 19:41   ` Richard Henderson
2021-07-29 11:14 ` [PATCH for-6.2 40/53] target/arm: Implement MVE VCMUL and VCMLA Peter Maydell
2021-07-30 19:47   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 41/53] target/arm: Implement MVE VMAXNMA and VMINNMA Peter Maydell
2021-07-30 19:50   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 42/53] target/arm: Implement MVE scalar fp insns Peter Maydell
2021-07-30 19:55   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 43/53] target/arm: Implement MVE fp-with-scalar VFMA, VFMAS Peter Maydell
2021-07-30 19:58   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 44/53] softfloat: Remove assertion preventing silencing of NaN in default-NaN mode Peter Maydell
2021-07-30 20:00   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 45/53] target/arm: Implement MVE FP max/min across vector Peter Maydell
2021-07-30 20:12   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 46/53] target/arm: Implement MVE fp vector comparisons Peter Maydell
2021-07-30 20:21   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 47/53] target/arm: Implement MVE fp scalar comparisons Peter Maydell
2021-07-30 20:22   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 48/53] target/arm: Implement MVE VCVT between floating and fixed point Peter Maydell
2021-07-30 20:24   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 49/53] target/arm: Implement MVE VCVT between fp and integer Peter Maydell
2021-07-30 20:27   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 50/53] target/arm: Implement MVE VCVT with specified rounding mode Peter Maydell
2021-07-30 20:28   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 51/53] target/arm: Implement MVE VCVT between single and half precision Peter Maydell
2021-07-30 20:33   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 52/53] target/arm: Implement MVE VRINT insns Peter Maydell
2021-07-30 20:47   ` Richard Henderson
2021-07-29 11:15 ` [PATCH for-6.2 53/53] target/arm: Enable MVE in Cortex-M55 Peter Maydell
2021-07-30 20:48   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210729111512.16541-13-peter.maydell@linaro.org \
    --to=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.