All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-arm@nongnu.org, qemu-devel@nongnu.org
Subject: [PATCH 02/18] target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
Date: Mon, 28 Jun 2021 14:58:19 +0100	[thread overview]
Message-ID: <20210628135835.6690-3-peter.maydell@linaro.org> (raw)
In-Reply-To: <20210628135835.6690-1-peter.maydell@linaro.org>

The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
insns had some bugs:
 * the 32x32 multiply of elements was being done as 32x32->32,
   not 32x32->64
 * we were incorrectly maintaining the accumulator in its full
   72-bit form across all 4 beats of the insn; in the pseudocode
   it is squashed back into the 64 bits of the RdaHi:RdaLo
   registers after each beat

In particular, fixing the second of these allows us to recast
the implementation to avoid 128-bit arithmetic entirely.

Since the element size here is always 4, we can also drop the
parameterization of ESIZE to make the code a little more readable.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
Richard suggested this change in review of v1 of the original
MVE-slice-1 series, but at that time I was incorrectly reading the
pseudocode as requiring the 72-bit accumulation over all four beats.
Testing with a wider range of inputs showed I was wrong...
---
 target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 05552ce7eee..85a552fe070 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -18,7 +18,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/int128.h"
 #include "cpu.h"
 #include "internals.h"
 #include "vec_internal.h"
@@ -1100,40 +1099,45 @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
 DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
 
 /*
- * Rounding multiply add long dual accumulate high: we must keep
- * a 72-bit internal accumulator value and return the top 64 bits.
+ * Rounding multiply add long dual accumulate high. In the pseudocode
+ * this is implemented with a 72-bit internal accumulator value of which
+ * the top 64 bits are returned. We optimize this to avoid having to
+ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
+ * is squashed back into 64-bits after each beat.
  */
-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
+#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
                                     void *vm, uint64_t a)               \
     {                                                                   \
         uint16_t mask = mve_element_mask(env);                          \
         unsigned e;                                                     \
         TYPE *n = vn, *m = vm;                                          \
-        Int128 acc = int128_lshift(TO128(a), 8);                        \
-        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
             if (mask & 1) {                                             \
+                LTYPE mul;                                              \
                 if (e & 1) {                                            \
-                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
-                                            m[H##ESIZE(e)]));           \
+                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
+                    if (SUB) {                                          \
+                        mul = -mul;                                     \
+                    }                                                   \
                 } else {                                                \
-                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
-                                             m[H##ESIZE(e)]));          \
+                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                 }                                                       \
-                acc = int128_add(acc, int128_make64(1 << 7));           \
+                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
+                a += mul;                                               \
             }                                                           \
         }                                                               \
         mve_advance_vpt(env);                                           \
-        return int128_getlo(int128_rshift(acc, 8));                     \
+        return a;                                                       \
     }
 
-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
+DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
+DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
 
-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
+DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
 
-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
+DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
+DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
 
 /* Vector add across vector */
 #define DO_VADDV(OP, ESIZE, TYPE)                               \
-- 
2.20.1



  parent reply	other threads:[~2021-06-28 14:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-28 13:58 [PATCH 00/18] target/arm: Second slice of MVE implementation Peter Maydell
2021-06-28 13:58 ` [PATCH 01/18] target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation Peter Maydell
2021-06-28 15:12   ` Richard Henderson
2021-06-28 13:58 ` Peter Maydell [this message]
2021-06-28 15:17   ` [PATCH 02/18] target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH Richard Henderson
2021-06-28 13:58 ` [PATCH 03/18] target/arm: Make asimd_imm_const() public Peter Maydell
2021-06-28 15:19   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 04/18] target/arm: Use asimd_imm_const for A64 decode Peter Maydell
2021-06-28 15:36   ` Richard Henderson
2021-06-28 16:04     ` Peter Maydell
2021-06-28 13:58 ` [PATCH 05/18] target/arm: Use dup_const() instead of bitfield_replicate() Peter Maydell
2021-06-28 15:23   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 06/18] target/arm: Implement MVE logical immediate insns Peter Maydell
2021-06-28 15:37   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 07/18] target/arm: Implement MVE vector shift left by " Peter Maydell
2021-06-28 16:10   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 08/18] target/arm: Implement MVE vector shift right " Peter Maydell
2021-06-28 16:09   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 09/18] target/arm: Implement MVE VSHLL Peter Maydell
2021-06-28 16:18   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 10/18] target/arm: Implement MVE VSRI, VSLI Peter Maydell
2021-06-28 16:26   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 11/18] target/arm: Implement MVE VSHRN, VRSHRN Peter Maydell
2021-06-28 16:30   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 12/18] target/arm: Implement MVE saturating narrowing shifts Peter Maydell
2021-06-28 16:38   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 13/18] target/arm: Implement MVE VSHLC Peter Maydell
2021-06-28 16:39   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 14/18] target/arm: Implement MVE VADDLV Peter Maydell
2021-06-28 16:47   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 15/18] target/arm: Implement MVE long shifts by immediate Peter Maydell
2021-06-28 16:54   ` Richard Henderson
2021-06-28 17:45     ` Richard Henderson
2021-06-29 15:56       ` Peter Maydell
2021-06-29 16:13         ` Richard Henderson
2021-06-28 13:58 ` [PATCH 16/18] target/arm: Implement MVE long shifts by register Peter Maydell
2021-06-28 17:07   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 17/18] target/arm: Implement MVE shifts by immediate Peter Maydell
2021-06-28 17:38   ` Richard Henderson
2021-06-28 13:58 ` [PATCH 18/18] target/arm: Implement MVE shifts by register Peter Maydell
2021-06-28 17:41   ` Richard Henderson
2021-06-28 14:18 ` [PATCH 00/18] target/arm: Second slice of MVE implementation no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210628135835.6690-3-peter.maydell@linaro.org \
    --to=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.