All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] VSX MMA Implementation
@ 2022-05-20 13:51 Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 1/8] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Alex Bennée, Daniel Henrique Barboza, qemu-devel,
	David Gibson, Greg Kurz

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Based-on: <20220517161522.36132-1-victor.colombo@eldorado.org.br>

This patch series is a patch series of the Matrix-Multiply Assist (MMA)
instructions implementation from the PowerISA 3.1

These and the VDIV/VMOD implementation are the last new PowerISA 3.1
instructions left to be implemented.

The XVFGER instructions accumulate the exception status and at the end
set the FPSCR and take a Program interrupt on a trap-enabled exception,
previous versions were based on Victor's rework of FPU exceptions, but
as that patch was rejected this version worked around the fact that
OX/UX/XX and invalid instructions were handled in different functions
by disabling all enable bits then re-enabling them and calling the mtfsf
deferred exception helper.

Patch without review: 5

v4 changes:
    - Changed VSXGER16 accumulation to always use float32_sum and negate
      the elements according to the type of accumulation

v3 changes:
    - GER helpers now use ppc_acc_t instead of ppc_vsr_t for passing acc
    - Removed do_ger_XX3 and updated the decodetree to pass the masks in
      32 bits instructions
    - Removed unnecessary rounding mode function
    - Moved float32_neg to fpu_helper.c and renamed it bfp32_negate to
      make it clearer that it's a 32 bit version of the PowerISA
      bfp_NEGATE
    - Negated accumulation now a subtraction
    - Changed exception handling by disabling all enable FPSCR enable
      bits to set all FPSCR bits (except FEX) correctly, then re-enable
      them and call do_fpscr_check_status to raise the exception
      accordingly and set FEX if necessary

v2 changes:
    - Changed VSXGER, VSXGER16 and XVIGER macros to functions
    - Set rounding mode in floating-point instructions based on RN
      before operations
    - Separated accumulate and with saturation instructions in
      different helpers
    - Used FIELD, FIELD_EX32 and FIELD_DP32 for packing/unpacking masks

Joel Stanley (1):
  linux-user: Add PowerPC ISA 3.1 and MMA to hwcap

Lucas Mateus Castro (alqotel) (7):
  target/ppc: Implement xxm[tf]acc and xxsetaccz
  target/ppc: Implemented xvi*ger* instructions
  target/ppc: Implemented pmxvi*ger* instructions
  target/ppc: Implemented xvf*ger*
  target/ppc: Implemented xvf16ger*
  target/ppc: Implemented pmxvf*ger*
  target/ppc: Implemented [pm]xvbf16ger2*

 linux-user/elfload.c                |   4 +
 target/ppc/cpu.h                    |  13 ++
 target/ppc/fpu_helper.c             | 326 +++++++++++++++++++++++++++-
 target/ppc/helper.h                 |  33 +++
 target/ppc/insn32.decode            |  52 +++++
 target/ppc/insn64.decode            |  79 +++++++
 target/ppc/int_helper.c             | 130 +++++++++++
 target/ppc/internal.h               |  15 ++
 target/ppc/translate/vsx-impl.c.inc | 130 +++++++++++
 9 files changed, 780 insertions(+), 2 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/8] target/ppc: Implement xxm[tf]acc and xxsetaccz
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 2/8] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xxmfacc: VSX Move From Accumulator
xxmtacc: VSX Move To Accumulator
xxsetaccz: VSX Set Accumulator to Zero

The PowerISA 3.1 mentions that for the current version of the
architecture, "the hardware implementation provides the effect of ACC[i]
and VSRs 4*i to 4*i + 3 logically containing the same data" and "The
Accumulators introduce no new logical state at this time" (page 501).
For now it seems unnecessary to create new structures, so this patch
just uses ACC[i] as VSRs 4*i to 4*i+3 and therefore move to and from
accumulators are no-ops.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h                    |  5 +++++
 target/ppc/insn32.decode            |  9 +++++++++
 target/ppc/translate/vsx-impl.c.inc | 31 +++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 901ded79e9..2e80d0978f 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2661,6 +2661,11 @@ static inline int vsr_full_offset(int i)
     return offsetof(CPUPPCState, vsr[i].u64[0]);
 }
 
+static inline int acc_full_offset(int i)
+{
+    return vsr_full_offset(i * 4);
+}
+
 static inline int fpr_offset(int i)
 {
     return vsr64_offset(i, true);
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 39372fe673..7a76bedfa6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -151,6 +151,9 @@
 &X_vrt_frbp     vrt frbp
 @X_vrt_frbp     ...... vrt:5 ..... ....0 .......... .           &X_vrt_frbp frbp=%x_frbp
 
+&X_a            ra
+@X_a            ...... ra:3 .. ..... ..... .......... .         &X_a
+
 %xx_xt          0:1 21:5
 %xx_xb          1:1 11:5
 %xx_xa          2:1 16:5
@@ -710,3 +713,9 @@ XVTLSBB         111100 ... -- 00010 ..... 111011011 . - @XX2_bf_xb
 &XL_s           s:uint8_t
 @XL_s           ......-------------- s:1 .......... -   &XL_s
 RFEBB           010011-------------- .   0010010010 -   @XL_s
+
+## Accumulator Instructions
+
+XXMFACC         011111 ... -- 00000 ----- 0010110001 -   @X_a
+XXMTACC         011111 ... -- 00001 ----- 0010110001 -   @X_a
+XXSETACCZ       011111 ... -- 00011 ----- 0010110001 -   @X_a
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 3692740736..dc8875d5d3 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2787,6 +2787,37 @@ static bool trans_XVCVBF16SPN(DisasContext *ctx, arg_XX2 *a)
     return true;
 }
 
+    /*
+     *  The PowerISA 3.1 mentions that for the current version of the
+     *  architecture, "the hardware implementation provides the effect of
+     *  ACC[i] and VSRs 4*i to 4*i + 3 logically containing the same data"
+     *  and "The Accumulators introduce no new logical state at this time"
+     *  (page 501). For now it seems unnecessary to create new structures,
+     *  so ACC[i] is the same as VSRs 4*i to 4*i+3 and therefore
+     *  move to and from accumulators are no-ops.
+     */
+static bool trans_XXMFACC(DisasContext *ctx, arg_X_a *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    return true;
+}
+
+static bool trans_XXMTACC(DisasContext *ctx, arg_X_a *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    return true;
+}
+
+static bool trans_XXSETACCZ(DisasContext *ctx, arg_X_a *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    tcg_gen_gvec_dup_imm(MO_64, acc_full_offset(a->ra), 64, 64, 0);
+    return true;
+}
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/8] target/ppc: Implemented xvi*ger* instructions
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 1/8] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 3/8] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvi4ger8:     VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
xvi4ger8pp:   VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
Positive multiply, Positive accumulate
xvi8ger4:     VSX Vector 4-bit Signed Integer GER (rank-8 update)
xvi8ger4pp:   VSX Vector 4-bit Signed Integer GER (rank-8 update)
Positive multiply, Positive accumulate
xvi8ger4spp:  VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
with Saturate Positive multiply, Positive accumulate
xvi16ger2:    VSX Vector 16-bit Signed Integer GER (rank-2 update)
xvi16ger2pp:  VSX Vector 16-bit Signed Integer GER (rank-2 update)
Positive multiply, Positive accumulate
xvi16ger2s:   VSX Vector 16-bit Signed Integer GER (rank-2 update)
with Saturation
xvi16ger2spp: VSX Vector 16-bit Signed Integer GER (rank-2 update)
with Saturation Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h                    |   1 +
 target/ppc/helper.h                 |  13 +++
 target/ppc/insn32.decode            |  18 ++++
 target/ppc/int_helper.c             | 130 ++++++++++++++++++++++++++++
 target/ppc/internal.h               |  15 ++++
 target/ppc/translate/vsx-impl.c.inc |  41 +++++++++
 6 files changed, 218 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 2e80d0978f..c8a12a3985 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -238,6 +238,7 @@ typedef union _ppc_vsr_t {
 
 typedef ppc_vsr_t ppc_avr_t;
 typedef ppc_vsr_t ppc_fprp_t;
+typedef ppc_vsr_t ppc_acc_t;
 
 #if !defined(CONFIG_USER_ONLY)
 /* Software TLB cache */
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index aa6773c4a5..29354276f0 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -133,6 +133,10 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
 #define dh_ctype_vsr ppc_vsr_t *
 #define dh_typecode_vsr dh_typecode_ptr
 
+#define dh_alias_acc ptr
+#define dh_ctype_acc ppc_acc_t *
+#define dh_typecode_acc dh_typecode_ptr
+
 DEF_HELPER_3(vavgub, void, avr, avr, avr)
 DEF_HELPER_3(vavguh, void, avr, avr, avr)
 DEF_HELPER_3(vavguw, void, avr, avr, avr)
@@ -537,6 +541,15 @@ DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVD, void, vsr, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI4GER8, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI4GER8PP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI8GER4, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI8GER4PP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI8GER4SPP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, acc, i32)
 
 DEF_HELPER_2(efscfsi, i32, env, i32)
 DEF_HELPER_2(efscfui, i32, env, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 7a76bedfa6..899a04bf77 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -170,6 +170,12 @@
 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
+# 32 bit GER instructions have all mask bits considered 1
+&MMIRR_XX3      xa xb xt pmsk xmsk ymsk
+%xx_at          23:3
+@XX3_at         ...... ... .. ..... ..... ........ ...          &MMIRR_XX3 xt=%xx_at xb=%xx_xb \
+                                                                pmsk=255 xmsk=15 ymsk=15
+
 &XX3_dm         xt xa xb dm
 @XX3_dm         ...... ..... ..... ..... . dm:2 ..... ...       &XX3_dm xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
@@ -719,3 +725,15 @@ RFEBB           010011-------------- .   0010010010 -   @XL_s
 XXMFACC         011111 ... -- 00000 ----- 0010110001 -   @X_a
 XXMTACC         011111 ... -- 00001 ----- 0010110001 -   @X_a
 XXSETACCZ       011111 ... -- 00011 ----- 0010110001 -   @X_a
+
+## VSX GER instruction
+
+XVI4GER8        111011 ... -- ..... ..... 00100011 ..-  @XX3_at xa=%xx_xa
+XVI4GER8PP      111011 ... -- ..... ..... 00100010 ..-  @XX3_at xa=%xx_xa
+XVI8GER4        111011 ... -- ..... ..... 00000011 ..-  @XX3_at xa=%xx_xa
+XVI8GER4PP      111011 ... -- ..... ..... 00000010 ..-  @XX3_at xa=%xx_xa
+XVI16GER2       111011 ... -- ..... ..... 01001011 ..-  @XX3_at xa=%xx_xa
+XVI16GER2PP     111011 ... -- ..... ..... 01101011 ..-  @XX3_at xa=%xx_xa
+XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
+XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
+XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 8c1674510b..32a7d99718 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -782,6 +782,136 @@ VCT(uxs, cvtsduw, u32)
 VCT(sxs, cvtsdsw, s32)
 #undef VCT
 
+typedef int64_t do_ger(uint32_t, uint32_t, uint32_t);
+
+static int64_t ger_rank8(uint32_t a, uint32_t b, uint32_t mask)
+{
+    int64_t psum = 0;
+    for (int i = 0; i < 8; i++, mask >>= 1) {
+        if (mask & 1) {
+            psum += sextract32(a, 4 * i, 4) * sextract32(b, 4 * i, 4);
+        }
+    }
+    return psum;
+}
+
+static int64_t ger_rank4(uint32_t a, uint32_t b, uint32_t mask)
+{
+    int64_t psum = 0;
+    for (int i = 0; i < 4; i++, mask >>= 1) {
+        if (mask & 1) {
+            psum += sextract32(a, 8 * i, 8) * (int64_t)extract32(b, 8 * i, 8);
+        }
+    }
+    return psum;
+}
+
+static int64_t ger_rank2(uint32_t a, uint32_t b, uint32_t mask)
+{
+    int64_t psum = 0;
+    for (int i = 0; i < 2; i++, mask >>= 1) {
+        if (mask & 1) {
+            psum += sextract32(a, 16 * i, 16) * sextract32(b, 16 * i, 16);
+        }
+    }
+    return psum;
+}
+
+static void xviger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
+                   uint32_t mask, bool sat, bool acc, do_ger ger)
+{
+    uint8_t pmsk = FIELD_EX32(mask, GER_MSK, PMSK),
+            xmsk = FIELD_EX32(mask, GER_MSK, XMSK),
+            ymsk = FIELD_EX32(mask, GER_MSK, YMSK);
+    uint8_t xmsk_bit, ymsk_bit;
+    int64_t psum;
+    int i, j;
+    for (i = 0, xmsk_bit = 1 << 3; i < 4; i++, xmsk_bit >>= 1) {
+        for (j = 0, ymsk_bit = 1 << 3; j < 4; j++, ymsk_bit >>= 1) {
+            if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {
+                psum = ger(a->VsrW(i), b->VsrW(j), pmsk);
+                if (acc) {
+                    psum += at[i].VsrSW(j);
+                }
+                if (sat && psum > INT32_MAX) {
+                    set_vscr_sat(env);
+                    at[i].VsrSW(j) = INT32_MAX;
+                } else if (sat && psum < INT32_MIN) {
+                    set_vscr_sat(env);
+                    at[i].VsrSW(j) = INT32_MIN;
+                } else {
+                    at[i].VsrSW(j) = (int32_t) psum;
+                }
+            } else {
+                at[i].VsrSW(j) = 0;
+            }
+        }
+    }
+}
+
+QEMU_FLATTEN
+void helper_XVI4GER8(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, false, ger_rank8);
+}
+
+QEMU_FLATTEN
+void helper_XVI4GER8PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, true, ger_rank8);
+}
+
+QEMU_FLATTEN
+void helper_XVI8GER4(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, false, ger_rank4);
+}
+
+QEMU_FLATTEN
+void helper_XVI8GER4PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, true, ger_rank4);
+}
+
+QEMU_FLATTEN
+void helper_XVI8GER4SPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, true, true, ger_rank4);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                      ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, false, ger_rank2);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2S(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, true, false, ger_rank2);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, true, ger_rank2);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2SPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, true, true, ger_rank2);
+}
+
 target_ulong helper_vclzlsbb(ppc_avr_t *r)
 {
     target_ulong count = 0;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 8094e0b033..2add128cd1 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -18,6 +18,8 @@
 #ifndef PPC_INTERNAL_H
 #define PPC_INTERNAL_H
 
+#include "hw/registerfields.h"
+
 #define FUNC_MASK(name, ret_type, size, max_val)                  \
 static inline ret_type name(uint##size##_t start,                 \
                               uint##size##_t end)                 \
@@ -291,4 +293,17 @@ G_NORETURN void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
                                             uintptr_t retaddr);
 #endif
 
+FIELD(GER_MSK, XMSK, 0, 4)
+FIELD(GER_MSK, YMSK, 4, 4)
+FIELD(GER_MSK, PMSK, 8, 8)
+
+static inline int ger_pack_masks(int pmsk, int ymsk, int xmsk)
+{
+    int msk = 0;
+    msk = FIELD_DP32(msk, GER_MSK, XMSK, xmsk);
+    msk = FIELD_DP32(msk, GER_MSK, YMSK, ymsk);
+    msk = FIELD_DP32(msk, GER_MSK, PMSK, pmsk);
+    return msk;
+}
+
 #endif /* PPC_INTERNAL_H */
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index dc8875d5d3..9d4309e841 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -17,6 +17,13 @@ static inline TCGv_ptr gen_vsr_ptr(int reg)
     return r;
 }
 
+static inline TCGv_ptr gen_acc_ptr(int reg)
+{
+    TCGv_ptr r = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(r, cpu_env, acc_full_offset(reg));
+    return r;
+}
+
 #define VSX_LOAD_SCALAR(name, operation)                      \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
@@ -2818,6 +2825,40 @@ static bool trans_XXSETACCZ(DisasContext *ctx, arg_X_a *a)
     return true;
 }
 
+static bool do_ger(DisasContext *ctx, arg_MMIRR_XX3 *a,
+                   void (*helper)(TCGv_env, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32))
+{
+    uint32_t mask;
+    TCGv_ptr xt, xa, xb;
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    if (unlikely((a->xa / 4 == a->xt) || (a->xb / 4 == a->xt))) {
+        gen_invalid(ctx);
+        return true;
+    }
+
+    xt = gen_acc_ptr(a->xt);
+    xa = gen_vsr_ptr(a->xa);
+    xb = gen_vsr_ptr(a->xb);
+
+    mask = ger_pack_masks(a->pmsk, a->ymsk, a->xmsk);
+    helper(cpu_env, xa, xb, xt, tcg_constant_i32(mask));
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xa);
+    tcg_temp_free_ptr(xb);
+    return true;
+}
+
+TRANS(XVI4GER8, do_ger, gen_helper_XVI4GER8)
+TRANS(XVI4GER8PP, do_ger,  gen_helper_XVI4GER8PP)
+TRANS(XVI8GER4, do_ger, gen_helper_XVI8GER4)
+TRANS(XVI8GER4PP, do_ger,  gen_helper_XVI8GER4PP)
+TRANS(XVI8GER4SPP, do_ger, gen_helper_XVI8GER4SPP)
+TRANS(XVI16GER2, do_ger, gen_helper_XVI16GER2)
+TRANS(XVI16GER2PP, do_ger, gen_helper_XVI16GER2PP)
+TRANS(XVI16GER2S, do_ger, gen_helper_XVI16GER2S)
+TRANS(XVI16GER2SPP, do_ger, gen_helper_XVI16GER2SPP)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/8] target/ppc: Implemented pmxvi*ger* instructions
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 1/8] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 2/8] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 4/8] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
pmxvi4ger8:     Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update)
pmxvi4ger8pp:   Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update) Positive multiply, Positive accumulate
pmxvi8ger4:     Prefixed Masked VSX Vector 4-bit Signed Integer GER
(rank-8 update)
pmxvi8ger4pp:   Prefixed Masked VSX Vector 4-bit Signed Integer GER
(rank-8 update) Positive multiply, Positive accumulate
pmxvi8ger4spp:  Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update) with Saturate Positive multiply, Positive accumulate
pmxvi16ger2:    Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update)
pmxvi16ger2pp:  Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) Positive multiply, Positive accumulate
pmxvi16ger2s:   Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) with Saturation
pmxvi16ger2spp: Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) with Saturation Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/insn64.decode            | 30 +++++++++++++++++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 10 ++++++++++
 2 files changed, 40 insertions(+)

diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 691e8fe6c0..0eed35c8cd 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -68,6 +68,15 @@
                 ...... ..... ..... ..... ..... .. ....   \
                 &8RR_XX4_uim3 xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
 
+# Format MMIRR:XX3
+&MMIRR_XX3      !extern xa xb xt pmsk xmsk ymsk
+%xx3_xa         2:1 16:5
+%xx3_xb         1:1 11:5
+%xx3_at         23:3
+@MMIRR_XX3      ...... .. .... .. . . ........ xmsk:4 ymsk:4  \
+                ...... ... .. ..... ..... ........ ...  \
+                &MMIRR_XX3 xa=%xx3_xa xb=%xx3_xb xt=%xx3_at
+
 ### Fixed-Point Load Instructions
 
 PLBZ            000001 10 0--.-- .................. \
@@ -115,6 +124,27 @@ PSTFS           000001 10 0--.-- .................. \
 PSTFD           000001 10 0--.-- .................. \
                 110110 ..... ..... ................     @PLS_D
 
+## VSX GER instruction
+
+PMXVI4GER8      000001 11 1001 -- - - pmsk:8 ........              \
+                111011 ... -- ..... ..... 00100011 ..-  @MMIRR_XX3
+PMXVI4GER8PP    000001 11 1001 -- - - pmsk:8 ........              \
+                111011 ... -- ..... ..... 00100010 ..-  @MMIRR_XX3
+PMXVI8GER4      000001 11 1001 -- - - pmsk:4 ---- ........         \
+                111011 ... -- ..... ..... 00000011 ..-  @MMIRR_XX3
+PMXVI8GER4PP    000001 11 1001 -- - - pmsk:4 ---- ........         \
+                111011 ... -- ..... ..... 00000010 ..-  @MMIRR_XX3
+PMXVI16GER2     000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 01001011 ..-  @MMIRR_XX3
+PMXVI16GER2PP   000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 01101011 ..-  @MMIRR_XX3
+PMXVI8GER4SPP   000001 11 1001 -- - - pmsk:4 ---- ........         \
+                111011 ... -- ..... ..... 01100011 ..-  @MMIRR_XX3
+PMXVI16GER2S    000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 00101011 ..-  @MMIRR_XX3
+PMXVI16GER2SPP  000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 00101010 ..-  @MMIRR_XX3
+
 ### Prefixed No-operation Instruction
 
 @PNOP           000001 11 0000-- 000000000000000000     \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 9d4309e841..c9ed898bb6 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2859,6 +2859,16 @@ TRANS(XVI16GER2PP, do_ger, gen_helper_XVI16GER2PP)
 TRANS(XVI16GER2S, do_ger, gen_helper_XVI16GER2S)
 TRANS(XVI16GER2SPP, do_ger, gen_helper_XVI16GER2SPP)
 
+TRANS64(PMXVI4GER8, do_ger, gen_helper_XVI4GER8)
+TRANS64(PMXVI4GER8PP, do_ger, gen_helper_XVI4GER8PP)
+TRANS64(PMXVI8GER4, do_ger, gen_helper_XVI8GER4)
+TRANS64(PMXVI8GER4PP, do_ger, gen_helper_XVI8GER4PP)
+TRANS64(PMXVI8GER4SPP, do_ger, gen_helper_XVI8GER4SPP)
+TRANS64(PMXVI16GER2, do_ger, gen_helper_XVI16GER2)
+TRANS64(PMXVI16GER2PP, do_ger, gen_helper_XVI16GER2PP)
+TRANS64(PMXVI16GER2S, do_ger, gen_helper_XVI16GER2S)
+TRANS64(PMXVI16GER2SPP, do_ger, gen_helper_XVI16GER2SPP)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/8] target/ppc: Implemented xvf*ger*
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (2 preceding siblings ...)
  2022-05-20 13:51 ` [PATCH v4 3/8] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 5/8] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvf32ger:   VSX Vector 32-bit Floating-Point GER (rank-1 update)
xvf32gernn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
multiply, Negative accumulate
xvf32gernp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
multiply, Positive accumulate
xvf32gerpn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
multiply, Negative accumulate
xvf32gerpp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
multiply, Positive accumulate
xvf64ger:   VSX Vector 64-bit Floating-Point GER (rank-1 update)
xvf64gernn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
multiply, Negative accumulate
xvf64gernp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
multiply, Positive accumulate
xvf64gerpn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
multiply, Negative accumulate
xvf64gerpp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h                    |   4 +
 target/ppc/fpu_helper.c             | 193 +++++++++++++++++++++++++++-
 target/ppc/helper.h                 |  10 ++
 target/ppc/insn32.decode            |  13 ++
 target/ppc/translate/vsx-impl.c.inc |  12 ++
 5 files changed, 230 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c8a12a3985..bdedf4138e 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2641,6 +2641,8 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[i]
 #define VsrD(i) u64[i]
 #define VsrSD(i) s64[i]
+#define VsrSF(i) f32[i]
+#define VsrDF(i) f64[i]
 #else
 #define VsrB(i) u8[15 - (i)]
 #define VsrSB(i) s8[15 - (i)]
@@ -2650,6 +2652,8 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[3 - (i)]
 #define VsrD(i) u64[1 - (i)]
 #define VsrSD(i) s64[1 - (i)]
+#define VsrSF(i) f32[3 - (i)]
+#define VsrDF(i) f64[1 - (i)]
 #endif
 
 static inline int vsr64_offset(int i, bool high)
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 8592727792..1766da5bcf 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -414,7 +414,7 @@ void helper_store_fpscr(CPUPPCState *env, uint64_t val, uint32_t nibbles)
     ppc_store_fpscr(env, val);
 }
 
-void helper_fpscr_check_status(CPUPPCState *env)
+static void do_fpscr_check_status(CPUPPCState *env, uintptr_t raddr)
 {
     CPUState *cs = env_cpu(env);
     target_ulong fpscr = env->fpscr;
@@ -455,13 +455,19 @@ void helper_fpscr_check_status(CPUPPCState *env)
     }
     cs->exception_index = POWERPC_EXCP_PROGRAM;
     env->error_code = error | POWERPC_EXCP_FP;
+    env->fpscr |= error ? FP_FEX : 0;
     /* Deferred floating-point exception after target FPSCR update */
     if (fp_exceptions_enabled(env)) {
         raise_exception_err_ra(env, cs->exception_index,
-                               env->error_code, GETPC());
+                               env->error_code, raddr);
     }
 }
 
+void helper_fpscr_check_status(CPUPPCState *env)
+{
+    do_fpscr_check_status(env, GETPC());
+}
+
 static void do_float_check_status(CPUPPCState *env, bool change_fi,
                                   uintptr_t raddr)
 {
@@ -3469,3 +3475,186 @@ void helper_xssubqp(CPUPPCState *env, uint32_t opcode,
     *xt = t;
     do_float_check_status(env, true, GETPC());
 }
+
+static inline void vsxger_excp(CPUPPCState *env, uintptr_t retaddr)
+{
+    /*
+     * XV*GER instructions execute and set the FPSCR as if exceptions
+     * are disabled and only at the end throw an exception
+     */
+    target_ulong enable;
+    enable = env->fpscr & (FP_ENABLES | FP_FI | FP_FR);
+    env->fpscr &= ~(FP_ENABLES | FP_FI | FP_FR);
+    int status = get_float_exception_flags(&env->fp_status);
+    if (unlikely(status & float_flag_invalid)) {
+        if (status & float_flag_invalid_snan) {
+            float_invalid_op_vxsnan(env, 0);
+        }
+        if (status & float_flag_invalid_imz) {
+            float_invalid_op_vximz(env, false, 0);
+        }
+        if (status & float_flag_invalid_isi) {
+            float_invalid_op_vxisi(env, false, 0);
+        }
+    }
+    do_float_check_status(env, false, retaddr);
+    env->fpscr |= enable;
+    do_fpscr_check_status(env, retaddr);
+}
+
+typedef void vsxger_zero(ppc_vsr_t *at, int, int);
+
+typedef void vsxger_muladd_f(ppc_vsr_t *, ppc_vsr_t *, ppc_vsr_t *, int, int,
+                             int flags, float_status *s);
+
+static void vsxger_muladd32(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i,
+                            int j, int flags, float_status *s)
+{
+    at[i].VsrSF(j) = float32_muladd(a->VsrSF(i), b->VsrSF(j),
+                                    at[i].VsrSF(j), flags, s);
+}
+
+static void vsxger_mul32(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i,
+                         int j, int flags, float_status *s)
+{
+    at[i].VsrSF(j) = float32_mul(a->VsrSF(i), b->VsrSF(j), s);
+}
+
+static void vsxger_zero32(ppc_vsr_t *at, int i, int j)
+{
+    at[i].VsrSF(j) = float32_zero;
+}
+
+static void vsxger_muladd64(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i,
+                            int j, int flags, float_status *s)
+{
+    if (j >= 2) {
+        j -= 2;
+        at[i].VsrDF(j) = float64_muladd(a[i / 2].VsrDF(i % 2), b->VsrDF(j),
+                                        at[i].VsrDF(j), flags, s);
+    }
+}
+
+static void vsxger_mul64(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i,
+                         int j, int flags, float_status *s)
+{
+    if (j >= 2) {
+        j -= 2;
+        at[i].VsrDF(j) = float64_mul(a[i / 2].VsrDF(i % 2), b->VsrDF(j), s);
+    }
+}
+
+static void vsxger_zero64(ppc_vsr_t *at, int i, int j)
+{
+    if (j >= 2) {
+        j -= 2;
+        at[i].VsrDF(j) = float64_zero;
+    }
+}
+
+static void vsxger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
+                   uint32_t mask, bool acc, bool neg_mul, bool neg_acc,
+                   vsxger_muladd_f mul, vsxger_muladd_f muladd, vsxger_zero zero)
+{
+    int i, j, xmsk_bit, ymsk_bit, op_flags;
+    uint8_t xmsk = mask & 0x0F;
+    uint8_t ymsk = (mask >> 4) & 0x0F;
+    float_status *excp_ptr = &env->fp_status;
+    op_flags = (neg_acc ^ neg_mul) ? float_muladd_negate_c : 0;
+    op_flags |= (neg_mul) ? float_muladd_negate_result : 0;
+    helper_reset_fpstatus(env);
+    for (i = 0, xmsk_bit = 1 << 3; i < 4; i++, xmsk_bit >>= 1) {
+        for (j = 0, ymsk_bit = 1 << 3; j < 4; j++, ymsk_bit >>= 1) {
+            if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {
+                if (acc) {
+                    muladd(at, a, b, i, j, op_flags, excp_ptr);
+                } else {
+                    mul(at, a, b, i, j, op_flags, excp_ptr);
+                }
+            } else {
+                zero(at, i, j);
+            }
+        }
+    }
+    vsxger_excp(env, GETPC());
+}
+
+QEMU_FLATTEN
+void helper_XVF32GER(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, false, false, false, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, false, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERPN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, true, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERNP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, false, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERNN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, true, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GER(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, false, false, false, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, false, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERPN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, true, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERNP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, false, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERNN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, true, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 29354276f0..054d25f3b0 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -550,6 +550,16 @@ DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF32GER, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF32GERPP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF32GERPN, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF32GERNP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF32GERNN, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF64GER, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF64GERPP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF64GERPN, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF64GERNP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF64GERNN, void, env, vsr, vsr, acc, i32)
 
 DEF_HELPER_2(efscfsi, i32, env, i32)
 DEF_HELPER_2(efscfui, i32, env, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 899a04bf77..c561a17c7d 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -173,6 +173,7 @@
 # 32 bit GER instructions have all mask bits considered 1
 &MMIRR_XX3      xa xb xt pmsk xmsk ymsk
 %xx_at          23:3
+%xx_xa_pair     2:1 17:4 !function=times_2
 @XX3_at         ...... ... .. ..... ..... ........ ...          &MMIRR_XX3 xt=%xx_at xb=%xx_xb \
                                                                 pmsk=255 xmsk=15 ymsk=15
 
@@ -737,3 +738,15 @@ XVI16GER2PP     111011 ... -- ..... ..... 01101011 ..-  @XX3_at xa=%xx_xa
 XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
+
+XVF32GER        111011 ... -- ..... ..... 00011011 ..-  @XX3_at xa=%xx_xa
+XVF32GERPP      111011 ... -- ..... ..... 00011010 ..-  @XX3_at xa=%xx_xa
+XVF32GERPN      111011 ... -- ..... ..... 10011010 ..-  @XX3_at xa=%xx_xa
+XVF32GERNP      111011 ... -- ..... ..... 01011010 ..-  @XX3_at xa=%xx_xa
+XVF32GERNN      111011 ... -- ..... ..... 11011010 ..-  @XX3_at xa=%xx_xa
+
+XVF64GER        111011 ... -- .... 0 ..... 00111011 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERPP      111011 ... -- .... 0 ..... 00111010 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERPN      111011 ... -- .... 0 ..... 10111010 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERNP      111011 ... -- .... 0 ..... 01111010 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERNN      111011 ... -- .... 0 ..... 11111010 ..-  @XX3_at xa=%xx_xa_pair
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index c9ed898bb6..76747956bb 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2869,6 +2869,18 @@ TRANS64(PMXVI16GER2PP, do_ger, gen_helper_XVI16GER2PP)
 TRANS64(PMXVI16GER2S, do_ger, gen_helper_XVI16GER2S)
 TRANS64(PMXVI16GER2SPP, do_ger, gen_helper_XVI16GER2SPP)
 
+TRANS(XVF32GER, do_ger, gen_helper_XVF32GER)
+TRANS(XVF32GERPP, do_ger, gen_helper_XVF32GERPP)
+TRANS(XVF32GERPN, do_ger, gen_helper_XVF32GERPN)
+TRANS(XVF32GERNP, do_ger, gen_helper_XVF32GERNP)
+TRANS(XVF32GERNN, do_ger, gen_helper_XVF32GERNN)
+
+TRANS(XVF64GER, do_ger, gen_helper_XVF64GER)
+TRANS(XVF64GERPP, do_ger, gen_helper_XVF64GERPP)
+TRANS(XVF64GERPN, do_ger, gen_helper_XVF64GERPN)
+TRANS(XVF64GERNP, do_ger, gen_helper_XVF64GERNP)
+TRANS(XVF64GERNN, do_ger, gen_helper_XVF64GERNN)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/8] target/ppc: Implemented xvf16ger*
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (3 preceding siblings ...)
  2022-05-20 13:51 ` [PATCH v4 4/8] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 15:47   ` Richard Henderson
  2022-05-20 13:51 ` [PATCH v4 6/8] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvf16ger2:   VSX Vector 16-bit Floating-Point GER (rank-2 update)
xvf16ger2nn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative
multiply, Negative accumulate
xvf16ger2np: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative
multiply, Positive accumulate
xvf16ger2pn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive
multiply, Negative accumulate
xvf16ger2pp: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive
multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
It was suggested in last version of this patch that since negate is
being used to use it more, so this version tries to recreate the
operation as it's described in the ISA:
 if "[pm]xvf16ger2pp" then v2 <-bfp_ADD(r1, acc)
 if "[pm]xvf16ger2pn" then v2 <-bfp_ADD(r1, bfp_NEGATE(acc))
 if "[pm]xvf16ger2np" then v2 <-bfp_ADD(bfp_NEGATE(r1), acc)
 if "[pm]xvf16ger2nn" then v2 <-bfp_ADD(bfp_NEGATE(r1), bfp_NEGATE(acc))
A version using float32_sub instead of always add + neg was tried but in
the case of [pm]xvf16ger2np it would be necessary to invert the operations
so in a NaN - NaN situation the result could differ from the hardware.
(xvbf16ger2* behaves the same way as xvf16ger2*)
---
 target/ppc/cpu.h                    |  3 +
 target/ppc/fpu_helper.c             | 93 +++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  5 ++
 target/ppc/insn32.decode            |  6 ++
 target/ppc/translate/vsx-impl.c.inc |  6 ++
 5 files changed, 113 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index bdedf4138e..46769a5647 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -227,6 +227,7 @@ typedef union _ppc_vsr_t {
     int16_t s16[8];
     int32_t s32[4];
     int64_t s64[2];
+    float16 f16[8];
     float32 f32[4];
     float64 f64[2];
     float128 f128;
@@ -2641,6 +2642,7 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[i]
 #define VsrD(i) u64[i]
 #define VsrSD(i) s64[i]
+#define VsrHF(i) f16[i]
 #define VsrSF(i) f32[i]
 #define VsrDF(i) f64[i]
 #else
@@ -2652,6 +2654,7 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[3 - (i)]
 #define VsrD(i) u64[1 - (i)]
 #define VsrSD(i) s64[1 - (i)]
+#define VsrHF(i) f16[7 - (i)]
 #define VsrSF(i) f32[3 - (i)]
 #define VsrDF(i) f64[1 - (i)]
 #endif
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 1766da5bcf..f7da92a51a 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -36,6 +36,15 @@ static inline float128 float128_snan_to_qnan(float128 x)
 #define float32_snan_to_qnan(x) ((x) | 0x00400000)
 #define float16_snan_to_qnan(x) ((x) | 0x0200)
 
+static inline float32 bfp32_neg(float32 a)
+{
+    if (unlikely(float32_is_any_nan(a))) {
+        return a;
+    } else {
+        return float32_chs(a);
+    }
+}
+
 static inline bool fp_exceptions_enabled(CPUPPCState *env)
 {
 #ifdef CONFIG_USER_ONLY
@@ -3502,6 +3511,55 @@ static inline void vsxger_excp(CPUPPCState *env, uintptr_t retaddr)
     do_fpscr_check_status(env, retaddr);
 }
 
+typedef float64 extract_f16(float16, float_status *);
+
+static float64 extract_hf16(float16 in, float_status *fp_status)
+{
+    return float16_to_float64(in, true, fp_status);
+}
+
+static void vsxger16(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t  *at, uint32_t mask, bool acc,
+                     bool neg_mul, bool neg_acc, extract_f16 extract)
+{
+    float32 r, aux_acc;
+    float64 psum, va, vb, vc, vd;
+    int i, j, xmsk_bit, ymsk_bit;
+    uint8_t pmsk = FIELD_EX32(mask, GER_MSK, PMSK),
+            xmsk = FIELD_EX32(mask, GER_MSK, XMSK),
+            ymsk = FIELD_EX32(mask, GER_MSK, YMSK);
+    float_status *excp_ptr = &env->fp_status;
+    for (i = 0, xmsk_bit = 1 << 3; i < 4; i++, xmsk_bit >>= 1) {
+        for (j = 0, ymsk_bit = 1 << 3; j < 4; j++, ymsk_bit >>= 1) {
+            if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {
+                va = !(pmsk & 2) ? float64_zero : extract(a->VsrHF(2 * i), excp_ptr);
+                vb = !(pmsk & 2) ? float64_zero : extract(b->VsrHF(2 * j), excp_ptr);
+                vc = !(pmsk & 1) ? float64_zero : extract(a->VsrHF(2 * i + 1), excp_ptr);
+                vd = !(pmsk & 1) ? float64_zero : extract(b->VsrHF(2 * j + 1), excp_ptr);
+                psum = float64_mul(va, vb, excp_ptr);
+                psum = float64r32_muladd(vc, vd, psum, 0, excp_ptr);
+                r = float64_to_float32(psum, excp_ptr);
+                if (acc) {
+                    aux_acc = at[i].VsrSF(j);
+                    if (!neg_mul && !neg_acc) {
+                        r = float32_add(r, aux_acc, excp_ptr);
+                    } else if (!neg_mul) {
+                        r = float32_add(r, bfp32_neg(aux_acc), excp_ptr);
+                    } else if (!neg_acc) {
+                        r = float32_add(bfp32_neg(r), aux_acc, excp_ptr);
+                    } else {
+                        r = float32_add(bfp32_neg(r), bfp32_neg(aux_acc), excp_ptr);
+                    }
+                }
+                at[i].VsrSF(j) = r;
+            } else {
+                at[i].VsrSF(j) = float32_zero;
+            }
+        }
+    }
+    vsxger_excp(env, GETPC());
+}
+
 typedef void vsxger_zero(ppc_vsr_t *at, int, int);
 
 typedef void vsxger_muladd_f(ppc_vsr_t *, ppc_vsr_t *, ppc_vsr_t *, int, int,
@@ -3579,6 +3637,41 @@ static void vsxger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
     vsxger_excp(env, GETPC());
 }
 
+QEMU_FLATTEN
+void helper_XVF16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, false, false, false, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, false, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2PN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, true, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2NP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, false, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2NN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, true, extract_hf16);
+}
+
 QEMU_FLATTEN
 void helper_XVF32GER(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
                      ppc_acc_t *at, uint32_t mask)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 054d25f3b0..7ab5ac8ee7 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -550,6 +550,11 @@ DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF16GER2, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF16GER2PP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF16GER2PN, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF16GER2NP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVF16GER2NN, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF32GER, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF32GERPP, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF32GERPN, void, env, vsr, vsr, acc, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index c561a17c7d..c774227d8c 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -739,6 +739,12 @@ XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
 
+XVF16GER2       111011 ... -- ..... ..... 00010011 ..-  @XX3_at xa=%xx_xa
+XVF16GER2PP     111011 ... -- ..... ..... 00010010 ..-  @XX3_at xa=%xx_xa
+XVF16GER2PN     111011 ... -- ..... ..... 10010010 ..-  @XX3_at xa=%xx_xa
+XVF16GER2NP     111011 ... -- ..... ..... 01010010 ..-  @XX3_at xa=%xx_xa
+XVF16GER2NN     111011 ... -- ..... ..... 11010010 ..-  @XX3_at xa=%xx_xa
+
 XVF32GER        111011 ... -- ..... ..... 00011011 ..-  @XX3_at xa=%xx_xa
 XVF32GERPP      111011 ... -- ..... ..... 00011010 ..-  @XX3_at xa=%xx_xa
 XVF32GERPN      111011 ... -- ..... ..... 10011010 ..-  @XX3_at xa=%xx_xa
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 76747956bb..232a4d881e 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2869,6 +2869,12 @@ TRANS64(PMXVI16GER2PP, do_ger, gen_helper_XVI16GER2PP)
 TRANS64(PMXVI16GER2S, do_ger, gen_helper_XVI16GER2S)
 TRANS64(PMXVI16GER2SPP, do_ger, gen_helper_XVI16GER2SPP)
 
+TRANS(XVF16GER2, do_ger, gen_helper_XVF16GER2)
+TRANS(XVF16GER2PP, do_ger, gen_helper_XVF16GER2PP)
+TRANS(XVF16GER2PN, do_ger, gen_helper_XVF16GER2PN)
+TRANS(XVF16GER2NP, do_ger, gen_helper_XVF16GER2NP)
+TRANS(XVF16GER2NN, do_ger, gen_helper_XVF16GER2NN)
+
 TRANS(XVF32GER, do_ger, gen_helper_XVF32GER)
 TRANS(XVF32GERPP, do_ger, gen_helper_XVF32GERPP)
 TRANS(XVF32GERPN, do_ger, gen_helper_XVF32GERPN)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/8] target/ppc: Implemented pmxvf*ger*
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (4 preceding siblings ...)
  2022-05-20 13:51 ` [PATCH v4 5/8] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 7/8] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 8/8] linux-user: Add PowerPC ISA 3.1 and MMA to hwcap Lucas Mateus Castro(alqotel)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
pmxvf16ger2:   Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update)
pmxvf16ger2nn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Negative multiply, Negative accumulate
pmxvf16ger2np: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Negative multiply, Positive accumulate
pmxvf16ger2pn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Positive multiply, Negative accumulate
pmxvf16ger2pp: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Positive multiply, Positive accumulate
pmxvf32ger:    Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update)
pmxvf32gernn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Negative multiply, Negative accumulate
pmxvf32gernp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Negative multiply, Positive accumulate
pmxvf32gerpn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Positive multiply, Negative accumulate
pmxvf32gerpp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Positive multiply, Positive accumulate
pmxvf64ger:    Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update)
pmxvf64gernn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Negative multiply, Negative accumulate
pmxvf64gernp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Negative multiply, Positive accumulate
pmxvf64gerpn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Positive multiply, Negative accumulate
pmxvf64gerpp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/insn64.decode            | 38 +++++++++++++++++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 18 ++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 0eed35c8cd..5ecc5c85bf 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -73,10 +73,15 @@
 %xx3_xa         2:1 16:5
 %xx3_xb         1:1 11:5
 %xx3_at         23:3
+%xx3_xa_pair    2:1 17:4 !function=times_2
 @MMIRR_XX3      ...... .. .... .. . . ........ xmsk:4 ymsk:4  \
                 ...... ... .. ..... ..... ........ ...  \
                 &MMIRR_XX3 xa=%xx3_xa xb=%xx3_xb xt=%xx3_at
 
+@MMIRR_XX3_NO_P ...... .. .... .. . . ........ xmsk:4 .... \
+                ...... ... .. ..... ..... ........ ... \
+                &MMIRR_XX3 xb=%xx3_xb xt=%xx3_at pmsk=1
+
 ### Fixed-Point Load Instructions
 
 PLBZ            000001 10 0--.-- .................. \
@@ -145,6 +150,39 @@ PMXVI16GER2S    000001 11 1001 -- - - pmsk:2 ------ ........       \
 PMXVI16GER2SPP  000001 11 1001 -- - - pmsk:2 ------ ........       \
                 111011 ... -- ..... ..... 00101010 ..-  @MMIRR_XX3
 
+PMXVF16GER2     000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00010011 ..-  @MMIRR_XX3
+PMXVF16GER2PP   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00010010 ..-  @MMIRR_XX3
+PMXVF16GER2PN   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 10010010 ..-  @MMIRR_XX3
+PMXVF16GER2NP   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 01010010 ..-  @MMIRR_XX3
+PMXVF16GER2NN   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 11010010 ..-  @MMIRR_XX3
+
+PMXVF32GER      000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 00011011 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERPP    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 00011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERPN    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 10011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERNP    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 01011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERNN    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 11011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+
+PMXVF64GER      000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 00111011 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERPP    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 00111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERPN    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 10111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERNP    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 01111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERNN    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 11111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+
 ### Prefixed No-operation Instruction
 
 @PNOP           000001 11 0000-- 000000000000000000     \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 232a4d881e..7218394b45 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2887,6 +2887,24 @@ TRANS(XVF64GERPN, do_ger, gen_helper_XVF64GERPN)
 TRANS(XVF64GERNP, do_ger, gen_helper_XVF64GERNP)
 TRANS(XVF64GERNN, do_ger, gen_helper_XVF64GERNN)
 
+TRANS64(PMXVF16GER2, do_ger, gen_helper_XVF16GER2)
+TRANS64(PMXVF16GER2PP, do_ger, gen_helper_XVF16GER2PP)
+TRANS64(PMXVF16GER2PN, do_ger, gen_helper_XVF16GER2PN)
+TRANS64(PMXVF16GER2NP, do_ger, gen_helper_XVF16GER2NP)
+TRANS64(PMXVF16GER2NN, do_ger, gen_helper_XVF16GER2NN)
+
+TRANS64(PMXVF32GER, do_ger, gen_helper_XVF32GER)
+TRANS64(PMXVF32GERPP, do_ger, gen_helper_XVF32GERPP)
+TRANS64(PMXVF32GERPN, do_ger, gen_helper_XVF32GERPN)
+TRANS64(PMXVF32GERNP, do_ger, gen_helper_XVF32GERNP)
+TRANS64(PMXVF32GERNN, do_ger, gen_helper_XVF32GERNN)
+
+TRANS64(PMXVF64GER, do_ger, gen_helper_XVF64GER)
+TRANS64(PMXVF64GERPP, do_ger, gen_helper_XVF64GERPP)
+TRANS64(PMXVF64GERPN, do_ger, gen_helper_XVF64GERPN)
+TRANS64(PMXVF64GERNP, do_ger, gen_helper_XVF64GERNP)
+TRANS64(PMXVF64GERNN, do_ger, gen_helper_XVF64GERNN)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 7/8] target/ppc: Implemented [pm]xvbf16ger2*
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (5 preceding siblings ...)
  2022-05-20 13:51 ` [PATCH v4 6/8] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  2022-05-20 13:51 ` [PATCH v4 8/8] linux-user: Add PowerPC ISA 3.1 and MMA to hwcap Lucas Mateus Castro(alqotel)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvbf16ger2:   VSX Vector bfloat16 GER (rank-2 update)
xvbf16ger2nn: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
Negative accumulate
xvbf16ger2np: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
Positive accumulate
xvbf16ger2pn: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
Negative accumulate
xvbf16ger2pp: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
Positive accumulate
pmxvbf16ger2:   Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
pmxvbf16ger2nn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Negative multiply, Negative accumulate
pmxvbf16ger2np: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Negative multiply, Positive accumulate
pmxvbf16ger2pn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Positive multiply, Negative accumulate
pmxvbf16ger2pp: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/fpu_helper.c             | 40 +++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  5 ++++
 target/ppc/insn32.decode            |  6 +++++
 target/ppc/insn64.decode            | 11 ++++++++
 target/ppc/translate/vsx-impl.c.inc | 12 +++++++++
 5 files changed, 74 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index f7da92a51a..46e82b7b26 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -3518,6 +3518,11 @@ static float64 extract_hf16(float16 in, float_status *fp_status)
     return float16_to_float64(in, true, fp_status);
 }
 
+static float64 extract_bf16(bfloat16 in, float_status *fp_status)
+{
+    return bfloat16_to_float64(in, fp_status);
+}
+
 static void vsxger16(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
                      ppc_acc_t  *at, uint32_t mask, bool acc,
                      bool neg_mul, bool neg_acc, extract_f16 extract)
@@ -3637,6 +3642,41 @@ static void vsxger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
     vsxger_excp(env, GETPC());
 }
 
+QEMU_FLATTEN
+void helper_XVBF16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, false, false, false, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, false, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2PN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, true, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2NP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, false, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2NN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, true, extract_bf16);
+}
+
 QEMU_FLATTEN
 void helper_XVF16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
                      ppc_acc_t *at, uint32_t mask)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 7ab5ac8ee7..06203fd893 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -555,6 +555,11 @@ DEF_HELPER_5(XVF16GER2PP, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF16GER2PN, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF16GER2NP, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF16GER2NN, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVBF16GER2, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVBF16GER2PP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVBF16GER2PN, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVBF16GER2NP, void, env, vsr, vsr, acc, i32)
+DEF_HELPER_5(XVBF16GER2NN, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF32GER, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF32GERPP, void, env, vsr, vsr, acc, i32)
 DEF_HELPER_5(XVF32GERPN, void, env, vsr, vsr, acc, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index c774227d8c..dfd12e9801 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -739,6 +739,12 @@ XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
 
+XVBF16GER2      111011 ... -- ..... ..... 00110011 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2PP    111011 ... -- ..... ..... 00110010 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2PN    111011 ... -- ..... ..... 10110010 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2NP    111011 ... -- ..... ..... 01110010 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2NN    111011 ... -- ..... ..... 11110010 ..-  @XX3_at xa=%xx_xa
+
 XVF16GER2       111011 ... -- ..... ..... 00010011 ..-  @XX3_at xa=%xx_xa
 XVF16GER2PP     111011 ... -- ..... ..... 00010010 ..-  @XX3_at xa=%xx_xa
 XVF16GER2PN     111011 ... -- ..... ..... 10010010 ..-  @XX3_at xa=%xx_xa
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 5ecc5c85bf..de115c1943 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -150,6 +150,17 @@ PMXVI16GER2S    000001 11 1001 -- - - pmsk:2 ------ ........       \
 PMXVI16GER2SPP  000001 11 1001 -- - - pmsk:2 ------ ........       \
                 111011 ... -- ..... ..... 00101010 ..-  @MMIRR_XX3
 
+PMXVBF16GER2    000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00110011 ..-  @MMIRR_XX3
+PMXVBF16GER2PP  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00110010 ..-  @MMIRR_XX3
+PMXVBF16GER2PN  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 10110010 ..-  @MMIRR_XX3
+PMXVBF16GER2NP  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 01110010 ..-  @MMIRR_XX3
+PMXVBF16GER2NN  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 11110010 ..-  @MMIRR_XX3
+
 PMXVF16GER2     000001 11 1001 -- - - pmsk:2 ------ ........ \
                 111011 ... -- ..... ..... 00010011 ..-  @MMIRR_XX3
 PMXVF16GER2PP   000001 11 1001 -- - - pmsk:2 ------ ........ \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 7218394b45..c1f3205c83 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2869,6 +2869,12 @@ TRANS64(PMXVI16GER2PP, do_ger, gen_helper_XVI16GER2PP)
 TRANS64(PMXVI16GER2S, do_ger, gen_helper_XVI16GER2S)
 TRANS64(PMXVI16GER2SPP, do_ger, gen_helper_XVI16GER2SPP)
 
+TRANS(XVBF16GER2, do_ger, gen_helper_XVBF16GER2)
+TRANS(XVBF16GER2PP, do_ger, gen_helper_XVBF16GER2PP)
+TRANS(XVBF16GER2PN, do_ger, gen_helper_XVBF16GER2PN)
+TRANS(XVBF16GER2NP, do_ger, gen_helper_XVBF16GER2NP)
+TRANS(XVBF16GER2NN, do_ger, gen_helper_XVBF16GER2NN)
+
 TRANS(XVF16GER2, do_ger, gen_helper_XVF16GER2)
 TRANS(XVF16GER2PP, do_ger, gen_helper_XVF16GER2PP)
 TRANS(XVF16GER2PN, do_ger, gen_helper_XVF16GER2PN)
@@ -2887,6 +2893,12 @@ TRANS(XVF64GERPN, do_ger, gen_helper_XVF64GERPN)
 TRANS(XVF64GERNP, do_ger, gen_helper_XVF64GERNP)
 TRANS(XVF64GERNN, do_ger, gen_helper_XVF64GERNN)
 
+TRANS64(PMXVBF16GER2, do_ger, gen_helper_XVBF16GER2)
+TRANS64(PMXVBF16GER2PP, do_ger, gen_helper_XVBF16GER2PP)
+TRANS64(PMXVBF16GER2PN, do_ger, gen_helper_XVBF16GER2PN)
+TRANS64(PMXVBF16GER2NP, do_ger, gen_helper_XVBF16GER2NP)
+TRANS64(PMXVBF16GER2NN, do_ger, gen_helper_XVBF16GER2NN)
+
 TRANS64(PMXVF16GER2, do_ger, gen_helper_XVF16GER2)
 TRANS64(PMXVF16GER2PP, do_ger, gen_helper_XVF16GER2PP)
 TRANS64(PMXVF16GER2PN, do_ger, gen_helper_XVF16GER2PN)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 8/8] linux-user: Add PowerPC ISA 3.1 and MMA to hwcap
  2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (6 preceding siblings ...)
  2022-05-20 13:51 ` [PATCH v4 7/8] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
@ 2022-05-20 13:51 ` Lucas Mateus Castro(alqotel)
  7 siblings, 0 replies; 12+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-20 13:51 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Alex Bennée,
	Daniel Henrique Barboza, qemu-devel, Lucas Mateus Castro,
	Laurent Vivier

From: Joel Stanley <joel@jms.id.au>

These are new hwcap bits added for power10.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/elfload.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 61063fd974..0908692e62 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -779,6 +779,8 @@ enum {
     QEMU_PPC_FEATURE2_DARN = 0x00200000, /* darn random number insn */
     QEMU_PPC_FEATURE2_SCV = 0x00100000, /* scv syscall */
     QEMU_PPC_FEATURE2_HTM_NO_SUSPEND = 0x00080000, /* TM w/o suspended state */
+    QEMU_PPC_FEATURE2_ARCH_3_1 = 0x00040000, /* ISA 3.1 */
+    QEMU_PPC_FEATURE2_MMA = 0x00020000, /* Matrix-Multiply Assist */
 };
 
 #define ELF_HWCAP get_elf_hwcap()
@@ -836,6 +838,8 @@ static uint32_t get_elf_hwcap2(void)
                   QEMU_PPC_FEATURE2_VEC_CRYPTO);
     GET_FEATURE2(PPC2_ISA300, QEMU_PPC_FEATURE2_ARCH_3_00 |
                  QEMU_PPC_FEATURE2_DARN | QEMU_PPC_FEATURE2_HAS_IEEE128);
+    GET_FEATURE2(PPC2_ISA310, QEMU_PPC_FEATURE2_ARCH_3_1 |
+                 QEMU_PPC_FEATURE2_MMA);
 
 #undef GET_FEATURE
 #undef GET_FEATURE2
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 5/8] target/ppc: Implemented xvf16ger*
  2022-05-20 13:51 ` [PATCH v4 5/8] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
@ 2022-05-20 15:47   ` Richard Henderson
  2022-05-20 16:42     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2022-05-20 15:47 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

On 5/20/22 06:51, Lucas Mateus Castro(alqotel) wrote:
> +                if (acc) {
> +                    aux_acc = at[i].VsrSF(j);
> +                    if (!neg_mul && !neg_acc) {
> +                        r = float32_add(r, aux_acc, excp_ptr);
> +                    } else if (!neg_mul) {
> +                        r = float32_add(r, bfp32_neg(aux_acc), excp_ptr);
> +                    } else if (!neg_acc) {
> +                        r = float32_add(bfp32_neg(r), aux_acc, excp_ptr);
> +                    } else {
> +                        r = float32_add(bfp32_neg(r), bfp32_neg(aux_acc), excp_ptr);
> +                    }

There's no point in the 3 if's when using bfp32_neg.
Just use

   if (neg_mul) {
   }
   if (neg_acc) {
   }
   float32_add(...);

With that,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 5/8] target/ppc: Implemented xvf16ger*
  2022-05-20 15:47   ` Richard Henderson
@ 2022-05-20 16:42     ` Lucas Mateus Martins Araujo e Castro
  2022-05-20 19:03       ` Richard Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Lucas Mateus Martins Araujo e Castro @ 2022-05-20 16:42 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc
  Cc: Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]


On 20/05/2022 12:47, Richard Henderson wrote:
>
> On 5/20/22 06:51, Lucas Mateus Castro(alqotel) wrote:
>> +                if (acc) {
>> +                    aux_acc = at[i].VsrSF(j);
>> +                    if (!neg_mul && !neg_acc) {
>> +                        r = float32_add(r, aux_acc, excp_ptr);
>> +                    } else if (!neg_mul) {
>> +                        r = float32_add(r, bfp32_neg(aux_acc), 
>> excp_ptr);
>> +                    } else if (!neg_acc) {
>> +                        r = float32_add(bfp32_neg(r), aux_acc, 
>> excp_ptr);
>> +                    } else {
>> +                        r = float32_add(bfp32_neg(r), 
>> bfp32_neg(aux_acc), excp_ptr);
>> +                    }
>
> There's no point in the 3 if's when using bfp32_neg.
> Just use
>
>   if (neg_mul) {
>   }
>   if (neg_acc) {
>   }
>   float32_add(...);

You mean negate separately? Like:

     if (neg_mul) {
         r = bfp32_neg(r);
     }
     if (neg_acc) {
         aux_acc = bfp32_neg(aux_acc);
     }
     r = float32_add(r, aux_acc, excp_ptr);

If so I'll send a new version with this change later today

>
> With that,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~
-- 
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[-- Attachment #2: Type: text/html, Size: 2989 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 5/8] target/ppc: Implemented xvf16ger*
  2022-05-20 16:42     ` Lucas Mateus Martins Araujo e Castro
@ 2022-05-20 19:03       ` Richard Henderson
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2022-05-20 19:03 UTC (permalink / raw)
  To: Lucas Mateus Martins Araujo e Castro, qemu-ppc
  Cc: Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

On 5/20/22 09:42, Lucas Mateus Martins Araujo e Castro wrote:
> 
> On 20/05/2022 12:47, Richard Henderson wrote:
>>
>> On 5/20/22 06:51, Lucas Mateus Castro(alqotel) wrote:
>>> +                if (acc) {
>>> +                    aux_acc = at[i].VsrSF(j);
>>> +                    if (!neg_mul && !neg_acc) {
>>> +                        r = float32_add(r, aux_acc, excp_ptr);
>>> +                    } else if (!neg_mul) {
>>> +                        r = float32_add(r, bfp32_neg(aux_acc), excp_ptr);
>>> +                    } else if (!neg_acc) {
>>> +                        r = float32_add(bfp32_neg(r), aux_acc, excp_ptr);
>>> +                    } else {
>>> +                        r = float32_add(bfp32_neg(r), bfp32_neg(aux_acc), excp_ptr);
>>> +                    }
>>
>> There's no point in the 3 if's when using bfp32_neg.
>> Just use
>>
>>   if (neg_mul) {
>>   }
>>   if (neg_acc) {
>>   }
>>   float32_add(...);
> 
> You mean negate separately? Like:
> 
>      if (neg_mul) {
>          r = bfp32_neg(r);
>      }
>      if (neg_acc) {
>          aux_acc = bfp32_neg(aux_acc);
>      }
>      r = float32_add(r, aux_acc, excp_ptr);
> 
> If so I'll send a new version with this change later today

Yes.


r~


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-20 19:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 13:51 [PATCH v4 0/8] VSX MMA Implementation Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 1/8] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 2/8] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 3/8] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 4/8] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 5/8] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
2022-05-20 15:47   ` Richard Henderson
2022-05-20 16:42     ` Lucas Mateus Martins Araujo e Castro
2022-05-20 19:03       ` Richard Henderson
2022-05-20 13:51 ` [PATCH v4 6/8] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 7/8] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
2022-05-20 13:51 ` [PATCH v4 8/8] linux-user: Add PowerPC ISA 3.1 and MMA to hwcap Lucas Mateus Castro(alqotel)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.