All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/7] VSX MMA Implementation
@ 2022-05-06 12:18 Lucas Mateus Castro(alqotel)
  2022-05-06 12:18 ` [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Alex Bennée, Daniel Henrique Barboza, qemu-devel,
	David Gibson, Greg Kurz

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

This patch series is an RFC of the Matrix-Multiply Assist (MMA)
instructions implementation from the PowerISA 3.1

These and the VDIV/VMOD implementation are the last new PowerISA 3.1
instructions left to be implemented.

The XVFGER instructions accumulate the exception status and at the end
set the FPSCR and take a Program interrupt on a trap-enabled exception,
but as the exception functions are currently set up in
target/ppc/fpu_helper.c a call to set a FPSCR bit could raise an
exception before all bits could be set and it doesn't set the invalid
operation bits.

Victor is working on a patch series to fix the FPSCR.FI bit that will
reorganize do_float_check_status (in a way that would solve the
aforementioned problem), so for now I sent thin RFC without trying to
solve that problem.

v2 changes:
    - Changed VSXGER, VSXGER16 and XVIGER macros to functions
    - Set rounding mode in floating-point instructions based on RN
      before operations
    - Separated accumulate and with saturations instructions in
      different helpers
    - Used FIELD, FIELD_EX32 and FIELD_DP32 for packing/unpacking masks

Lucas Mateus Castro (alqotel) (7):
  target/ppc: Implement xxm[tf]acc and xxsetaccz
  target/ppc: Implemented xvi*ger* instructions
  target/ppc: Implemented pmxvi*ger* instructions
  target/ppc: Implemented xvf*ger*
  target/ppc: Implemented xvf16ger*
  target/ppc: Implemented pmxvf*ger*
  target/ppc: Implemented [pm]xvbf16ger2*

 include/fpu/softfloat.h             |   9 +
 target/ppc/cpu.h                    |  13 ++
 target/ppc/fpu_helper.c             | 303 ++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  29 +++
 target/ppc/insn32.decode            |  49 +++++
 target/ppc/insn64.decode            |  79 ++++++++
 target/ppc/int_helper.c             | 130 ++++++++++++
 target/ppc/internal.h               |  15 ++
 target/ppc/translate/vsx-impl.c.inc | 145 +++++++++++++
 9 files changed, 772 insertions(+)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  3:28   ` Richard Henderson
  2022-05-06 12:18 ` [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xxmfacc: VSX Move From Accumulator
xxmtacc: VSX Move To Accumulator
xxsetaccz: VSX Set Accumulator to Zero

The PowerISA 3.1 mentions that for the current version of the
architecture, "the hardware implementation provides the effect of ACC[i]
and VSRs 4*i to 4*i + 3 logically containing the same data" and "The
Accumulators introduce no new logical state at this time" (page 501).
For now it seems unnecessary to create new structures, so this patch
just uses ACC[i] as VSRs 4*i to 4*i+3 and therefore move to and from
accumulators are no-ops.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
 target/ppc/cpu.h                    |  5 +++++
 target/ppc/insn32.decode            |  9 +++++++++
 target/ppc/translate/vsx-impl.c.inc | 31 +++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 48596cfb25..10c6d7ae43 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2659,6 +2659,11 @@ static inline int vsr_full_offset(int i)
     return offsetof(CPUPPCState, vsr[i].u64[0]);
 }
 
+static inline int acc_full_offset(int i)
+{
+    return vsr_full_offset(i * 4);
+}
+
 static inline int fpr_offset(int i)
 {
     return vsr64_offset(i, true);
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 39372fe673..7a76bedfa6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -151,6 +151,9 @@
 &X_vrt_frbp     vrt frbp
 @X_vrt_frbp     ...... vrt:5 ..... ....0 .......... .           &X_vrt_frbp frbp=%x_frbp
 
+&X_a            ra
+@X_a            ...... ra:3 .. ..... ..... .......... .         &X_a
+
 %xx_xt          0:1 21:5
 %xx_xb          1:1 11:5
 %xx_xa          2:1 16:5
@@ -710,3 +713,9 @@ XVTLSBB         111100 ... -- 00010 ..... 111011011 . - @XX2_bf_xb
 &XL_s           s:uint8_t
 @XL_s           ......-------------- s:1 .......... -   &XL_s
 RFEBB           010011-------------- .   0010010010 -   @XL_s
+
+## Accumulator Instructions
+
+XXMFACC         011111 ... -- 00000 ----- 0010110001 -   @X_a
+XXMTACC         011111 ... -- 00001 ----- 0010110001 -   @X_a
+XXSETACCZ       011111 ... -- 00011 ----- 0010110001 -   @X_a
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 3692740736..dc8875d5d3 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2787,6 +2787,37 @@ static bool trans_XVCVBF16SPN(DisasContext *ctx, arg_XX2 *a)
     return true;
 }
 
+    /*
+     *  The PowerISA 3.1 mentions that for the current version of the
+     *  architecture, "the hardware implementation provides the effect of
+     *  ACC[i] and VSRs 4*i to 4*i + 3 logically containing the same data"
+     *  and "The Accumulators introduce no new logical state at this time"
+     *  (page 501). For now it seems unnecessary to create new structures,
+     *  so ACC[i] is the same as VSRs 4*i to 4*i+3 and therefore
+     *  move to and from accumulators are no-ops.
+     */
+static bool trans_XXMFACC(DisasContext *ctx, arg_X_a *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    return true;
+}
+
+static bool trans_XXMTACC(DisasContext *ctx, arg_X_a *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    return true;
+}
+
+static bool trans_XXSETACCZ(DisasContext *ctx, arg_X_a *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    tcg_gen_gvec_dup_imm(MO_64, acc_full_offset(a->ra), 64, 64, 0);
+    return true;
+}
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
  2022-05-06 12:18 ` [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  3:41   ` Richard Henderson
  2022-05-06 12:18 ` [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvi4ger8:     VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
xvi4ger8pp:   VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
Positive multiply, Positive accumulate
xvi8ger4:     VSX Vector 4-bit Signed Integer GER (rank-8 update)
xvi8ger4pp:   VSX Vector 4-bit Signed Integer GER (rank-8 update)
Positive multiply, Positive accumulate
xvi8ger4spp:  VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
with Saturate Positive multiply, Positive accumulate
xvi16ger2:    VSX Vector 16-bit Signed Integer GER (rank-2 update)
xvi16ger2pp:  VSX Vector 16-bit Signed Integer GER (rank-2 update)
Positive multiply, Positive accumulate
xvi16ger2s:   VSX Vector 16-bit Signed Integer GER (rank-2 update)
with Saturation
xvi16ger2spp: VSX Vector 16-bit Signed Integer GER (rank-2 update)
with Saturation Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
 target/ppc/cpu.h                    |   1 +
 target/ppc/helper.h                 |   9 ++
 target/ppc/insn32.decode            |  15 ++++
 target/ppc/int_helper.c             | 130 ++++++++++++++++++++++++++++
 target/ppc/internal.h               |  15 ++++
 target/ppc/translate/vsx-impl.c.inc |  42 +++++++++
 6 files changed, 212 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 10c6d7ae43..348a898950 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -238,6 +238,7 @@ typedef union _ppc_vsr_t {
 
 typedef ppc_vsr_t ppc_avr_t;
 typedef ppc_vsr_t ppc_fprp_t;
+typedef ppc_vsr_t ppc_acc_t;
 
 #if !defined(CONFIG_USER_ONLY)
 /* Software TLB cache */
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index aa6773c4a5..61217e0a10 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -537,6 +537,15 @@ DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVD, void, vsr, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI4GER8, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI4GER8PP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI8GER4, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI8GER4PP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI8GER4SPP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, vsr, i32)
 
 DEF_HELPER_2(efscfsi, i32, env, i32)
 DEF_HELPER_2(efscfui, i32, env, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 7a76bedfa6..62fb0214f4 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -170,6 +170,9 @@
 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
+%xx_at          23:3
+@XX3_at         ...... ... .. ..... ..... ........ ...          &XX3 xt=%xx_at xb=%xx_xb
+
 &XX3_dm         xt xa xb dm
 @XX3_dm         ...... ..... ..... ..... . dm:2 ..... ...       &XX3_dm xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
@@ -719,3 +722,15 @@ RFEBB           010011-------------- .   0010010010 -   @XL_s
 XXMFACC         011111 ... -- 00000 ----- 0010110001 -   @X_a
 XXMTACC         011111 ... -- 00001 ----- 0010110001 -   @X_a
 XXSETACCZ       011111 ... -- 00011 ----- 0010110001 -   @X_a
+
+## Vector GER instruction
+
+XVI4GER8        111011 ... -- ..... ..... 00100011 ..-  @XX3_at xa=%xx_xa
+XVI4GER8PP      111011 ... -- ..... ..... 00100010 ..-  @XX3_at xa=%xx_xa
+XVI8GER4        111011 ... -- ..... ..... 00000011 ..-  @XX3_at xa=%xx_xa
+XVI8GER4PP      111011 ... -- ..... ..... 00000010 ..-  @XX3_at xa=%xx_xa
+XVI16GER2       111011 ... -- ..... ..... 01001011 ..-  @XX3_at xa=%xx_xa
+XVI16GER2PP     111011 ... -- ..... ..... 01101011 ..-  @XX3_at xa=%xx_xa
+XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
+XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
+XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 8c1674510b..32a7d99718 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -782,6 +782,136 @@ VCT(uxs, cvtsduw, u32)
 VCT(sxs, cvtsdsw, s32)
 #undef VCT
 
+typedef int64_t do_ger(uint32_t, uint32_t, uint32_t);
+
+static int64_t ger_rank8(uint32_t a, uint32_t b, uint32_t mask)
+{
+    int64_t psum = 0;
+    for (int i = 0; i < 8; i++, mask >>= 1) {
+        if (mask & 1) {
+            psum += sextract32(a, 4 * i, 4) * sextract32(b, 4 * i, 4);
+        }
+    }
+    return psum;
+}
+
+static int64_t ger_rank4(uint32_t a, uint32_t b, uint32_t mask)
+{
+    int64_t psum = 0;
+    for (int i = 0; i < 4; i++, mask >>= 1) {
+        if (mask & 1) {
+            psum += sextract32(a, 8 * i, 8) * (int64_t)extract32(b, 8 * i, 8);
+        }
+    }
+    return psum;
+}
+
+static int64_t ger_rank2(uint32_t a, uint32_t b, uint32_t mask)
+{
+    int64_t psum = 0;
+    for (int i = 0; i < 2; i++, mask >>= 1) {
+        if (mask & 1) {
+            psum += sextract32(a, 16 * i, 16) * sextract32(b, 16 * i, 16);
+        }
+    }
+    return psum;
+}
+
+static void xviger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
+                   uint32_t mask, bool sat, bool acc, do_ger ger)
+{
+    uint8_t pmsk = FIELD_EX32(mask, GER_MSK, PMSK),
+            xmsk = FIELD_EX32(mask, GER_MSK, XMSK),
+            ymsk = FIELD_EX32(mask, GER_MSK, YMSK);
+    uint8_t xmsk_bit, ymsk_bit;
+    int64_t psum;
+    int i, j;
+    for (i = 0, xmsk_bit = 1 << 3; i < 4; i++, xmsk_bit >>= 1) {
+        for (j = 0, ymsk_bit = 1 << 3; j < 4; j++, ymsk_bit >>= 1) {
+            if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {
+                psum = ger(a->VsrW(i), b->VsrW(j), pmsk);
+                if (acc) {
+                    psum += at[i].VsrSW(j);
+                }
+                if (sat && psum > INT32_MAX) {
+                    set_vscr_sat(env);
+                    at[i].VsrSW(j) = INT32_MAX;
+                } else if (sat && psum < INT32_MIN) {
+                    set_vscr_sat(env);
+                    at[i].VsrSW(j) = INT32_MIN;
+                } else {
+                    at[i].VsrSW(j) = (int32_t) psum;
+                }
+            } else {
+                at[i].VsrSW(j) = 0;
+            }
+        }
+    }
+}
+
+QEMU_FLATTEN
+void helper_XVI4GER8(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, false, ger_rank8);
+}
+
+QEMU_FLATTEN
+void helper_XVI4GER8PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, true, ger_rank8);
+}
+
+QEMU_FLATTEN
+void helper_XVI8GER4(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, false, ger_rank4);
+}
+
+QEMU_FLATTEN
+void helper_XVI8GER4PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, true, ger_rank4);
+}
+
+QEMU_FLATTEN
+void helper_XVI8GER4SPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, true, true, ger_rank4);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                      ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, false, ger_rank2);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2S(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, true, false, ger_rank2);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, false, true, ger_rank2);
+}
+
+QEMU_FLATTEN
+void helper_XVI16GER2SPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    xviger(env, a, b, at, mask, true, true, ger_rank2);
+}
+
 target_ulong helper_vclzlsbb(ppc_avr_t *r)
 {
     target_ulong count = 0;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 8094e0b033..2add128cd1 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -18,6 +18,8 @@
 #ifndef PPC_INTERNAL_H
 #define PPC_INTERNAL_H
 
+#include "hw/registerfields.h"
+
 #define FUNC_MASK(name, ret_type, size, max_val)                  \
 static inline ret_type name(uint##size##_t start,                 \
                               uint##size##_t end)                 \
@@ -291,4 +293,17 @@ G_NORETURN void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
                                             uintptr_t retaddr);
 #endif
 
+FIELD(GER_MSK, XMSK, 0, 4)
+FIELD(GER_MSK, YMSK, 4, 4)
+FIELD(GER_MSK, PMSK, 8, 8)
+
+static inline int ger_pack_masks(int pmsk, int ymsk, int xmsk)
+{
+    int msk = 0;
+    msk = FIELD_DP32(msk, GER_MSK, XMSK, xmsk);
+    msk = FIELD_DP32(msk, GER_MSK, YMSK, ymsk);
+    msk = FIELD_DP32(msk, GER_MSK, PMSK, pmsk);
+    return msk;
+}
+
 #endif /* PPC_INTERNAL_H */
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index dc8875d5d3..829e04fc87 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -17,6 +17,13 @@ static inline TCGv_ptr gen_vsr_ptr(int reg)
     return r;
 }
 
+static inline TCGv_ptr gen_acc_ptr(int reg)
+{
+    TCGv_ptr r = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(r, cpu_env, acc_full_offset(reg));
+    return r;
+}
+
 #define VSX_LOAD_SCALAR(name, operation)                      \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
@@ -2818,6 +2825,41 @@ static bool trans_XXSETACCZ(DisasContext *ctx, arg_X_a *a)
     return true;
 }
 
+static bool do_ger_XX3(DisasContext *ctx, arg_XX3 *a,
+                             void (*helper)(TCGv_env, TCGv_ptr, TCGv_ptr,
+                                            TCGv_ptr, TCGv_i32))
+{
+    uint32_t mask;
+    TCGv_ptr xt, xa, xb;
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+    if (unlikely((a->xa / 4 == a->xt) || (a->xb / 4 == a->xt))) {
+        gen_invalid(ctx);
+        return true;
+    }
+
+    xt = gen_acc_ptr(a->xt);
+    xa = gen_vsr_ptr(a->xa);
+    xb = gen_vsr_ptr(a->xb);
+
+    mask = 0xFFFFFFFF;
+    helper(cpu_env, xa, xb, xt, tcg_constant_i32(mask));
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xa);
+    tcg_temp_free_ptr(xb);
+    return true;
+}
+
+TRANS(XVI4GER8, do_ger_XX3, gen_helper_XVI4GER8)
+TRANS(XVI4GER8PP, do_ger_XX3,  gen_helper_XVI4GER8PP)
+TRANS(XVI8GER4, do_ger_XX3, gen_helper_XVI8GER4)
+TRANS(XVI8GER4PP, do_ger_XX3,  gen_helper_XVI8GER4PP)
+TRANS(XVI8GER4SPP, do_ger_XX3, gen_helper_XVI8GER4SPP)
+TRANS(XVI16GER2, do_ger_XX3, gen_helper_XVI16GER2)
+TRANS(XVI16GER2PP, do_ger_XX3, gen_helper_XVI16GER2PP)
+TRANS(XVI16GER2S, do_ger_XX3, gen_helper_XVI16GER2S)
+TRANS(XVI16GER2SPP, do_ger_XX3, gen_helper_XVI16GER2SPP)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
  2022-05-06 12:18 ` [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
  2022-05-06 12:18 ` [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  3:48   ` Richard Henderson
  2022-05-06 12:18 ` [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
pmxvi4ger8:     Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update)
pmxvi4ger8pp:   Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update) Positive multiply, Positive accumulate
pmxvi8ger4:     Prefixed Masked VSX Vector 4-bit Signed Integer GER
(rank-8 update)
pmxvi8ger4pp:   Prefixed Masked VSX Vector 4-bit Signed Integer GER
(rank-8 update) Positive multiply, Positive accumulate
pmxvi8ger4spp:  Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer
GER (rank-4 update) with Saturate Positive multiply, Positive accumulate
pmxvi16ger2:    Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update)
pmxvi16ger2pp:  Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) Positive multiply, Positive accumulate
pmxvi16ger2s:   Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) with Saturation
pmxvi16ger2spp: Prefixed Masked VSX Vector 16-bit Signed Integer GER
(rank-2 update) with Saturation Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
 target/ppc/insn64.decode            | 30 +++++++++++++++++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 28 +++++++++++++++++++++++++--
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 691e8fe6c0..7b65f71a02 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -68,6 +68,15 @@
                 ...... ..... ..... ..... ..... .. ....   \
                 &8RR_XX4_uim3 xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
 
+# Format MMIRR:XX3
+&MMIRR_XX3      xa xb xt pmsk xmsk ymsk
+%xx3_xa         2:1 16:5
+%xx3_xb         1:1 11:5
+%xx3_at         23:3
+@MMIRR_XX3      ...... .. .... .. . . ........ xmsk:4 ymsk:4  \
+                ...... ... .. ..... ..... ........ ...  \
+                &MMIRR_XX3 xa=%xx3_xa xb=%xx3_xb xt=%xx3_at
+
 ### Fixed-Point Load Instructions
 
 PLBZ            000001 10 0--.-- .................. \
@@ -115,6 +124,27 @@ PSTFS           000001 10 0--.-- .................. \
 PSTFD           000001 10 0--.-- .................. \
                 110110 ..... ..... ................     @PLS_D
 
+## Vector GER instruction
+
+PMXVI4GER8      000001 11 1001 -- - - pmsk:8 ........              \
+                111011 ... -- ..... ..... 00100011 ..-  @MMIRR_XX3
+PMXVI4GER8PP    000001 11 1001 -- - - pmsk:8 ........              \
+                111011 ... -- ..... ..... 00100010 ..-  @MMIRR_XX3
+PMXVI8GER4      000001 11 1001 -- - - pmsk:4 ---- ........         \
+                111011 ... -- ..... ..... 00000011 ..-  @MMIRR_XX3
+PMXVI8GER4PP    000001 11 1001 -- - - pmsk:4 ---- ........         \
+                111011 ... -- ..... ..... 00000010 ..-  @MMIRR_XX3
+PMXVI16GER2     000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 01001011 ..-  @MMIRR_XX3
+PMXVI16GER2PP   000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 01101011 ..-  @MMIRR_XX3
+PMXVI8GER4SPP   000001 11 1001 -- - - pmsk:4 ---- ........         \
+                111011 ... -- ..... ..... 01100011 ..-  @MMIRR_XX3
+PMXVI16GER2S    000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 00101011 ..-  @MMIRR_XX3
+PMXVI16GER2SPP  000001 11 1001 -- - - pmsk:2 ------ ........       \
+                111011 ... -- ..... ..... 00101010 ..-  @MMIRR_XX3
+
 ### Prefixed No-operation Instruction
 
 @PNOP           000001 11 0000-- 000000000000000000     \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 829e04fc87..06bc83c03a 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2825,7 +2825,7 @@ static bool trans_XXSETACCZ(DisasContext *ctx, arg_X_a *a)
     return true;
 }
 
-static bool do_ger_XX3(DisasContext *ctx, arg_XX3 *a,
+static bool do_ger_MMIRR_XX3(DisasContext *ctx, arg_MMIRR_XX3 *a,
                              void (*helper)(TCGv_env, TCGv_ptr, TCGv_ptr,
                                             TCGv_ptr, TCGv_i32))
 {
@@ -2842,12 +2842,26 @@ static bool do_ger_XX3(DisasContext *ctx, arg_XX3 *a,
     xa = gen_vsr_ptr(a->xa);
     xb = gen_vsr_ptr(a->xb);
 
-    mask = 0xFFFFFFFF;
+    mask = ger_pack_masks(a->pmsk, a->ymsk, a->xmsk);
     helper(cpu_env, xa, xb, xt, tcg_constant_i32(mask));
     tcg_temp_free_ptr(xt);
     tcg_temp_free_ptr(xa);
     tcg_temp_free_ptr(xb);
     return true;
+
+}
+static bool do_ger_XX3(DisasContext *ctx, arg_XX3 *a,
+                       void (*helper)(TCGv_env, TCGv_ptr, TCGv_ptr,
+                                      TCGv_ptr, TCGv_i32))
+{
+    arg_MMIRR_XX3 m;
+    m.xa = a->xa;
+    m.xb = a->xb;
+    m.xt = a->xt;
+    m.pmsk = 0xFF;
+    m.ymsk = 0xF;
+    m.xmsk = 0xF;
+    return do_ger_MMIRR_XX3(ctx, &m, helper);
 }
 
 TRANS(XVI4GER8, do_ger_XX3, gen_helper_XVI4GER8)
@@ -2860,6 +2874,16 @@ TRANS(XVI16GER2PP, do_ger_XX3, gen_helper_XVI16GER2PP)
 TRANS(XVI16GER2S, do_ger_XX3, gen_helper_XVI16GER2S)
 TRANS(XVI16GER2SPP, do_ger_XX3, gen_helper_XVI16GER2SPP)
 
+TRANS64(PMXVI4GER8, do_ger_MMIRR_XX3, gen_helper_XVI4GER8)
+TRANS64(PMXVI4GER8PP, do_ger_MMIRR_XX3, gen_helper_XVI4GER8PP)
+TRANS64(PMXVI8GER4, do_ger_MMIRR_XX3, gen_helper_XVI8GER4)
+TRANS64(PMXVI8GER4PP, do_ger_MMIRR_XX3, gen_helper_XVI8GER4PP)
+TRANS64(PMXVI8GER4SPP, do_ger_MMIRR_XX3, gen_helper_XVI8GER4SPP)
+TRANS64(PMXVI16GER2, do_ger_MMIRR_XX3, gen_helper_XVI16GER2)
+TRANS64(PMXVI16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2PP)
+TRANS64(PMXVI16GER2S, do_ger_MMIRR_XX3, gen_helper_XVI16GER2S)
+TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2SPP)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger*
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (2 preceding siblings ...)
  2022-05-06 12:18 ` [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  4:03   ` Richard Henderson
  2022-05-06 12:18 ` [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvf32ger:   VSX Vector 32-bit Floating-Point GER (rank-1 update)
xvf32gernn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
multiply, Negative accumulate
xvf32gernp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
multiply, Positive accumulate
xvf32gerpn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
multiply, Negative accumulate
xvf32gerpp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
multiply, Positive accumulate
xvf64ger:   VSX Vector 64-bit Floating-Point GER (rank-1 update)
xvf64gernn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
multiply, Negative accumulate
xvf64gernp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
multiply, Positive accumulate
xvf64gerpn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
multiply, Negative accumulate
xvf64gerpp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
 target/ppc/cpu.h                    |   4 +
 target/ppc/fpu_helper.c             | 178 ++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  10 ++
 target/ppc/insn32.decode            |  13 ++
 target/ppc/translate/vsx-impl.c.inc |  12 ++
 5 files changed, 217 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 348a898950..eb50ad699e 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2639,6 +2639,8 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[i]
 #define VsrD(i) u64[i]
 #define VsrSD(i) s64[i]
+#define VsrSF(i) f32[i]
+#define VsrDF(i) f64[i]
 #else
 #define VsrB(i) u8[15 - (i)]
 #define VsrSB(i) s8[15 - (i)]
@@ -2648,6 +2650,8 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[3 - (i)]
 #define VsrD(i) u64[1 - (i)]
 #define VsrSD(i) s64[1 - (i)]
+#define VsrSF(i) f32[3 - (i)]
+#define VsrDF(i) f64[1 - (i)]
 #endif
 
 static inline int vsr64_offset(int i, bool high)
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index f6c8318a71..138b30d08f 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -3462,3 +3462,181 @@ void helper_xssubqp(CPUPPCState *env, uint32_t opcode,
     *xt = t;
     do_float_check_status(env, GETPC());
 }
+
+static void set_rounding_mode_rn(CPUPPCState *env)
+{
+    uint8_t rmode = (env->fpscr & FP_RN) >> FPSCR_RN0;
+    switch (rmode) {
+    case 0:
+        set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
+        break;
+    case 1:
+        set_float_rounding_mode(float_round_to_zero, &env->fp_status);
+        break;
+    case 2:
+        set_float_rounding_mode(float_round_up, &env->fp_status);
+        break;
+    case 3:
+        set_float_rounding_mode(float_round_down, &env->fp_status);
+        break;
+    default:
+        abort();
+    }
+}
+
+typedef void vsxger_zero(ppc_vsr_t *at, int, int);
+
+typedef void vsxger_muladd_f(ppc_vsr_t *, ppc_vsr_t *, ppc_vsr_t *, int, int,
+                             int flags, float_status *s);
+
+static void vsxger_muladd32(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i, int j,
+                            int flags, float_status *s)
+{
+    at[i].VsrSF(j) = float32_muladd(a->VsrSF(i), b->VsrSF(j), at[i].VsrSF(j), flags, s);
+}
+
+static void vsxger_mul32(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i, int j,
+                            int flags, float_status *s)
+{
+    at[i].VsrSF(j) = float32_mul(a->VsrSF(i), b->VsrSF(j), s);
+}
+
+static void vsxger_zero32(ppc_vsr_t *at, int i, int j)
+{
+    at[i].VsrSF(j) = float32_zero;
+}
+
+static void vsxger_muladd64(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i, int j,
+                            int flags, float_status *s)
+{
+    if (j >= 2) {
+        j -= 2;
+        at[i].VsrDF(j) = float64_muladd(a[i / 2].VsrDF(i % 2), b->VsrDF(j),
+                                        at[i].VsrDF(j), flags, s);
+    }
+}
+
+static void vsxger_mul64(ppc_vsr_t *at, ppc_vsr_t *a, ppc_vsr_t *b, int i, int j,
+                            int flags, float_status *s)
+{
+    if (j >= 2) {
+        j -= 2;
+        at[i].VsrDF(j) = float64_mul(a[i / 2].VsrDF(i % 2), b->VsrDF(j), s);
+    }
+}
+
+static void vsxger_zero64(ppc_vsr_t *at, int i, int j)
+{
+    if (j >= 2) {
+        j -= 2;
+        at[i].VsrDF(j) = float64_zero;
+    }
+}
+
+static void vsxger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
+                   uint32_t mask, bool acc, bool neg_mul, bool neg_acc,
+                   vsxger_muladd_f mul, vsxger_muladd_f muladd, vsxger_zero zero)
+{
+    int i, j, xmsk_bit, ymsk_bit, op_flags;
+    uint8_t xmsk = mask & 0x0F;
+    uint8_t ymsk = (mask >> 4) & 0x0F;
+    float_status *excp_ptr = &env->fp_status;
+    op_flags = (neg_acc ^ neg_mul) ? float_muladd_negate_c : 0;
+    op_flags |= (neg_mul) ? float_muladd_negate_result : 0;
+    helper_reset_fpstatus(env);
+    set_rounding_mode_rn(env);
+    for (i = 0, xmsk_bit = 1 << 3; i < 4; i++, xmsk_bit >>= 1) {
+        for (j = 0, ymsk_bit = 1 << 3; j < 4; j++, ymsk_bit >>= 1) {
+            if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {
+                if (acc) {
+                    muladd(at, a, b, i, j, op_flags, excp_ptr);
+                } else {
+                    mul(at, a, b, i, j, op_flags, excp_ptr);
+                }
+            } else {
+                    zero(at, i, j);
+            }
+        }
+    }
+    do_float_check_status(env, GETPC());
+}
+
+QEMU_FLATTEN
+void helper_XVF32GER(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, false, false, false, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, false, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERPN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, true, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERNP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, false, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF32GERNN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, true, vsxger_mul32,
+           vsxger_muladd32, vsxger_zero32);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GER(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, false, false, false, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERPP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, false, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERPN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, false, true, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERNP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, false, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
+
+QEMU_FLATTEN
+void helper_XVF64GERNN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger(env, a, b, at, mask, true, true, true, vsxger_mul64,
+           vsxger_muladd64, vsxger_zero64);
+}
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 61217e0a10..360aa74ed1 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -546,6 +546,16 @@ DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF32GER, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF32GERPP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF32GERPN, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF32GERNP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF32GERNN, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF64GER, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF64GERPP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF64GERPN, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF64GERNP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF64GERNN, void, env, vsr, vsr, vsr, i32)
 
 DEF_HELPER_2(efscfsi, i32, env, i32)
 DEF_HELPER_2(efscfui, i32, env, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 62fb0214f4..9a3581db2f 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -171,6 +171,7 @@
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
 %xx_at          23:3
+%xx_xa_pair     2:1 17:4 !function=times_2
 @XX3_at         ...... ... .. ..... ..... ........ ...          &XX3 xt=%xx_at xb=%xx_xb
 
 &XX3_dm         xt xa xb dm
@@ -734,3 +735,15 @@ XVI16GER2PP     111011 ... -- ..... ..... 01101011 ..-  @XX3_at xa=%xx_xa
 XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
+
+XVF32GER        111011 ... -- ..... ..... 00011011 ..-  @XX3_at xa=%xx_xa
+XVF32GERPP      111011 ... -- ..... ..... 00011010 ..-  @XX3_at xa=%xx_xa
+XVF32GERPN      111011 ... -- ..... ..... 10011010 ..-  @XX3_at xa=%xx_xa
+XVF32GERNP      111011 ... -- ..... ..... 01011010 ..-  @XX3_at xa=%xx_xa
+XVF32GERNN      111011 ... -- ..... ..... 11011010 ..-  @XX3_at xa=%xx_xa
+
+XVF64GER        111011 ... -- .... 0 ..... 00111011 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERPP      111011 ... -- .... 0 ..... 00111010 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERPN      111011 ... -- .... 0 ..... 10111010 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERNP      111011 ... -- .... 0 ..... 01111010 ..-  @XX3_at xa=%xx_xa_pair
+XVF64GERNN      111011 ... -- .... 0 ..... 11111010 ..-  @XX3_at xa=%xx_xa_pair
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 06bc83c03a..764c6ded70 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2884,6 +2884,18 @@ TRANS64(PMXVI16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2PP)
 TRANS64(PMXVI16GER2S, do_ger_MMIRR_XX3, gen_helper_XVI16GER2S)
 TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2SPP)
 
+TRANS(XVF32GER, do_ger_XX3, gen_helper_XVF32GER)
+TRANS(XVF32GERPP, do_ger_XX3, gen_helper_XVF32GERPP)
+TRANS(XVF32GERPN, do_ger_XX3, gen_helper_XVF32GERPN)
+TRANS(XVF32GERNP, do_ger_XX3, gen_helper_XVF32GERNP)
+TRANS(XVF32GERNN, do_ger_XX3, gen_helper_XVF32GERNN)
+
+TRANS(XVF64GER, do_ger_XX3, gen_helper_XVF64GER)
+TRANS(XVF64GERPP, do_ger_XX3, gen_helper_XVF64GERPP)
+TRANS(XVF64GERPN, do_ger_XX3, gen_helper_XVF64GERPN)
+TRANS(XVF64GERNP, do_ger_XX3, gen_helper_XVF64GERNP)
+TRANS(XVF64GERNN, do_ger_XX3, gen_helper_XVF64GERNN)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger*
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (3 preceding siblings ...)
  2022-05-06 12:18 ` [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  4:24   ` Richard Henderson
  2022-05-06 12:18 ` [RFC PATCH v2 6/7] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
  2022-05-06 12:18 ` [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Aurelien Jarno, Peter Maydell, Alex Bennée,
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvf16ger2:   VSX Vector 16-bit Floating-Point GER (rank-2 update)
xvf16ger2nn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative
multiply, Negative accumulate
xvf16ger2np: VSX Vector 16-bit Floating-Point GER (rank-2 update) Negative
multiply, Positive accumulate
xvf16ger2pn: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive
multiply, Negative accumulate
xvf16ger2pp: VSX Vector 16-bit Floating-Point GER (rank-2 update) Positive
multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
 include/fpu/softfloat.h             |  9 +++
 target/ppc/cpu.h                    |  3 +
 target/ppc/fpu_helper.c             | 85 +++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  5 ++
 target/ppc/insn32.decode            |  6 ++
 target/ppc/translate/vsx-impl.c.inc |  6 ++
 6 files changed, 114 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3dcf20e3a2..63d7ff18f0 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -619,6 +619,15 @@ static inline float32 float32_chs(float32 a)
     return make_float32(float32_val(a) ^ 0x80000000);
 }
 
+static inline float32 float32_neg(float32 a)
+{
+    if (((a & 0x7f800000) == 0x7f800000) && (a & 0x007fffff)) {
+        return a;
+    } else {
+        return float32_chs(a);
+    }
+}
+
 static inline bool float32_is_infinity(float32 a)
 {
     return (float32_val(a) & 0x7fffffff) == 0x7f800000;
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index eb50ad699e..c891a23830 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -227,6 +227,7 @@ typedef union _ppc_vsr_t {
     int16_t s16[8];
     int32_t s32[4];
     int64_t s64[2];
+    float16 f16[8];
     float32 f32[4];
     float64 f64[2];
     float128 f128;
@@ -2639,6 +2640,7 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[i]
 #define VsrD(i) u64[i]
 #define VsrSD(i) s64[i]
+#define VsrHF(i) f16[i]
 #define VsrSF(i) f32[i]
 #define VsrDF(i) f64[i]
 #else
@@ -2650,6 +2652,7 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
 #define VsrSW(i) s32[3 - (i)]
 #define VsrD(i) u64[1 - (i)]
 #define VsrSD(i) s64[1 - (i)]
+#define VsrHF(i) f16[7 - (i)]
 #define VsrSF(i) f32[3 - (i)]
 #define VsrDF(i) f64[1 - (i)]
 #endif
diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 138b30d08f..6857be6ccc 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -3484,6 +3484,56 @@ static void set_rounding_mode_rn(CPUPPCState *env)
     }
 }
 
+typedef float64 extract_f16(float16, float_status *);
+
+static float64 extract_hf16(float16 in, float_status *fp_status)
+{
+    return float16_to_float64(in, true, fp_status);
+}
+
+static void vsxger16(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t  *at, uint32_t mask, bool acc,
+                     bool neg_mul, bool neg_acc, extract_f16 extract)
+{
+    float32 msum, aux_acc;
+    float64 psum, va, vb, vc, vd;
+    int i, j, xmsk_bit, ymsk_bit;
+    uint8_t pmsk = FIELD_EX32(mask, GER_MSK, PMSK),
+            xmsk = FIELD_EX32(mask, GER_MSK, XMSK),
+            ymsk = FIELD_EX32(mask, GER_MSK, YMSK);
+    float_status *excp_ptr = &env->fp_status;
+    set_rounding_mode_rn(env);
+    for (i = 0, xmsk_bit = 1 << 3; i < 4; i++, xmsk_bit >>= 1) {
+        for (j = 0, ymsk_bit = 1 << 3; j < 4; j++, ymsk_bit >>= 1) {
+            if ((xmsk_bit & xmsk) && (ymsk_bit & ymsk)) {
+                va = !(pmsk & 2) ? float64_zero : extract(a->VsrHF(2 * i), excp_ptr);
+                vb = !(pmsk & 2) ? float64_zero : extract(b->VsrHF(2 * j), excp_ptr);
+                vc = !(pmsk & 1) ? float64_zero : extract(a->VsrHF(2 * i + 1), excp_ptr);
+                vd = !(pmsk & 1) ? float64_zero : extract(b->VsrHF(2 * j + 1), excp_ptr);
+                psum = float64_mul(va, vb, excp_ptr);
+                psum = float64r32_muladd(vc, vd, psum, 0, excp_ptr);
+                msum = float64_to_float32(psum, excp_ptr);
+                if (acc) {
+                    if (neg_mul) {
+                        msum = float32_neg(msum);
+                    }
+                    if (neg_acc) {
+                        aux_acc = float32_neg(at[i].VsrSF(j));
+                    } else {
+                        aux_acc = at[i].VsrSF(j);
+                    }
+                    at[i].VsrSF(j) = float32_add(msum, aux_acc, excp_ptr);
+                } else {
+                    at[i].VsrSF(j) = msum;
+                }
+            } else {
+                at[i].VsrSF(j) = float32_zero;
+            }
+        }
+    }
+    do_float_check_status(env, GETPC());
+}
+
 typedef void vsxger_zero(ppc_vsr_t *at, int, int);
 
 typedef void vsxger_muladd_f(ppc_vsr_t *, ppc_vsr_t *, ppc_vsr_t *, int, int,
@@ -3561,6 +3611,41 @@ static void vsxger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
     do_float_check_status(env, GETPC());
 }
 
+QEMU_FLATTEN
+void helper_XVF16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                     ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, false, false, false, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, false, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2PN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, true, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2NP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, false, extract_hf16);
+}
+
+QEMU_FLATTEN
+void helper_XVF16GER2NN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                        ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, true, extract_hf16);
+}
+
 QEMU_FLATTEN
 void helper_XVF32GER(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
                      ppc_acc_t *at, uint32_t mask)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 360aa74ed1..5f2f574d30 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -546,6 +546,11 @@ DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF16GER2, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF16GER2PP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF16GER2PN, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF16GER2NP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVF16GER2NN, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF32GER, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF32GERPP, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF32GERPN, void, env, vsr, vsr, vsr, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 9a3581db2f..bbd4bc80f8 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -736,6 +736,12 @@ XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
 
+XVF16GER2       111011 ... -- ..... ..... 00010011 ..-  @XX3_at xa=%xx_xa
+XVF16GER2PP     111011 ... -- ..... ..... 00010010 ..-  @XX3_at xa=%xx_xa
+XVF16GER2PN     111011 ... -- ..... ..... 10010010 ..-  @XX3_at xa=%xx_xa
+XVF16GER2NP     111011 ... -- ..... ..... 01010010 ..-  @XX3_at xa=%xx_xa
+XVF16GER2NN     111011 ... -- ..... ..... 11010010 ..-  @XX3_at xa=%xx_xa
+
 XVF32GER        111011 ... -- ..... ..... 00011011 ..-  @XX3_at xa=%xx_xa
 XVF32GERPP      111011 ... -- ..... ..... 00011010 ..-  @XX3_at xa=%xx_xa
 XVF32GERPN      111011 ... -- ..... ..... 10011010 ..-  @XX3_at xa=%xx_xa
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 764c6ded70..a8155b8bee 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2884,6 +2884,12 @@ TRANS64(PMXVI16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2PP)
 TRANS64(PMXVI16GER2S, do_ger_MMIRR_XX3, gen_helper_XVI16GER2S)
 TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2SPP)
 
+TRANS(XVF16GER2, do_ger_XX3, gen_helper_XVF16GER2)
+TRANS(XVF16GER2PP, do_ger_XX3, gen_helper_XVF16GER2PP)
+TRANS(XVF16GER2PN, do_ger_XX3, gen_helper_XVF16GER2PN)
+TRANS(XVF16GER2NP, do_ger_XX3, gen_helper_XVF16GER2NP)
+TRANS(XVF16GER2NN, do_ger_XX3, gen_helper_XVF16GER2NN)
+
 TRANS(XVF32GER, do_ger_XX3, gen_helper_XVF32GER)
 TRANS(XVF32GERPP, do_ger_XX3, gen_helper_XVF32GERPP)
 TRANS(XVF32GERPN, do_ger_XX3, gen_helper_XVF32GERPN)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 6/7] target/ppc: Implemented pmxvf*ger*
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (4 preceding siblings ...)
  2022-05-06 12:18 ` [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  4:25   ` Richard Henderson
  2022-05-06 12:18 ` [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
pmxvf16ger2:   Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update)
pmxvf16ger2nn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Negative multiply, Negative accumulate
pmxvf16ger2np: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Negative multiply, Positive accumulate
pmxvf16ger2pn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Positive multiply, Negative accumulate
pmxvf16ger2pp: Prefixed Masked VSX Vector 16-bit Floating-Point GER
(rank-2 update) Positive multiply, Positive accumulate
pmxvf32ger:    Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update)
pmxvf32gernn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Negative multiply, Negative accumulate
pmxvf32gernp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Negative multiply, Positive accumulate
pmxvf32gerpn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Positive multiply, Negative accumulate
pmxvf32gerpp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
(rank-1 update) Positive multiply, Positive accumulate
pmxvf64ger:    Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update)
pmxvf64gernn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Negative multiply, Negative accumulate
pmxvf64gernp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Negative multiply, Positive accumulate
pmxvf64gerpn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Positive multiply, Negative accumulate
pmxvf64gerpp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
(rank-1 update) Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
 target/ppc/insn64.decode            | 38 +++++++++++++++++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 18 ++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 7b65f71a02..a12f11044c 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -73,10 +73,15 @@
 %xx3_xa         2:1 16:5
 %xx3_xb         1:1 11:5
 %xx3_at         23:3
+%xx3_xa_pair    2:1 17:4 !function=times_2
 @MMIRR_XX3      ...... .. .... .. . . ........ xmsk:4 ymsk:4  \
                 ...... ... .. ..... ..... ........ ...  \
                 &MMIRR_XX3 xa=%xx3_xa xb=%xx3_xb xt=%xx3_at
 
+@MMIRR_XX3_NO_P ...... .. .... .. . . ........ xmsk:4 .... \
+                ...... ... .. ..... ..... ........ ... \
+                &MMIRR_XX3 xb=%xx3_xb xt=%xx3_at pmsk=1
+
 ### Fixed-Point Load Instructions
 
 PLBZ            000001 10 0--.-- .................. \
@@ -145,6 +150,39 @@ PMXVI16GER2S    000001 11 1001 -- - - pmsk:2 ------ ........       \
 PMXVI16GER2SPP  000001 11 1001 -- - - pmsk:2 ------ ........       \
                 111011 ... -- ..... ..... 00101010 ..-  @MMIRR_XX3
 
+PMXVF16GER2     000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00010011 ..-  @MMIRR_XX3
+PMXVF16GER2PP   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00010010 ..-  @MMIRR_XX3
+PMXVF16GER2PN   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 10010010 ..-  @MMIRR_XX3
+PMXVF16GER2NP   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 01010010 ..-  @MMIRR_XX3
+PMXVF16GER2NN   000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 11010010 ..-  @MMIRR_XX3
+
+PMXVF32GER      000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 00011011 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERPP    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 00011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERPN    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 10011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERNP    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 01011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+PMXVF32GERNN    000001 11 1001 -- - - -------- .... ymsk:4 \
+                111011 ... -- ..... ..... 11011010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa
+
+PMXVF64GER      000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 00111011 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERPP    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 00111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERPN    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 10111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERNP    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 01111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+PMXVF64GERNN    000001 11 1001 -- - - -------- .... ymsk:2 -- \
+                111011 ... -- ....0 ..... 11111010 ..-  @MMIRR_XX3_NO_P xa=%xx3_xa_pair
+
 ### Prefixed No-operation Instruction
 
 @PNOP           000001 11 0000-- 000000000000000000     \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index a8155b8bee..00eed2b1b9 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2902,6 +2902,24 @@ TRANS(XVF64GERPN, do_ger_XX3, gen_helper_XVF64GERPN)
 TRANS(XVF64GERNP, do_ger_XX3, gen_helper_XVF64GERNP)
 TRANS(XVF64GERNN, do_ger_XX3, gen_helper_XVF64GERNN)
 
+TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, gen_helper_XVF16GER2)
+TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVF16GER2PP)
+TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, gen_helper_XVF16GER2PN)
+TRANS64(PMXVF16GER2NP, do_ger_MMIRR_XX3, gen_helper_XVF16GER2NP)
+TRANS64(PMXVF16GER2NN, do_ger_MMIRR_XX3, gen_helper_XVF16GER2NN)
+
+TRANS64(PMXVF32GER, do_ger_MMIRR_XX3, gen_helper_XVF32GER)
+TRANS64(PMXVF32GERPP, do_ger_MMIRR_XX3, gen_helper_XVF32GERPP)
+TRANS64(PMXVF32GERPN, do_ger_MMIRR_XX3, gen_helper_XVF32GERPN)
+TRANS64(PMXVF32GERNP, do_ger_MMIRR_XX3, gen_helper_XVF32GERNP)
+TRANS64(PMXVF32GERNN, do_ger_MMIRR_XX3, gen_helper_XVF32GERNN)
+
+TRANS64(PMXVF64GER, do_ger_MMIRR_XX3, gen_helper_XVF64GER)
+TRANS64(PMXVF64GERPP, do_ger_MMIRR_XX3, gen_helper_XVF64GERPP)
+TRANS64(PMXVF64GERPN, do_ger_MMIRR_XX3, gen_helper_XVF64GERPN)
+TRANS64(PMXVF64GERNP, do_ger_MMIRR_XX3, gen_helper_XVF64GERNP)
+TRANS64(PMXVF64GERNN, do_ger_MMIRR_XX3, gen_helper_XVF64GERNN)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2*
  2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
                   ` (5 preceding siblings ...)
  2022-05-06 12:18 ` [RFC PATCH v2 6/7] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
@ 2022-05-06 12:18 ` Lucas Mateus Castro(alqotel)
  2022-05-08  4:27   ` Richard Henderson
  6 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Castro(alqotel) @ 2022-05-06 12:18 UTC (permalink / raw)
  To: qemu-ppc
  Cc: richard.henderson, Joel Stanley, Lucas Mateus Castro (alqotel),
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
xvbf16ger2:   VSX Vector bfloat16 GER (rank-2 update)
xvbf16ger2nn: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
Negative accumulate
xvbf16ger2np: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
Positive accumulate
xvbf16ger2pn: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
Negative accumulate
xvbf16ger2pp: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
Positive accumulate
pmxvbf16ger2:   Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
pmxvbf16ger2nn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Negative multiply, Negative accumulate
pmxvbf16ger2np: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Negative multiply, Positive accumulate
pmxvbf16ger2pn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Positive multiply, Negative accumulate
pmxvbf16ger2pp: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
Positive multiply, Positive accumulate

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
---
There's a discrepancy between this implementation and mambo/the
hardware where implementing it with float64_mul then float64r32_muladd
sometimes results in an incorrect result after an underflow, but
implementing with float32_mul then float32_muladd results in incorrect
signal in some 0 or infinite results. I've not been able to solve this
---
 target/ppc/fpu_helper.c             | 40 +++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  5 ++++
 target/ppc/insn32.decode            |  6 +++++
 target/ppc/insn64.decode            | 11 ++++++++
 target/ppc/translate/vsx-impl.c.inc | 12 +++++++++
 5 files changed, 74 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 6857be6ccc..0882702301 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -3491,6 +3491,11 @@ static float64 extract_hf16(float16 in, float_status *fp_status)
     return float16_to_float64(in, true, fp_status);
 }
 
+static float64 extract_bf16(bfloat16 in, float_status *fp_status)
+{
+    return bfloat16_to_float64(in, fp_status);
+}
+
 static void vsxger16(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
                      ppc_acc_t  *at, uint32_t mask, bool acc,
                      bool neg_mul, bool neg_acc, extract_f16 extract)
@@ -3611,6 +3616,41 @@ static void vsxger(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b, ppc_acc_t  *at,
     do_float_check_status(env, GETPC());
 }
 
+QEMU_FLATTEN
+void helper_XVBF16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                       ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, false, false, false, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2PP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, false, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2PN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, false, true, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2NP(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, false, extract_bf16);
+}
+
+QEMU_FLATTEN
+void helper_XVBF16GER2NN(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
+                         ppc_acc_t *at, uint32_t mask)
+{
+    vsxger16(env, a, b, at, mask, true, true, true, extract_bf16);
+}
+
 QEMU_FLATTEN
 void helper_XVF16GER2(CPUPPCState *env, ppc_vsr_t *a, ppc_vsr_t *b,
                      ppc_acc_t *at, uint32_t mask)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 5f2f574d30..59e6b74f94 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -551,6 +551,11 @@ DEF_HELPER_5(XVF16GER2PP, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF16GER2PN, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF16GER2NP, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF16GER2NN, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVBF16GER2, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVBF16GER2PP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVBF16GER2PN, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVBF16GER2NP, void, env, vsr, vsr, vsr, i32)
+DEF_HELPER_5(XVBF16GER2NN, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF32GER, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF32GERPP, void, env, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XVF32GERPN, void, env, vsr, vsr, vsr, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index bbd4bc80f8..2090c17268 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -736,6 +736,12 @@ XVI8GER4SPP     111011 ... -- ..... ..... 01100011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2S      111011 ... -- ..... ..... 00101011 ..-  @XX3_at xa=%xx_xa
 XVI16GER2SPP    111011 ... -- ..... ..... 00101010 ..-  @XX3_at xa=%xx_xa
 
+XVBF16GER2      111011 ... -- ..... ..... 00110011 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2PP    111011 ... -- ..... ..... 00110010 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2PN    111011 ... -- ..... ..... 10110010 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2NP    111011 ... -- ..... ..... 01110010 ..-  @XX3_at xa=%xx_xa
+XVBF16GER2NN    111011 ... -- ..... ..... 11110010 ..-  @XX3_at xa=%xx_xa
+
 XVF16GER2       111011 ... -- ..... ..... 00010011 ..-  @XX3_at xa=%xx_xa
 XVF16GER2PP     111011 ... -- ..... ..... 00010010 ..-  @XX3_at xa=%xx_xa
 XVF16GER2PN     111011 ... -- ..... ..... 10010010 ..-  @XX3_at xa=%xx_xa
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index a12f11044c..78738924c6 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -150,6 +150,17 @@ PMXVI16GER2S    000001 11 1001 -- - - pmsk:2 ------ ........       \
 PMXVI16GER2SPP  000001 11 1001 -- - - pmsk:2 ------ ........       \
                 111011 ... -- ..... ..... 00101010 ..-  @MMIRR_XX3
 
+PMXVBF16GER2    000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00110011 ..-  @MMIRR_XX3
+PMXVBF16GER2PP  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 00110010 ..-  @MMIRR_XX3
+PMXVBF16GER2PN  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 10110010 ..-  @MMIRR_XX3
+PMXVBF16GER2NP  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 01110010 ..-  @MMIRR_XX3
+PMXVBF16GER2NN  000001 11 1001 -- - - pmsk:2 ------ ........ \
+                111011 ... -- ..... ..... 11110010 ..-  @MMIRR_XX3
+
 PMXVF16GER2     000001 11 1001 -- - - pmsk:2 ------ ........ \
                 111011 ... -- ..... ..... 00010011 ..-  @MMIRR_XX3
 PMXVF16GER2PP   000001 11 1001 -- - - pmsk:2 ------ ........ \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 00eed2b1b9..bb5b68ee06 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2884,6 +2884,12 @@ TRANS64(PMXVI16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2PP)
 TRANS64(PMXVI16GER2S, do_ger_MMIRR_XX3, gen_helper_XVI16GER2S)
 TRANS64(PMXVI16GER2SPP, do_ger_MMIRR_XX3, gen_helper_XVI16GER2SPP)
 
+TRANS(XVBF16GER2, do_ger_XX3, gen_helper_XVBF16GER2)
+TRANS(XVBF16GER2PP, do_ger_XX3, gen_helper_XVBF16GER2PP)
+TRANS(XVBF16GER2PN, do_ger_XX3, gen_helper_XVBF16GER2PN)
+TRANS(XVBF16GER2NP, do_ger_XX3, gen_helper_XVBF16GER2NP)
+TRANS(XVBF16GER2NN, do_ger_XX3, gen_helper_XVBF16GER2NN)
+
 TRANS(XVF16GER2, do_ger_XX3, gen_helper_XVF16GER2)
 TRANS(XVF16GER2PP, do_ger_XX3, gen_helper_XVF16GER2PP)
 TRANS(XVF16GER2PN, do_ger_XX3, gen_helper_XVF16GER2PN)
@@ -2902,6 +2908,12 @@ TRANS(XVF64GERPN, do_ger_XX3, gen_helper_XVF64GERPN)
 TRANS(XVF64GERNP, do_ger_XX3, gen_helper_XVF64GERNP)
 TRANS(XVF64GERNN, do_ger_XX3, gen_helper_XVF64GERNN)
 
+TRANS64(PMXVBF16GER2, do_ger_MMIRR_XX3, gen_helper_XVBF16GER2)
+TRANS64(PMXVBF16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVBF16GER2PP)
+TRANS64(PMXVBF16GER2PN, do_ger_MMIRR_XX3, gen_helper_XVBF16GER2PN)
+TRANS64(PMXVBF16GER2NP, do_ger_MMIRR_XX3, gen_helper_XVBF16GER2NP)
+TRANS64(PMXVBF16GER2NN, do_ger_MMIRR_XX3, gen_helper_XVBF16GER2NN)
+
 TRANS64(PMXVF16GER2, do_ger_MMIRR_XX3, gen_helper_XVF16GER2)
 TRANS64(PMXVF16GER2PP, do_ger_MMIRR_XX3, gen_helper_XVF16GER2PP)
 TRANS64(PMXVF16GER2PN, do_ger_MMIRR_XX3, gen_helper_XVF16GER2PN)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz
  2022-05-06 12:18 ` [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
@ 2022-05-08  3:28   ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  3:28 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)"<lucas.araujo@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> xxmfacc: VSX Move From Accumulator
> xxmtacc: VSX Move To Accumulator
> xxsetaccz: VSX Set Accumulator to Zero
> 
> The PowerISA 3.1 mentions that for the current version of the
> architecture, "the hardware implementation provides the effect of ACC[i]
> and VSRs 4*i to 4*i + 3 logically containing the same data" and "The
> Accumulators introduce no new logical state at this time" (page 501).
> For now it seems unnecessary to create new structures, so this patch
> just uses ACC[i] as VSRs 4*i to 4*i+3 and therefore move to and from
> accumulators are no-ops.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel)<lucas.araujo@eldorado.org.br>
> ---
>   target/ppc/cpu.h                    |  5 +++++
>   target/ppc/insn32.decode            |  9 +++++++++
>   target/ppc/translate/vsx-impl.c.inc | 31 +++++++++++++++++++++++++++++
>   3 files changed, 45 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions
  2022-05-06 12:18 ` [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
@ 2022-05-08  3:41   ` Richard Henderson
  2022-05-10 17:28     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  3:41 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 10c6d7ae43..348a898950 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -238,6 +238,7 @@ typedef union _ppc_vsr_t {
>   
>   typedef ppc_vsr_t ppc_avr_t;
>   typedef ppc_vsr_t ppc_fprp_t;
> +typedef ppc_vsr_t ppc_acc_t;
>   
>   #if !defined(CONFIG_USER_ONLY)
>   /* Software TLB cache */
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index aa6773c4a5..61217e0a10 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -537,6 +537,15 @@ DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
>   DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
>   DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
>   DEF_HELPER_5(XXBLENDVD, void, vsr, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI4GER8, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI4GER8PP, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI8GER4, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI8GER4PP, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI8GER4SPP, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, vsr, i32)
> +DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, vsr, i32)

Did you intend to use "acc" here, for documentation purposes?
It's just a couple of #defines above.

Either way, much cleaner.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions
  2022-05-06 12:18 ` [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
@ 2022-05-08  3:48   ` Richard Henderson
  2022-05-12 17:38     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  3:48 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>       return true;
> +
> +}
> +static bool do_ger_XX3(DisasContext *ctx, arg_XX3 *a,

Watch the whitespace.

> +{
> +    arg_MMIRR_XX3 m;
> +    m.xa = a->xa;
> +    m.xb = a->xb;
> +    m.xt = a->xt;
> +    m.pmsk = 0xFF;
> +    m.ymsk = 0xF;
> +    m.xmsk = 0xF;
> +    return do_ger_MMIRR_XX3(ctx, &m, helper);
>   }

Is XX3 going to be used for anything else?  Is it worthwhile to move these into the 
decoder as explicit assignments?

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger*
  2022-05-06 12:18 ` [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
@ 2022-05-08  4:03   ` Richard Henderson
  2022-05-09 11:33     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  4:03 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> xvf32ger:   VSX Vector 32-bit Floating-Point GER (rank-1 update)
> xvf32gernn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
> multiply, Negative accumulate
> xvf32gernp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Negative
> multiply, Positive accumulate
> xvf32gerpn: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
> multiply, Negative accumulate
> xvf32gerpp: VSX Vector 32-bit Floating-Point GER (rank-1 update) Positive
> multiply, Positive accumulate
> xvf64ger:   VSX Vector 64-bit Floating-Point GER (rank-1 update)
> xvf64gernn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
> multiply, Negative accumulate
> xvf64gernp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Negative
> multiply, Positive accumulate
> xvf64gerpn: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
> multiply, Negative accumulate
> xvf64gerpp: VSX Vector 64-bit Floating-Point GER (rank-1 update) Positive
> multiply, Positive accumulate
> 
> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
> ---
>   target/ppc/cpu.h                    |   4 +
>   target/ppc/fpu_helper.c             | 178 ++++++++++++++++++++++++++++
>   target/ppc/helper.h                 |  10 ++
>   target/ppc/insn32.decode            |  13 ++
>   target/ppc/translate/vsx-impl.c.inc |  12 ++
>   5 files changed, 217 insertions(+)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 348a898950..eb50ad699e 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -2639,6 +2639,8 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
>   #define VsrSW(i) s32[i]
>   #define VsrD(i) u64[i]
>   #define VsrSD(i) s64[i]
> +#define VsrSF(i) f32[i]
> +#define VsrDF(i) f64[i]
>   #else
>   #define VsrB(i) u8[15 - (i)]
>   #define VsrSB(i) s8[15 - (i)]
> @@ -2648,6 +2650,8 @@ static inline bool lsw_reg_in_range(int start, int nregs, int rx)
>   #define VsrSW(i) s32[3 - (i)]
>   #define VsrD(i) u64[1 - (i)]
>   #define VsrSD(i) s64[1 - (i)]
> +#define VsrSF(i) f32[3 - (i)]
> +#define VsrDF(i) f64[1 - (i)]
>   #endif
>   
>   static inline int vsr64_offset(int i, bool high)
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index f6c8318a71..138b30d08f 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -3462,3 +3462,181 @@ void helper_xssubqp(CPUPPCState *env, uint32_t opcode,
>       *xt = t;
>       do_float_check_status(env, GETPC());
>   }
> +
> +static void set_rounding_mode_rn(CPUPPCState *env)
> +{
> +    uint8_t rmode = (env->fpscr & FP_RN) >> FPSCR_RN0;
> +    switch (rmode) {
> +    case 0:
> +        set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
> +        break;
> +    case 1:
> +        set_float_rounding_mode(float_round_to_zero, &env->fp_status);
> +        break;
> +    case 2:
> +        set_float_rounding_mode(float_round_up, &env->fp_status);
> +        break;
> +    case 3:
> +        set_float_rounding_mode(float_round_down, &env->fp_status);
> +        break;
> +    default:
> +        abort();
> +    }
> +}

How is this different from fpscr_set_rounding_mode and why do you need to call it at all?


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger*
  2022-05-06 12:18 ` [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
@ 2022-05-08  4:24   ` Richard Henderson
  2022-05-10 14:47     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  4:24 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Aurelien Jarno, Peter Maydell, Alex Bennée,
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
> +static inline float32 float32_neg(float32 a)
> +{
> +    if (((a & 0x7f800000) == 0x7f800000) && (a & 0x007fffff)) {
> +        return a;
> +    } else {
> +        return float32_chs(a);
> +    }
> +}

This is wrong -- even NaNs get their signs changed.
Negation and absolute value are non-arithmetic operations.

If you're matching hardware results, this suggests...

> +                    if (neg_mul) {
> +                        msum = float32_neg(msum);
> +                    }
> +                    if (neg_acc) {
> +                        aux_acc = float32_neg(at[i].VsrSF(j));
> +                    } else {
> +                        aux_acc = at[i].VsrSF(j);
> +                    }
> +                    at[i].VsrSF(j) = float32_add(msum, aux_acc, excp_ptr);

This "add" should be "sub" instead of using a separate negation, when required.
I do wonder about the double-negation vs nans.

It looks like this could be

   float32_muladd(float32_one, msum, aux_acc, flags, &status)

with flags set to float_muladd_negate_* for neg_mul and neg_acc.  Any NaNs would go 
through pick_nan_muladd and fail to be altered.

I'm not sure if I'm suggesting actual use of muladd, for the simplicity, or if you should 
have an inline check for nans.  I might need to think about this in the morning.


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 6/7] target/ppc: Implemented pmxvf*ger*
  2022-05-06 12:18 ` [RFC PATCH v2 6/7] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
@ 2022-05-08  4:25   ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  4:25 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)"<lucas.araujo@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> pmxvf16ger2:   Prefixed Masked VSX Vector 16-bit Floating-Point GER
> (rank-2 update)
> pmxvf16ger2nn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
> (rank-2 update) Negative multiply, Negative accumulate
> pmxvf16ger2np: Prefixed Masked VSX Vector 16-bit Floating-Point GER
> (rank-2 update) Negative multiply, Positive accumulate
> pmxvf16ger2pn: Prefixed Masked VSX Vector 16-bit Floating-Point GER
> (rank-2 update) Positive multiply, Negative accumulate
> pmxvf16ger2pp: Prefixed Masked VSX Vector 16-bit Floating-Point GER
> (rank-2 update) Positive multiply, Positive accumulate
> pmxvf32ger:    Prefixed Masked VSX Vector 32-bit Floating-Point GER
> (rank-1 update)
> pmxvf32gernn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
> (rank-1 update) Negative multiply, Negative accumulate
> pmxvf32gernp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
> (rank-1 update) Negative multiply, Positive accumulate
> pmxvf32gerpn:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
> (rank-1 update) Positive multiply, Negative accumulate
> pmxvf32gerpp:  Prefixed Masked VSX Vector 32-bit Floating-Point GER
> (rank-1 update) Positive multiply, Positive accumulate
> pmxvf64ger:    Prefixed Masked VSX Vector 64-bit Floating-Point GER
> (rank-1 update)
> pmxvf64gernn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
> (rank-1 update) Negative multiply, Negative accumulate
> pmxvf64gernp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
> (rank-1 update) Negative multiply, Positive accumulate
> pmxvf64gerpn:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
> (rank-1 update) Positive multiply, Negative accumulate
> pmxvf64gerpp:  Prefixed Masked VSX Vector 64-bit Floating-Point GER
> (rank-1 update) Positive multiply, Positive accumulate
> 
> Signed-off-by: Lucas Mateus Castro (alqotel)<lucas.araujo@eldorado.org.br>
> ---
>   target/ppc/insn64.decode            | 38 +++++++++++++++++++++++++++++
>   target/ppc/translate/vsx-impl.c.inc | 18 ++++++++++++++
>   2 files changed, 56 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2*
  2022-05-06 12:18 ` [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
@ 2022-05-08  4:27   ` Richard Henderson
  2022-05-10 17:25     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2022-05-08  4:27 UTC (permalink / raw)
  To: Lucas Mateus Castro(alqotel), qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> xvbf16ger2:   VSX Vector bfloat16 GER (rank-2 update)
> xvbf16ger2nn: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
> Negative accumulate
> xvbf16ger2np: VSX Vector bfloat16 GER (rank-2 update) Negative multiply,
> Positive accumulate
> xvbf16ger2pn: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
> Negative accumulate
> xvbf16ger2pp: VSX Vector bfloat16 GER (rank-2 update) Positive multiply,
> Positive accumulate
> pmxvbf16ger2:   Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
> pmxvbf16ger2nn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
> Negative multiply, Negative accumulate
> pmxvbf16ger2np: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
> Negative multiply, Positive accumulate
> pmxvbf16ger2pn: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
> Positive multiply, Negative accumulate
> pmxvbf16ger2pp: Prefixed Masked VSX Vector bfloat16 GER (rank-2 update)
> Positive multiply, Positive accumulate
> 
> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
> ---
> There's a discrepancy between this implementation and mambo/the
> hardware where implementing it with float64_mul then float64r32_muladd
> sometimes results in an incorrect result after an underflow, but
> implementing with float32_mul then float32_muladd results in incorrect
> signal in some 0 or infinite results. I've not been able to solve this

I did suggest that the float64_mul needs to be done in round-to-odd.

Anyway, for this patch,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger*
  2022-05-08  4:03   ` Richard Henderson
@ 2022-05-09 11:33     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 0 replies; 21+ messages in thread
From: Lucas Mateus Martins Araujo e Castro @ 2022-05-09 11:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 4332 bytes --]


On 08/05/2022 01:03, Richard Henderson wrote:
>
>
> On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
>>
>> Implement the following PowerISA v3.1 instructions:
>> xvf32ger:   VSX Vector 32-bit Floating-Point GER (rank-1 update)
>> xvf32gernn: VSX Vector 32-bit Floating-Point GER (rank-1 update) 
>> Negative
>> multiply, Negative accumulate
>> xvf32gernp: VSX Vector 32-bit Floating-Point GER (rank-1 update) 
>> Negative
>> multiply, Positive accumulate
>> xvf32gerpn: VSX Vector 32-bit Floating-Point GER (rank-1 update) 
>> Positive
>> multiply, Negative accumulate
>> xvf32gerpp: VSX Vector 32-bit Floating-Point GER (rank-1 update) 
>> Positive
>> multiply, Positive accumulate
>> xvf64ger:   VSX Vector 64-bit Floating-Point GER (rank-1 update)
>> xvf64gernn: VSX Vector 64-bit Floating-Point GER (rank-1 update) 
>> Negative
>> multiply, Negative accumulate
>> xvf64gernp: VSX Vector 64-bit Floating-Point GER (rank-1 update) 
>> Negative
>> multiply, Positive accumulate
>> xvf64gerpn: VSX Vector 64-bit Floating-Point GER (rank-1 update) 
>> Positive
>> multiply, Negative accumulate
>> xvf64gerpp: VSX Vector 64-bit Floating-Point GER (rank-1 update) 
>> Positive
>> multiply, Positive accumulate
>>
>> Signed-off-by: Lucas Mateus Castro (alqotel) 
>> <lucas.araujo@eldorado.org.br>
>> ---
>>   target/ppc/cpu.h                    |   4 +
>>   target/ppc/fpu_helper.c             | 178 ++++++++++++++++++++++++++++
>>   target/ppc/helper.h                 |  10 ++
>>   target/ppc/insn32.decode            |  13 ++
>>   target/ppc/translate/vsx-impl.c.inc |  12 ++
>>   5 files changed, 217 insertions(+)
>>
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index 348a898950..eb50ad699e 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -2639,6 +2639,8 @@ static inline bool lsw_reg_in_range(int start, 
>> int nregs, int rx)
>>   #define VsrSW(i) s32[i]
>>   #define VsrD(i) u64[i]
>>   #define VsrSD(i) s64[i]
>> +#define VsrSF(i) f32[i]
>> +#define VsrDF(i) f64[i]
>>   #else
>>   #define VsrB(i) u8[15 - (i)]
>>   #define VsrSB(i) s8[15 - (i)]
>> @@ -2648,6 +2650,8 @@ static inline bool lsw_reg_in_range(int start, 
>> int nregs, int rx)
>>   #define VsrSW(i) s32[3 - (i)]
>>   #define VsrD(i) u64[1 - (i)]
>>   #define VsrSD(i) s64[1 - (i)]
>> +#define VsrSF(i) f32[3 - (i)]
>> +#define VsrDF(i) f64[1 - (i)]
>>   #endif
>>
>>   static inline int vsr64_offset(int i, bool high)
>> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
>> index f6c8318a71..138b30d08f 100644
>> --- a/target/ppc/fpu_helper.c
>> +++ b/target/ppc/fpu_helper.c
>> @@ -3462,3 +3462,181 @@ void helper_xssubqp(CPUPPCState *env, 
>> uint32_t opcode,
>>       *xt = t;
>>       do_float_check_status(env, GETPC());
>>   }
>> +
>> +static void set_rounding_mode_rn(CPUPPCState *env)
>> +{
>> +    uint8_t rmode = (env->fpscr & FP_RN) >> FPSCR_RN0;
>> +    switch (rmode) {
>> +    case 0:
>> +        set_float_rounding_mode(float_round_nearest_even, 
>> &env->fp_status);
>> +        break;
>> +    case 1:
>> +        set_float_rounding_mode(float_round_to_zero, &env->fp_status);
>> +        break;
>> +    case 2:
>> +        set_float_rounding_mode(float_round_up, &env->fp_status);
>> +        break;
>> +    case 3:
>> +        set_float_rounding_mode(float_round_down, &env->fp_status);
>> +        break;
>> +    default:
>> +        abort();
>> +    }
>> +}
>
> How is this different from fpscr_set_rounding_mode and why do you need 
> to call it at all?
It's not, I was worried that my tests weren't getting some edge case and 
searching in target/ppc/fpu_helper.c I couldn't find a function to set 
the rounding mode so I decided to create this one, but looking back now 
it's completely unnecessary so I'll remove it in v3.
>
>
> r~
-- 
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[-- Attachment #2: Type: text/html, Size: 6796 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger*
  2022-05-08  4:24   ` Richard Henderson
@ 2022-05-10 14:47     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 0 replies; 21+ messages in thread
From: Lucas Mateus Martins Araujo e Castro @ 2022-05-10 14:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc
  Cc: Joel Stanley, Aurelien Jarno, Peter Maydell, Alex Bennée,
	Cédric Le Goater, Daniel Henrique Barboza, David Gibson,
	Greg Kurz, open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]


On 08/05/2022 01:24, Richard Henderson wrote:
> On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>> +static inline float32 float32_neg(float32 a)
>> +{
>> +    if (((a & 0x7f800000) == 0x7f800000) && (a & 0x007fffff)) {
>> +        return a;
>> +    } else {
>> +        return float32_chs(a);
>> +    }
>> +}
>
> This is wrong -- even NaNs get their signs changed.
> Negation and absolute value are non-arithmetic operations.

The PowerISA 3.1 (page 589) defines bfp_negate as:

bfp_NEGATE(x)
x is a binary floating-point value that is represented in the binary 
floating-point working format.
If x is not a NaN, return x with its sign complemented. Otherwise, return x

So this is what I based on to create this function

>
> If you're matching hardware results, this suggests...
>
>> +                    if (neg_mul) {
>> +                        msum = float32_neg(msum);
>> +                    }
>> +                    if (neg_acc) {
>> +                        aux_acc = float32_neg(at[i].VsrSF(j));
>> +                    } else {
>> +                        aux_acc = at[i].VsrSF(j);
>> +                    }
>> +                    at[i].VsrSF(j) = float32_add(msum, aux_acc, 
>> excp_ptr);
>
> This "add" should be "sub" instead of using a separate negation, when 
> required.
> I do wonder about the double-negation vs nans.

But in this case some way to negate msum would still be necessary, so 
maybe move float32_neg to target/ppc/fpu_helper.c and change the name, I 
used 2 negations as a way to keep closer to the description, it is in 
the ISA as:

if “[pm]xvf16ger2pp” then v2 ← bfp_ADD(r1, acc)
if “[pm]xvf16ger2pn” then v2 ← bfp_ADD(r1, bfp_NEGATE(acc))
if “[pm]xvf16ger2np” then v2 ← bfp_ADD(bfp_NEGATE(r1), acc)
if “[pm]xvf16ger2nn” then v2 ← bfp_ADD(bfp_NEGATE(r1), bfp_NEGATE(acc))

But it could easily be change to an add/sub instead like you said

>
> It looks like this could be
>
>   float32_muladd(float32_one, msum, aux_acc, flags, &status)
>
> with flags set to float_muladd_negate_* for neg_mul and neg_acc. Any 
> NaNs would go
> through pick_nan_muladd and fail to be altered.

It would have to be float32_muladd(musm, float32_one, aux_acc, ...) to 
match the hardware result (it looks like qemu preference in a target PPC 
is to return A over C and C over B if all are NaN in a muladd, but A 
over B in a add/sub if both are NaN, so the equivalent of add(A,B) is 
muladd(A, 1, B))

That aside, having a muladd would bring it closer to vsxger over negate 
+ add/sub but personally I think I prefer the latter to not add an 
unnecessary muladd, any opinions?

>
> I'm not sure if I'm suggesting actual use of muladd, for the 
> simplicity, or if you should
> have an inline check for nans.  I might need to think about this in 
> the morning.
>
>
> r~
-- 
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[-- Attachment #2: Type: text/html, Size: 5061 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2*
  2022-05-08  4:27   ` Richard Henderson
@ 2022-05-10 17:25     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 0 replies; 21+ messages in thread
From: Lucas Mateus Martins Araujo e Castro @ 2022-05-10 17:25 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 2265 bytes --]


On 08/05/2022 01:27, Richard Henderson wrote:
> On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>>
>> There's a discrepancy between this implementation and mambo/the
>> hardware where implementing it with float64_mul then float64r32_muladd
>> sometimes results in an incorrect result after an underflow, but
>> implementing with float32_mul then float32_muladd results in incorrect
>> signal in some 0 or infinite results. I've not been able to solve this
>
> I did suggest that the float64_mul needs to be done in round-to-odd.

 From what I understood, you meant:

     rmode = get_float_rounding_mode(&status);
     set_float_rounding_mode(float_round_to_odd, &status);
     psum = float64_mul(va, vb, &status);
     set_float_rounding_mode(rmode, &status);
     psum = float64r32_muladd(vc, vd, psum, 0, &status);

Which doesn't solve the problem, I tried other solutions but overall I 
found 3 test cases that no solution could pass all, those being:

xa = 0x 000923da 28c31f00 00018540 XXXXXXXX
xb = 0x 9d080000 000f97ac b7092f00 XXXXXXXX
xvbf16ger2 at, xa, xb
at = 0x 80000000 XXXXXXXX XXXXXXXX XXXXXXXX
         0xXXXXXXXX 80000016 XXXXXXXX XXXXXXXX
         0xXXXXXXXX XXXXXXXX 80000001 XXXXXXXX
         0xXXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Doing the operation either with float64 (with and without round_to_odd) 
or with a new softfloat operation that uses FloatParts64 results in 
0x80000015 instead of 0x80000016, but doing it with float32 results in 
0x00000000 instead of 0x80000000 and 0x80000002 instead of 0x80000001

Between those choices I'd go with float64 as to keep the result 
numerically close tho the actual value if the next operation treat those 
as an integer (with float32 you can end up having 0 instead of 
INT32_MIN) and the results are close if they're treated as floating-point.

>
> Anyway, for this patch,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~
-- 
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[-- Attachment #2: Type: text/html, Size: 3427 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions
  2022-05-08  3:41   ` Richard Henderson
@ 2022-05-10 17:28     ` Lucas Mateus Martins Araujo e Castro
  2022-05-11  0:00       ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Lucas Mateus Martins Araujo e Castro @ 2022-05-10 17:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]


On 08/05/2022 00:41, Richard Henderson wrote:
> On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index 10c6d7ae43..348a898950 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -238,6 +238,7 @@ typedef union _ppc_vsr_t {
>>
>>   typedef ppc_vsr_t ppc_avr_t;
>>   typedef ppc_vsr_t ppc_fprp_t;
>> +typedef ppc_vsr_t ppc_acc_t;
>>
>>   #if !defined(CONFIG_USER_ONLY)
>>   /* Software TLB cache */
>> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
>> index aa6773c4a5..61217e0a10 100644
>> --- a/target/ppc/helper.h
>> +++ b/target/ppc/helper.h
>> @@ -537,6 +537,15 @@ DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, 
>> vsr, i32)
>>   DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
>>   DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
>>   DEF_HELPER_5(XXBLENDVD, void, vsr, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI4GER8, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI4GER8PP, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI8GER4, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI8GER4PP, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI8GER4SPP, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, vsr, i32)
>> +DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, vsr, i32)
>
> Did you intend to use "acc" here, for documentation purposes?
> It's just a couple of #defines above.

Yes, I'll change that in the next version, do you want me to keep the 
Reviewed-by or do I sent without them so you can review the changes?

It'll be just adding

     #define dh_alias_acc ptr
     #define dh_ctype_acc ppc_acc_t *
     #define dh_typecode_acc dh_typecode_ptr

and changing DEF_HELPER_5 to have acc instead of the third vsr

>
> Either way, much cleaner.
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~
-- 
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[-- Attachment #2: Type: text/html, Size: 3712 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions
  2022-05-10 17:28     ` Lucas Mateus Martins Araujo e Castro
@ 2022-05-11  0:00       ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2022-05-11  0:00 UTC (permalink / raw)
  To: Lucas Mateus Martins Araujo e Castro, qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

On 5/10/22 10:28, Lucas Mateus Martins Araujo e Castro wrote:
> 
> On 08/05/2022 00:41, Richard Henderson wrote:
>> On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>>> index 10c6d7ae43..348a898950 100644
>>> --- a/target/ppc/cpu.h
>>> +++ b/target/ppc/cpu.h
>>> @@ -238,6 +238,7 @@ typedef union _ppc_vsr_t {
>>>
>>>   typedef ppc_vsr_t ppc_avr_t;
>>>   typedef ppc_vsr_t ppc_fprp_t;
>>> +typedef ppc_vsr_t ppc_acc_t;
>>>
>>>   #if !defined(CONFIG_USER_ONLY)
>>>   /* Software TLB cache */
>>> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
>>> index aa6773c4a5..61217e0a10 100644
>>> --- a/target/ppc/helper.h
>>> +++ b/target/ppc/helper.h
>>> @@ -537,6 +537,15 @@ DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
>>>   DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
>>>   DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
>>>   DEF_HELPER_5(XXBLENDVD, void, vsr, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI4GER8, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI4GER8PP, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI8GER4, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI8GER4PP, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI8GER4SPP, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI16GER2, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI16GER2S, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI16GER2PP, void, env, vsr, vsr, vsr, i32)
>>> +DEF_HELPER_5(XVI16GER2SPP, void, env, vsr, vsr, vsr, i32)
>>
>> Did you intend to use "acc" here, for documentation purposes?
>> It's just a couple of #defines above.
> 
> Yes, I'll change that in the next version, do you want me to keep the Reviewed-by or do I 
> sent without them so you can review the changes?

Keep the r-b.

> It'll be just adding
> 
>      #define dh_alias_acc ptr
>      #define dh_ctype_acc ppc_acc_t *
>      #define dh_typecode_acc dh_typecode_ptr
> 
> and changing DEF_HELPER_5 to have acc instead of the third vsr

Yep.


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions
  2022-05-08  3:48   ` Richard Henderson
@ 2022-05-12 17:38     ` Lucas Mateus Martins Araujo e Castro
  0 siblings, 0 replies; 21+ messages in thread
From: Lucas Mateus Martins Araujo e Castro @ 2022-05-12 17:38 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc
  Cc: Joel Stanley, Cédric Le Goater, Daniel Henrique Barboza,
	David Gibson, Greg Kurz, open list:All patches CC here

[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]


On 08/05/2022 00:48, Richard Henderson wrote:
> On 5/6/22 07:18, Lucas Mateus Castro(alqotel) wrote:
>
>> +{
>> +    arg_MMIRR_XX3 m;
>> +    m.xa = a->xa;
>> +    m.xb = a->xb;
>> +    m.xt = a->xt;
>> +    m.pmsk = 0xFF;
>> +    m.ymsk = 0xF;
>> +    m.xmsk = 0xF;
>> +    return do_ger_MMIRR_XX3(ctx, &m, helper);
>>   }
>
> Is XX3 going to be used for anything else?  Is it worthwhile to move 
> these into the
> decoder as explicit assignments?
XX3 and MMIRR_XX3 are in different decodetree files, I'll change all 
instructions to use MMIRR_XX3 for the next patch but I'll have to change 
MMIRR_XX3 declaration in target/ppc/insn64.decode to have an !extern in 
it (since it'll need to also be declared in target/ppc/insn32.decode)
>
> Either way,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
> r~
-- 
Lucas Mateus M. Araujo e Castro
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[-- Attachment #2: Type: text/html, Size: 2189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-05-12 18:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06 12:18 [RFC PATCH v2 0/7] VSX MMA Implementation Lucas Mateus Castro(alqotel)
2022-05-06 12:18 ` [RFC PATCH v2 1/7] target/ppc: Implement xxm[tf]acc and xxsetaccz Lucas Mateus Castro(alqotel)
2022-05-08  3:28   ` Richard Henderson
2022-05-06 12:18 ` [RFC PATCH v2 2/7] target/ppc: Implemented xvi*ger* instructions Lucas Mateus Castro(alqotel)
2022-05-08  3:41   ` Richard Henderson
2022-05-10 17:28     ` Lucas Mateus Martins Araujo e Castro
2022-05-11  0:00       ` Richard Henderson
2022-05-06 12:18 ` [RFC PATCH v2 3/7] target/ppc: Implemented pmxvi*ger* instructions Lucas Mateus Castro(alqotel)
2022-05-08  3:48   ` Richard Henderson
2022-05-12 17:38     ` Lucas Mateus Martins Araujo e Castro
2022-05-06 12:18 ` [RFC PATCH v2 4/7] target/ppc: Implemented xvf*ger* Lucas Mateus Castro(alqotel)
2022-05-08  4:03   ` Richard Henderson
2022-05-09 11:33     ` Lucas Mateus Martins Araujo e Castro
2022-05-06 12:18 ` [RFC PATCH v2 5/7] target/ppc: Implemented xvf16ger* Lucas Mateus Castro(alqotel)
2022-05-08  4:24   ` Richard Henderson
2022-05-10 14:47     ` Lucas Mateus Martins Araujo e Castro
2022-05-06 12:18 ` [RFC PATCH v2 6/7] target/ppc: Implemented pmxvf*ger* Lucas Mateus Castro(alqotel)
2022-05-08  4:25   ` Richard Henderson
2022-05-06 12:18 ` [RFC PATCH v2 7/7] target/ppc: Implemented [pm]xvbf16ger2* Lucas Mateus Castro(alqotel)
2022-05-08  4:27   ` Richard Henderson
2022-05-10 17:25     ` Lucas Mateus Martins Araujo e Castro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.