On 26/04/2022 20:40, Richard Henderson wrote: > > On 4/26/22 05:50, Lucas Mateus Castro(alqotel) wrote: >> +%xx_at          23:3 !function=times_4 >> +@XX3_at         ...... ... .. ..... ..... ........ ... &XX3 >> xt=%xx_at xb=%xx_xb > > Hmm.  Depends, I suppose on whether you want acc[0-7] or vsr[0-28] I mostly used VSR function here, but since I'll change the patch 1 to your suggestion (which will require creating acc_full_offset) I'll make a few changes to create some functions for the accumulator > >> +/* >> + * Packed VSX Integer GER Flags >> + * 00 - no accumulation no saturation >> + * 01 - accumulate but no saturation >> + * 10 - no accumulation but with saturation >> + * 11 - accumulate with saturation >> + */ >> +static inline bool get_sat(uint32_t flags) >> +{ >> +    return flags & 0x2; >> +} >> + >> +static inline bool get_acc(uint32_t flags) >> +{ >> +    return flags & 0x1; >> +} > > Better to have separate helpers for these?  They'd be immediate > operands to the function > replacing XVIGER (see below) and thus optimize well. Do you mean different functions or a function that receives packed_flags along with the callback functions? > >> +#define GET_VsrN(a, i) (extract32(a->VsrB((i) / 2), (i) % 2 ? 4 : 0, >> 4)) >> +#define GET_VsrB(a, i) a->VsrB(i) >> +#define GET_VsrH(a, i) a->VsrH(i) >> + >> +#define GET_VsrSN(a, i) (sextract32(a->VsrSB((i) / 2), (i) % 2 ? 4 : >> 0, 4)) >> +#define GET_VsrSB(a, i) a->VsrSB(i) >> +#define GET_VsrSH(a, i) a->VsrSH(i) > > These can be made into functions of the form > >     typedef int32_t xviger_extract(ppc_vsr_t *a, int i); > In this case it'd be necessary to receive 2 xviger_extract functions since XVI8GER4* multiply one value as signed and the other as unsigned (and other integer GER treat both as signed). An alternative would be to isolate the innermost loop into a different function, like:     typedef int64_t do_ger(int32_t a, int32_t b, int32_t at, int32_t pmsk);     static int64_t ger_rank4(int32_t a, int32_t b, int32_t at, int32_t mask)     {         int64_t psum = 0, i;         for (i = 0; i < 4; i++, mask >>= 1) {             if (mask & 1) {                 psum += (sextract32(a, i * 8, 8)) * (extract32(b, i * 8, 8));            }         }         return psum;     } That way we could avoid having 'rank' as a parameter, what do you think? > > >> diff --git a/target/ppc/internal.h b/target/ppc/internal.h >> index 8094e0b033..a994d98238 100644 >> --- a/target/ppc/internal.h >> +++ b/target/ppc/internal.h >> @@ -291,4 +291,32 @@ G_NORETURN void >> ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr, >>                                               uintptr_t retaddr); >>   #endif >> >> +/* >> + * Auxiliary functions to pack/unpack masks for GER instructions. >> + * >> + * Packed format: >> + *  Bits 0-3: xmsk >> + *  Bits 4-7: ymsk >> + *  Bits 8-15: pmsk >> + */ >> +static inline uint8_t ger_get_xmsk(uint32_t packed_masks) >> +{ >> +    return packed_masks & 0xF; >> +} >> + >> +static inline uint8_t ger_get_ymsk(uint32_t packed_masks) >> +{ >> +    return (packed_masks >> 4) & 0xF; >> +} >> + >> +static inline uint8_t ger_get_pmsk(uint32_t packed_masks) >> +{ >> +    return (packed_masks >> 8) & 0xFF; >> +} >> + >> +static inline int ger_pack_masks(int pmsk, int ymsk, int xmsk) >> +{ >> +    return (pmsk & 0xFF) << 8 | (ymsk & 0xF) << 4 | (xmsk & 0xF); >> +} > > Use hw/registerfields.h.  C.f. PREDDESC in target/arm/internals.h. Ok, will do > >> +static bool do_ger_XX3(DisasContext *ctx, arg_XX3 *a, uint32_t op, >> +                             void (*helper)(TCGv_env, TCGv_i32, >> TCGv_i32, >> +                                            TCGv_i32, TCGv_i32, >> TCGv_i32)) >> +{ >> +    uint32_t mask; >> +    REQUIRE_INSNS_FLAGS2(ctx, ISA310); >> +    REQUIRE_VSX(ctx); >> +    if (unlikely((a->xa / 4 == a->xt / 4) || (a->xb / 4 == a->xt / >> 4))) { >> +        gen_invalid(ctx); >> +        return true; >> +    } >> + >> +    mask = 0xFFFFFFFF; >> +    helper(cpu_env, tcg_constant_i32(a->xa), tcg_constant_i32(a->xb), >> +           tcg_constant_i32(a->xt), tcg_constant_i32(mask), >> +           tcg_constant_i32(op)); >> +    return true; >> +} > > Why are you passing register numbers instead of pointers, like > everywhere else? Because here we are not working only with 1 register per register number, the ACC uses 4 and the XVF64GER* needs to use XA and XA+1, and while VSR is an array so I could do ppc_vsr_ptr+1 I thought it was better not to access memory I was not given a pointer to, so I passed XA so I can request cpu_vsr_ptr(env, xa) and cpu_vsr_ptr(env, xa + 1) > > > r~ -- Lucas Mateus M. Araujo e Castro Instituto de Pesquisas ELDORADO Departamento Computação Embarcada Analista de Software Trainee Aviso Legal - Disclaimer