qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Maydell <peter.maydell@linaro.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-arm <qemu-arm@nongnu.org>, QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [PATCH v1 08/11] target/arm: Implement bfloat16 matrix multiply accumulate
Date: Tue, 18 May 2021 13:37:51 +0100	[thread overview]
Message-ID: <CAFEAcA_T9f47uZSEt9BRsThxLsvauTPMiDSNM8B5=Dk5xRQ+wg@mail.gmail.com> (raw)
In-Reply-To: <20210416235928.1631788-9-richard.henderson@linaro.org>

On Sat, 17 Apr 2021 at 01:00, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is BFMMLA for both AArch64 AdvSIMD and SVE,
> and VMMLA.BF16 for AArch32 NEON.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> +void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
> +{
> +    intptr_t s, opr_sz = simd_oprsz(desc);
> +    float32 *d = vd, *a = va;
> +    uint32_t *n = vn, *m = vm;
> +
> +    for (s = 0; s < opr_sz / 4; s += 4) {
> +        float32 sum00, sum01, sum10, sum11;
> +
> +        /*
> +         * Process the entire segment at once, writing back the
> +         * results only after we've consumed all of the inputs.
> +         *
> +         * Key to indicies by column:

"indices"

> +         *               i   j           i   k             j   k
> +         */
> +        sum00 = a[s + H4(0 + 0)];
> +        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
> +        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);

I can't make these indices match up with the arm arm pseudocode ones,
which index by "4*i + 2*k + 0" and "4*i + 2*k + 1", not "2*i + k";
are we hiding a division by 2 somewhere?

> +
> +        sum01 = a[s + H4(0 + 1)];
> +        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
> +        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
> +
> +        sum10 = a[s + H4(2 + 0)];
> +        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
> +        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
> +
> +        sum11 = a[s + H4(2 + 1)];
> +        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
> +        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
> +
> +        d[s + H4(0 + 0)] = sum00;
> +        d[s + H4(0 + 1)] = sum01;
> +        d[s + H4(2 + 0)] = sum10;
> +        d[s + H4(2 + 1)] = sum11;
> +    }
> +    clear_tail(d, opr_sz, simd_maxsz(desc));

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


  reply	other threads:[~2021-05-18 12:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 23:59 [PATCH v1 for-6.1 00/11] target/arm: Implement BFloat16 Richard Henderson
2021-04-16 23:59 ` [PATCH v1 01/11] target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16 Richard Henderson
2021-05-18 10:43   ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 02/11] target/arm: Unify unallocated path in disas_fp_1src Richard Henderson
2021-05-18 10:43   ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 03/11] target/arm: Implement scalar float32 to bfloat16 conversion Richard Henderson
2021-05-18 10:53   ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 04/11] target/arm: Implement vector " Richard Henderson
2021-05-18 11:10   ` Peter Maydell
2021-05-18 14:32     ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 05/11] fpu: Add float_round_to_odd_inf Richard Henderson
2021-05-18 11:20   ` Peter Maydell
2021-05-18 14:24     ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 06/11] target/arm: Implement bfloat16 dot product (vector) Richard Henderson
2021-05-18 12:15   ` Peter Maydell
2021-05-18 14:27     ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 07/11] target/arm: Implement bfloat16 dot product (indexed) Richard Henderson
2021-05-18 12:24   ` Peter Maydell
2021-05-18 14:38     ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 08/11] target/arm: Implement bfloat16 matrix multiply accumulate Richard Henderson
2021-05-18 12:37   ` Peter Maydell [this message]
2021-05-18 14:45     ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 09/11] target/arm: Implement bfloat widening fma (vector) Richard Henderson
2021-05-18 12:42   ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 10/11] target/arm: Implement bfloat widening fma (indexed) Richard Henderson
2021-05-18 12:46   ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 11/11] target/arm: Enable BFloat16 extensions Richard Henderson
2021-05-18 12:47   ` Peter Maydell
2021-05-18 14:47     ` Richard Henderson
2021-05-25 16:57       ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFEAcA_T9f47uZSEt9BRsThxLsvauTPMiDSNM8B5=Dk5xRQ+wg@mail.gmail.com' \
    --to=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).