From: Peter Maydell <peter.maydell@linaro.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-arm <qemu-arm@nongnu.org>, QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [PATCH v1 08/11] target/arm: Implement bfloat16 matrix multiply accumulate
Date: Tue, 18 May 2021 13:37:51 +0100 [thread overview]
Message-ID: <CAFEAcA_T9f47uZSEt9BRsThxLsvauTPMiDSNM8B5=Dk5xRQ+wg@mail.gmail.com> (raw)
In-Reply-To: <20210416235928.1631788-9-richard.henderson@linaro.org>
On Sat, 17 Apr 2021 at 01:00, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is BFMMLA for both AArch64 AdvSIMD and SVE,
> and VMMLA.BF16 for AArch32 NEON.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> +void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
> +{
> + intptr_t s, opr_sz = simd_oprsz(desc);
> + float32 *d = vd, *a = va;
> + uint32_t *n = vn, *m = vm;
> +
> + for (s = 0; s < opr_sz / 4; s += 4) {
> + float32 sum00, sum01, sum10, sum11;
> +
> + /*
> + * Process the entire segment at once, writing back the
> + * results only after we've consumed all of the inputs.
> + *
> + * Key to indicies by column:
"indices"
> + * i j i k j k
> + */
> + sum00 = a[s + H4(0 + 0)];
> + sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
> + sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
I can't make these indices match up with the arm arm pseudocode ones,
which index by "4*i + 2*k + 0" and "4*i + 2*k + 1", not "2*i + k";
are we hiding a division by 2 somewhere?
> +
> + sum01 = a[s + H4(0 + 1)];
> + sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
> + sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
> +
> + sum10 = a[s + H4(2 + 0)];
> + sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
> + sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
> +
> + sum11 = a[s + H4(2 + 1)];
> + sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
> + sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
> +
> + d[s + H4(0 + 0)] = sum00;
> + d[s + H4(0 + 1)] = sum01;
> + d[s + H4(2 + 0)] = sum10;
> + d[s + H4(2 + 1)] = sum11;
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
next prev parent reply other threads:[~2021-05-18 12:52 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-16 23:59 [PATCH v1 for-6.1 00/11] target/arm: Implement BFloat16 Richard Henderson
2021-04-16 23:59 ` [PATCH v1 01/11] target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16 Richard Henderson
2021-05-18 10:43 ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 02/11] target/arm: Unify unallocated path in disas_fp_1src Richard Henderson
2021-05-18 10:43 ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 03/11] target/arm: Implement scalar float32 to bfloat16 conversion Richard Henderson
2021-05-18 10:53 ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 04/11] target/arm: Implement vector " Richard Henderson
2021-05-18 11:10 ` Peter Maydell
2021-05-18 14:32 ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 05/11] fpu: Add float_round_to_odd_inf Richard Henderson
2021-05-18 11:20 ` Peter Maydell
2021-05-18 14:24 ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 06/11] target/arm: Implement bfloat16 dot product (vector) Richard Henderson
2021-05-18 12:15 ` Peter Maydell
2021-05-18 14:27 ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 07/11] target/arm: Implement bfloat16 dot product (indexed) Richard Henderson
2021-05-18 12:24 ` Peter Maydell
2021-05-18 14:38 ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 08/11] target/arm: Implement bfloat16 matrix multiply accumulate Richard Henderson
2021-05-18 12:37 ` Peter Maydell [this message]
2021-05-18 14:45 ` Richard Henderson
2021-04-16 23:59 ` [PATCH v1 09/11] target/arm: Implement bfloat widening fma (vector) Richard Henderson
2021-05-18 12:42 ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 10/11] target/arm: Implement bfloat widening fma (indexed) Richard Henderson
2021-05-18 12:46 ` Peter Maydell
2021-04-16 23:59 ` [PATCH v1 11/11] target/arm: Enable BFloat16 extensions Richard Henderson
2021-05-18 12:47 ` Peter Maydell
2021-05-18 14:47 ` Richard Henderson
2021-05-25 16:57 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFEAcA_T9f47uZSEt9BRsThxLsvauTPMiDSNM8B5=Dk5xRQ+wg@mail.gmail.com' \
--to=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).