From: Nick Desaulniers <ndesaulniers@google.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-crypto@vger.kernel.org,
Herbert Xu <herbert@gondor.apana.org.au>,
Eric Biggers <ebiggers@kernel.org>
Subject: Re: [PATCH v2 3/3] crypto: arm64/aegis128 - implement plain NEON version
Date: Mon, 12 Aug 2019 09:50:33 -0700 [thread overview]
Message-ID: <CAKwvOd=uxi8qmQEjOudvSUVW6vc42b-SmoV91DeWfBkp3kOJcQ@mail.gmail.com> (raw)
In-Reply-To: <20190811225912.19412-4-ard.biesheuvel@linaro.org>
On Sun, Aug 11, 2019 at 3:59 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> Provide a version of the core AES transform to the aegis128 SIMD
> code that does not rely on the special AES instructions, but uses
> plain NEON instructions instead. This allows the SIMD version of
> the aegis128 driver to be used on arm64 systems that do not
> implement those instructions (which are not mandatory in the
> architecture), such as the Raspberry Pi 3.
>
> Since GCC makes a mess of this when using the tbl/tbx intrinsics
> to perform the sbox substitution, preload the Sbox into v16..v31
> in this case and use inline asm to emit the tbl/tbx instructions.
> Clang does not support this approach, nor does it require it, since
> it does a much better job at code generation, so there we use the
> intrinsics as usual.
Oh, great job getting it working with Clang, too. I appreciate that.
Certainly getting SIMD working exactly how you want across compilers
can be tricky.
>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> crypto/Makefile | 9 ++-
> crypto/aegis128-neon-inner.c | 65 ++++++++++++++++++++
> crypto/aegis128-neon.c | 8 ++-
> 3 files changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/crypto/Makefile b/crypto/Makefile
> index 99a9fa9087d1..0d2cdd523fd9 100644
> --- a/crypto/Makefile
> +++ b/crypto/Makefile
> @@ -98,7 +98,14 @@ CFLAGS_aegis128-neon-inner.o += -mfpu=crypto-neon-fp-armv8
> aegis128-$(CONFIG_CRYPTO_AEGIS128_SIMD) += aegis128-neon.o aegis128-neon-inner.o
> endif
> ifeq ($(ARCH),arm64)
> -CFLAGS_aegis128-neon-inner.o += -ffreestanding -mcpu=generic+crypto
> +aegis128-cflags-y := -ffreestanding -mcpu=generic+crypto
> +aegis128-cflags-$(CONFIG_CC_IS_GCC) += -ffixed-q16 -ffixed-q17 -ffixed-q18 \
> + -ffixed-q19 -ffixed-q20 -ffixed-q21 \
> + -ffixed-q22 -ffixed-q23 -ffixed-q24 \
> + -ffixed-q25 -ffixed-q26 -ffixed-q27 \
> + -ffixed-q28 -ffixed-q29 -ffixed-q30 \
> + -ffixed-q31
I've filed https://bugs.llvm.org/show_bug.cgi?id=42974 for a feature
request for this in Clang.
> +CFLAGS_aegis128-neon-inner.o += $(aegis128-cflags-y)
> CFLAGS_REMOVE_aegis128-neon-inner.o += -mgeneral-regs-only
> aegis128-$(CONFIG_CRYPTO_AEGIS128_SIMD) += aegis128-neon.o aegis128-neon-inner.o
> endif
> diff --git a/crypto/aegis128-neon-inner.c b/crypto/aegis128-neon-inner.c
> index 3d8043c4832b..ed55568afd1b 100644
> --- a/crypto/aegis128-neon-inner.c
> +++ b/crypto/aegis128-neon-inner.c
> @@ -17,6 +17,8 @@
>
> #include <stddef.h>
>
> +extern int aegis128_have_aes_insn;
> +
> void *memcpy(void *dest, const void *src, size_t n);
> void *memset(void *s, int c, size_t n);
>
> @@ -24,6 +26,8 @@ struct aegis128_state {
> uint8x16_t v[5];
> };
>
> +extern const uint8x16x4_t crypto_aes_sbox[];
extern const uint8x16x4_t *crypto_aes_sbox;
> +
> static struct aegis128_state aegis128_load_state_neon(const void *state)
> {
> return (struct aegis128_state){ {
> @@ -49,6 +53,46 @@ uint8x16_t aegis_aes_round(uint8x16_t w)
> {
> uint8x16_t z = {};
>
> +#ifdef CONFIG_ARM64
> + if (!__builtin_expect(aegis128_have_aes_insn, 1)) {
> + static const uint8x16_t shift_rows = {
> + 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3,
> + 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb,
> + };
> + static const uint8x16_t ror32by8 = {
> + 0x1, 0x2, 0x3, 0x0, 0x5, 0x6, 0x7, 0x4,
> + 0x9, 0xa, 0xb, 0x8, 0xd, 0xe, 0xf, 0xc,
> + };
> + uint8x16_t v;
> +
> + // shift rows
> + w = vqtbl1q_u8(w, shift_rows);
> +
> + // sub bytes
> + if (!IS_ENABLED(CONFIG_CC_IS_GCC)) {
> + v = vqtbl4q_u8(crypto_aes_sbox[0], w);
> + v = vqtbx4q_u8(v, crypto_aes_sbox[1], w - 0x40);
> + v = vqtbx4q_u8(v, crypto_aes_sbox[2], w - 0x80);
> + v = vqtbx4q_u8(v, crypto_aes_sbox[3], w - 0xc0);
> + } else {
> + asm("tbl %0.16b, {v16.16b-v19.16b}, %1.16b" : "=w"(v) : "w"(w));
> + w -= 0x40;
> + asm("tbx %0.16b, {v20.16b-v23.16b}, %1.16b" : "+w"(v) : "w"(w));
> + w -= 0x40;
> + asm("tbx %0.16b, {v24.16b-v27.16b}, %1.16b" : "+w"(v) : "w"(w));
> + w -= 0x40;
> + asm("tbx %0.16b, {v28.16b-v31.16b}, %1.16b" : "+w"(v) : "w"(w));
> + }
I find negation in a if condition that also has an else to be a code
smell. Consider replacing:
if !foo:
bar()
else:
baz()
with:
if foo:
baz()
else:
bar()
(CONFIG_CC_IS_CLANG may be helpful here, too).
With those 2 recommendations:
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
in regards to compiling w/ Clang. Someone else should review the
implementation of this crypto routine.
> +
> + // mix columns
> + w = (v << 1) ^ (uint8x16_t)(((int8x16_t)v >> 7) & 0x1b);
> + w ^= (uint8x16_t)vrev32q_u16((uint16x8_t)v);
> + w ^= vqtbl1q_u8(v ^ w, ror32by8);
> +
> + return w;
> + }
> +#endif
> +
> /*
> * We use inline asm here instead of the vaeseq_u8/vaesmcq_u8 intrinsics
> * to force the compiler to issue the aese/aesmc instructions in pairs.
> @@ -73,10 +117,27 @@ struct aegis128_state aegis128_update_neon(struct aegis128_state st,
> return st;
> }
>
> +static inline __attribute__((always_inline))
> +void preload_sbox(void)
> +{
> + if (!IS_ENABLED(CONFIG_ARM64) ||
> + !IS_ENABLED(CONFIG_CC_IS_GCC) ||
> + __builtin_expect(aegis128_have_aes_insn, 1))
> + return;
> +
> + asm("ld1 {v16.16b-v19.16b}, [%0], #64 \n\t"
> + "ld1 {v20.16b-v23.16b}, [%0], #64 \n\t"
> + "ld1 {v24.16b-v27.16b}, [%0], #64 \n\t"
> + "ld1 {v28.16b-v31.16b}, [%0] \n\t"
> + :: "r"(crypto_aes_sbox));
> +}
> +
> void crypto_aegis128_update_neon(void *state, const void *msg)
> {
> struct aegis128_state st = aegis128_load_state_neon(state);
>
> + preload_sbox();
> +
> st = aegis128_update_neon(st, vld1q_u8(msg));
>
> aegis128_save_state_neon(st, state);
> @@ -88,6 +149,8 @@ void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src,
> struct aegis128_state st = aegis128_load_state_neon(state);
> uint8x16_t msg;
>
> + preload_sbox();
> +
> while (size >= AEGIS_BLOCK_SIZE) {
> uint8x16_t s = st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4];
>
> @@ -120,6 +183,8 @@ void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src,
> struct aegis128_state st = aegis128_load_state_neon(state);
> uint8x16_t msg;
>
> + preload_sbox();
> +
> while (size >= AEGIS_BLOCK_SIZE) {
> msg = vld1q_u8(src) ^ st.v[1] ^ (st.v[2] & st.v[3]) ^ st.v[4];
> st = aegis128_update_neon(st, msg);
> diff --git a/crypto/aegis128-neon.c b/crypto/aegis128-neon.c
> index c1c0a1686f67..751f9c195aa4 100644
> --- a/crypto/aegis128-neon.c
> +++ b/crypto/aegis128-neon.c
> @@ -14,9 +14,15 @@ void crypto_aegis128_encrypt_chunk_neon(void *state, void *dst, const void *src,
> void crypto_aegis128_decrypt_chunk_neon(void *state, void *dst, const void *src,
> unsigned int size);
>
> +int aegis128_have_aes_insn __ro_after_init;
> +
> bool crypto_aegis128_have_simd(void)
> {
> - return cpu_have_feature(cpu_feature(AES));
> + if (cpu_have_feature(cpu_feature(AES))) {
> + aegis128_have_aes_insn = 1;
If aegis128_have_aes_insn is __ro_after_init, is
crypto_aegis128_have_simd() called exclusively from .init sectioned
code?
> + return true;
> + }
> + return IS_ENABLED(CONFIG_ARM64);
> }
>
> void crypto_aegis128_update_simd(union aegis_block *state, const void *msg)
> --
> 2.17.1
>
--
Thanks,
~Nick Desaulniers
next prev parent reply other threads:[~2019-08-12 16:50 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-11 22:59 [PATCH v2 0/3] crypto: aegis128 followup Ard Biesheuvel
2019-08-11 22:59 ` [PATCH v2 1/3] crypto: aegis128 - add support for SIMD acceleration Ard Biesheuvel
2019-08-11 22:59 ` [PATCH v2 2/3] crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Ard Biesheuvel
2019-08-11 22:59 ` [PATCH v2 3/3] crypto: arm64/aegis128 - implement plain NEON version Ard Biesheuvel
2019-08-12 16:50 ` Nick Desaulniers [this message]
2019-08-12 17:22 ` Ard Biesheuvel
2019-08-12 17:31 ` Nick Desaulniers
2019-08-12 18:14 ` Ard Biesheuvel
2019-08-15 12:08 ` [PATCH v2 0/3] crypto: aegis128 followup Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKwvOd=uxi8qmQEjOudvSUVW6vc42b-SmoV91DeWfBkp3kOJcQ@mail.gmail.com' \
--to=ndesaulniers@google.com \
--cc=ard.biesheuvel@linaro.org \
--cc=ebiggers@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).