From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Dave Martin <Dave.Martin@arm.com>
Cc: Greg Kaiser <gkaiser@google.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
Eric Biggers <ebiggers@google.com>,
Patrik Torstensson <totte@google.com>,
Michael Halcrow <mhalcrow@google.com>,
Paul Lawrence <paullawrence@google.com>,
linux-fscrypt@vger.kernel.org,
"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
<linux-crypto@vger.kernel.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Paul Crowley <paulcrowley@google.com>
Subject: Re: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS
Date: Tue, 6 Mar 2018 12:47:45 +0000 [thread overview]
Message-ID: <CAKv+Gu9bgJ_zW30Q=nFcof_xhQzno4WvtNbpweav=22B6ef5GA@mail.gmail.com> (raw)
In-Reply-To: <20180306123505.GK32331@e103592.cambridge.arm.com>
On 6 March 2018 at 12:35, Dave Martin <Dave.Martin@arm.com> wrote:
> On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
>> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> for ARM64. This is ported from the 32-bit version. It may be useful on
>> devices with 64-bit ARM CPUs that don't have the Cryptography
>> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
>> processor on the Raspberry Pi 3.
>>
>> It generally works the same way as the 32-bit version, but there are
>> some slight differences due to the different instructions, registers,
>> and syntax available in ARM64 vs. in ARM32. For example, in the 64-bit
>> version there are enough registers to hold the XTS tweaks for each
>> 128-byte chunk, so they don't need to be saved on the stack.
>>
>> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
>>
>> Algorithm Encryption Decryption
>> --------- ---------- ----------
>> Speck64/128-XTS (NEON) 92.2 MB/s 92.2 MB/s
>> Speck128/256-XTS (NEON) 75.0 MB/s 75.0 MB/s
>> Speck128/256-XTS (generic) 47.4 MB/s 35.6 MB/s
>> AES-128-XTS (NEON bit-sliced) 33.4 MB/s 29.6 MB/s
>> AES-256-XTS (NEON bit-sliced) 24.6 MB/s 21.7 MB/s
>>
>> The code performs well on higher-end ARM64 processors as well, though
>> such processors tend to have the Crypto Extensions which make AES
>> preferred. For example, here are the same benchmarks run on a HiKey960
>> (with CPU affinity set for the A73 cores), with the Crypto Extensions
>> implementation of AES-256-XTS added:
>>
>> Algorithm Encryption Decryption
>> --------- ----------- -----------
>> AES-256-XTS (Crypto Extensions) 1273.3 MB/s 1274.7 MB/s
>> Speck64/128-XTS (NEON) 359.8 MB/s 348.0 MB/s
>> Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
>> Speck128/256-XTS (generic) 186.3 MB/s 181.8 MB/s
>> AES-128-XTS (NEON bit-sliced) 142.0 MB/s 124.3 MB/s
>> AES-256-XTS (NEON bit-sliced) 104.7 MB/s 91.1 MB/s
>>
>> Signed-off-by: Eric Biggers <ebiggers@google.com>
>> ---
>> arch/arm64/crypto/Kconfig | 6 +
>> arch/arm64/crypto/Makefile | 3 +
>> arch/arm64/crypto/speck-neon-core.S | 352 ++++++++++++++++++++++++++++
>> arch/arm64/crypto/speck-neon-glue.c | 282 ++++++++++++++++++++++
>> 4 files changed, 643 insertions(+)
>> create mode 100644 arch/arm64/crypto/speck-neon-core.S
>> create mode 100644 arch/arm64/crypto/speck-neon-glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 285c36c7b408..cb5a243110c4 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
>> select CRYPTO_AES_ARM64
>> select CRYPTO_SIMD
>>
>> +config CRYPTO_SPECK_NEON
>> + tristate "NEON accelerated Speck cipher algorithms"
>> + depends on KERNEL_MODE_NEON
>> + select CRYPTO_BLKCIPHER
>> + select CRYPTO_SPECK
>> +
>> endif
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index cee9b8d9830b..d94ebd15a859 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -53,6 +53,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
>> obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>> chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>>
>> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>> +
>> obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
>> aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
>>
>> diff --git a/arch/arm64/crypto/speck-neon-core.S b/arch/arm64/crypto/speck-neon-core.S
>> new file mode 100644
>> index 000000000000..b14463438b09
>> --- /dev/null
>> +++ b/arch/arm64/crypto/speck-neon-core.S
>> @@ -0,0 +1,352 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> + *
>> + * Copyright (c) 2018 Google, Inc
>> + *
>> + * Author: Eric Biggers <ebiggers@google.com>
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> + .text
>> +
>> + // arguments
>> + ROUND_KEYS .req x0 // const {u64,u32} *round_keys
>> + NROUNDS .req w1 // int nrounds
>> + NROUNDS_X .req x1
>> + DST .req x2 // void *dst
>> + SRC .req x3 // const void *src
>> + NBYTES .req w4 // unsigned int nbytes
>> + TWEAK .req x5 // void *tweak
>> +
>> + // registers which hold the data being encrypted/decrypted
>> + // (underscores avoid a naming collision with ARM64 registers x0-x3)
>> + X_0 .req v0
>> + Y_0 .req v1
>> + X_1 .req v2
>> + Y_1 .req v3
>> + X_2 .req v4
>> + Y_2 .req v5
>> + X_3 .req v6
>> + Y_3 .req v7
>> +
>> + // the round key, duplicated in all lanes
>> + ROUND_KEY .req v8
>> +
>> + // index vector for tbl-based 8-bit rotates
>> + ROTATE_TABLE .req v9
>> + ROTATE_TABLE_Q .req q9
>> +
>> + // temporary registers
>> + TMP0 .req v10
>> + TMP1 .req v11
>> + TMP2 .req v12
>> + TMP3 .req v13
>> +
>> + // multiplication table for updating XTS tweaks
>> + GFMUL_TABLE .req v14
>> + GFMUL_TABLE_Q .req q14
>> +
>> + // next XTS tweak value(s)
>> + TWEAKV_NEXT .req v15
>> +
>> + // XTS tweaks for the blocks currently being encrypted/decrypted
>> + TWEAKV0 .req v16
>> + TWEAKV1 .req v17
>> + TWEAKV2 .req v18
>> + TWEAKV3 .req v19
>> + TWEAKV4 .req v20
>> + TWEAKV5 .req v21
>> + TWEAKV6 .req v22
>> + TWEAKV7 .req v23
>> +
>> + .align 4
>> +.Lror64_8_table:
>> + .octa 0x080f0e0d0c0b0a090007060504030201
>> +.Lror32_8_table:
>> + .octa 0x0c0f0e0d080b0a090407060500030201
>> +.Lrol64_8_table:
>> + .octa 0x0e0d0c0b0a09080f0605040302010007
>> +.Lrol32_8_table:
>> + .octa 0x0e0d0c0f0a09080b0605040702010003
>> +.Lgf128mul_table:
>> + .octa 0x00000000000000870000000000000001
>> +.Lgf64mul_table:
>> + .octa 0x0000000000000000000000002d361b00
>
> Won't this put the data in the image in an endianness-dependent layout?
> Alternatively, if this doesn't matter, then why doesn't it matter?
>
> (I don't claim to understand the code fully here...)
>
Since these constants get loaded using 'ldr q#, .Lxxxx' instructions,
this arrangement is actually endian agnostic.
...
>> +static int __init speck_neon_module_init(void)
>> +{
>> + if (!(elf_hwcap & HWCAP_ASIMD))
>> + return -ENODEV;
>> + return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
>
> I haven't tried to understand everything here, but the kernel-mode NEON
> integration looks OK to me.
>
I agree that the conditional use of the NEON looks fine here. The RT
folks will frown at handling all input inside a single
kernel_mode_neon_begin/_end pair, but we can fix that later once my
changes for yielding the NEON get merged (which may take a while)
WARNING: multiple messages have this Message-ID (diff)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Dave Martin <Dave.Martin@arm.com>
Cc: Eric Biggers <ebiggers@google.com>,
"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
<linux-crypto@vger.kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
Greg Kaiser <gkaiser@google.com>,
Michael Halcrow <mhalcrow@google.com>,
Patrik Torstensson <totte@google.com>,
Paul Lawrence <paullawrence@google.com>,
linux-fscrypt@vger.kernel.org,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Paul Crowley <paulcrowley@google.com>
Subject: Re: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS
Date: Tue, 6 Mar 2018 12:47:45 +0000 [thread overview]
Message-ID: <CAKv+Gu9bgJ_zW30Q=nFcof_xhQzno4WvtNbpweav=22B6ef5GA@mail.gmail.com> (raw)
In-Reply-To: <20180306123505.GK32331@e103592.cambridge.arm.com>
On 6 March 2018 at 12:35, Dave Martin <Dave.Martin@arm.com> wrote:
> On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
>> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> for ARM64. This is ported from the 32-bit version. It may be useful on
>> devices with 64-bit ARM CPUs that don't have the Cryptography
>> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
>> processor on the Raspberry Pi 3.
>>
>> It generally works the same way as the 32-bit version, but there are
>> some slight differences due to the different instructions, registers,
>> and syntax available in ARM64 vs. in ARM32. For example, in the 64-bit
>> version there are enough registers to hold the XTS tweaks for each
>> 128-byte chunk, so they don't need to be saved on the stack.
>>
>> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
>>
>> Algorithm Encryption Decryption
>> --------- ---------- ----------
>> Speck64/128-XTS (NEON) 92.2 MB/s 92.2 MB/s
>> Speck128/256-XTS (NEON) 75.0 MB/s 75.0 MB/s
>> Speck128/256-XTS (generic) 47.4 MB/s 35.6 MB/s
>> AES-128-XTS (NEON bit-sliced) 33.4 MB/s 29.6 MB/s
>> AES-256-XTS (NEON bit-sliced) 24.6 MB/s 21.7 MB/s
>>
>> The code performs well on higher-end ARM64 processors as well, though
>> such processors tend to have the Crypto Extensions which make AES
>> preferred. For example, here are the same benchmarks run on a HiKey960
>> (with CPU affinity set for the A73 cores), with the Crypto Extensions
>> implementation of AES-256-XTS added:
>>
>> Algorithm Encryption Decryption
>> --------- ----------- -----------
>> AES-256-XTS (Crypto Extensions) 1273.3 MB/s 1274.7 MB/s
>> Speck64/128-XTS (NEON) 359.8 MB/s 348.0 MB/s
>> Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
>> Speck128/256-XTS (generic) 186.3 MB/s 181.8 MB/s
>> AES-128-XTS (NEON bit-sliced) 142.0 MB/s 124.3 MB/s
>> AES-256-XTS (NEON bit-sliced) 104.7 MB/s 91.1 MB/s
>>
>> Signed-off-by: Eric Biggers <ebiggers@google.com>
>> ---
>> arch/arm64/crypto/Kconfig | 6 +
>> arch/arm64/crypto/Makefile | 3 +
>> arch/arm64/crypto/speck-neon-core.S | 352 ++++++++++++++++++++++++++++
>> arch/arm64/crypto/speck-neon-glue.c | 282 ++++++++++++++++++++++
>> 4 files changed, 643 insertions(+)
>> create mode 100644 arch/arm64/crypto/speck-neon-core.S
>> create mode 100644 arch/arm64/crypto/speck-neon-glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 285c36c7b408..cb5a243110c4 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
>> select CRYPTO_AES_ARM64
>> select CRYPTO_SIMD
>>
>> +config CRYPTO_SPECK_NEON
>> + tristate "NEON accelerated Speck cipher algorithms"
>> + depends on KERNEL_MODE_NEON
>> + select CRYPTO_BLKCIPHER
>> + select CRYPTO_SPECK
>> +
>> endif
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index cee9b8d9830b..d94ebd15a859 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -53,6 +53,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
>> obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>> chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>>
>> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>> +
>> obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
>> aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
>>
>> diff --git a/arch/arm64/crypto/speck-neon-core.S b/arch/arm64/crypto/speck-neon-core.S
>> new file mode 100644
>> index 000000000000..b14463438b09
>> --- /dev/null
>> +++ b/arch/arm64/crypto/speck-neon-core.S
>> @@ -0,0 +1,352 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> + *
>> + * Copyright (c) 2018 Google, Inc
>> + *
>> + * Author: Eric Biggers <ebiggers@google.com>
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> + .text
>> +
>> + // arguments
>> + ROUND_KEYS .req x0 // const {u64,u32} *round_keys
>> + NROUNDS .req w1 // int nrounds
>> + NROUNDS_X .req x1
>> + DST .req x2 // void *dst
>> + SRC .req x3 // const void *src
>> + NBYTES .req w4 // unsigned int nbytes
>> + TWEAK .req x5 // void *tweak
>> +
>> + // registers which hold the data being encrypted/decrypted
>> + // (underscores avoid a naming collision with ARM64 registers x0-x3)
>> + X_0 .req v0
>> + Y_0 .req v1
>> + X_1 .req v2
>> + Y_1 .req v3
>> + X_2 .req v4
>> + Y_2 .req v5
>> + X_3 .req v6
>> + Y_3 .req v7
>> +
>> + // the round key, duplicated in all lanes
>> + ROUND_KEY .req v8
>> +
>> + // index vector for tbl-based 8-bit rotates
>> + ROTATE_TABLE .req v9
>> + ROTATE_TABLE_Q .req q9
>> +
>> + // temporary registers
>> + TMP0 .req v10
>> + TMP1 .req v11
>> + TMP2 .req v12
>> + TMP3 .req v13
>> +
>> + // multiplication table for updating XTS tweaks
>> + GFMUL_TABLE .req v14
>> + GFMUL_TABLE_Q .req q14
>> +
>> + // next XTS tweak value(s)
>> + TWEAKV_NEXT .req v15
>> +
>> + // XTS tweaks for the blocks currently being encrypted/decrypted
>> + TWEAKV0 .req v16
>> + TWEAKV1 .req v17
>> + TWEAKV2 .req v18
>> + TWEAKV3 .req v19
>> + TWEAKV4 .req v20
>> + TWEAKV5 .req v21
>> + TWEAKV6 .req v22
>> + TWEAKV7 .req v23
>> +
>> + .align 4
>> +.Lror64_8_table:
>> + .octa 0x080f0e0d0c0b0a090007060504030201
>> +.Lror32_8_table:
>> + .octa 0x0c0f0e0d080b0a090407060500030201
>> +.Lrol64_8_table:
>> + .octa 0x0e0d0c0b0a09080f0605040302010007
>> +.Lrol32_8_table:
>> + .octa 0x0e0d0c0f0a09080b0605040702010003
>> +.Lgf128mul_table:
>> + .octa 0x00000000000000870000000000000001
>> +.Lgf64mul_table:
>> + .octa 0x0000000000000000000000002d361b00
>
> Won't this put the data in the image in an endianness-dependent layout?
> Alternatively, if this doesn't matter, then why doesn't it matter?
>
> (I don't claim to understand the code fully here...)
>
Since these constants get loaded using 'ldr q#, .Lxxxx' instructions,
this arrangement is actually endian agnostic.
...
>> +static int __init speck_neon_module_init(void)
>> +{
>> + if (!(elf_hwcap & HWCAP_ASIMD))
>> + return -ENODEV;
>> + return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
>
> I haven't tried to understand everything here, but the kernel-mode NEON
> integration looks OK to me.
>
I agree that the conditional use of the NEON looks fine here. The RT
folks will frown at handling all input inside a single
kernel_mode_neon_begin/_end pair, but we can fix that later once my
changes for yielding the NEON get merged (which may take a while)
WARNING: multiple messages have this Message-ID (diff)
From: ard.biesheuvel@linaro.org (Ard Biesheuvel)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS
Date: Tue, 6 Mar 2018 12:47:45 +0000 [thread overview]
Message-ID: <CAKv+Gu9bgJ_zW30Q=nFcof_xhQzno4WvtNbpweav=22B6ef5GA@mail.gmail.com> (raw)
In-Reply-To: <20180306123505.GK32331@e103592.cambridge.arm.com>
On 6 March 2018 at 12:35, Dave Martin <Dave.Martin@arm.com> wrote:
> On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
>> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> for ARM64. This is ported from the 32-bit version. It may be useful on
>> devices with 64-bit ARM CPUs that don't have the Cryptography
>> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
>> processor on the Raspberry Pi 3.
>>
>> It generally works the same way as the 32-bit version, but there are
>> some slight differences due to the different instructions, registers,
>> and syntax available in ARM64 vs. in ARM32. For example, in the 64-bit
>> version there are enough registers to hold the XTS tweaks for each
>> 128-byte chunk, so they don't need to be saved on the stack.
>>
>> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
>>
>> Algorithm Encryption Decryption
>> --------- ---------- ----------
>> Speck64/128-XTS (NEON) 92.2 MB/s 92.2 MB/s
>> Speck128/256-XTS (NEON) 75.0 MB/s 75.0 MB/s
>> Speck128/256-XTS (generic) 47.4 MB/s 35.6 MB/s
>> AES-128-XTS (NEON bit-sliced) 33.4 MB/s 29.6 MB/s
>> AES-256-XTS (NEON bit-sliced) 24.6 MB/s 21.7 MB/s
>>
>> The code performs well on higher-end ARM64 processors as well, though
>> such processors tend to have the Crypto Extensions which make AES
>> preferred. For example, here are the same benchmarks run on a HiKey960
>> (with CPU affinity set for the A73 cores), with the Crypto Extensions
>> implementation of AES-256-XTS added:
>>
>> Algorithm Encryption Decryption
>> --------- ----------- -----------
>> AES-256-XTS (Crypto Extensions) 1273.3 MB/s 1274.7 MB/s
>> Speck64/128-XTS (NEON) 359.8 MB/s 348.0 MB/s
>> Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
>> Speck128/256-XTS (generic) 186.3 MB/s 181.8 MB/s
>> AES-128-XTS (NEON bit-sliced) 142.0 MB/s 124.3 MB/s
>> AES-256-XTS (NEON bit-sliced) 104.7 MB/s 91.1 MB/s
>>
>> Signed-off-by: Eric Biggers <ebiggers@google.com>
>> ---
>> arch/arm64/crypto/Kconfig | 6 +
>> arch/arm64/crypto/Makefile | 3 +
>> arch/arm64/crypto/speck-neon-core.S | 352 ++++++++++++++++++++++++++++
>> arch/arm64/crypto/speck-neon-glue.c | 282 ++++++++++++++++++++++
>> 4 files changed, 643 insertions(+)
>> create mode 100644 arch/arm64/crypto/speck-neon-core.S
>> create mode 100644 arch/arm64/crypto/speck-neon-glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 285c36c7b408..cb5a243110c4 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
>> select CRYPTO_AES_ARM64
>> select CRYPTO_SIMD
>>
>> +config CRYPTO_SPECK_NEON
>> + tristate "NEON accelerated Speck cipher algorithms"
>> + depends on KERNEL_MODE_NEON
>> + select CRYPTO_BLKCIPHER
>> + select CRYPTO_SPECK
>> +
>> endif
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index cee9b8d9830b..d94ebd15a859 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -53,6 +53,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
>> obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>> chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>>
>> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>> +
>> obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
>> aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
>>
>> diff --git a/arch/arm64/crypto/speck-neon-core.S b/arch/arm64/crypto/speck-neon-core.S
>> new file mode 100644
>> index 000000000000..b14463438b09
>> --- /dev/null
>> +++ b/arch/arm64/crypto/speck-neon-core.S
>> @@ -0,0 +1,352 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> + *
>> + * Copyright (c) 2018 Google, Inc
>> + *
>> + * Author: Eric Biggers <ebiggers@google.com>
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> + .text
>> +
>> + // arguments
>> + ROUND_KEYS .req x0 // const {u64,u32} *round_keys
>> + NROUNDS .req w1 // int nrounds
>> + NROUNDS_X .req x1
>> + DST .req x2 // void *dst
>> + SRC .req x3 // const void *src
>> + NBYTES .req w4 // unsigned int nbytes
>> + TWEAK .req x5 // void *tweak
>> +
>> + // registers which hold the data being encrypted/decrypted
>> + // (underscores avoid a naming collision with ARM64 registers x0-x3)
>> + X_0 .req v0
>> + Y_0 .req v1
>> + X_1 .req v2
>> + Y_1 .req v3
>> + X_2 .req v4
>> + Y_2 .req v5
>> + X_3 .req v6
>> + Y_3 .req v7
>> +
>> + // the round key, duplicated in all lanes
>> + ROUND_KEY .req v8
>> +
>> + // index vector for tbl-based 8-bit rotates
>> + ROTATE_TABLE .req v9
>> + ROTATE_TABLE_Q .req q9
>> +
>> + // temporary registers
>> + TMP0 .req v10
>> + TMP1 .req v11
>> + TMP2 .req v12
>> + TMP3 .req v13
>> +
>> + // multiplication table for updating XTS tweaks
>> + GFMUL_TABLE .req v14
>> + GFMUL_TABLE_Q .req q14
>> +
>> + // next XTS tweak value(s)
>> + TWEAKV_NEXT .req v15
>> +
>> + // XTS tweaks for the blocks currently being encrypted/decrypted
>> + TWEAKV0 .req v16
>> + TWEAKV1 .req v17
>> + TWEAKV2 .req v18
>> + TWEAKV3 .req v19
>> + TWEAKV4 .req v20
>> + TWEAKV5 .req v21
>> + TWEAKV6 .req v22
>> + TWEAKV7 .req v23
>> +
>> + .align 4
>> +.Lror64_8_table:
>> + .octa 0x080f0e0d0c0b0a090007060504030201
>> +.Lror32_8_table:
>> + .octa 0x0c0f0e0d080b0a090407060500030201
>> +.Lrol64_8_table:
>> + .octa 0x0e0d0c0b0a09080f0605040302010007
>> +.Lrol32_8_table:
>> + .octa 0x0e0d0c0f0a09080b0605040702010003
>> +.Lgf128mul_table:
>> + .octa 0x00000000000000870000000000000001
>> +.Lgf64mul_table:
>> + .octa 0x0000000000000000000000002d361b00
>
> Won't this put the data in the image in an endianness-dependent layout?
> Alternatively, if this doesn't matter, then why doesn't it matter?
>
> (I don't claim to understand the code fully here...)
>
Since these constants get loaded using 'ldr q#, .Lxxxx' instructions,
this arrangement is actually endian agnostic.
...
>> +static int __init speck_neon_module_init(void)
>> +{
>> + if (!(elf_hwcap & HWCAP_ASIMD))
>> + return -ENODEV;
>> + return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
>
> I haven't tried to understand everything here, but the kernel-mode NEON
> integration looks OK to me.
>
I agree that the conditional use of the NEON looks fine here. The RT
folks will frown at handling all input inside a single
kernel_mode_neon_begin/_end pair, but we can fix that later once my
changes for yielding the NEON get merged (which may take a while)
next prev parent reply other threads:[~2018-03-06 12:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-05 19:17 [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS Eric Biggers
2018-03-05 19:17 ` Eric Biggers
2018-03-05 19:17 ` Eric Biggers
2018-03-06 12:35 ` Dave Martin
2018-03-06 12:35 ` Dave Martin
2018-03-06 12:35 ` Dave Martin
2018-03-06 12:47 ` Ard Biesheuvel [this message]
2018-03-06 12:47 ` Ard Biesheuvel
2018-03-06 12:47 ` Ard Biesheuvel
2018-03-06 13:44 ` Dave Martin
2018-03-06 13:44 ` Dave Martin
2018-03-06 13:44 ` Dave Martin
2018-03-16 15:53 ` Herbert Xu
2018-03-16 15:53 ` Herbert Xu
2018-03-16 15:53 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKv+Gu9bgJ_zW30Q=nFcof_xhQzno4WvtNbpweav=22B6ef5GA@mail.gmail.com' \
--to=ard.biesheuvel@linaro.org \
--cc=Dave.Martin@arm.com \
--cc=ebiggers@google.com \
--cc=gkaiser@google.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-fscrypt@vger.kernel.org \
--cc=mhalcrow@google.com \
--cc=paulcrowley@google.com \
--cc=paullawrence@google.com \
--cc=totte@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.