* [PATCH] arm64: crypto: increase AES interleave to 4x
@ 2015-02-19 17:25 ` Ard Biesheuvel
0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-02-19 17:25 UTC (permalink / raw)
To: will.deacon, linux-arm-kernel
Cc: steve.capper, herbert, linux-crypto, Ard Biesheuvel
This patch increases the interleave factor for parallel AES modes
to 4x. This improves performance on Cortex-A57 by ~35%. This is
due to the 3-cycle latency of AES instructions on the A57's
relatively deep pipeline (compared to Cortex-A53 where the AES
instruction latency is only 2 cycles).
At the same time, disable inline expansion of the core AES functions,
as the performance benefit of this feature is negligible.
Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
Baseline (2x interleave, inline expansion)
------------------------------------------
testing speed of async cbc(aes) (cbc-aes-ce) decryption
test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
This patch (4x interleave, no inline expansion)
-----------------------------------------------
testing speed of async cbc(aes) (cbc-aes-ce) decryption
test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 5720608c50b1..abb79b3cfcfe 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
aes-neon-blk-y := aes-glue-neon.o aes-neon.o
-AFLAGS_aes-ce.o := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
+AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4
CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
--
1.8.3.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] arm64: crypto: increase AES interleave to 4x
@ 2015-02-19 17:25 ` Ard Biesheuvel
0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-02-19 17:25 UTC (permalink / raw)
To: linux-arm-kernel
This patch increases the interleave factor for parallel AES modes
to 4x. This improves performance on Cortex-A57 by ~35%. This is
due to the 3-cycle latency of AES instructions on the A57's
relatively deep pipeline (compared to Cortex-A53 where the AES
instruction latency is only 2 cycles).
At the same time, disable inline expansion of the core AES functions,
as the performance benefit of this feature is negligible.
Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
Baseline (2x interleave, inline expansion)
------------------------------------------
testing speed of async cbc(aes) (cbc-aes-ce) decryption
test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
This patch (4x interleave, no inline expansion)
-----------------------------------------------
testing speed of async cbc(aes) (cbc-aes-ce) decryption
test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 5720608c50b1..abb79b3cfcfe 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
aes-neon-blk-y := aes-glue-neon.o aes-neon.o
-AFLAGS_aes-ce.o := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
+AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4
CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
--
1.8.3.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] arm64: crypto: increase AES interleave to 4x
2015-02-19 17:25 ` Ard Biesheuvel
@ 2015-02-20 15:55 ` Will Deacon
-1 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2015-02-20 15:55 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-arm-kernel, steve.capper, herbert, linux-crypto
On Thu, Feb 19, 2015 at 05:25:16PM +0000, Ard Biesheuvel wrote:
> This patch increases the interleave factor for parallel AES modes
> to 4x. This improves performance on Cortex-A57 by ~35%. This is
> due to the 3-cycle latency of AES instructions on the A57's
> relatively deep pipeline (compared to Cortex-A53 where the AES
> instruction latency is only 2 cycles).
>
> At the same time, disable inline expansion of the core AES functions,
> as the performance benefit of this feature is negligible.
>
> Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
>
> Baseline (2x interleave, inline expansion)
> ------------------------------------------
> testing speed of async cbc(aes) (cbc-aes-ce) decryption
> test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
> test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
>
> This patch (4x interleave, no inline expansion)
> -----------------------------------------------
> testing speed of async cbc(aes) (cbc-aes-ce) decryption
> test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
> test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
Fine by me. Shall I queue this via the arm64 tree?
Will
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> arch/arm64/crypto/Makefile | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index 5720608c50b1..abb79b3cfcfe 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
> aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>
> -AFLAGS_aes-ce.o := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
> +AFLAGS_aes-ce.o := -DINTERLEAVE=4
> AFLAGS_aes-neon.o := -DINTERLEAVE=4
>
> CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
> --
> 1.8.3.2
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] arm64: crypto: increase AES interleave to 4x
@ 2015-02-20 15:55 ` Will Deacon
0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2015-02-20 15:55 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Feb 19, 2015 at 05:25:16PM +0000, Ard Biesheuvel wrote:
> This patch increases the interleave factor for parallel AES modes
> to 4x. This improves performance on Cortex-A57 by ~35%. This is
> due to the 3-cycle latency of AES instructions on the A57's
> relatively deep pipeline (compared to Cortex-A53 where the AES
> instruction latency is only 2 cycles).
>
> At the same time, disable inline expansion of the core AES functions,
> as the performance benefit of this feature is negligible.
>
> Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
>
> Baseline (2x interleave, inline expansion)
> ------------------------------------------
> testing speed of async cbc(aes) (cbc-aes-ce) decryption
> test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
> test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
>
> This patch (4x interleave, no inline expansion)
> -----------------------------------------------
> testing speed of async cbc(aes) (cbc-aes-ce) decryption
> test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
> test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
Fine by me. Shall I queue this via the arm64 tree?
Will
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> arch/arm64/crypto/Makefile | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index 5720608c50b1..abb79b3cfcfe 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
> aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>
> -AFLAGS_aes-ce.o := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
> +AFLAGS_aes-ce.o := -DINTERLEAVE=4
> AFLAGS_aes-neon.o := -DINTERLEAVE=4
>
> CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
> --
> 1.8.3.2
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] arm64: crypto: increase AES interleave to 4x
2015-02-20 15:55 ` Will Deacon
@ 2015-02-20 16:16 ` Ard Biesheuvel
-1 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-02-20 16:16 UTC (permalink / raw)
To: Will Deacon; +Cc: linux-arm-kernel, steve.capper, herbert, linux-crypto
On 20 February 2015 at 15:55, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Feb 19, 2015 at 05:25:16PM +0000, Ard Biesheuvel wrote:
>> This patch increases the interleave factor for parallel AES modes
>> to 4x. This improves performance on Cortex-A57 by ~35%. This is
>> due to the 3-cycle latency of AES instructions on the A57's
>> relatively deep pipeline (compared to Cortex-A53 where the AES
>> instruction latency is only 2 cycles).
>>
>> At the same time, disable inline expansion of the core AES functions,
>> as the performance benefit of this feature is negligible.
>>
>> Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
>>
>> Baseline (2x interleave, inline expansion)
>> ------------------------------------------
>> testing speed of async cbc(aes) (cbc-aes-ce) decryption
>> test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
>> test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
>>
>> This patch (4x interleave, no inline expansion)
>> -----------------------------------------------
>> testing speed of async cbc(aes) (cbc-aes-ce) decryption
>> test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
>> test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
>
> Fine by me. Shall I queue this via the arm64 tree?
>
Yes, please.
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>> arch/arm64/crypto/Makefile | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index 5720608c50b1..abb79b3cfcfe 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>> aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>>
>> -AFLAGS_aes-ce.o := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
>> +AFLAGS_aes-ce.o := -DINTERLEAVE=4
>> AFLAGS_aes-neon.o := -DINTERLEAVE=4
>>
>> CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
>> --
>> 1.8.3.2
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] arm64: crypto: increase AES interleave to 4x
@ 2015-02-20 16:16 ` Ard Biesheuvel
0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2015-02-20 16:16 UTC (permalink / raw)
To: linux-arm-kernel
On 20 February 2015 at 15:55, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Feb 19, 2015 at 05:25:16PM +0000, Ard Biesheuvel wrote:
>> This patch increases the interleave factor for parallel AES modes
>> to 4x. This improves performance on Cortex-A57 by ~35%. This is
>> due to the 3-cycle latency of AES instructions on the A57's
>> relatively deep pipeline (compared to Cortex-A53 where the AES
>> instruction latency is only 2 cycles).
>>
>> At the same time, disable inline expansion of the core AES functions,
>> as the performance benefit of this feature is negligible.
>>
>> Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1):
>>
>> Baseline (2x interleave, inline expansion)
>> ------------------------------------------
>> testing speed of async cbc(aes) (cbc-aes-ce) decryption
>> test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds
>> test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds
>>
>> This patch (4x interleave, no inline expansion)
>> -----------------------------------------------
>> testing speed of async cbc(aes) (cbc-aes-ce) decryption
>> test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds
>> test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds
>
> Fine by me. Shall I queue this via the arm64 tree?
>
Yes, please.
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>> arch/arm64/crypto/Makefile | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index 5720608c50b1..abb79b3cfcfe 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -29,7 +29,7 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>> obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>> aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>>
>> -AFLAGS_aes-ce.o := -DINTERLEAVE=2 -DINTERLEAVE_INLINE
>> +AFLAGS_aes-ce.o := -DINTERLEAVE=4
>> AFLAGS_aes-neon.o := -DINTERLEAVE=4
>>
>> CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
>> --
>> 1.8.3.2
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-20 16:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-19 17:25 [PATCH] arm64: crypto: increase AES interleave to 4x Ard Biesheuvel
2015-02-19 17:25 ` Ard Biesheuvel
2015-02-20 15:55 ` Will Deacon
2015-02-20 15:55 ` Will Deacon
2015-02-20 16:16 ` Ard Biesheuvel
2015-02-20 16:16 ` Ard Biesheuvel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.