* [PATCH 0/6] crypto: arm64 - big endian fixes
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
As it turns out, none of the accelerated crypto routines under arch/arm64/crypto
currently work, or have ever worked correctly when built for big endian. So this
series fixes all of them.
Each of these patches carries a fixes tag, and could be backported to stable.
However, for patches #1 and #5, the fixes tag denotes the oldest commit that the
fix is compatible with, not the patch that introduced the algorithm. This is due
to the fact that the key schedules are incompatible between generic AES and the
arm64 Crypto Extensions implementation (but only when building for big endian)
This is not a problem in practice, but it does mean that the AES-CCM and AES in
EBC/CBC/CTR/XTS mode implementations before v3.19 require a different fix, i.e.,
one that is compatible with the generic AES key schedule generation code (which
it currently no longer uses)
In any case, please apply with cc to stable.
Ard Biesheuvel (6):
crypto: arm64/aes-ce - fix for big endian
crypto: arm64/ghash-ce - fix for big endian
crypto: arm64/sha1-ce - fix for big endian
crypto: arm64/sha2-ce - fix for big endian
crypto: arm64/aes-ccm-ce: fix for big endian
crypto: arm64/aes-neon - fix for big endian
arch/arm64/crypto/aes-ce-ccm-core.S | 53 ++++++++++----------
arch/arm64/crypto/aes-ce-cipher.c | 25 +++++----
arch/arm64/crypto/aes-neon.S | 25 +++++----
arch/arm64/crypto/ghash-ce-core.S | 6 +--
arch/arm64/crypto/sha1-ce-core.S | 4 +-
arch/arm64/crypto/sha2-ce-core.S | 4 +-
6 files changed, 64 insertions(+), 53 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/6] crypto: arm64 - big endian fixes
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
As it turns out, none of the accelerated crypto routines under arch/arm64/crypto
currently work, or have ever worked correctly when built for big endian. So this
series fixes all of them.
Each of these patches carries a fixes tag, and could be backported to stable.
However, for patches #1 and #5, the fixes tag denotes the oldest commit that the
fix is compatible with, not the patch that introduced the algorithm. This is due
to the fact that the key schedules are incompatible between generic AES and the
arm64 Crypto Extensions implementation (but only when building for big endian)
This is not a problem in practice, but it does mean that the AES-CCM and AES in
EBC/CBC/CTR/XTS mode implementations before v3.19 require a different fix, i.e.,
one that is compatible with the generic AES key schedule generation code (which
it currently no longer uses)
In any case, please apply with cc to stable.
Ard Biesheuvel (6):
crypto: arm64/aes-ce - fix for big endian
crypto: arm64/ghash-ce - fix for big endian
crypto: arm64/sha1-ce - fix for big endian
crypto: arm64/sha2-ce - fix for big endian
crypto: arm64/aes-ccm-ce: fix for big endian
crypto: arm64/aes-neon - fix for big endian
arch/arm64/crypto/aes-ce-ccm-core.S | 53 ++++++++++----------
arch/arm64/crypto/aes-ce-cipher.c | 25 +++++----
arch/arm64/crypto/aes-neon.S | 25 +++++----
arch/arm64/crypto/ghash-ce-core.S | 6 +--
arch/arm64/crypto/sha1-ce-core.S | 4 +-
arch/arm64/crypto/sha2-ce-core.S | 4 +-
6 files changed, 64 insertions(+), 53 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/6] crypto: arm64/aes-ce - fix for big endian
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-09 17:42 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
The core AES cipher implementation that uses ARMv8 Crypto Extensions
instructions erroneously loads the round keys as 64-bit quantities,
which causes the algorithm to fail when built for big endian. In
addition, the key schedule generation routine fails to take endianness
into account as well, when loading the combining the input key with
the round constants. So fix both issues.
Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/aes-ce-cipher.c | 25 ++++++++++++--------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
index f7bd9bf0bbb3..50d9fe11d0c8 100644
--- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -47,24 +47,24 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
kernel_neon_begin_partial(4);
__asm__(" ld1 {v0.16b}, %[in] ;"
- " ld1 {v1.2d}, [%[key]], #16 ;"
+ " ld1 {v1.16b}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;"
" bmi 0f ;"
" bne 3f ;"
" mov v3.16b, v1.16b ;"
" b 2f ;"
"0: mov v2.16b, v1.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
"1: aese v0.16b, v2.16b ;"
" aesmc v0.16b, v0.16b ;"
- "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ "2: ld1 {v1.16b}, [%[key]], #16 ;"
" aese v0.16b, v3.16b ;"
" aesmc v0.16b, v0.16b ;"
- "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ "3: ld1 {v2.16b}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;"
" aese v0.16b, v1.16b ;"
" aesmc v0.16b, v0.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
" bpl 1b ;"
" aese v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;"
@@ -92,24 +92,24 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
kernel_neon_begin_partial(4);
__asm__(" ld1 {v0.16b}, %[in] ;"
- " ld1 {v1.2d}, [%[key]], #16 ;"
+ " ld1 {v1.16b}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;"
" bmi 0f ;"
" bne 3f ;"
" mov v3.16b, v1.16b ;"
" b 2f ;"
"0: mov v2.16b, v1.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
"1: aesd v0.16b, v2.16b ;"
" aesimc v0.16b, v0.16b ;"
- "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ "2: ld1 {v1.16b}, [%[key]], #16 ;"
" aesd v0.16b, v3.16b ;"
" aesimc v0.16b, v0.16b ;"
- "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ "3: ld1 {v2.16b}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;"
" aesd v0.16b, v1.16b ;"
" aesimc v0.16b, v0.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
" bpl 1b ;"
" aesd v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;"
@@ -173,7 +173,12 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
u32 *rki = ctx->key_enc + (i * kwords);
u32 *rko = rki + kwords;
+#ifndef CONFIG_CPU_BIG_ENDIAN
rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
+#else
+ rko[0] = rol32(aes_sub(rki[kwords - 1]), 8) ^ (rcon[i] << 24) ^
+ rki[0];
+#endif
rko[1] = rko[0] ^ rki[1];
rko[2] = rko[1] ^ rki[2];
rko[3] = rko[2] ^ rki[3];
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 1/6] crypto: arm64/aes-ce - fix for big endian
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
The core AES cipher implementation that uses ARMv8 Crypto Extensions
instructions erroneously loads the round keys as 64-bit quantities,
which causes the algorithm to fail when built for big endian. In
addition, the key schedule generation routine fails to take endianness
into account as well, when loading the combining the input key with
the round constants. So fix both issues.
Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/aes-ce-cipher.c | 25 ++++++++++++--------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
index f7bd9bf0bbb3..50d9fe11d0c8 100644
--- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -47,24 +47,24 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
kernel_neon_begin_partial(4);
__asm__(" ld1 {v0.16b}, %[in] ;"
- " ld1 {v1.2d}, [%[key]], #16 ;"
+ " ld1 {v1.16b}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;"
" bmi 0f ;"
" bne 3f ;"
" mov v3.16b, v1.16b ;"
" b 2f ;"
"0: mov v2.16b, v1.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
"1: aese v0.16b, v2.16b ;"
" aesmc v0.16b, v0.16b ;"
- "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ "2: ld1 {v1.16b}, [%[key]], #16 ;"
" aese v0.16b, v3.16b ;"
" aesmc v0.16b, v0.16b ;"
- "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ "3: ld1 {v2.16b}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;"
" aese v0.16b, v1.16b ;"
" aesmc v0.16b, v0.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
" bpl 1b ;"
" aese v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;"
@@ -92,24 +92,24 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
kernel_neon_begin_partial(4);
__asm__(" ld1 {v0.16b}, %[in] ;"
- " ld1 {v1.2d}, [%[key]], #16 ;"
+ " ld1 {v1.16b}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;"
" bmi 0f ;"
" bne 3f ;"
" mov v3.16b, v1.16b ;"
" b 2f ;"
"0: mov v2.16b, v1.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
"1: aesd v0.16b, v2.16b ;"
" aesimc v0.16b, v0.16b ;"
- "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ "2: ld1 {v1.16b}, [%[key]], #16 ;"
" aesd v0.16b, v3.16b ;"
" aesimc v0.16b, v0.16b ;"
- "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ "3: ld1 {v2.16b}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;"
" aesd v0.16b, v1.16b ;"
" aesimc v0.16b, v0.16b ;"
- " ld1 {v3.2d}, [%[key]], #16 ;"
+ " ld1 {v3.16b}, [%[key]], #16 ;"
" bpl 1b ;"
" aesd v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;"
@@ -173,7 +173,12 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
u32 *rki = ctx->key_enc + (i * kwords);
u32 *rko = rki + kwords;
+#ifndef CONFIG_CPU_BIG_ENDIAN
rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
+#else
+ rko[0] = rol32(aes_sub(rki[kwords - 1]), 8) ^ (rcon[i] << 24) ^
+ rki[0];
+#endif
rko[1] = rko[0] ^ rki[1];
rko[2] = rko[1] ^ rki[2];
rko[3] = rko[2] ^ rki[3];
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/6] crypto: arm64/ghash-ce - fix for big endian
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-09 17:42 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
The GHASH key and digest are both pairs of 64-bit quantities, but the
GHASH code does not always refer to them as such, causing failures when
built for big endian. So replace the 16x1 loads and stores with 2x8 ones.
Fixes: b913a6404ce2 ("arm64/crypto: improve performance of GHASH algorithm")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/ghash-ce-core.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
index dc457015884e..f0bb9f0b524f 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -29,8 +29,8 @@
* struct ghash_key const *k, const char *head)
*/
ENTRY(pmull_ghash_update)
- ld1 {SHASH.16b}, [x3]
- ld1 {XL.16b}, [x1]
+ ld1 {SHASH.2d}, [x3]
+ ld1 {XL.2d}, [x1]
movi MASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
shl MASK.2d, MASK.2d, #57
@@ -74,6 +74,6 @@ CPU_LE( rev64 T1.16b, T1.16b )
cbnz w0, 0b
- st1 {XL.16b}, [x1]
+ st1 {XL.2d}, [x1]
ret
ENDPROC(pmull_ghash_update)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/6] crypto: arm64/ghash-ce - fix for big endian
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
The GHASH key and digest are both pairs of 64-bit quantities, but the
GHASH code does not always refer to them as such, causing failures when
built for big endian. So replace the 16x1 loads and stores with 2x8 ones.
Fixes: b913a6404ce2 ("arm64/crypto: improve performance of GHASH algorithm")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/ghash-ce-core.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
index dc457015884e..f0bb9f0b524f 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -29,8 +29,8 @@
* struct ghash_key const *k, const char *head)
*/
ENTRY(pmull_ghash_update)
- ld1 {SHASH.16b}, [x3]
- ld1 {XL.16b}, [x1]
+ ld1 {SHASH.2d}, [x3]
+ ld1 {XL.2d}, [x1]
movi MASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
shl MASK.2d, MASK.2d, #57
@@ -74,6 +74,6 @@ CPU_LE( rev64 T1.16b, T1.16b )
cbnz w0, 0b
- st1 {XL.16b}, [x1]
+ st1 {XL.2d}, [x1]
ret
ENDPROC(pmull_ghash_update)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] crypto: arm64/sha1-ce - fix for big endian
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-09 17:42 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
The SHA1 digest is an array of 5 32-bit quantities, so we should refer
to them as such in order for this code to work correctly when built for
big endian. So replace 16 byte scalar loads and stores with 4x4 vector
ones where appropriate.
Fixes: 2c98833a42cd ("arm64/crypto: SHA-1 using ARMv8 Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/sha1-ce-core.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 033aae6d732a..c98e7e849f06 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -78,7 +78,7 @@ ENTRY(sha1_ce_transform)
ld1r {k3.4s}, [x6]
/* load state */
- ldr dga, [x0]
+ ld1 {dgav.4s}, [x0]
ldr dgb, [x0, #16]
/* load sha1_ce_state::finalize */
@@ -144,7 +144,7 @@ CPU_LE( rev32 v11.16b, v11.16b )
b 1b
/* store new state */
-3: str dga, [x0]
+3: st1 {dgav.4s}, [x0]
str dgb, [x0, #16]
ret
ENDPROC(sha1_ce_transform)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] crypto: arm64/sha1-ce - fix for big endian
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
The SHA1 digest is an array of 5 32-bit quantities, so we should refer
to them as such in order for this code to work correctly when built for
big endian. So replace 16 byte scalar loads and stores with 4x4 vector
ones where appropriate.
Fixes: 2c98833a42cd ("arm64/crypto: SHA-1 using ARMv8 Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/sha1-ce-core.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 033aae6d732a..c98e7e849f06 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -78,7 +78,7 @@ ENTRY(sha1_ce_transform)
ld1r {k3.4s}, [x6]
/* load state */
- ldr dga, [x0]
+ ld1 {dgav.4s}, [x0]
ldr dgb, [x0, #16]
/* load sha1_ce_state::finalize */
@@ -144,7 +144,7 @@ CPU_LE( rev32 v11.16b, v11.16b )
b 1b
/* store new state */
-3: str dga, [x0]
+3: st1 {dgav.4s}, [x0]
str dgb, [x0, #16]
ret
ENDPROC(sha1_ce_transform)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6] crypto: arm64/sha2-ce - fix for big endian
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-09 17:42 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
The SHA256 digest is an array of 8 32-bit quantities, so we should refer
to them as such in order for this code to work correctly when built for
big endian. So replace 16 byte scalar loads and stores with 4x32 vector
ones where appropriate.
Fixes: 6ba6c74dfc6b ("arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/sha2-ce-core.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index 5df9d9d470ad..01cfee066837 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -85,7 +85,7 @@ ENTRY(sha2_ce_transform)
ld1 {v12.4s-v15.4s}, [x8]
/* load state */
- ldp dga, dgb, [x0]
+ ld1 {dgav.4s, dgbv.4s}, [x0]
/* load sha256_ce_state::finalize */
ldr w4, [x0, #:lo12:sha256_ce_offsetof_finalize]
@@ -148,6 +148,6 @@ CPU_LE( rev32 v19.16b, v19.16b )
b 1b
/* store new state */
-3: stp dga, dgb, [x0]
+3: st1 {dgav.4s, dgbv.4s}, [x0]
ret
ENDPROC(sha2_ce_transform)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6] crypto: arm64/sha2-ce - fix for big endian
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
The SHA256 digest is an array of 8 32-bit quantities, so we should refer
to them as such in order for this code to work correctly when built for
big endian. So replace 16 byte scalar loads and stores with 4x32 vector
ones where appropriate.
Fixes: 6ba6c74dfc6b ("arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/sha2-ce-core.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index 5df9d9d470ad..01cfee066837 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -85,7 +85,7 @@ ENTRY(sha2_ce_transform)
ld1 {v12.4s-v15.4s}, [x8]
/* load state */
- ldp dga, dgb, [x0]
+ ld1 {dgav.4s, dgbv.4s}, [x0]
/* load sha256_ce_state::finalize */
ldr w4, [x0, #:lo12:sha256_ce_offsetof_finalize]
@@ -148,6 +148,6 @@ CPU_LE( rev32 v19.16b, v19.16b )
b 1b
/* store new state */
-3: stp dga, dgb, [x0]
+3: st1 {dgav.4s, dgbv.4s}, [x0]
ret
ENDPROC(sha2_ce_transform)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/6] crypto: arm64/aes-ccm-ce: fix for big endian
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-09 17:42 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
The AES-CCM implementation that uses ARMv8 Crypto Extensions instructions
refers to the AES round keys as pairs of 64-bit quantities, which causes
failures when building the code for big endian. In addition, it byte swaps
the input counter unconditionally, while this is only required for little
endian builds. So fix both issues.
Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/aes-ce-ccm-core.S | 53 ++++++++++----------
1 file changed, 27 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
index a2a7fbcacc14..3363560c79b7 100644
--- a/arch/arm64/crypto/aes-ce-ccm-core.S
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -9,6 +9,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
.text
.arch armv8-a+crypto
@@ -19,7 +20,7 @@
*/
ENTRY(ce_aes_ccm_auth_data)
ldr w8, [x3] /* leftover from prev round? */
- ld1 {v0.2d}, [x0] /* load mac */
+ ld1 {v0.16b}, [x0] /* load mac */
cbz w8, 1f
sub w8, w8, #16
eor v1.16b, v1.16b, v1.16b
@@ -31,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data)
beq 8f /* out of input? */
cbnz w8, 0b
eor v0.16b, v0.16b, v1.16b
-1: ld1 {v3.2d}, [x4] /* load first round key */
+1: ld1 {v3.16b}, [x4] /* load first round key */
prfm pldl1strm, [x1]
cmp w5, #12 /* which key size? */
add x6, x4, #16
@@ -41,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data)
mov v5.16b, v3.16b
b 4f
2: mov v4.16b, v3.16b
- ld1 {v5.2d}, [x6], #16 /* load 2nd round key */
+ ld1 {v5.16b}, [x6], #16 /* load 2nd round key */
3: aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
-4: ld1 {v3.2d}, [x6], #16 /* load next round key */
+4: ld1 {v3.16b}, [x6], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
-5: ld1 {v4.2d}, [x6], #16 /* load next round key */
+5: ld1 {v4.16b}, [x6], #16 /* load next round key */
subs w7, w7, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
- ld1 {v5.2d}, [x6], #16 /* load next round key */
+ ld1 {v5.16b}, [x6], #16 /* load next round key */
bpl 3b
aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */
@@ -60,7 +61,7 @@ ENTRY(ce_aes_ccm_auth_data)
ld1 {v1.16b}, [x1], #16 /* load next input block */
eor v0.16b, v0.16b, v1.16b /* xor with mac */
bne 1b
-6: st1 {v0.2d}, [x0] /* store mac */
+6: st1 {v0.16b}, [x0] /* store mac */
beq 10f
adds w2, w2, #16
beq 10f
@@ -79,7 +80,7 @@ ENTRY(ce_aes_ccm_auth_data)
adds w7, w7, #1
bne 9b
eor v0.16b, v0.16b, v1.16b
- st1 {v0.2d}, [x0]
+ st1 {v0.16b}, [x0]
10: str w8, [x3]
ret
ENDPROC(ce_aes_ccm_auth_data)
@@ -89,27 +90,27 @@ ENDPROC(ce_aes_ccm_auth_data)
* u32 rounds);
*/
ENTRY(ce_aes_ccm_final)
- ld1 {v3.2d}, [x2], #16 /* load first round key */
- ld1 {v0.2d}, [x0] /* load mac */
+ ld1 {v3.16b}, [x2], #16 /* load first round key */
+ ld1 {v0.16b}, [x0] /* load mac */
cmp w3, #12 /* which key size? */
sub w3, w3, #2 /* modified # of rounds */
- ld1 {v1.2d}, [x1] /* load 1st ctriv */
+ ld1 {v1.16b}, [x1] /* load 1st ctriv */
bmi 0f
bne 3f
mov v5.16b, v3.16b
b 2f
0: mov v4.16b, v3.16b
-1: ld1 {v5.2d}, [x2], #16 /* load next round key */
+1: ld1 {v5.16b}, [x2], #16 /* load next round key */
aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
aese v1.16b, v4.16b
aesmc v1.16b, v1.16b
-2: ld1 {v3.2d}, [x2], #16 /* load next round key */
+2: ld1 {v3.16b}, [x2], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
aese v1.16b, v5.16b
aesmc v1.16b, v1.16b
-3: ld1 {v4.2d}, [x2], #16 /* load next round key */
+3: ld1 {v4.16b}, [x2], #16 /* load next round key */
subs w3, w3, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
@@ -120,47 +121,47 @@ ENTRY(ce_aes_ccm_final)
aese v1.16b, v4.16b
/* final round key cancels out */
eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */
- st1 {v0.2d}, [x0] /* store result */
+ st1 {v0.16b}, [x0] /* store result */
ret
ENDPROC(ce_aes_ccm_final)
.macro aes_ccm_do_crypt,enc
ldr x8, [x6, #8] /* load lower ctr */
- ld1 {v0.2d}, [x5] /* load mac */
- rev x8, x8 /* keep swabbed ctr in reg */
+ ld1 {v0.16b}, [x5] /* load mac */
+CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
0: /* outer loop */
- ld1 {v1.1d}, [x6] /* load upper ctr */
+ ld1 {v1.8b}, [x6] /* load upper ctr */
prfm pldl1strm, [x1]
add x8, x8, #1
rev x9, x8
cmp w4, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */
- ld1 {v3.2d}, [x3] /* load first round key */
+ ld1 {v3.16b}, [x3] /* load first round key */
add x10, x3, #16
bmi 1f
bne 4f
mov v5.16b, v3.16b
b 3f
1: mov v4.16b, v3.16b
- ld1 {v5.2d}, [x10], #16 /* load 2nd round key */
+ ld1 {v5.16b}, [x10], #16 /* load 2nd round key */
2: /* inner loop: 3 rounds, 2x interleaved */
aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
aese v1.16b, v4.16b
aesmc v1.16b, v1.16b
-3: ld1 {v3.2d}, [x10], #16 /* load next round key */
+3: ld1 {v3.16b}, [x10], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
aese v1.16b, v5.16b
aesmc v1.16b, v1.16b
-4: ld1 {v4.2d}, [x10], #16 /* load next round key */
+4: ld1 {v4.16b}, [x10], #16 /* load next round key */
subs w7, w7, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
aese v1.16b, v3.16b
aesmc v1.16b, v1.16b
- ld1 {v5.2d}, [x10], #16 /* load next round key */
+ ld1 {v5.16b}, [x10], #16 /* load next round key */
bpl 2b
aese v0.16b, v4.16b
aese v1.16b, v4.16b
@@ -177,14 +178,14 @@ ENDPROC(ce_aes_ccm_final)
eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
st1 {v1.16b}, [x0], #16 /* write output block */
bne 0b
- rev x8, x8
- st1 {v0.2d}, [x5] /* store mac */
+CPU_LE( rev x8, x8 )
+ st1 {v0.16b}, [x5] /* store mac */
str x8, [x6, #8] /* store lsb end of ctr (BE) */
5: ret
6: eor v0.16b, v0.16b, v5.16b /* final round mac */
eor v1.16b, v1.16b, v5.16b /* final round enc */
- st1 {v0.2d}, [x5] /* store mac */
+ st1 {v0.16b}, [x5] /* store mac */
add w2, w2, #16 /* process partial tail block */
7: ldrb w9, [x1], #1 /* get 1 byte of input */
umov w6, v1.b[0] /* get top crypted ctr byte */
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/6] crypto: arm64/aes-ccm-ce: fix for big endian
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
The AES-CCM implementation that uses ARMv8 Crypto Extensions instructions
refers to the AES round keys as pairs of 64-bit quantities, which causes
failures when building the code for big endian. In addition, it byte swaps
the input counter unconditionally, while this is only required for little
endian builds. So fix both issues.
Fixes: 12ac3efe74f8 ("arm64/crypto: use crypto instructions to generate AES key schedule")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/aes-ce-ccm-core.S | 53 ++++++++++----------
1 file changed, 27 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
index a2a7fbcacc14..3363560c79b7 100644
--- a/arch/arm64/crypto/aes-ce-ccm-core.S
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -9,6 +9,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
.text
.arch armv8-a+crypto
@@ -19,7 +20,7 @@
*/
ENTRY(ce_aes_ccm_auth_data)
ldr w8, [x3] /* leftover from prev round? */
- ld1 {v0.2d}, [x0] /* load mac */
+ ld1 {v0.16b}, [x0] /* load mac */
cbz w8, 1f
sub w8, w8, #16
eor v1.16b, v1.16b, v1.16b
@@ -31,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data)
beq 8f /* out of input? */
cbnz w8, 0b
eor v0.16b, v0.16b, v1.16b
-1: ld1 {v3.2d}, [x4] /* load first round key */
+1: ld1 {v3.16b}, [x4] /* load first round key */
prfm pldl1strm, [x1]
cmp w5, #12 /* which key size? */
add x6, x4, #16
@@ -41,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data)
mov v5.16b, v3.16b
b 4f
2: mov v4.16b, v3.16b
- ld1 {v5.2d}, [x6], #16 /* load 2nd round key */
+ ld1 {v5.16b}, [x6], #16 /* load 2nd round key */
3: aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
-4: ld1 {v3.2d}, [x6], #16 /* load next round key */
+4: ld1 {v3.16b}, [x6], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
-5: ld1 {v4.2d}, [x6], #16 /* load next round key */
+5: ld1 {v4.16b}, [x6], #16 /* load next round key */
subs w7, w7, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
- ld1 {v5.2d}, [x6], #16 /* load next round key */
+ ld1 {v5.16b}, [x6], #16 /* load next round key */
bpl 3b
aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */
@@ -60,7 +61,7 @@ ENTRY(ce_aes_ccm_auth_data)
ld1 {v1.16b}, [x1], #16 /* load next input block */
eor v0.16b, v0.16b, v1.16b /* xor with mac */
bne 1b
-6: st1 {v0.2d}, [x0] /* store mac */
+6: st1 {v0.16b}, [x0] /* store mac */
beq 10f
adds w2, w2, #16
beq 10f
@@ -79,7 +80,7 @@ ENTRY(ce_aes_ccm_auth_data)
adds w7, w7, #1
bne 9b
eor v0.16b, v0.16b, v1.16b
- st1 {v0.2d}, [x0]
+ st1 {v0.16b}, [x0]
10: str w8, [x3]
ret
ENDPROC(ce_aes_ccm_auth_data)
@@ -89,27 +90,27 @@ ENDPROC(ce_aes_ccm_auth_data)
* u32 rounds);
*/
ENTRY(ce_aes_ccm_final)
- ld1 {v3.2d}, [x2], #16 /* load first round key */
- ld1 {v0.2d}, [x0] /* load mac */
+ ld1 {v3.16b}, [x2], #16 /* load first round key */
+ ld1 {v0.16b}, [x0] /* load mac */
cmp w3, #12 /* which key size? */
sub w3, w3, #2 /* modified # of rounds */
- ld1 {v1.2d}, [x1] /* load 1st ctriv */
+ ld1 {v1.16b}, [x1] /* load 1st ctriv */
bmi 0f
bne 3f
mov v5.16b, v3.16b
b 2f
0: mov v4.16b, v3.16b
-1: ld1 {v5.2d}, [x2], #16 /* load next round key */
+1: ld1 {v5.16b}, [x2], #16 /* load next round key */
aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
aese v1.16b, v4.16b
aesmc v1.16b, v1.16b
-2: ld1 {v3.2d}, [x2], #16 /* load next round key */
+2: ld1 {v3.16b}, [x2], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
aese v1.16b, v5.16b
aesmc v1.16b, v1.16b
-3: ld1 {v4.2d}, [x2], #16 /* load next round key */
+3: ld1 {v4.16b}, [x2], #16 /* load next round key */
subs w3, w3, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
@@ -120,47 +121,47 @@ ENTRY(ce_aes_ccm_final)
aese v1.16b, v4.16b
/* final round key cancels out */
eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */
- st1 {v0.2d}, [x0] /* store result */
+ st1 {v0.16b}, [x0] /* store result */
ret
ENDPROC(ce_aes_ccm_final)
.macro aes_ccm_do_crypt,enc
ldr x8, [x6, #8] /* load lower ctr */
- ld1 {v0.2d}, [x5] /* load mac */
- rev x8, x8 /* keep swabbed ctr in reg */
+ ld1 {v0.16b}, [x5] /* load mac */
+CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
0: /* outer loop */
- ld1 {v1.1d}, [x6] /* load upper ctr */
+ ld1 {v1.8b}, [x6] /* load upper ctr */
prfm pldl1strm, [x1]
add x8, x8, #1
rev x9, x8
cmp w4, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */
- ld1 {v3.2d}, [x3] /* load first round key */
+ ld1 {v3.16b}, [x3] /* load first round key */
add x10, x3, #16
bmi 1f
bne 4f
mov v5.16b, v3.16b
b 3f
1: mov v4.16b, v3.16b
- ld1 {v5.2d}, [x10], #16 /* load 2nd round key */
+ ld1 {v5.16b}, [x10], #16 /* load 2nd round key */
2: /* inner loop: 3 rounds, 2x interleaved */
aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
aese v1.16b, v4.16b
aesmc v1.16b, v1.16b
-3: ld1 {v3.2d}, [x10], #16 /* load next round key */
+3: ld1 {v3.16b}, [x10], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
aese v1.16b, v5.16b
aesmc v1.16b, v1.16b
-4: ld1 {v4.2d}, [x10], #16 /* load next round key */
+4: ld1 {v4.16b}, [x10], #16 /* load next round key */
subs w7, w7, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
aese v1.16b, v3.16b
aesmc v1.16b, v1.16b
- ld1 {v5.2d}, [x10], #16 /* load next round key */
+ ld1 {v5.16b}, [x10], #16 /* load next round key */
bpl 2b
aese v0.16b, v4.16b
aese v1.16b, v4.16b
@@ -177,14 +178,14 @@ ENDPROC(ce_aes_ccm_final)
eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
st1 {v1.16b}, [x0], #16 /* write output block */
bne 0b
- rev x8, x8
- st1 {v0.2d}, [x5] /* store mac */
+CPU_LE( rev x8, x8 )
+ st1 {v0.16b}, [x5] /* store mac */
str x8, [x6, #8] /* store lsb end of ctr (BE) */
5: ret
6: eor v0.16b, v0.16b, v5.16b /* final round mac */
eor v1.16b, v1.16b, v5.16b /* final round enc */
- st1 {v0.2d}, [x5] /* store mac */
+ st1 {v0.16b}, [x5] /* store mac */
add w2, w2, #16 /* process partial tail block */
7: ldrb w9, [x1], #1 /* get 1 byte of input */
umov w6, v1.b[0] /* get top crypted ctr byte */
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 6/6] crypto: arm64/aes-neon - fix for big endian
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-09 17:42 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, herbert
Cc: catalin.marinas, will.deacon, Ard Biesheuvel
The AES implementation using pure NEON instructions relies on the generic
AES key schedule generation routines, which store the round keys as arrays
of 32-bit quantities stored in memory using native endianness. This means
we should refer to these round keys using 4x4 loads rather than 16x1 loads.
In addition, the ShiftRows tables are loading using a single scalar load,
which is also affected by endianness, so emit these tables in the correct
order depending on whether we are building for big endian or not.
Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/aes-neon.S | 25 ++++++++++++--------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S
index b93170e1cc93..85f07ead7c5c 100644
--- a/arch/arm64/crypto/aes-neon.S
+++ b/arch/arm64/crypto/aes-neon.S
@@ -9,6 +9,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
#define AES_ENTRY(func) ENTRY(neon_ ## func)
#define AES_ENDPROC(func) ENDPROC(neon_ ## func)
@@ -83,13 +84,13 @@
.endm
.macro do_block, enc, in, rounds, rk, rkp, i
- ld1 {v15.16b}, [\rk]
+ ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16
mov \i, \rounds
1111: eor \in\().16b, \in\().16b, v15.16b /* ^round key */
tbl \in\().16b, {\in\().16b}, v13.16b /* ShiftRows */
sub_bytes \in
- ld1 {v15.16b}, [\rkp], #16
+ ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1
beq 2222f
.if \enc == 1
@@ -229,7 +230,7 @@
.endm
.macro do_block_2x, enc, in0, in1 rounds, rk, rkp, i
- ld1 {v15.16b}, [\rk]
+ ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16
mov \i, \rounds
1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */
@@ -237,7 +238,7 @@
sub_bytes_2x \in0, \in1
tbl \in0\().16b, {\in0\().16b}, v13.16b /* ShiftRows */
tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */
- ld1 {v15.16b}, [\rkp], #16
+ ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1
beq 2222f
.if \enc == 1
@@ -254,7 +255,7 @@
.endm
.macro do_block_4x, enc, in0, in1, in2, in3, rounds, rk, rkp, i
- ld1 {v15.16b}, [\rk]
+ ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16
mov \i, \rounds
1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */
@@ -266,7 +267,7 @@
tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */
tbl \in2\().16b, {\in2\().16b}, v13.16b /* ShiftRows */
tbl \in3\().16b, {\in3\().16b}, v13.16b /* ShiftRows */
- ld1 {v15.16b}, [\rkp], #16
+ ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1
beq 2222f
.if \enc == 1
@@ -306,12 +307,16 @@
.text
.align 4
.LForward_ShiftRows:
- .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3
- .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb
+CPU_LE( .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3 )
+CPU_LE( .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb )
+CPU_BE( .byte 0xb, 0x6, 0x1, 0xc, 0x7, 0x2, 0xd, 0x8 )
+CPU_BE( .byte 0x3, 0xe, 0x9, 0x4, 0xf, 0xa, 0x5, 0x0 )
.LReverse_ShiftRows:
- .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb
- .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3
+CPU_LE( .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb )
+CPU_LE( .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3 )
+CPU_BE( .byte 0x3, 0x6, 0x9, 0xc, 0xf, 0x2, 0x5, 0x8 )
+CPU_BE( .byte 0xb, 0xe, 0x1, 0x4, 0x7, 0xa, 0xd, 0x0 )
.LForward_Sbox:
.byte 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 6/6] crypto: arm64/aes-neon - fix for big endian
@ 2016-10-09 17:42 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-09 17:42 UTC (permalink / raw)
To: linux-arm-kernel
The AES implementation using pure NEON instructions relies on the generic
AES key schedule generation routines, which store the round keys as arrays
of 32-bit quantities stored in memory using native endianness. This means
we should refer to these round keys using 4x4 loads rather than 16x1 loads.
In addition, the ShiftRows tables are loading using a single scalar load,
which is also affected by endianness, so emit these tables in the correct
order depending on whether we are building for big endian or not.
Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/aes-neon.S | 25 ++++++++++++--------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S
index b93170e1cc93..85f07ead7c5c 100644
--- a/arch/arm64/crypto/aes-neon.S
+++ b/arch/arm64/crypto/aes-neon.S
@@ -9,6 +9,7 @@
*/
#include <linux/linkage.h>
+#include <asm/assembler.h>
#define AES_ENTRY(func) ENTRY(neon_ ## func)
#define AES_ENDPROC(func) ENDPROC(neon_ ## func)
@@ -83,13 +84,13 @@
.endm
.macro do_block, enc, in, rounds, rk, rkp, i
- ld1 {v15.16b}, [\rk]
+ ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16
mov \i, \rounds
1111: eor \in\().16b, \in\().16b, v15.16b /* ^round key */
tbl \in\().16b, {\in\().16b}, v13.16b /* ShiftRows */
sub_bytes \in
- ld1 {v15.16b}, [\rkp], #16
+ ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1
beq 2222f
.if \enc == 1
@@ -229,7 +230,7 @@
.endm
.macro do_block_2x, enc, in0, in1 rounds, rk, rkp, i
- ld1 {v15.16b}, [\rk]
+ ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16
mov \i, \rounds
1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */
@@ -237,7 +238,7 @@
sub_bytes_2x \in0, \in1
tbl \in0\().16b, {\in0\().16b}, v13.16b /* ShiftRows */
tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */
- ld1 {v15.16b}, [\rkp], #16
+ ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1
beq 2222f
.if \enc == 1
@@ -254,7 +255,7 @@
.endm
.macro do_block_4x, enc, in0, in1, in2, in3, rounds, rk, rkp, i
- ld1 {v15.16b}, [\rk]
+ ld1 {v15.4s}, [\rk]
add \rkp, \rk, #16
mov \i, \rounds
1111: eor \in0\().16b, \in0\().16b, v15.16b /* ^round key */
@@ -266,7 +267,7 @@
tbl \in1\().16b, {\in1\().16b}, v13.16b /* ShiftRows */
tbl \in2\().16b, {\in2\().16b}, v13.16b /* ShiftRows */
tbl \in3\().16b, {\in3\().16b}, v13.16b /* ShiftRows */
- ld1 {v15.16b}, [\rkp], #16
+ ld1 {v15.4s}, [\rkp], #16
subs \i, \i, #1
beq 2222f
.if \enc == 1
@@ -306,12 +307,16 @@
.text
.align 4
.LForward_ShiftRows:
- .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3
- .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb
+CPU_LE( .byte 0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3 )
+CPU_LE( .byte 0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb )
+CPU_BE( .byte 0xb, 0x6, 0x1, 0xc, 0x7, 0x2, 0xd, 0x8 )
+CPU_BE( .byte 0x3, 0xe, 0x9, 0x4, 0xf, 0xa, 0x5, 0x0 )
.LReverse_ShiftRows:
- .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb
- .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3
+CPU_LE( .byte 0x0, 0xd, 0xa, 0x7, 0x4, 0x1, 0xe, 0xb )
+CPU_LE( .byte 0x8, 0x5, 0x2, 0xf, 0xc, 0x9, 0x6, 0x3 )
+CPU_BE( .byte 0x3, 0x6, 0x9, 0xc, 0xf, 0x2, 0x5, 0x8 )
+CPU_BE( .byte 0xb, 0xe, 0x1, 0x4, 0x7, 0xa, 0xd, 0x0 )
.LForward_Sbox:
.byte 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 0/6] crypto: arm64 - big endian fixes
2016-10-09 17:42 ` Ard Biesheuvel
@ 2016-10-10 11:26 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-10 11:26 UTC (permalink / raw)
To: linux-crypto, linux-arm-kernel, Herbert Xu
Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel
On 9 October 2016 at 18:42, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> As it turns out, none of the accelerated crypto routines under arch/arm64/crypto
> currently work, or have ever worked correctly when built for big endian. So this
> series fixes all of them.
>
> Each of these patches carries a fixes tag, and could be backported to stable.
> However, for patches #1 and #5, the fixes tag denotes the oldest commit that the
> fix is compatible with, not the patch that introduced the algorithm. This is due
> to the fact that the key schedules are incompatible between generic AES and the
> arm64 Crypto Extensions implementation (but only when building for big endian)
> This is not a problem in practice, but it does mean that the AES-CCM and AES in
> EBC/CBC/CTR/XTS mode implementations before v3.19 require a different fix, i.e.,
> one that is compatible with the generic AES key schedule generation code (which
> it currently no longer uses)
>
> In any case, please apply with cc to stable.
>
Herbert,
I have an additional fix to add to this series, and a couple for
32-bit ARM as well. They escaped my attention due to this code (in
algboss.c:250)
/* This piece of crap needs to disappear into per-type test hooks. */
if (!((type ^ CRYPTO_ALG_TYPE_BLKCIPHER) &
CRYPTO_ALG_TYPE_BLKCIPHER_MASK) && !(type & CRYPTO_ALG_GENIV) &&
((alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
CRYPTO_ALG_TYPE_BLKCIPHER ? alg->cra_blkcipher.ivsize :
alg->cra_ablkcipher.ivsize))
type |= CRYPTO_ALG_TESTED;
This causes cbc(aes), ctr(aes) and xts(aes) to remain untested, unless
I add CRYPTO_ALG_GENIV to their cra_flags. Is this expected behavior?
What would be your recommended way to ensure these algos are covered
by the boottime tests?
Thanks,
Ard.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/6] crypto: arm64 - big endian fixes
@ 2016-10-10 11:26 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-10 11:26 UTC (permalink / raw)
To: linux-arm-kernel
On 9 October 2016 at 18:42, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> As it turns out, none of the accelerated crypto routines under arch/arm64/crypto
> currently work, or have ever worked correctly when built for big endian. So this
> series fixes all of them.
>
> Each of these patches carries a fixes tag, and could be backported to stable.
> However, for patches #1 and #5, the fixes tag denotes the oldest commit that the
> fix is compatible with, not the patch that introduced the algorithm. This is due
> to the fact that the key schedules are incompatible between generic AES and the
> arm64 Crypto Extensions implementation (but only when building for big endian)
> This is not a problem in practice, but it does mean that the AES-CCM and AES in
> EBC/CBC/CTR/XTS mode implementations before v3.19 require a different fix, i.e.,
> one that is compatible with the generic AES key schedule generation code (which
> it currently no longer uses)
>
> In any case, please apply with cc to stable.
>
Herbert,
I have an additional fix to add to this series, and a couple for
32-bit ARM as well. They escaped my attention due to this code (in
algboss.c:250)
/* This piece of crap needs to disappear into per-type test hooks. */
if (!((type ^ CRYPTO_ALG_TYPE_BLKCIPHER) &
CRYPTO_ALG_TYPE_BLKCIPHER_MASK) && !(type & CRYPTO_ALG_GENIV) &&
((alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
CRYPTO_ALG_TYPE_BLKCIPHER ? alg->cra_blkcipher.ivsize :
alg->cra_ablkcipher.ivsize))
type |= CRYPTO_ALG_TESTED;
This causes cbc(aes), ctr(aes) and xts(aes) to remain untested, unless
I add CRYPTO_ALG_GENIV to their cra_flags. Is this expected behavior?
What would be your recommended way to ensure these algos are covered
by the boottime tests?
Thanks,
Ard.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/6] crypto: arm64 - big endian fixes
2016-10-10 11:26 ` Ard Biesheuvel
@ 2016-10-11 2:12 ` Herbert Xu
-1 siblings, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2016-10-11 2:12 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-crypto, linux-arm-kernel, Catalin Marinas, Will Deacon
On Mon, Oct 10, 2016 at 12:26:00PM +0100, Ard Biesheuvel wrote:
>
> /* This piece of crap needs to disappear into per-type test hooks. */
> if (!((type ^ CRYPTO_ALG_TYPE_BLKCIPHER) &
> CRYPTO_ALG_TYPE_BLKCIPHER_MASK) && !(type & CRYPTO_ALG_GENIV) &&
> ((alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
> CRYPTO_ALG_TYPE_BLKCIPHER ? alg->cra_blkcipher.ivsize :
> alg->cra_ablkcipher.ivsize))
> type |= CRYPTO_ALG_TESTED;
>
> This causes cbc(aes), ctr(aes) and xts(aes) to remain untested, unless
> I add CRYPTO_ALG_GENIV to their cra_flags. Is this expected behavior?
> What would be your recommended way to ensure these algos are covered
> by the boottime tests?
This is a leftover from the old blkcipher/ablkcipher interface.
I've got a patch pending which will remove this if clause.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/6] crypto: arm64 - big endian fixes
@ 2016-10-11 2:12 ` Herbert Xu
0 siblings, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2016-10-11 2:12 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Oct 10, 2016 at 12:26:00PM +0100, Ard Biesheuvel wrote:
>
> /* This piece of crap needs to disappear into per-type test hooks. */
> if (!((type ^ CRYPTO_ALG_TYPE_BLKCIPHER) &
> CRYPTO_ALG_TYPE_BLKCIPHER_MASK) && !(type & CRYPTO_ALG_GENIV) &&
> ((alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
> CRYPTO_ALG_TYPE_BLKCIPHER ? alg->cra_blkcipher.ivsize :
> alg->cra_ablkcipher.ivsize))
> type |= CRYPTO_ALG_TESTED;
>
> This causes cbc(aes), ctr(aes) and xts(aes) to remain untested, unless
> I add CRYPTO_ALG_GENIV to their cra_flags. Is this expected behavior?
> What would be your recommended way to ensure these algos are covered
> by the boottime tests?
This is a leftover from the old blkcipher/ablkcipher interface.
I've got a patch pending which will remove this if clause.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/6] crypto: arm64 - big endian fixes
2016-10-11 2:12 ` Herbert Xu
@ 2016-10-11 11:01 ` Ard Biesheuvel
-1 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-11 11:01 UTC (permalink / raw)
To: Herbert Xu; +Cc: Catalin Marinas, Will Deacon, linux-crypto, linux-arm-kernel
On 11 October 2016 at 03:12, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Oct 10, 2016 at 12:26:00PM +0100, Ard Biesheuvel wrote:
>>
>> /* This piece of crap needs to disappear into per-type test hooks. */
>> if (!((type ^ CRYPTO_ALG_TYPE_BLKCIPHER) &
>> CRYPTO_ALG_TYPE_BLKCIPHER_MASK) && !(type & CRYPTO_ALG_GENIV) &&
>> ((alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
>> CRYPTO_ALG_TYPE_BLKCIPHER ? alg->cra_blkcipher.ivsize :
>> alg->cra_ablkcipher.ivsize))
>> type |= CRYPTO_ALG_TESTED;
>>
>> This causes cbc(aes), ctr(aes) and xts(aes) to remain untested, unless
>> I add CRYPTO_ALG_GENIV to their cra_flags. Is this expected behavior?
>> What would be your recommended way to ensure these algos are covered
>> by the boottime tests?
>
> This is a leftover from the old blkcipher/ablkcipher interface.
> I've got a patch pending which will remove this if clause.
>
OK, thanks.
So I will follow up with a v2 of this series with two additional fixes.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/6] crypto: arm64 - big endian fixes
@ 2016-10-11 11:01 ` Ard Biesheuvel
0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2016-10-11 11:01 UTC (permalink / raw)
To: linux-arm-kernel
On 11 October 2016 at 03:12, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Oct 10, 2016 at 12:26:00PM +0100, Ard Biesheuvel wrote:
>>
>> /* This piece of crap needs to disappear into per-type test hooks. */
>> if (!((type ^ CRYPTO_ALG_TYPE_BLKCIPHER) &
>> CRYPTO_ALG_TYPE_BLKCIPHER_MASK) && !(type & CRYPTO_ALG_GENIV) &&
>> ((alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
>> CRYPTO_ALG_TYPE_BLKCIPHER ? alg->cra_blkcipher.ivsize :
>> alg->cra_ablkcipher.ivsize))
>> type |= CRYPTO_ALG_TESTED;
>>
>> This causes cbc(aes), ctr(aes) and xts(aes) to remain untested, unless
>> I add CRYPTO_ALG_GENIV to their cra_flags. Is this expected behavior?
>> What would be your recommended way to ensure these algos are covered
>> by the boottime tests?
>
> This is a leftover from the old blkcipher/ablkcipher interface.
> I've got a patch pending which will remove this if clause.
>
OK, thanks.
So I will follow up with a v2 of this series with two additional fixes.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-10-11 11:01 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-09 17:42 [PATCH 0/6] crypto: arm64 - big endian fixes Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-09 17:42 ` [PATCH 1/6] crypto: arm64/aes-ce - fix for big endian Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-09 17:42 ` [PATCH 2/6] crypto: arm64/ghash-ce " Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-09 17:42 ` [PATCH 3/6] crypto: arm64/sha1-ce " Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-09 17:42 ` [PATCH 4/6] crypto: arm64/sha2-ce " Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-09 17:42 ` [PATCH 5/6] crypto: arm64/aes-ccm-ce: " Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-09 17:42 ` [PATCH 6/6] crypto: arm64/aes-neon - " Ard Biesheuvel
2016-10-09 17:42 ` Ard Biesheuvel
2016-10-10 11:26 ` [PATCH 0/6] crypto: arm64 - big endian fixes Ard Biesheuvel
2016-10-10 11:26 ` Ard Biesheuvel
2016-10-11 2:12 ` Herbert Xu
2016-10-11 2:12 ` Herbert Xu
2016-10-11 11:01 ` Ard Biesheuvel
2016-10-11 11:01 ` Ard Biesheuvel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.