linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64
@ 2019-01-25  8:49 Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 1/4] crypto: arm/crct10dif - revert to C code for short inputs Ard Biesheuvel
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2019-01-25  8:49 UTC (permalink / raw)
  To: linux-crypto; +Cc: ebiggers, herbert, linux-arm-kernel, Ard Biesheuvel

Fix the issues in both NEON implementations of the CRC-T10DIF routines,
that were reported by Eric's new testing code.

v2:
- keep alignment specifiers where possible (#1)
- clarify/fix commit log (#2)
- add fixes/cc-stable/etc tags
- add patches that drop the now unreacheable code (#3 - #4)

Ard Biesheuvel (4):
  crypto: arm/crct10dif - revert to C code for short inputs
  crypto: arm64/crct10dif - revert to C code for short inputs
  crypto: arm/crct10dif - remove dead code
  crypto: arm64/crct10dif - remove dead code

 arch/arm/crypto/crct10dif-ce-core.S   | 27 +++++---------------
 arch/arm/crypto/crct10dif-ce-glue.c   | 23 +++++------------
 arch/arm64/crypto/crct10dif-ce-core.S | 11 --------
 arch/arm64/crypto/crct10dif-ce-glue.c | 25 +++++-------------
 4 files changed, 19 insertions(+), 67 deletions(-)

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] crypto: arm/crct10dif - revert to C code for short inputs
  2019-01-25  8:49 [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Ard Biesheuvel
@ 2019-01-25  8:49 ` Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 2/4] crypto: arm64/crct10dif " Ard Biesheuvel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2019-01-25  8:49 UTC (permalink / raw)
  To: linux-crypto; +Cc: ebiggers, stable, herbert, linux-arm-kernel, Ard Biesheuvel

The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to permit
the NEON routine to use special versions of the vld1 instructions
that assume 16 byte alignment, but this could result in inputs of
less than 16 bytes to be passed in. This not only fails the new
extended tests that Eric has implemented, it also results in the
code reading past the end of the input, which could potentially
result in crashes when dealing with less than 16 bytes of input
at the end of a page which is followed by an unmapped page.

So update the glue code to only invoke the NEON routine if the
input is more than 16 bytes.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 1d481f1cd892 ("crypto: arm/crct10dif - port x86 SSE implementation to ARM")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/crct10dif-ce-core.S | 14 ++++++------
 arch/arm/crypto/crct10dif-ce-glue.c | 23 +++++---------------
 2 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/arm/crypto/crct10dif-ce-core.S b/arch/arm/crypto/crct10dif-ce-core.S
index ce45ba0c0687..16019b5961e7 100644
--- a/arch/arm/crypto/crct10dif-ce-core.S
+++ b/arch/arm/crypto/crct10dif-ce-core.S
@@ -124,10 +124,10 @@ ENTRY(crc_t10dif_pmull)
 	vext.8		q10, qzr, q0, #4
 
 	// receive the initial 64B data, xor the initial crc value
-	vld1.64		{q0-q1}, [arg2, :128]!
-	vld1.64		{q2-q3}, [arg2, :128]!
-	vld1.64		{q4-q5}, [arg2, :128]!
-	vld1.64		{q6-q7}, [arg2, :128]!
+	vld1.64		{q0-q1}, [arg2]!
+	vld1.64		{q2-q3}, [arg2]!
+	vld1.64		{q4-q5}, [arg2]!
+	vld1.64		{q6-q7}, [arg2]!
 CPU_LE(	vrev64.8	q0, q0			)
 CPU_LE(	vrev64.8	q1, q1			)
 CPU_LE(	vrev64.8	q2, q2			)
@@ -167,7 +167,7 @@ CPU_LE(	vrev64.8	q7, q7			)
 _fold_64_B_loop:
 
 	.macro		fold64, reg1, reg2
-	vld1.64		{q11-q12}, [arg2, :128]!
+	vld1.64		{q11-q12}, [arg2]!
 
 	vmull.p64	q8, \reg1\()h, d21
 	vmull.p64	\reg1, \reg1\()l, d20
@@ -238,7 +238,7 @@ _16B_reduction_loop:
 	vmull.p64	q7, d15, d21
 	veor.8		q7, q7, q8
 
-	vld1.64		{q0}, [arg2, :128]!
+	vld1.64		{q0}, [arg2]!
 CPU_LE(	vrev64.8	q0, q0		)
 	vswp		d0, d1
 	veor.8		q7, q7, q0
@@ -335,7 +335,7 @@ _less_than_128:
 	vmov.i8		q0, #0
 	vmov		s3, arg1_low32		// get the initial crc value
 
-	vld1.64		{q7}, [arg2, :128]!
+	vld1.64		{q7}, [arg2]!
 CPU_LE(	vrev64.8	q7, q7		)
 	vswp		d14, d15
 	veor.8		q7, q7, q0
diff --git a/arch/arm/crypto/crct10dif-ce-glue.c b/arch/arm/crypto/crct10dif-ce-glue.c
index d428355cf38d..14c19c70a841 100644
--- a/arch/arm/crypto/crct10dif-ce-glue.c
+++ b/arch/arm/crypto/crct10dif-ce-glue.c
@@ -35,26 +35,15 @@ static int crct10dif_update(struct shash_desc *desc, const u8 *data,
 			    unsigned int length)
 {
 	u16 *crc = shash_desc_ctx(desc);
-	unsigned int l;
 
-	if (!may_use_simd()) {
-		*crc = crc_t10dif_generic(*crc, data, length);
+	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && may_use_simd()) {
+		kernel_neon_begin();
+		*crc = crc_t10dif_pmull(*crc, data, length);
+		kernel_neon_end();
 	} else {
-		if (unlikely((u32)data % CRC_T10DIF_PMULL_CHUNK_SIZE)) {
-			l = min_t(u32, length, CRC_T10DIF_PMULL_CHUNK_SIZE -
-				  ((u32)data % CRC_T10DIF_PMULL_CHUNK_SIZE));
-
-			*crc = crc_t10dif_generic(*crc, data, l);
-
-			length -= l;
-			data += l;
-		}
-		if (length > 0) {
-			kernel_neon_begin();
-			*crc = crc_t10dif_pmull(*crc, data, length);
-			kernel_neon_end();
-		}
+		*crc = crc_t10dif_generic(*crc, data, length);
 	}
+
 	return 0;
 }
 
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] crypto: arm64/crct10dif - revert to C code for short inputs
  2019-01-25  8:49 [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 1/4] crypto: arm/crct10dif - revert to C code for short inputs Ard Biesheuvel
@ 2019-01-25  8:49 ` Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 3/4] crypto: arm/crct10dif - remove dead code Ard Biesheuvel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2019-01-25  8:49 UTC (permalink / raw)
  To: linux-crypto; +Cc: ebiggers, stable, herbert, linux-arm-kernel, Ard Biesheuvel

The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to 16 bytes,
which is not really necessary on this architecture (although it
could be beneficial to performance to expose aligned data to the
the NEON routine), but this could result in inputs of less than
16 bytes to be passed in. This not only fails the new extended
tests that Eric has implemented, it also results in the code
reading past the end of the input, which could potentially result
in crashes when dealing with less than 16 bytes of input at the
end of a page which is followed by an unmapped page.

So update the glue code to only invoke the NEON routine if the
input is more than 16 bytes.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 6ef5737f3931 ("crypto: arm64/crct10dif - port x86 SSE implementation to arm64")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crct10dif-ce-glue.c | 25 +++++---------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
index b461d62023f2..567c24f3d224 100644
--- a/arch/arm64/crypto/crct10dif-ce-glue.c
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -39,26 +39,13 @@ static int crct10dif_update(struct shash_desc *desc, const u8 *data,
 			    unsigned int length)
 {
 	u16 *crc = shash_desc_ctx(desc);
-	unsigned int l;
 
-	if (unlikely((u64)data % CRC_T10DIF_PMULL_CHUNK_SIZE)) {
-		l = min_t(u32, length, CRC_T10DIF_PMULL_CHUNK_SIZE -
-			  ((u64)data % CRC_T10DIF_PMULL_CHUNK_SIZE));
-
-		*crc = crc_t10dif_generic(*crc, data, l);
-
-		length -= l;
-		data += l;
-	}
-
-	if (length > 0) {
-		if (may_use_simd()) {
-			kernel_neon_begin();
-			*crc = crc_t10dif_pmull(*crc, data, length);
-			kernel_neon_end();
-		} else {
-			*crc = crc_t10dif_generic(*crc, data, length);
-		}
+	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && may_use_simd()) {
+		kernel_neon_begin();
+		*crc = crc_t10dif_pmull(*crc, data, length);
+		kernel_neon_end();
+	} else {
+		*crc = crc_t10dif_generic(*crc, data, length);
 	}
 
 	return 0;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] crypto: arm/crct10dif - remove dead code
  2019-01-25  8:49 [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 1/4] crypto: arm/crct10dif - revert to C code for short inputs Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 2/4] crypto: arm64/crct10dif " Ard Biesheuvel
@ 2019-01-25  8:49 ` Ard Biesheuvel
  2019-01-25  8:49 ` [PATCH v2 4/4] crypto: arm64/crct10dif " Ard Biesheuvel
  2019-01-27  0:07 ` [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Eric Biggers
  4 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2019-01-25  8:49 UTC (permalink / raw)
  To: linux-crypto; +Cc: ebiggers, herbert, linux-arm-kernel, Ard Biesheuvel

Remove some code that is no longer called now that we make sure never
to invoke the SIMD routine with less that 16 bytes of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/crct10dif-ce-core.S | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/arch/arm/crypto/crct10dif-ce-core.S b/arch/arm/crypto/crct10dif-ce-core.S
index 16019b5961e7..d058fad423c2 100644
--- a/arch/arm/crypto/crct10dif-ce-core.S
+++ b/arch/arm/crypto/crct10dif-ce-core.S
@@ -342,7 +342,6 @@ CPU_LE(	vrev64.8	q7, q7		)
 
 	cmp		arg3, #16
 	beq		_128_done		// exactly 16 left
-	blt		_less_than_16_left
 
 	// now if there is, load the constants
 	vldr		d20, rk1
@@ -353,18 +352,6 @@ CPU_LE(	vrev64.8	q7, q7		)
 	addlt		arg3, arg3, #16
 	blt		_get_last_two_regs
 	b		_16B_reduction_loop
-
-_less_than_16_left:
-	// shl r9, 4
-	adr		ip, tbl_shf_table + 16
-	sub		ip, ip, arg3
-	vld1.8		{q0}, [ip]
-	vmov.i8		q9, #0x80
-	veor.8		q0, q0, q9
-	vtbl.8		d18, {d14-d15}, d0
-	vtbl.8		d15, {d14-d15}, d1
-	vmov		d14, d18
-	b		_128_done
 ENDPROC(crc_t10dif_pmull)
 
 // precomputed constants
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] crypto: arm64/crct10dif - remove dead code
  2019-01-25  8:49 [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2019-01-25  8:49 ` [PATCH v2 3/4] crypto: arm/crct10dif - remove dead code Ard Biesheuvel
@ 2019-01-25  8:49 ` Ard Biesheuvel
  2019-01-27  0:07 ` [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Eric Biggers
  4 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2019-01-25  8:49 UTC (permalink / raw)
  To: linux-crypto; +Cc: ebiggers, herbert, linux-arm-kernel, Ard Biesheuvel

Remove some code that is no longer called now that we make sure never
to invoke the SIMD routine with less than 16 bytes of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crct10dif-ce-core.S | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S
index 9e82e8e8ed05..f7326259c40d 100644
--- a/arch/arm64/crypto/crct10dif-ce-core.S
+++ b/arch/arm64/crypto/crct10dif-ce-core.S
@@ -497,7 +497,6 @@ CPU_LE(	ext		v7.16b, v7.16b, v7.16b, #8	)
 
 	cmp		arg3, #16
 	b.eq		.L_128_done_\@		// exactly 16 left
-	b.lt		.L_less_than_16_left_\@
 
 	ldr_l		q10, rk1, x8		// rk1 and rk2 in xmm10
 	__pmull_pre_\p	v10
@@ -509,16 +508,6 @@ CPU_LE(	ext		v7.16b, v7.16b, v7.16b, #8	)
 
 	add		arg3, arg3, #16
 	b		.L_get_last_two_regs_\@
-
-.L_less_than_16_left_\@:
-	// shl r9, 4
-	adr_l		x0, tbl_shf_table + 16
-	sub		x0, x0, arg3
-	ld1		{v0.16b}, [x0]
-	movi		v9.16b, #0x80
-	eor		v0.16b, v0.16b, v9.16b
-	tbl		v7.16b, {v7.16b}, v0.16b
-	b		.L_128_done_\@
 	.endm
 
 ENTRY(crc_t10dif_pmull_p8)
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64
  2019-01-25  8:49 [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2019-01-25  8:49 ` [PATCH v2 4/4] crypto: arm64/crct10dif " Ard Biesheuvel
@ 2019-01-27  0:07 ` Eric Biggers
  2019-01-27  9:10   ` Ard Biesheuvel
  4 siblings, 1 reply; 7+ messages in thread
From: Eric Biggers @ 2019-01-27  0:07 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel, herbert

On Fri, Jan 25, 2019 at 09:49:11AM +0100, Ard Biesheuvel wrote:
> Fix the issues in both NEON implementations of the CRC-T10DIF routines,
> that were reported by Eric's new testing code.
> 
> v2:
> - keep alignment specifiers where possible (#1)
> - clarify/fix commit log (#2)
> - add fixes/cc-stable/etc tags
> - add patches that drop the now unreacheable code (#3 - #4)
> 
> Ard Biesheuvel (4):
>   crypto: arm/crct10dif - revert to C code for short inputs
>   crypto: arm64/crct10dif - revert to C code for short inputs
>   crypto: arm/crct10dif - remove dead code
>   crypto: arm64/crct10dif - remove dead code
> 
>  arch/arm/crypto/crct10dif-ce-core.S   | 27 +++++---------------
>  arch/arm/crypto/crct10dif-ce-glue.c   | 23 +++++------------
>  arch/arm64/crypto/crct10dif-ce-core.S | 11 --------
>  arch/arm64/crypto/crct10dif-ce-glue.c | 25 +++++-------------
>  4 files changed, 19 insertions(+), 67 deletions(-)
> 

In the commit message of patches 1-2, "more than 16 bytes" should be
"at least 16 bytes".

Otherwise for all 4 patches:

	Reviewed-by: Eric Biggers <ebiggers@kernel.org>

- Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64
  2019-01-27  0:07 ` [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Eric Biggers
@ 2019-01-27  9:10   ` Ard Biesheuvel
  0 siblings, 0 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2019-01-27  9:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	linux-arm-kernel, Herbert Xu

On Sun, 27 Jan 2019 at 01:07, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Fri, Jan 25, 2019 at 09:49:11AM +0100, Ard Biesheuvel wrote:
> > Fix the issues in both NEON implementations of the CRC-T10DIF routines,
> > that were reported by Eric's new testing code.
> >
> > v2:
> > - keep alignment specifiers where possible (#1)
> > - clarify/fix commit log (#2)
> > - add fixes/cc-stable/etc tags
> > - add patches that drop the now unreacheable code (#3 - #4)
> >
> > Ard Biesheuvel (4):
> >   crypto: arm/crct10dif - revert to C code for short inputs
> >   crypto: arm64/crct10dif - revert to C code for short inputs
> >   crypto: arm/crct10dif - remove dead code
> >   crypto: arm64/crct10dif - remove dead code
> >
> >  arch/arm/crypto/crct10dif-ce-core.S   | 27 +++++---------------
> >  arch/arm/crypto/crct10dif-ce-glue.c   | 23 +++++------------
> >  arch/arm64/crypto/crct10dif-ce-core.S | 11 --------
> >  arch/arm64/crypto/crct10dif-ce-glue.c | 25 +++++-------------
> >  4 files changed, 19 insertions(+), 67 deletions(-)
> >
>
> In the commit message of patches 1-2, "more than 16 bytes" should be
> "at least 16 bytes".
>
> Otherwise for all 4 patches:
>
>         Reviewed-by: Eric Biggers <ebiggers@kernel.org>
>

Thanks Eric. I'll update and resend.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-27  9:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25  8:49 [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Ard Biesheuvel
2019-01-25  8:49 ` [PATCH v2 1/4] crypto: arm/crct10dif - revert to C code for short inputs Ard Biesheuvel
2019-01-25  8:49 ` [PATCH v2 2/4] crypto: arm64/crct10dif " Ard Biesheuvel
2019-01-25  8:49 ` [PATCH v2 3/4] crypto: arm/crct10dif - remove dead code Ard Biesheuvel
2019-01-25  8:49 ` [PATCH v2 4/4] crypto: arm64/crct10dif " Ard Biesheuvel
2019-01-27  0:07 ` [PATCH v2 0/4] crypto: fix crct10dif for ARM and arm64 Eric Biggers
2019-01-27  9:10   ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).