All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT
@ 2014-03-28 11:05 Ard Biesheuvel
  2014-03-28 11:05 ` [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED Ard Biesheuvel
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2014-03-28 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

This series is an attempt to reduce latency under CONFIG_PREEMPT while
maintaining optimal throughput otherwise, i.e., under !CONFIG_PREEMPT or
while running outside of process context.

In the in_interrupt() case, the calls to kernel_neon_begin and kernel_neon_end
incur a fixed penalty (i.e., each call needs to stack/unstack a fixed number of
registers), and preemption is not possible anyway, so the call into the crypto
algorithm should just complete as fast as possible, ideally by processing all
of the input in the core loop without having to spill state to memory or reload
round keys (e.g., SHA-256 uses 64 32-bit round keys to process each input block
of 64 bytes)

In contrast, when running in process context, we should avoid hogging the CPU by
spending unreasonable amounts of time inside a kernel_neon_begin/kernel_neon_end
section. However, reloading those 64 32-byte round keys to process each 64-byte
block one by one is far from optimal.

The solution proposed here is to allow the inner loops of the crypto algorithms
to test the TIF_NEED_RESCHED flag, and terminate early if it is set. This is
essentially CONFIG_PREEMPT_VOLUNTARY, even under CONFIG_PREEMPT, but it is the
best we can do when running with preemption disabled.

Patch #1 introduces the shared asm macro, patches #2 and #3 are the SHA-1 and
SHA-224/SHA-256 implementations I posted earlier, but reworked to utilize
voluntary preemption.

Note that this series depends on my kernel mode NEON optimization patches posted
a while ago.

Ard Biesheuvel (3):
  arm64/crypto: add shared macro to test for NEED_RESCHED
  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions

 arch/arm64/Kconfig               |   3 +
 arch/arm64/Makefile              |   1 +
 arch/arm64/crypto/Kconfig        |  18 +++
 arch/arm64/crypto/Makefile       |  15 +++
 arch/arm64/crypto/preempt.h      |  28 ++++
 arch/arm64/crypto/sha1-ce-core.S | 156 ++++++++++++++++++++++
 arch/arm64/crypto/sha1-ce-glue.c | 201 ++++++++++++++++++++++++++++
 arch/arm64/crypto/sha2-ce-core.S | 161 ++++++++++++++++++++++
 arch/arm64/crypto/sha2-ce-glue.c | 280 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 863 insertions(+)
 create mode 100644 arch/arm64/crypto/Kconfig
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/preempt.h
 create mode 100644 arch/arm64/crypto/sha1-ce-core.S
 create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
 create mode 100644 arch/arm64/crypto/sha2-ce-core.S
 create mode 100644 arch/arm64/crypto/sha2-ce-glue.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED
  2014-03-28 11:05 [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Ard Biesheuvel
@ 2014-03-28 11:05 ` Ard Biesheuvel
  2014-03-29  1:53   ` Nicolas Pitre
  2014-03-28 11:05 ` [PATCH RFC 2/3] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions Ard Biesheuvel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2014-03-28 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

This adds arch/arm64/crypto/preempt.h, currently containing just a single
asm macro definition 'b_if_no_resched' that will be shared between multiple
crypto algorithm implementations that need to test for preemption in the
inner loop.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/preempt.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 arch/arm64/crypto/preempt.h

diff --git a/arch/arm64/crypto/preempt.h b/arch/arm64/crypto/preempt.h
new file mode 100644
index 000000000000..94302d5b5ae9
--- /dev/null
+++ b/arch/arm64/crypto/preempt.h
@@ -0,0 +1,28 @@
+/*
+ * preempt.h - shared macros to check preempt state
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/thread_info.h>
+
+	/*
+	 * Branch to 'lb' but only if we have not been tagged for preemption.
+	 *
+	 * Expects current->thread_info in ti, or NULL if running in interrupt
+	 * context. reg is a scratch x register.
+	 */
+	.macro		b_if_no_resched, ti, reg, lb
+#if defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_VOLUNTARY)
+	cbz		\ti, \lb			// have thread_info?
+	ldr		\reg, [\ti, #TI_FLAGS]		// get flags
+	tbz	 	\reg, #TIF_NEED_RESCHED, \lb	// needs rescheduling?
+#else
+	b		\lb
+#endif
+	.endm
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/3] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
  2014-03-28 11:05 [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Ard Biesheuvel
  2014-03-28 11:05 ` [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED Ard Biesheuvel
@ 2014-03-28 11:05 ` Ard Biesheuvel
  2014-03-28 11:05 ` [PATCH RFC 3/3] arm64/crypto: SHA-224/SHA-256 " Ard Biesheuvel
  2014-03-29  2:03 ` [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Nicolas Pitre
  3 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2014-03-28 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that
have support for the SHA-1 part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig               |   3 +
 arch/arm64/Makefile              |   1 +
 arch/arm64/crypto/Kconfig        |  13 +++
 arch/arm64/crypto/Makefile       |  12 +++
 arch/arm64/crypto/sha1-ce-core.S | 156 ++++++++++++++++++++++++++++++
 arch/arm64/crypto/sha1-ce-glue.c | 201 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 386 insertions(+)
 create mode 100644 arch/arm64/crypto/Kconfig
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/sha1-ce-core.S
 create mode 100644 arch/arm64/crypto/sha1-ce-glue.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3d27a1c4e4ad..11f366a6f09d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -317,5 +317,8 @@ source "arch/arm64/Kconfig.debug"
 source "security/Kconfig"
 
 source "crypto/Kconfig"
+if CRYPTO
+source "arch/arm64/crypto/Kconfig"
+endif
 
 source "lib/Kconfig"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index e0b75464b7f1..a4b3e253557d 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -46,6 +46,7 @@ core-y		+= arch/arm64/emu/
 core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
 libs-y		:= arch/arm64/lib/ $(libs-y)
 libs-y		+= $(LIBGCC)
 
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
new file mode 100644
index 000000000000..af378bb608e8
--- /dev/null
+++ b/arch/arm64/crypto/Kconfig
@@ -0,0 +1,13 @@
+
+menuconfig ARM64_CRYPTO
+	bool "ARM64 Accelerated Cryptographic Algorithms"
+	depends on ARM64
+
+if ARM64_CRYPTO
+
+config CRYPTO_SHA1_ARM64_CE
+	tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
+endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..0ed3caaec81b
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,12 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
+sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
new file mode 100644
index 000000000000..019808854d10
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -0,0 +1,156 @@
+/*
+ * sha1-ce-core.S - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+#include "preempt.h"
+
+	.text
+	.arch		armv8-a+crypto
+
+	k0		.req	v0
+	k1		.req	v1
+	k2		.req	v2
+	k3		.req	v3
+
+	t0		.req	v4
+	t1		.req	v5
+
+	dga		.req	q6
+	dgav		.req	v6
+	dgb		.req	s7
+	dgbv		.req	v7
+
+	dg0q		.req	q12
+	dg0s		.req	s12
+	dg0v		.req	v12
+	dg1s		.req	s13
+	dg1v		.req	v13
+	dg2s		.req	s14
+
+	.macro		add_only, op, ev, rc, s0, dg1
+	.ifc		\ev, ev
+	add		t1.4s, v\s0\().4s, \rc\().4s
+	sha1h		dg2s, dg0s
+	.ifnb		\dg1
+	sha1\op		dg0q, \dg1, t0.4s
+	.else
+	sha1\op		dg0q, dg1s, t0.4s
+	.endif
+	.else
+	.ifnb		\s0
+	add		t0.4s, v\s0\().4s, \rc\().4s
+	.endif
+	sha1h		dg1s, dg0s
+	sha1\op		dg0q, dg2s, t1.4s
+	.endif
+	.endm
+
+	.macro		add_update, op, ev, rc, s0, s1, s2, s3, dg1
+	sha1su0		v\s0\().4s, v\s1\().4s, v\s2\().4s
+	sha1su1		v\s0\().4s, v\s3\().4s
+	add_only	\op, \ev, \rc, \s1, \dg1
+	.endm
+
+	/*
+	 * The SHA1 round constants
+	 */
+	.align		4
+.Lsha1_rcon:
+	.word		0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
+
+	/*
+	 * int sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+	 * 			 u8 *head, long bytes, struct thread_info *ti)
+	 */
+ENTRY(sha1_ce_transform)
+	/* load round constants */
+	adr		x6, .Lsha1_rcon
+	ld1r		{k0.4s}, [x6], #4
+	ld1r		{k1.4s}, [x6], #4
+	ld1r		{k2.4s}, [x6], #4
+	ld1r		{k3.4s}, [x6]
+
+	/* load state */
+	ldr		dga, [x2]
+	ldr		dgb, [x2, #16]
+
+	/* load partial state (if supplied) */
+	cbz		x3, 0f
+	ld1		{v8.4s-v11.4s}, [x3]
+	b		1f
+
+	/* load input */
+0:	ld1		{v8.4s-v11.4s}, [x1], #64
+	sub		w0, w0, #1
+
+1:
+CPU_LE(	rev32		v8.16b, v8.16b		)
+CPU_LE(	rev32		v9.16b, v9.16b		)
+CPU_LE(	rev32		v10.16b, v10.16b	)
+CPU_LE(	rev32		v11.16b, v11.16b	)
+
+2:	add		t0.4s, v8.4s, k0.4s
+	mov		dg0v.16b, dgav.16b
+
+	add_update	c, ev, k0,  8,  9, 10, 11, dgb
+	add_update	c, od, k0,  9, 10, 11,  8
+	add_update	c, ev, k0, 10, 11,  8,  9
+	add_update	c, od, k0, 11,  8,  9, 10
+	add_update	c, ev, k1,  8,  9, 10, 11
+
+	add_update	p, od, k1,  9, 10, 11,  8
+	add_update	p, ev, k1, 10, 11,  8,  9
+	add_update	p, od, k1, 11,  8,  9, 10
+	add_update	p, ev, k1,  8,  9, 10, 11
+	add_update	p, od, k2,  9, 10, 11,  8
+
+	add_update	m, ev, k2, 10, 11,  8,  9
+	add_update	m, od, k2, 11,  8,  9, 10
+	add_update	m, ev, k2,  8,  9, 10, 11
+	add_update	m, od, k2,  9, 10, 11,  8
+	add_update	m, ev, k3, 10, 11,  8,  9
+
+	add_update	p, od, k3, 11,  8,  9, 10
+	add_only	p, ev, k3,  9
+	add_only	p, od, k3, 10
+	add_only	p, ev, k3, 11
+	add_only	p, od
+
+	/* update state */
+	add		dgbv.2s, dgbv.2s, dg1v.2s
+	add		dgav.4s, dgav.4s, dg0v.4s
+
+	cbz		w0, 4f
+	b_if_no_resched	x5, x8, 0b
+
+	/* store new state */
+3:	str		dga, [x2]
+	str		dgb, [x2, #16]
+	ret
+
+	/*
+	 * Final block: add padding and total bit count.
+	 * Skip if we have no total byte count in x4. In that case, the input
+	 * size was not a round multiple of the block size, and the padding is
+	 * handled by the C code.
+	 */
+4:	cbz		x4, 3b
+	movi		v9.2d, #0
+	mov		x8, #0x80000000
+	movi		v10.2d, #0
+	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	fmov		d8, x8
+	mov		x4, #0
+	mov		v11.d[0], xzr
+	mov		v11.d[1], x7
+	b		2b
+ENDPROC(sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
new file mode 100644
index 000000000000..69850a163668
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -0,0 +1,201 @@
+/*
+ * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage int sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+				 u8 *head, long bytes, struct thread_info *ti);
+
+static int sha1_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha1_state){
+		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+	};
+	return 0;
+}
+
+static int sha1_update(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if ((partial + len) >= SHA1_BLOCK_SIZE) {
+		struct thread_info *ti = NULL;
+		int blocks;
+
+		if (partial) {
+			int p = SHA1_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buffer + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		/*
+		 * Pass current's thread info pointer to sha1_ce_transform()
+		 * below if we want it to play nice under preemption.
+		 */
+		if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) ||
+		     IS_ENABLED(CONFIG_PREEMPT)) && !in_interrupt())
+			ti = current_thread_info();
+
+		blocks = len / SHA1_BLOCK_SIZE;
+		len %= SHA1_BLOCK_SIZE;
+
+		do {
+			int rem;
+
+			kernel_neon_begin_partial(16);
+			rem = sha1_ce_transform(blocks, data, sctx->state,
+						partial ? sctx->buffer : NULL,
+						0, ti);
+			kernel_neon_end();
+
+			data += (blocks - rem) * SHA1_BLOCK_SIZE;
+			blocks = rem;
+			partial = 0;
+		} while (unlikely(ti && blocks > 0));
+	}
+	if (len)
+		memcpy(sctx->buffer + partial, data, len);
+	return 0;
+}
+
+static int sha1_final(struct shash_desc *desc, u8 *out)
+{
+	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
+
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be64 bits = cpu_to_be64(sctx->count << 3);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	u32 padlen = SHA1_BLOCK_SIZE
+		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
+
+	sha1_update(desc, padding, padlen);
+	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
+
+static int sha1_finup(struct shash_desc *desc, const u8 *data,
+		      unsigned int len, u8 *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct thread_info *ti = NULL;
+	__be32 *dst = (__be32 *)out;
+	int blocks;
+	int i;
+
+	if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
+		sha1_update(desc, data, len);
+		return sha1_final(desc, out);
+	}
+
+	/*
+	 * Use a fast path if the input is a multiple of 64 bytes. In
+	 * this case, there is no need to copy data around, and we can
+	 * perform the entire digest calculation in a single invocation
+	 * of sha1_ce_transform()
+	 */
+	blocks = len / SHA1_BLOCK_SIZE;
+
+	if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) ||
+	     IS_ENABLED(CONFIG_PREEMPT)) && !in_interrupt())
+		ti = current_thread_info();
+
+	do {
+		int rem;
+
+		kernel_neon_begin_partial(16);
+		rem = sha1_ce_transform(blocks, data, sctx->state,
+					NULL, len, ti);
+		kernel_neon_end();
+		data += (blocks - rem) * SHA1_BLOCK_SIZE;
+		blocks = rem;
+	} while (unlikely(ti && blocks > 0));
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
+
+static int sha1_export(struct shash_desc *desc, void *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state *dst = out;
+
+	*dst = *sctx;
+	return 0;
+}
+
+static int sha1_import(struct shash_desc *desc, const void *in)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state const *src = in;
+
+	*sctx = *src;
+	return 0;
+}
+
+static struct shash_alg alg = {
+	.init			= sha1_init,
+	.update			= sha1_update,
+	.final			= sha1_final,
+	.finup			= sha1_finup,
+	.export			= sha1_export,
+	.import			= sha1_import,
+	.descsize		= sizeof(struct sha1_state),
+	.digestsize		= SHA1_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha1_state),
+	.base			= {
+		.cra_name		= "sha1",
+		.cra_driver_name	= "sha1-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA1_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+};
+
+static int __init sha1_ce_mod_init(void)
+{
+	return crypto_register_shash(&alg);
+}
+
+static void __exit sha1_ce_mod_fini(void)
+{
+	crypto_unregister_shash(&alg);
+}
+
+module_cpu_feature_match(SHA1, sha1_ce_mod_init);
+module_exit(sha1_ce_mod_fini);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 3/3] arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
  2014-03-28 11:05 [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Ard Biesheuvel
  2014-03-28 11:05 ` [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED Ard Biesheuvel
  2014-03-28 11:05 ` [PATCH RFC 2/3] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions Ard Biesheuvel
@ 2014-03-28 11:05 ` Ard Biesheuvel
  2014-03-29  2:03 ` [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Nicolas Pitre
  3 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2014-03-28 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms
for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |   5 +
 arch/arm64/crypto/Makefile       |   3 +
 arch/arm64/crypto/sha2-ce-core.S | 161 ++++++++++++++++++++++
 arch/arm64/crypto/sha2-ce-glue.c | 280 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 449 insertions(+)
 create mode 100644 arch/arm64/crypto/sha2-ce-core.S
 create mode 100644 arch/arm64/crypto/sha2-ce-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index af378bb608e8..c14e2ae98193 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -10,4 +10,9 @@ config CRYPTO_SHA1_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_SHA2_ARM64_CE
+	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0ed3caaec81b..0b3885a60d43 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -10,3 +10,6 @@
 
 obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
 sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
+
+obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
+sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
new file mode 100644
index 000000000000..40556fe1b06e
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -0,0 +1,161 @@
+/*
+ * sha2-ce-core.S - core SHA-224/SHA-256 transform using v8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+#include "preempt.h"
+
+	.text
+	.arch		armv8-a+crypto
+
+	dga		.req	q20
+	dgav		.req	v20
+	dgb		.req	q21
+	dgbv		.req	v21
+
+	t0		.req	v22
+	t1		.req	v23
+
+	dg0q		.req	q24
+	dg0v		.req	v24
+	dg1q		.req	q25
+	dg1v		.req	v25
+	dg2q		.req	q26
+	dg2v		.req	v26
+
+	.macro		add_only, ev, rc, s0
+	mov		dg2v.16b, dg0v.16b
+	.ifeq		\ev
+	add		t1.4s, v\s0\().4s, \rc\().4s
+	sha256h		dg0q, dg1q, t0.4s
+	sha256h2	dg1q, dg2q, t0.4s
+	.else
+	.ifnb		\s0
+	add		t0.4s, v\s0\().4s, \rc\().4s
+	.endif
+	sha256h		dg0q, dg1q, t1.4s
+	sha256h2	dg1q, dg2q, t1.4s
+	.endif
+	.endm
+
+	.macro		add_update, ev, rc, s0, s1, s2, s3
+	sha256su0	v\s0\().4s, v\s1\().4s
+	sha256su1	v\s0\().4s, v\s2\().4s, v\s3\().4s
+	add_only	\ev, \rc, \s1
+	.endm
+
+	/*
+	 * The SHA-256 round constants
+	 */
+	.align		4
+.Lsha2_rcon:
+	.word		0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+	.word		0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+	.word		0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+	.word		0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+	.word		0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+	.word		0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+	.word		0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+	.word		0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+	.word		0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+	.word		0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+	.word		0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+	.word		0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+	.word		0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+	.word		0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+	.word		0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+	.word		0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+
+	/*
+	 * int sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+	 *                       u8 *head, long bytes, struct thread_info *ti)
+	 */
+ENTRY(sha2_ce_transform)
+	/* load round constants */
+	adr		x8, .Lsha2_rcon
+	ld1		{ v0.4s- v3.4s}, [x8], #64
+	ld1		{ v4.4s- v7.4s}, [x8], #64
+	ld1		{ v8.4s-v11.4s}, [x8], #64
+	ld1		{v12.4s-v15.4s}, [x8]
+
+	/* load state */
+	ldp		dga, dgb, [x2]
+
+	/* load partial input (if supplied) */
+	cbz		x3, 0f
+	ld1		{v16.4s-v19.4s}, [x3]
+	b		1f
+
+	/* load input */
+0:	ld1		{v16.4s-v19.4s}, [x1], #64
+	sub		w0, w0, #1
+
+1:
+CPU_LE(	rev32		v16.16b, v16.16b	)
+CPU_LE(	rev32		v17.16b, v17.16b	)
+CPU_LE(	rev32		v18.16b, v18.16b	)
+CPU_LE(	rev32		v19.16b, v19.16b	)
+
+2:	add		t0.4s, v16.4s, v0.4s
+	mov		dg0v.16b, dgav.16b
+	mov		dg1v.16b, dgbv.16b
+
+	add_update	0,  v1, 16, 17, 18, 19
+	add_update	1,  v2, 17, 18, 19, 16
+	add_update	0,  v3, 18, 19, 16, 17
+	add_update	1,  v4, 19, 16, 17, 18
+
+	add_update	0,  v5, 16, 17, 18, 19
+	add_update	1,  v6, 17, 18, 19, 16
+	add_update	0,  v7, 18, 19, 16, 17
+	add_update	1,  v8, 19, 16, 17, 18
+
+	add_update	0,  v9, 16, 17, 18, 19
+	add_update	1, v10, 17, 18, 19, 16
+	add_update	0, v11, 18, 19, 16, 17
+	add_update	1, v12, 19, 16, 17, 18
+
+	add_only	0, v13, 17
+	add_only	1, v14, 18
+	add_only	0, v15, 19
+	add_only	1
+
+	/* update state */
+	add		dgav.4s, dgav.4s, dg0v.4s
+	add		dgbv.4s, dgbv.4s, dg1v.4s
+
+	/* handled all input blocks? */
+	cbz		w0, 4f
+
+	/* should we exit early? */
+	b_if_no_resched	x5, x8, 0b
+
+	/* store new state */
+3:	stp		dga, dgb, [x2]
+	ret
+
+	/*
+	 * Final block: add padding and total bit count.
+	 * Skip if we have no total byte count in x4. In that case, the input
+	 * size was not a round multiple of the block size, and the padding is
+	 * handled by the C code.
+	 */
+4:	cbz		x4, 3b
+	movi		v17.2d, #0
+	mov		x8, #0x80000000
+	movi		v18.2d, #0
+	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	fmov		d16, x8
+	mov		x4, #0
+	mov		v19.d[0], xzr
+	mov		v19.d[1], x7
+	b		2b
+ENDPROC(sha2_ce_transform)
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
new file mode 100644
index 000000000000..df44d4b5c6c0
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -0,0 +1,280 @@
+/*
+ * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage int sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+				 u8 *head, long bytes, struct thread_info *ti);
+
+static int sha224_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = {
+			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
+			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
+		}
+	};
+	return 0;
+}
+
+static int sha256_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = {
+			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
+		}
+	};
+	return 0;
+}
+
+static int sha2_update(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if ((partial + len) >= SHA256_BLOCK_SIZE) {
+		struct thread_info *ti = NULL;
+		int blocks;
+
+		if (partial) {
+			int p = SHA256_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		/*
+		 * Pass current's thread info pointer to sha2_ce_transform()
+		 * below if we want it to play nice under preemption.
+		 */
+		if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) ||
+		     IS_ENABLED(CONFIG_PREEMPT)) && !in_interrupt())
+			ti = current_thread_info();
+
+		blocks = len / SHA256_BLOCK_SIZE;
+		len %= SHA256_BLOCK_SIZE;
+
+		do {
+			int rem;
+
+			kernel_neon_begin_partial(28);
+			rem = sha2_ce_transform(blocks, data, sctx->state,
+						partial ? sctx->buf : NULL,
+						0, ti);
+			kernel_neon_end();
+
+			data += (blocks - rem) * SHA256_BLOCK_SIZE;
+			blocks = rem;
+			partial = 0;
+		} while (unlikely(ti && blocks > 0));
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+	return 0;
+}
+
+static void sha2_final(struct shash_desc *desc)
+{
+	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be64 bits = cpu_to_be64(sctx->count << 3);
+	u32 padlen = SHA256_BLOCK_SIZE
+		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
+
+	sha2_update(desc, padding, padlen);
+	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+}
+
+static int sha224_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_final(desc);
+
+	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_final(desc);
+
+	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static void sha2_finup(struct shash_desc *desc, const u8 *data, unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct thread_info *ti = NULL;
+	int blocks;
+
+	if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
+		sha2_update(desc, data, len);
+		sha2_final(desc);
+		return;
+	}
+
+	/*
+	 * Use a fast path if the input is a multiple of 64 bytes. In
+	 * this case, there is no need to copy data around, and we can
+	 * perform the entire digest calculation in a single invocation
+	 * of sha2_ce_transform()
+	 */
+	blocks = len / SHA256_BLOCK_SIZE;
+
+	if ((IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) ||
+	     IS_ENABLED(CONFIG_PREEMPT)) && !in_interrupt())
+		ti = current_thread_info();
+
+	do {
+		int rem;
+
+		kernel_neon_begin_partial(28);
+		rem = sha2_ce_transform(blocks, data, sctx->state,
+					NULL, len, ti);
+		kernel_neon_end();
+		data += (blocks - rem) * SHA256_BLOCK_SIZE;
+		blocks = rem;
+	} while (unlikely(ti && blocks > 0));
+}
+
+static int sha224_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_finup(desc, data, len);
+
+	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_finup(desc, data, len);
+
+	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha2_export(struct shash_desc *desc, void *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state *dst = out;
+
+	*dst = *sctx;
+	return 0;
+}
+
+static int sha2_import(struct shash_desc *desc, const void *in)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state const *src = in;
+
+	*sctx = *src;
+	return 0;
+}
+
+static struct shash_alg algs[] = { {
+	.init			= sha224_init,
+	.update			= sha2_update,
+	.final			= sha224_final,
+	.finup			= sha224_finup,
+	.export			= sha2_export,
+	.import			= sha2_import,
+	.descsize		= sizeof(struct sha256_state),
+	.digestsize		= SHA224_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha256_state),
+	.base			= {
+		.cra_name		= "sha224",
+		.cra_driver_name	= "sha224-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+}, {
+	.init			= sha256_init,
+	.update			= sha2_update,
+	.final			= sha256_final,
+	.finup			= sha256_finup,
+	.export			= sha2_export,
+	.import			= sha2_import,
+	.descsize		= sizeof(struct sha256_state),
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha256_state),
+	.base			= {
+		.cra_name		= "sha256",
+		.cra_driver_name	= "sha256-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+} };
+
+static int __init sha2_ce_mod_init(void)
+{
+	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha2_ce_mod_fini(void)
+{
+	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_cpu_feature_match(SHA2, sha2_ce_mod_init);
+module_exit(sha2_ce_mod_fini);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED
  2014-03-28 11:05 ` [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED Ard Biesheuvel
@ 2014-03-29  1:53   ` Nicolas Pitre
  2014-03-31 19:07     ` Ard Biesheuvel
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2014-03-29  1:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Mar 2014, Ard Biesheuvel wrote:

> This adds arch/arm64/crypto/preempt.h, currently containing just a single
> asm macro definition 'b_if_no_resched' that will be shared between multiple
> crypto algorithm implementations that need to test for preemption in the
> inner loop.

This file is a rather bad choice for this pretty generic macro.  There 
is nothing crypto specific about it, even if crypto might be the only 
user for now.  This should live in include/asm/assembler.h, or a 
separate file in that directory only if including <asm/asm-offsets.h> 
from assembler.h is considered a nuisance.


> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/crypto/preempt.h | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>  create mode 100644 arch/arm64/crypto/preempt.h
> 
> diff --git a/arch/arm64/crypto/preempt.h b/arch/arm64/crypto/preempt.h
> new file mode 100644
> index 000000000000..94302d5b5ae9
> --- /dev/null
> +++ b/arch/arm64/crypto/preempt.h
> @@ -0,0 +1,28 @@
> +/*
> + * preempt.h - shared macros to check preempt state
> + *
> + * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <asm/asm-offsets.h>
> +#include <asm/thread_info.h>
> +
> +	/*
> +	 * Branch to 'lb' but only if we have not been tagged for preemption.
> +	 *
> +	 * Expects current->thread_info in ti, or NULL if running in interrupt
> +	 * context. reg is a scratch x register.
> +	 */
> +	.macro		b_if_no_resched, ti, reg, lb
> +#if defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_VOLUNTARY)
> +	cbz		\ti, \lb			// have thread_info?
> +	ldr		\reg, [\ti, #TI_FLAGS]		// get flags
> +	tbz	 	\reg, #TIF_NEED_RESCHED, \lb	// needs rescheduling?
> +#else
> +	b		\lb
> +#endif
> +	.endm
> -- 
> 1.8.3.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT
  2014-03-28 11:05 [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2014-03-28 11:05 ` [PATCH RFC 3/3] arm64/crypto: SHA-224/SHA-256 " Ard Biesheuvel
@ 2014-03-29  2:03 ` Nicolas Pitre
  2014-03-31 19:04   ` Ard Biesheuvel
  3 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2014-03-29  2:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Mar 2014, Ard Biesheuvel wrote:

> This series is an attempt to reduce latency under CONFIG_PREEMPT while
> maintaining optimal throughput otherwise, i.e., under !CONFIG_PREEMPT or
> while running outside of process context.
> 
> In the in_interrupt() case, the calls to kernel_neon_begin and kernel_neon_end
> incur a fixed penalty (i.e., each call needs to stack/unstack a fixed number of
> registers), and preemption is not possible anyway, so the call into the crypto
> algorithm should just complete as fast as possible, ideally by processing all
> of the input in the core loop without having to spill state to memory or reload
> round keys (e.g., SHA-256 uses 64 32-bit round keys to process each input block
> of 64 bytes)
> 
> In contrast, when running in process context, we should avoid hogging the CPU by
> spending unreasonable amounts of time inside a kernel_neon_begin/kernel_neon_end
> section. However, reloading those 64 32-byte round keys to process each 64-byte
> block one by one is far from optimal.
> 
> The solution proposed here is to allow the inner loops of the crypto algorithms
> to test the TIF_NEED_RESCHED flag, and terminate early if it is set. This is
> essentially CONFIG_PREEMPT_VOLUNTARY, even under CONFIG_PREEMPT, but it is the
> best we can do when running with preemption disabled.
> 
> Patch #1 introduces the shared asm macro, patches #2 and #3 are the SHA-1 and
> SHA-224/SHA-256 implementations I posted earlier, but reworked to utilize
> voluntary preemption.

How extensive is the required rework?  If reasonably small, I think this 
would be better to have #2 and #3 as patches to be applied on top of 
your initial implementations instead.  It helps with patch review, and 
it makes it easier in the occurrence of a problem to tell users to just 
revert commit xyz in order to get the SHA code without voluntary 
preemption for testing.


Nicolas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT
  2014-03-29  2:03 ` [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Nicolas Pitre
@ 2014-03-31 19:04   ` Ard Biesheuvel
  0 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2014-03-31 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 March 2014 03:03, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Fri, 28 Mar 2014, Ard Biesheuvel wrote:
>
>> This series is an attempt to reduce latency under CONFIG_PREEMPT while
>> maintaining optimal throughput otherwise, i.e., under !CONFIG_PREEMPT or
>> while running outside of process context.
>>
>> In the in_interrupt() case, the calls to kernel_neon_begin and kernel_neon_end
>> incur a fixed penalty (i.e., each call needs to stack/unstack a fixed number of
>> registers), and preemption is not possible anyway, so the call into the crypto
>> algorithm should just complete as fast as possible, ideally by processing all
>> of the input in the core loop without having to spill state to memory or reload
>> round keys (e.g., SHA-256 uses 64 32-bit round keys to process each input block
>> of 64 bytes)
>>
>> In contrast, when running in process context, we should avoid hogging the CPU by
>> spending unreasonable amounts of time inside a kernel_neon_begin/kernel_neon_end
>> section. However, reloading those 64 32-byte round keys to process each 64-byte
>> block one by one is far from optimal.
>>
>> The solution proposed here is to allow the inner loops of the crypto algorithms
>> to test the TIF_NEED_RESCHED flag, and terminate early if it is set. This is
>> essentially CONFIG_PREEMPT_VOLUNTARY, even under CONFIG_PREEMPT, but it is the
>> best we can do when running with preemption disabled.
>>
>> Patch #1 introduces the shared asm macro, patches #2 and #3 are the SHA-1 and
>> SHA-224/SHA-256 implementations I posted earlier, but reworked to utilize
>> voluntary preemption.
>
> How extensive is the required rework?  If reasonably small, I think this
> would be better to have #2 and #3 as patches to be applied on top of
> your initial implementations instead.  It helps with patch review, and
> it makes it easier in the occurrence of a problem to tell users to just
> revert commit xyz in order to get the SHA code without voluntary
> preemption for testing.
>

I can do that. I will also put the preempt.h include file elsewhere,
as you suggested in the other thread, and post back with a v2 series
tomorrow

Regards,
Ard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED
  2014-03-29  1:53   ` Nicolas Pitre
@ 2014-03-31 19:07     ` Ard Biesheuvel
  0 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2014-03-31 19:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 March 2014 02:53, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Fri, 28 Mar 2014, Ard Biesheuvel wrote:
>
>> This adds arch/arm64/crypto/preempt.h, currently containing just a single
>> asm macro definition 'b_if_no_resched' that will be shared between multiple
>> crypto algorithm implementations that need to test for preemption in the
>> inner loop.
>
> This file is a rather bad choice for this pretty generic macro.  There
> is nothing crypto specific about it, even if crypto might be the only
> user for now.  This should live in include/asm/assembler.h, or a
> separate file in that directory only if including <asm/asm-offsets.h>
> from assembler.h is considered a nuisance.
>

True, there is nothing crypto specific about it,

@Catalin: would you object to adding this macro (and the #include
asm-offsets.h) to assembler.h?
Or would you prefer to have it in a separate file?

-- 
Ard.



>
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/crypto/preempt.h | 28 ++++++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>  create mode 100644 arch/arm64/crypto/preempt.h
>>
>> diff --git a/arch/arm64/crypto/preempt.h b/arch/arm64/crypto/preempt.h
>> new file mode 100644
>> index 000000000000..94302d5b5ae9
>> --- /dev/null
>> +++ b/arch/arm64/crypto/preempt.h
>> @@ -0,0 +1,28 @@
>> +/*
>> + * preempt.h - shared macros to check preempt state
>> + *
>> + * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <asm/asm-offsets.h>
>> +#include <asm/thread_info.h>
>> +
>> +     /*
>> +      * Branch to 'lb' but only if we have not been tagged for preemption.
>> +      *
>> +      * Expects current->thread_info in ti, or NULL if running in interrupt
>> +      * context. reg is a scratch x register.
>> +      */
>> +     .macro          b_if_no_resched, ti, reg, lb
>> +#if defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_VOLUNTARY)
>> +     cbz             \ti, \lb                        // have thread_info?
>> +     ldr             \reg, [\ti, #TI_FLAGS]          // get flags
>> +     tbz             \reg, #TIF_NEED_RESCHED, \lb    // needs rescheduling?
>> +#else
>> +     b               \lb
>> +#endif
>> +     .endm
>> --
>> 1.8.3.2
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-03-31 19:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-28 11:05 [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Ard Biesheuvel
2014-03-28 11:05 ` [PATCH RFC 1/3] arm64/crypto: add shared macro to test for NEED_RESCHED Ard Biesheuvel
2014-03-29  1:53   ` Nicolas Pitre
2014-03-31 19:07     ` Ard Biesheuvel
2014-03-28 11:05 ` [PATCH RFC 2/3] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions Ard Biesheuvel
2014-03-28 11:05 ` [PATCH RFC 3/3] arm64/crypto: SHA-224/SHA-256 " Ard Biesheuvel
2014-03-29  2:03 ` [PATCH RFC 0/3] arm64: NEON crypto under CONFIG_PREEMPT Nicolas Pitre
2014-03-31 19:04   ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.