All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-01 15:49 ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This is a repost of the arm64 crypto patches that I have posted to the LAKML
over the past months. They have now been verified on actual hardware
(Cortex-A57) so if there are no remaining issues I would like to propose them
for 3.16.

Ard Biesheuvel (15):
  asm-generic: allow generic unaligned access if the arch supports it
  arm64: add abstractions for FPSIMD state manipulation
  arm64: defer reloading a task's FPSIMD state to userland resume
  arm64: add support for kernel mode NEON in interrupt context
  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
  arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
  arm64/crypto: AES using ARMv8 Crypto Extensions
  arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
  arm64: pull in <asm/simd.h> from asm-generic
  arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
    Extensions
  arm64/crypto: add shared macro to test for NEED_RESCHED
  arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
  arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
  arm64/crypto: add voluntary preemption to Crypto Extensions GHASH

 arch/arm64/Kconfig                    |   3 +
 arch/arm64/Makefile                   |   1 +
 arch/arm64/crypto/Kconfig             |  53 ++++
 arch/arm64/crypto/Makefile            |  38 +++
 arch/arm64/crypto/aes-ce-ccm-core.S   | 222 ++++++++++++++
 arch/arm64/crypto/aes-ce-ccm-glue.c   | 297 ++++++++++++++++++
 arch/arm64/crypto/aes-ce-cipher.c     | 155 ++++++++++
 arch/arm64/crypto/aes-ce.S            | 147 +++++++++
 arch/arm64/crypto/aes-glue.c          | 446 +++++++++++++++++++++++++++
 arch/arm64/crypto/aes-modes.S         | 548 ++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/aes-neon.S          | 382 ++++++++++++++++++++++++
 arch/arm64/crypto/ghash-ce-core.S     |  97 ++++++
 arch/arm64/crypto/ghash-ce-glue.c     | 172 +++++++++++
 arch/arm64/crypto/sha1-ce-core.S      | 154 ++++++++++
 arch/arm64/crypto/sha1-ce-glue.c      | 201 +++++++++++++
 arch/arm64/crypto/sha2-ce-core.S      | 159 ++++++++++
 arch/arm64/crypto/sha2-ce-glue.c      | 281 +++++++++++++++++
 arch/arm64/include/asm/Kbuild         |   1 +
 arch/arm64/include/asm/assembler.h    |  21 ++
 arch/arm64/include/asm/fpsimd.h       |  23 ++
 arch/arm64/include/asm/fpsimdmacros.h |  35 +++
 arch/arm64/include/asm/neon.h         |   6 +-
 arch/arm64/include/asm/thread_info.h  |   4 +-
 arch/arm64/kernel/entry-fpsimd.S      |  24 ++
 arch/arm64/kernel/entry.S             |   2 +-
 arch/arm64/kernel/fpsimd.c            | 187 ++++++++++--
 arch/arm64/kernel/process.c           |   2 +-
 arch/arm64/kernel/ptrace.c            |   2 +
 arch/arm64/kernel/signal.c            |  13 +-
 arch/arm64/kernel/signal32.c          |   9 +-
 include/asm-generic/unaligned.h       |  21 +-
 31 files changed, 3662 insertions(+), 44 deletions(-)
 create mode 100644 arch/arm64/crypto/Kconfig
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c
 create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
 create mode 100644 arch/arm64/crypto/aes-ce.S
 create mode 100644 arch/arm64/crypto/aes-glue.c
 create mode 100644 arch/arm64/crypto/aes-modes.S
 create mode 100644 arch/arm64/crypto/aes-neon.S
 create mode 100644 arch/arm64/crypto/ghash-ce-core.S
 create mode 100644 arch/arm64/crypto/ghash-ce-glue.c
 create mode 100644 arch/arm64/crypto/sha1-ce-core.S
 create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
 create mode 100644 arch/arm64/crypto/sha2-ce-core.S
 create mode 100644 arch/arm64/crypto/sha2-ce-glue.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-01 15:49 ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This is a repost of the arm64 crypto patches that I have posted to the LAKML
over the past months. They have now been verified on actual hardware
(Cortex-A57) so if there are no remaining issues I would like to propose them
for 3.16.

Ard Biesheuvel (15):
  asm-generic: allow generic unaligned access if the arch supports it
  arm64: add abstractions for FPSIMD state manipulation
  arm64: defer reloading a task's FPSIMD state to userland resume
  arm64: add support for kernel mode NEON in interrupt context
  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
  arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
  arm64/crypto: AES using ARMv8 Crypto Extensions
  arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
  arm64: pull in <asm/simd.h> from asm-generic
  arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
    Extensions
  arm64/crypto: add shared macro to test for NEED_RESCHED
  arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
  arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
  arm64/crypto: add voluntary preemption to Crypto Extensions GHASH

 arch/arm64/Kconfig                    |   3 +
 arch/arm64/Makefile                   |   1 +
 arch/arm64/crypto/Kconfig             |  53 ++++
 arch/arm64/crypto/Makefile            |  38 +++
 arch/arm64/crypto/aes-ce-ccm-core.S   | 222 ++++++++++++++
 arch/arm64/crypto/aes-ce-ccm-glue.c   | 297 ++++++++++++++++++
 arch/arm64/crypto/aes-ce-cipher.c     | 155 ++++++++++
 arch/arm64/crypto/aes-ce.S            | 147 +++++++++
 arch/arm64/crypto/aes-glue.c          | 446 +++++++++++++++++++++++++++
 arch/arm64/crypto/aes-modes.S         | 548 ++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/aes-neon.S          | 382 ++++++++++++++++++++++++
 arch/arm64/crypto/ghash-ce-core.S     |  97 ++++++
 arch/arm64/crypto/ghash-ce-glue.c     | 172 +++++++++++
 arch/arm64/crypto/sha1-ce-core.S      | 154 ++++++++++
 arch/arm64/crypto/sha1-ce-glue.c      | 201 +++++++++++++
 arch/arm64/crypto/sha2-ce-core.S      | 159 ++++++++++
 arch/arm64/crypto/sha2-ce-glue.c      | 281 +++++++++++++++++
 arch/arm64/include/asm/Kbuild         |   1 +
 arch/arm64/include/asm/assembler.h    |  21 ++
 arch/arm64/include/asm/fpsimd.h       |  23 ++
 arch/arm64/include/asm/fpsimdmacros.h |  35 +++
 arch/arm64/include/asm/neon.h         |   6 +-
 arch/arm64/include/asm/thread_info.h  |   4 +-
 arch/arm64/kernel/entry-fpsimd.S      |  24 ++
 arch/arm64/kernel/entry.S             |   2 +-
 arch/arm64/kernel/fpsimd.c            | 187 ++++++++++--
 arch/arm64/kernel/process.c           |   2 +-
 arch/arm64/kernel/ptrace.c            |   2 +
 arch/arm64/kernel/signal.c            |  13 +-
 arch/arm64/kernel/signal32.c          |   9 +-
 include/asm-generic/unaligned.h       |  21 +-
 31 files changed, 3662 insertions(+), 44 deletions(-)
 create mode 100644 arch/arm64/crypto/Kconfig
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c
 create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
 create mode 100644 arch/arm64/crypto/aes-ce.S
 create mode 100644 arch/arm64/crypto/aes-glue.c
 create mode 100644 arch/arm64/crypto/aes-modes.S
 create mode 100644 arch/arm64/crypto/aes-neon.S
 create mode 100644 arch/arm64/crypto/ghash-ce-core.S
 create mode 100644 arch/arm64/crypto/ghash-ce-glue.c
 create mode 100644 arch/arm64/crypto/sha1-ce-core.S
 create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
 create mode 100644 arch/arm64/crypto/sha2-ce-core.S
 create mode 100644 arch/arm64/crypto/sha2-ce-glue.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

Switch the default unaligned access method to 'hardware implemented'
if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/unaligned.h | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 03cf5936bad6..1ac097279db1 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -4,22 +4,27 @@
 /*
  * This is the most generic implementation of unaligned accesses
  * and should work almost anywhere.
- *
- * If an architecture can handle unaligned accesses in hardware,
- * it may want to use the linux/unaligned/access_ok.h implementation
- * instead.
  */
 #include <asm/byteorder.h>
 
+/* Set by the arch if it can handle unaligned accesses in hardware. */
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/access_ok.h>
+#endif
+
 #if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+#  include <linux/unaligned/le_struct.h>
+#  include <linux/unaligned/be_byteshift.h>
+# endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+#  include <linux/unaligned/be_struct.h>
+#  include <linux/unaligned/le_byteshift.h>
+# endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

Switch the default unaligned access method to 'hardware implemented'
if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/unaligned.h | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 03cf5936bad6..1ac097279db1 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -4,22 +4,27 @@
 /*
  * This is the most generic implementation of unaligned accesses
  * and should work almost anywhere.
- *
- * If an architecture can handle unaligned accesses in hardware,
- * it may want to use the linux/unaligned/access_ok.h implementation
- * instead.
  */
 #include <asm/byteorder.h>
 
+/* Set by the arch if it can handle unaligned accesses in hardware. */
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/access_ok.h>
+#endif
+
 #if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+#  include <linux/unaligned/le_struct.h>
+#  include <linux/unaligned/be_byteshift.h>
+# endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+#  include <linux/unaligned/be_struct.h>
+#  include <linux/unaligned/le_byteshift.h>
+# endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

There are two tacit assumptions in the FPSIMD handling code that will no longer
hold after the next patch that optimizes away some FPSIMD state restores:
. the FPSIMD registers of this CPU contain the userland FPSIMD state of
  task 'current';
. when switching to a task, its FPSIMD state will always be restored from
  memory.

This patch adds the following functions to abstract away from straight FPSIMD
register file saves and restores:
- fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved
- fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded
- fpsimd_update_current_state -> replace current's FPSIMD state
- fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state

Where necessary, the ptrace, signal handling and fork code are updated to use
the above wrappers instead of poking into the FPSIMD registers directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h |  6 ++++++
 arch/arm64/kernel/fpsimd.c      | 33 +++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c     |  2 +-
 arch/arm64/kernel/ptrace.c      |  2 ++
 arch/arm64/kernel/signal.c      |  9 +++------
 arch/arm64/kernel/signal32.c    |  9 +++------
 6 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac13008..9f9b8e438546 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -58,6 +58,12 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
+extern void fpsimd_preserve_current_state(void);
+extern void fpsimd_restore_current_state(void);
+extern void fpsimd_update_current_state(struct fpsimd_state *state);
+
+extern void fpsimd_flush_task_state(struct task_struct *target);
+
 #endif
 
 #endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4aef42a04bdc..86ac6a9bc86a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
 	preempt_enable();
 }
 
+/*
+ * Save the userland FPSIMD state of 'current' to memory
+ */
+void fpsimd_preserve_current_state(void)
+{
+	fpsimd_save_state(&current->thread.fpsimd_state);
+}
+
+/*
+ * Load the userland FPSIMD state of 'current' from memory
+ */
+void fpsimd_restore_current_state(void)
+{
+	fpsimd_load_state(&current->thread.fpsimd_state);
+}
+
+/*
+ * Load an updated userland FPSIMD state for 'current' from memory
+ */
+void fpsimd_update_current_state(struct fpsimd_state *state)
+{
+	preempt_disable();
+	fpsimd_load_state(state);
+	preempt_enable();
+}
+
+/*
+ * Invalidate live CPU copies of task t's FPSIMD state
+ */
+void fpsimd_flush_task_state(struct task_struct *t)
+{
+}
+
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6391485f342d..c5693163408c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -205,7 +205,7 @@ void release_thread(struct task_struct *dead_task)
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	fpsimd_save_state(&current->thread.fpsimd_state);
+	fpsimd_preserve_current_state();
 	*dst = *src;
 	return 0;
 }
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 6a8928bba03c..f8700eca24e7 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -517,6 +517,7 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		return ret;
 
 	target->thread.fpsimd_state.user_fpsimd = newstate;
+	fpsimd_flush_task_state(target);
 	return ret;
 }
 
@@ -764,6 +765,7 @@ static int compat_vfp_set(struct task_struct *target,
 		uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
 	}
 
+	fpsimd_flush_task_state(target);
 	return ret;
 }
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591f75dd..06448a77ff53 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -51,7 +51,7 @@ static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
 	int err;
 
 	/* dump the hardware registers to the fpsimd_state structure */
-	fpsimd_save_state(fpsimd);
+	fpsimd_preserve_current_state();
 
 	/* copy the FP and status/control registers */
 	err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
@@ -86,11 +86,8 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
 
 	/* load the hardware registers from the fpsimd_state structure */
-	if (!err) {
-		preempt_disable();
-		fpsimd_load_state(&fpsimd);
-		preempt_enable();
-	}
+	if (!err)
+		fpsimd_update_current_state(&fpsimd);
 
 	return err ? -EFAULT : 0;
 }
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index b3fc9f5ec6d3..ac7e237d0bda 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -219,7 +219,7 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
 	 * Note that this also saves V16-31, which aren't visible
 	 * in AArch32.
 	 */
-	fpsimd_save_state(fpsimd);
+	fpsimd_preserve_current_state();
 
 	/* Place structure header on the stack */
 	__put_user_error(magic, &frame->magic, err);
@@ -282,11 +282,8 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
 	 * We don't need to touch the exception register, so
 	 * reload the hardware state.
 	 */
-	if (!err) {
-		preempt_disable();
-		fpsimd_load_state(&fpsimd);
-		preempt_enable();
-	}
+	if (!err)
+		fpsimd_update_current_state(&fpsimd);
 
 	return err ? -EFAULT : 0;
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

There are two tacit assumptions in the FPSIMD handling code that will no longer
hold after the next patch that optimizes away some FPSIMD state restores:
. the FPSIMD registers of this CPU contain the userland FPSIMD state of
  task 'current';
. when switching to a task, its FPSIMD state will always be restored from
  memory.

This patch adds the following functions to abstract away from straight FPSIMD
register file saves and restores:
- fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved
- fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded
- fpsimd_update_current_state -> replace current's FPSIMD state
- fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state

Where necessary, the ptrace, signal handling and fork code are updated to use
the above wrappers instead of poking into the FPSIMD registers directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h |  6 ++++++
 arch/arm64/kernel/fpsimd.c      | 33 +++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c     |  2 +-
 arch/arm64/kernel/ptrace.c      |  2 ++
 arch/arm64/kernel/signal.c      |  9 +++------
 arch/arm64/kernel/signal32.c    |  9 +++------
 6 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac13008..9f9b8e438546 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -58,6 +58,12 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
+extern void fpsimd_preserve_current_state(void);
+extern void fpsimd_restore_current_state(void);
+extern void fpsimd_update_current_state(struct fpsimd_state *state);
+
+extern void fpsimd_flush_task_state(struct task_struct *target);
+
 #endif
 
 #endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4aef42a04bdc..86ac6a9bc86a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
 	preempt_enable();
 }
 
+/*
+ * Save the userland FPSIMD state of 'current' to memory
+ */
+void fpsimd_preserve_current_state(void)
+{
+	fpsimd_save_state(&current->thread.fpsimd_state);
+}
+
+/*
+ * Load the userland FPSIMD state of 'current' from memory
+ */
+void fpsimd_restore_current_state(void)
+{
+	fpsimd_load_state(&current->thread.fpsimd_state);
+}
+
+/*
+ * Load an updated userland FPSIMD state for 'current' from memory
+ */
+void fpsimd_update_current_state(struct fpsimd_state *state)
+{
+	preempt_disable();
+	fpsimd_load_state(state);
+	preempt_enable();
+}
+
+/*
+ * Invalidate live CPU copies of task t's FPSIMD state
+ */
+void fpsimd_flush_task_state(struct task_struct *t)
+{
+}
+
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6391485f342d..c5693163408c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -205,7 +205,7 @@ void release_thread(struct task_struct *dead_task)
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	fpsimd_save_state(&current->thread.fpsimd_state);
+	fpsimd_preserve_current_state();
 	*dst = *src;
 	return 0;
 }
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 6a8928bba03c..f8700eca24e7 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -517,6 +517,7 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		return ret;
 
 	target->thread.fpsimd_state.user_fpsimd = newstate;
+	fpsimd_flush_task_state(target);
 	return ret;
 }
 
@@ -764,6 +765,7 @@ static int compat_vfp_set(struct task_struct *target,
 		uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
 	}
 
+	fpsimd_flush_task_state(target);
 	return ret;
 }
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591f75dd..06448a77ff53 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -51,7 +51,7 @@ static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
 	int err;
 
 	/* dump the hardware registers to the fpsimd_state structure */
-	fpsimd_save_state(fpsimd);
+	fpsimd_preserve_current_state();
 
 	/* copy the FP and status/control registers */
 	err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
@@ -86,11 +86,8 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 	__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
 
 	/* load the hardware registers from the fpsimd_state structure */
-	if (!err) {
-		preempt_disable();
-		fpsimd_load_state(&fpsimd);
-		preempt_enable();
-	}
+	if (!err)
+		fpsimd_update_current_state(&fpsimd);
 
 	return err ? -EFAULT : 0;
 }
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index b3fc9f5ec6d3..ac7e237d0bda 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -219,7 +219,7 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
 	 * Note that this also saves V16-31, which aren't visible
 	 * in AArch32.
 	 */
-	fpsimd_save_state(fpsimd);
+	fpsimd_preserve_current_state();
 
 	/* Place structure header on the stack */
 	__put_user_error(magic, &frame->magic, err);
@@ -282,11 +282,8 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
 	 * We don't need to touch the exception register, so
 	 * reload the hardware state.
 	 */
-	if (!err) {
-		preempt_disable();
-		fpsimd_load_state(&fpsimd);
-		preempt_enable();
-	}
+	if (!err)
+		fpsimd_update_current_state(&fpsimd);
 
 	return err ? -EFAULT : 0;
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.

This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task re-enters userland. If a task does not enter
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h      |   2 +
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/entry.S            |   2 +-
 arch/arm64/kernel/fpsimd.c           | 136 ++++++++++++++++++++++++++++++-----
 arch/arm64/kernel/signal.c           |   4 ++
 5 files changed, 127 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9f9b8e438546..7a900142dbc8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,6 +37,8 @@ struct fpsimd_state {
 			u32 fpcr;
 		};
 	};
+	/* the id of the last cpu to have restored this state */
+	unsigned int cpu;
 };
 
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b66ffd..4a1ca1cfb2f8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
+#define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_SYSCALL_TRACE	8
 #define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
@@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
+#define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-				 _TIF_NOTIFY_RESUME)
+				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 39ac630d83de..80464e2fb1a5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -576,7 +576,7 @@ fast_work_pending:
 	str	x0, [sp, #S_X0]			// returned x0
 work_pending:
 	tbnz	x1, #TIF_NEED_RESCHED, work_resched
-	/* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+	/* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
 	ldr	x2, [sp, #S_PSTATE]
 	mov	x0, sp				// 'regs'
 	tst	x2, #PSR_MODE_MASK		// user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 86ac6a9bc86a..6cfbb4ef27d7 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -35,6 +35,60 @@
 #define FPEXC_IDF	(1 << 7)
 
 /*
+ * In order to reduce the number of times the FPSIMD state is needlessly saved
+ * and restored, we need to keep track of two things:
+ * (a) for each task, we need to remember which CPU was the last one to have
+ *     the task's FPSIMD state loaded into its FPSIMD registers;
+ * (b) for each CPU, we need to remember which task's userland FPSIMD state has
+ *     been loaded into its FPSIMD registers most recently, or whether it has
+ *     been used to perform kernel mode NEON in the meantime.
+ *
+ * For (a), we add a 'cpu' field to struct fpsimd_state, which gets updated to
+ * the id of the current CPU everytime the state is loaded onto a CPU. For (b),
+ * we add the per-cpu variable 'fpsimd_last_state' (below), which contains the
+ * address of the userland FPSIMD state of the task that was loaded onto the CPU
+ * the most recently, or NULL if kernel mode NEON has been performed after that.
+ *
+ * With this in place, we no longer have to restore the next FPSIMD state right
+ * when switching between tasks. Instead, we can defer this check to userland
+ * resume, at which time we verify whether the CPU's fpsimd_last_state and the
+ * task's fpsimd_state.cpu are still mutually in sync. If this is the case, we
+ * can omit the FPSIMD restore.
+ *
+ * As an optimization, we use the thread_info flag TIF_FOREIGN_FPSTATE to
+ * indicate whether or not the userland FPSIMD state of the current task is
+ * present in the registers. The flag is set unless the FPSIMD registers of this
+ * CPU currently contain the most recent userland FPSIMD state of the current
+ * task.
+ *
+ * For a certain task, the sequence may look something like this:
+ * - the task gets scheduled in; if both the task's fpsimd_state.cpu field
+ *   contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu
+ *   variable points to the task's fpsimd_state, the TIF_FOREIGN_FPSTATE flag is
+ *   cleared, otherwise it is set;
+ *
+ * - the task returns to userland; if TIF_FOREIGN_FPSTATE is set, the task's
+ *   userland FPSIMD state is copied from memory to the registers, the task's
+ *   fpsimd_state.cpu field is set to the id of the current CPU, the current
+ *   CPU's fpsimd_last_state pointer is set to this task's fpsimd_state and the
+ *   TIF_FOREIGN_FPSTATE flag is cleared;
+ *
+ * - the task executes an ordinary syscall; upon return to userland, the
+ *   TIF_FOREIGN_FPSTATE flag will still be cleared, so no FPSIMD state is
+ *   restored;
+ *
+ * - the task executes a syscall which executes some NEON instructions; this is
+ *   preceded by a call to kernel_neon_begin(), which copies the task's FPSIMD
+ *   register contents to memory, clears the fpsimd_last_state per-cpu variable
+ *   and sets the TIF_FOREIGN_FPSTATE flag;
+ *
+ * - the task gets preempted after kernel_neon_end() is called; as we have not
+ *   returned from the 2nd syscall yet, TIF_FOREIGN_FPSTATE is still set so
+ *   whatever is in the FPSIMD registers is not saved to memory, but discarded.
+ */
+static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state);
+
+/*
  * Trapped FP/ASIMD access.
  */
 void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
@@ -72,44 +126,85 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
-	/* check if not kernel threads */
-	if (current->mm)
+	/*
+	 * Save the current FPSIMD state to memory, but only if whatever is in
+	 * the registers is in fact the most recent userland FPSIMD state of
+	 * 'current'.
+	 */
+	if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
-	if (next->mm)
-		fpsimd_load_state(&next->thread.fpsimd_state);
+
+	if (next->mm) {
+		/*
+		 * If we are switching to a task whose most recent userland
+		 * FPSIMD state is already in the registers of *this* cpu,
+		 * we can skip loading the state from memory. Otherwise, set
+		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
+		 * upon the next return to userland.
+		 */
+		struct fpsimd_state *st = &next->thread.fpsimd_state;
+
+		if (__this_cpu_read(fpsimd_last_state) == st
+		    && st->cpu == smp_processor_id())
+			clear_ti_thread_flag(task_thread_info(next),
+					     TIF_FOREIGN_FPSTATE);
+		else
+			set_ti_thread_flag(task_thread_info(next),
+					   TIF_FOREIGN_FPSTATE);
+	}
 }
 
 void fpsimd_flush_thread(void)
 {
-	preempt_disable();
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
-	fpsimd_load_state(&current->thread.fpsimd_state);
-	preempt_enable();
+	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
 /*
- * Save the userland FPSIMD state of 'current' to memory
+ * Save the userland FPSIMD state of 'current' to memory, but only if the state
+ * currently held in the registers does in fact belong to 'current'
  */
 void fpsimd_preserve_current_state(void)
 {
-	fpsimd_save_state(&current->thread.fpsimd_state);
+	preempt_disable();
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
+		fpsimd_save_state(&current->thread.fpsimd_state);
+	preempt_enable();
 }
 
 /*
- * Load the userland FPSIMD state of 'current' from memory
+ * Load the userland FPSIMD state of 'current' from memory, but only if the
+ * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
+ * state of 'current'
  */
 void fpsimd_restore_current_state(void)
 {
-	fpsimd_load_state(&current->thread.fpsimd_state);
+	preempt_disable();
+	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		struct fpsimd_state *st = &current->thread.fpsimd_state;
+
+		fpsimd_load_state(st);
+		this_cpu_write(fpsimd_last_state, st);
+		st->cpu = smp_processor_id();
+	}
+	preempt_enable();
 }
 
 /*
- * Load an updated userland FPSIMD state for 'current' from memory
+ * Load an updated userland FPSIMD state for 'current' from memory and set the
+ * flag that indicates that the FPSIMD register contents are the most recent
+ * FPSIMD state of 'current'
  */
 void fpsimd_update_current_state(struct fpsimd_state *state)
 {
 	preempt_disable();
 	fpsimd_load_state(state);
+	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		struct fpsimd_state *st = &current->thread.fpsimd_state;
+
+		this_cpu_write(fpsimd_last_state, st);
+		st->cpu = smp_processor_id();
+	}
 	preempt_enable();
 }
 
@@ -118,6 +213,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
  */
 void fpsimd_flush_task_state(struct task_struct *t)
 {
+	t->thread.fpsimd_state.cpu = NR_CPUS;
 }
 
 #ifdef CONFIG_KERNEL_MODE_NEON
@@ -131,16 +227,19 @@ void kernel_neon_begin(void)
 	BUG_ON(in_interrupt());
 	preempt_disable();
 
-	if (current->mm)
+	/*
+	 * Save the userland FPSIMD state if we have one and if we haven't done
+	 * so already. Clear fpsimd_last_state to indicate that there is no
+	 * longer userland FPSIMD state in the registers.
+	 */
+	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
+	this_cpu_write(fpsimd_last_state, NULL);
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
 void kernel_neon_end(void)
 {
-	if (current->mm)
-		fpsimd_load_state(&current->thread.fpsimd_state);
-
 	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
@@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 {
 	switch (cmd) {
 	case CPU_PM_ENTER:
-		if (current->mm)
+		if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
 			fpsimd_save_state(&current->thread.fpsimd_state);
 		break;
 	case CPU_PM_EXIT:
-		if (current->mm)
-			fpsimd_load_state(&current->thread.fpsimd_state);
+		set_thread_flag(TIF_FOREIGN_FPSTATE);
 		break;
 	case CPU_PM_ENTER_FAILED:
 	default:
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 06448a77ff53..882f01774365 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 	}
+
+	if (thread_flags & _TIF_FOREIGN_FPSTATE)
+		fpsimd_restore_current_state();
+
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.

This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task re-enters userland. If a task does not enter
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h      |   2 +
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/entry.S            |   2 +-
 arch/arm64/kernel/fpsimd.c           | 136 ++++++++++++++++++++++++++++++-----
 arch/arm64/kernel/signal.c           |   4 ++
 5 files changed, 127 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9f9b8e438546..7a900142dbc8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,6 +37,8 @@ struct fpsimd_state {
 			u32 fpcr;
 		};
 	};
+	/* the id of the last cpu to have restored this state */
+	unsigned int cpu;
 };
 
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b66ffd..4a1ca1cfb2f8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
+#define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_SYSCALL_TRACE	8
 #define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
@@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
+#define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-				 _TIF_NOTIFY_RESUME)
+				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 39ac630d83de..80464e2fb1a5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -576,7 +576,7 @@ fast_work_pending:
 	str	x0, [sp, #S_X0]			// returned x0
 work_pending:
 	tbnz	x1, #TIF_NEED_RESCHED, work_resched
-	/* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+	/* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
 	ldr	x2, [sp, #S_PSTATE]
 	mov	x0, sp				// 'regs'
 	tst	x2, #PSR_MODE_MASK		// user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 86ac6a9bc86a..6cfbb4ef27d7 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -35,6 +35,60 @@
 #define FPEXC_IDF	(1 << 7)
 
 /*
+ * In order to reduce the number of times the FPSIMD state is needlessly saved
+ * and restored, we need to keep track of two things:
+ * (a) for each task, we need to remember which CPU was the last one to have
+ *     the task's FPSIMD state loaded into its FPSIMD registers;
+ * (b) for each CPU, we need to remember which task's userland FPSIMD state has
+ *     been loaded into its FPSIMD registers most recently, or whether it has
+ *     been used to perform kernel mode NEON in the meantime.
+ *
+ * For (a), we add a 'cpu' field to struct fpsimd_state, which gets updated to
+ * the id of the current CPU everytime the state is loaded onto a CPU. For (b),
+ * we add the per-cpu variable 'fpsimd_last_state' (below), which contains the
+ * address of the userland FPSIMD state of the task that was loaded onto the CPU
+ * the most recently, or NULL if kernel mode NEON has been performed after that.
+ *
+ * With this in place, we no longer have to restore the next FPSIMD state right
+ * when switching between tasks. Instead, we can defer this check to userland
+ * resume,@which time we verify whether the CPU's fpsimd_last_state and the
+ * task's fpsimd_state.cpu are still mutually in sync. If this is the case, we
+ * can omit the FPSIMD restore.
+ *
+ * As an optimization, we use the thread_info flag TIF_FOREIGN_FPSTATE to
+ * indicate whether or not the userland FPSIMD state of the current task is
+ * present in the registers. The flag is set unless the FPSIMD registers of this
+ * CPU currently contain the most recent userland FPSIMD state of the current
+ * task.
+ *
+ * For a certain task, the sequence may look something like this:
+ * - the task gets scheduled in; if both the task's fpsimd_state.cpu field
+ *   contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu
+ *   variable points to the task's fpsimd_state, the TIF_FOREIGN_FPSTATE flag is
+ *   cleared, otherwise it is set;
+ *
+ * - the task returns to userland; if TIF_FOREIGN_FPSTATE is set, the task's
+ *   userland FPSIMD state is copied from memory to the registers, the task's
+ *   fpsimd_state.cpu field is set to the id of the current CPU, the current
+ *   CPU's fpsimd_last_state pointer is set to this task's fpsimd_state and the
+ *   TIF_FOREIGN_FPSTATE flag is cleared;
+ *
+ * - the task executes an ordinary syscall; upon return to userland, the
+ *   TIF_FOREIGN_FPSTATE flag will still be cleared, so no FPSIMD state is
+ *   restored;
+ *
+ * - the task executes a syscall which executes some NEON instructions; this is
+ *   preceded by a call to kernel_neon_begin(), which copies the task's FPSIMD
+ *   register contents to memory, clears the fpsimd_last_state per-cpu variable
+ *   and sets the TIF_FOREIGN_FPSTATE flag;
+ *
+ * - the task gets preempted after kernel_neon_end() is called; as we have not
+ *   returned from the 2nd syscall yet, TIF_FOREIGN_FPSTATE is still set so
+ *   whatever is in the FPSIMD registers is not saved to memory, but discarded.
+ */
+static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state);
+
+/*
  * Trapped FP/ASIMD access.
  */
 void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
@@ -72,44 +126,85 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
-	/* check if not kernel threads */
-	if (current->mm)
+	/*
+	 * Save the current FPSIMD state to memory, but only if whatever is in
+	 * the registers is in fact the most recent userland FPSIMD state of
+	 * 'current'.
+	 */
+	if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
-	if (next->mm)
-		fpsimd_load_state(&next->thread.fpsimd_state);
+
+	if (next->mm) {
+		/*
+		 * If we are switching to a task whose most recent userland
+		 * FPSIMD state is already in the registers of *this* cpu,
+		 * we can skip loading the state from memory. Otherwise, set
+		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
+		 * upon the next return to userland.
+		 */
+		struct fpsimd_state *st = &next->thread.fpsimd_state;
+
+		if (__this_cpu_read(fpsimd_last_state) == st
+		    && st->cpu == smp_processor_id())
+			clear_ti_thread_flag(task_thread_info(next),
+					     TIF_FOREIGN_FPSTATE);
+		else
+			set_ti_thread_flag(task_thread_info(next),
+					   TIF_FOREIGN_FPSTATE);
+	}
 }
 
 void fpsimd_flush_thread(void)
 {
-	preempt_disable();
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
-	fpsimd_load_state(&current->thread.fpsimd_state);
-	preempt_enable();
+	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
 /*
- * Save the userland FPSIMD state of 'current' to memory
+ * Save the userland FPSIMD state of 'current' to memory, but only if the state
+ * currently held in the registers does in fact belong to 'current'
  */
 void fpsimd_preserve_current_state(void)
 {
-	fpsimd_save_state(&current->thread.fpsimd_state);
+	preempt_disable();
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
+		fpsimd_save_state(&current->thread.fpsimd_state);
+	preempt_enable();
 }
 
 /*
- * Load the userland FPSIMD state of 'current' from memory
+ * Load the userland FPSIMD state of 'current' from memory, but only if the
+ * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
+ * state of 'current'
  */
 void fpsimd_restore_current_state(void)
 {
-	fpsimd_load_state(&current->thread.fpsimd_state);
+	preempt_disable();
+	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		struct fpsimd_state *st = &current->thread.fpsimd_state;
+
+		fpsimd_load_state(st);
+		this_cpu_write(fpsimd_last_state, st);
+		st->cpu = smp_processor_id();
+	}
+	preempt_enable();
 }
 
 /*
- * Load an updated userland FPSIMD state for 'current' from memory
+ * Load an updated userland FPSIMD state for 'current' from memory and set the
+ * flag that indicates that the FPSIMD register contents are the most recent
+ * FPSIMD state of 'current'
  */
 void fpsimd_update_current_state(struct fpsimd_state *state)
 {
 	preempt_disable();
 	fpsimd_load_state(state);
+	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		struct fpsimd_state *st = &current->thread.fpsimd_state;
+
+		this_cpu_write(fpsimd_last_state, st);
+		st->cpu = smp_processor_id();
+	}
 	preempt_enable();
 }
 
@@ -118,6 +213,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
  */
 void fpsimd_flush_task_state(struct task_struct *t)
 {
+	t->thread.fpsimd_state.cpu = NR_CPUS;
 }
 
 #ifdef CONFIG_KERNEL_MODE_NEON
@@ -131,16 +227,19 @@ void kernel_neon_begin(void)
 	BUG_ON(in_interrupt());
 	preempt_disable();
 
-	if (current->mm)
+	/*
+	 * Save the userland FPSIMD state if we have one and if we haven't done
+	 * so already. Clear fpsimd_last_state to indicate that there is no
+	 * longer userland FPSIMD state in the registers.
+	 */
+	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
+	this_cpu_write(fpsimd_last_state, NULL);
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
 void kernel_neon_end(void)
 {
-	if (current->mm)
-		fpsimd_load_state(&current->thread.fpsimd_state);
-
 	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
@@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 {
 	switch (cmd) {
 	case CPU_PM_ENTER:
-		if (current->mm)
+		if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
 			fpsimd_save_state(&current->thread.fpsimd_state);
 		break;
 	case CPU_PM_EXIT:
-		if (current->mm)
-			fpsimd_load_state(&current->thread.fpsimd_state);
+		set_thread_flag(TIF_FOREIGN_FPSTATE);
 		break;
 	case CPU_PM_ENTER_FAILED:
 	default:
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 06448a77ff53..882f01774365 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 	}
+
+	if (thread_flags & _TIF_FOREIGN_FPSTATE)
+		fpsimd_restore_current_state();
+
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h       | 15 ++++++++++++
 arch/arm64/include/asm/fpsimdmacros.h | 35 ++++++++++++++++++++++++++++
 arch/arm64/include/asm/neon.h         |  6 ++++-
 arch/arm64/kernel/entry-fpsimd.S      | 24 +++++++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 44 ++++++++++++++++++++++++-----------
 5 files changed, 109 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7a900142dbc8..05e1b24aca4c 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,17 @@ struct fpsimd_state {
 	unsigned int cpu;
 };
 
+/*
+ * Struct for stacking the bottom 'n' FP/SIMD registers.
+ */
+struct fpsimd_partial_state {
+	u32		num_regs;
+	u32		fpsr;
+	u32		fpcr;
+	__uint128_t	vregs[32] __aligned(16);
+} __aligned(16);
+
+
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /* Masks for extracting the FPSR and FPCR from the FPSCR */
 #define VFP_FPSCR_STAT_MASK	0xf800009f
@@ -66,6 +77,10 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+				      u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
+
 #endif
 
 #endif
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599c96bd..69e75134689d 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,38 @@
 	ldr	w\tmpnr, [\state, #16 * 2 + 4]
 	msr	fpcr, x\tmpnr
 .endm
+
+.altmacro
+.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
+	mrs	x\tmpnr1, fpsr
+	str	w\numnr, [\state]
+	mrs	x\tmpnr2, fpcr
+	stp	w\tmpnr1, w\tmpnr2, [\state, #4]
+	adr	x\tmpnr1, 0f
+	add	\state, \state, x\numnr, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+	.irp	qb, %(qa + 1)
+	stp	q\qa, q\qb, [\state, # -16 * \qa - 16]
+	.endr
+	.endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
+	ldp	w\tmpnr1, w\tmpnr2, [\state, #4]
+	msr	fpsr, x\tmpnr1
+	msr	fpcr, x\tmpnr2
+	adr	x\tmpnr1, 0f
+	ldr	w\tmpnr2, [\state]
+	add	\state, \state, x\tmpnr2, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+	.irp	qb, %(qa + 1)
+	ldp	q\qa, q\qb, [\state, # -16 * \qa - 16]
+	.endr
+	.endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a97780..13ce4cc18e26 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,11 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/types.h>
+
 #define cpu_has_neon()		(1)
 
-void kernel_neon_begin(void);
+#define kernel_neon_begin()	kernel_neon_begin_partial(32)
+
+void kernel_neon_begin_partial(u32 num_regs);
 void kernel_neon_end(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6dbfa6..d358ccacfc00 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
 	fpsimd_restore x0, 8
 	ret
 ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+	fpsimd_save_partial x0, 1, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+	fpsimd_restore_partial x0, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 6cfbb4ef27d7..82ebc259a787 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -218,29 +218,45 @@ void fpsimd_flush_task_state(struct task_struct *t)
 
 #ifdef CONFIG_KERNEL_MODE_NEON
 
+static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
+static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+
 /*
  * Kernel-side NEON support functions
  */
-void kernel_neon_begin(void)
+void kernel_neon_begin_partial(u32 num_regs)
 {
-	/* Avoid using the NEON in interrupt context */
-	BUG_ON(in_interrupt());
-	preempt_disable();
+	if (in_interrupt()) {
+		struct fpsimd_partial_state *s = this_cpu_ptr(
+			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
 
-	/*
-	 * Save the userland FPSIMD state if we have one and if we haven't done
-	 * so already. Clear fpsimd_last_state to indicate that there is no
-	 * longer userland FPSIMD state in the registers.
-	 */
-	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
-		fpsimd_save_state(&current->thread.fpsimd_state);
-	this_cpu_write(fpsimd_last_state, NULL);
+		BUG_ON(num_regs > 32);
+		fpsimd_save_partial_state(s, roundup(num_regs, 2));
+	} else {
+		/*
+		 * Save the userland FPSIMD state if we have one and if we
+		 * haven't done so already. Clear fpsimd_last_state to indicate
+		 * that there is no longer userland FPSIMD state in the
+		 * registers.
+		 */
+		preempt_disable();
+		if (current->mm &&
+		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+			fpsimd_save_state(&current->thread.fpsimd_state);
+		this_cpu_write(fpsimd_last_state, NULL);
+	}
 }
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(kernel_neon_begin_partial);
 
 void kernel_neon_end(void)
 {
-	preempt_enable();
+	if (in_interrupt()) {
+		struct fpsimd_partial_state *s = this_cpu_ptr(
+			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+		fpsimd_load_partial_state(s);
+	} else {
+		preempt_enable();
+	}
 }
 EXPORT_SYMBOL(kernel_neon_end);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h       | 15 ++++++++++++
 arch/arm64/include/asm/fpsimdmacros.h | 35 ++++++++++++++++++++++++++++
 arch/arm64/include/asm/neon.h         |  6 ++++-
 arch/arm64/kernel/entry-fpsimd.S      | 24 +++++++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 44 ++++++++++++++++++++++++-----------
 5 files changed, 109 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7a900142dbc8..05e1b24aca4c 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,17 @@ struct fpsimd_state {
 	unsigned int cpu;
 };
 
+/*
+ * Struct for stacking the bottom 'n' FP/SIMD registers.
+ */
+struct fpsimd_partial_state {
+	u32		num_regs;
+	u32		fpsr;
+	u32		fpcr;
+	__uint128_t	vregs[32] __aligned(16);
+} __aligned(16);
+
+
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /* Masks for extracting the FPSR and FPCR from the FPSCR */
 #define VFP_FPSCR_STAT_MASK	0xf800009f
@@ -66,6 +77,10 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+				      u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
+
 #endif
 
 #endif
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599c96bd..69e75134689d 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,38 @@
 	ldr	w\tmpnr, [\state, #16 * 2 + 4]
 	msr	fpcr, x\tmpnr
 .endm
+
+.altmacro
+.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
+	mrs	x\tmpnr1, fpsr
+	str	w\numnr, [\state]
+	mrs	x\tmpnr2, fpcr
+	stp	w\tmpnr1, w\tmpnr2, [\state, #4]
+	adr	x\tmpnr1, 0f
+	add	\state, \state, x\numnr, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+	.irp	qb, %(qa + 1)
+	stp	q\qa, q\qb, [\state, # -16 * \qa - 16]
+	.endr
+	.endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
+	ldp	w\tmpnr1, w\tmpnr2, [\state, #4]
+	msr	fpsr, x\tmpnr1
+	msr	fpcr, x\tmpnr2
+	adr	x\tmpnr1, 0f
+	ldr	w\tmpnr2, [\state]
+	add	\state, \state, x\tmpnr2, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+	.irp	qb, %(qa + 1)
+	ldp	q\qa, q\qb, [\state, # -16 * \qa - 16]
+	.endr
+	.endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a97780..13ce4cc18e26 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,11 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/types.h>
+
 #define cpu_has_neon()		(1)
 
-void kernel_neon_begin(void);
+#define kernel_neon_begin()	kernel_neon_begin_partial(32)
+
+void kernel_neon_begin_partial(u32 num_regs);
 void kernel_neon_end(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6dbfa6..d358ccacfc00 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
 	fpsimd_restore x0, 8
 	ret
 ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+	fpsimd_save_partial x0, 1, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+	fpsimd_restore_partial x0, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 6cfbb4ef27d7..82ebc259a787 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -218,29 +218,45 @@ void fpsimd_flush_task_state(struct task_struct *t)
 
 #ifdef CONFIG_KERNEL_MODE_NEON
 
+static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
+static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+
 /*
  * Kernel-side NEON support functions
  */
-void kernel_neon_begin(void)
+void kernel_neon_begin_partial(u32 num_regs)
 {
-	/* Avoid using the NEON in interrupt context */
-	BUG_ON(in_interrupt());
-	preempt_disable();
+	if (in_interrupt()) {
+		struct fpsimd_partial_state *s = this_cpu_ptr(
+			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
 
-	/*
-	 * Save the userland FPSIMD state if we have one and if we haven't done
-	 * so already. Clear fpsimd_last_state to indicate that there is no
-	 * longer userland FPSIMD state in the registers.
-	 */
-	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
-		fpsimd_save_state(&current->thread.fpsimd_state);
-	this_cpu_write(fpsimd_last_state, NULL);
+		BUG_ON(num_regs > 32);
+		fpsimd_save_partial_state(s, roundup(num_regs, 2));
+	} else {
+		/*
+		 * Save the userland FPSIMD state if we have one and if we
+		 * haven't done so already. Clear fpsimd_last_state to indicate
+		 * that there is no longer userland FPSIMD state in the
+		 * registers.
+		 */
+		preempt_disable();
+		if (current->mm &&
+		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+			fpsimd_save_state(&current->thread.fpsimd_state);
+		this_cpu_write(fpsimd_last_state, NULL);
+	}
 }
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(kernel_neon_begin_partial);
 
 void kernel_neon_end(void)
 {
-	preempt_enable();
+	if (in_interrupt()) {
+		struct fpsimd_partial_state *s = this_cpu_ptr(
+			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+		fpsimd_load_partial_state(s);
+	} else {
+		preempt_enable();
+	}
 }
 EXPORT_SYMBOL(kernel_neon_end);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 05/15] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that
have support for the SHA-1 part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig               |   3 +
 arch/arm64/Makefile              |   1 +
 arch/arm64/crypto/Kconfig        |  16 ++++
 arch/arm64/crypto/Makefile       |  12 +++
 arch/arm64/crypto/sha1-ce-core.S | 153 ++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/sha1-ce-glue.c | 174 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 359 insertions(+)
 create mode 100644 arch/arm64/crypto/Kconfig
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/sha1-ce-core.S
 create mode 100644 arch/arm64/crypto/sha1-ce-glue.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e6e4d3749a6e..1cefc6fe969a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -342,5 +342,8 @@ source "arch/arm64/Kconfig.debug"
 source "security/Kconfig"
 
 source "crypto/Kconfig"
+if CRYPTO
+source "arch/arm64/crypto/Kconfig"
+endif
 
 source "lib/Kconfig"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2fceb71ac3b7..8185a913c5ed 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -45,6 +45,7 @@ export	TEXT_OFFSET GZFLAGS
 core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
 libs-y		:= arch/arm64/lib/ $(libs-y)
 libs-y		+= $(LIBGCC)
 
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
new file mode 100644
index 000000000000..7956881b5986
--- /dev/null
+++ b/arch/arm64/crypto/Kconfig
@@ -0,0 +1,16 @@
+
+menuconfig ARM64_CRYPTO
+	bool "ARM64 Accelerated Cryptographic Algorithms"
+	depends on ARM64
+	help
+	  Say Y here to choose from a selection of cryptographic algorithms
+	  implemented using ARM64 specific CPU features or instructions.
+
+if ARM64_CRYPTO
+
+config CRYPTO_SHA1_ARM64_CE
+	tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
+endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..0ed3caaec81b
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,12 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
+sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
new file mode 100644
index 000000000000..bd4af29f2722
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -0,0 +1,153 @@
+/*
+ * sha1-ce-core.S - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.arch		armv8-a+crypto
+
+	k0		.req	v0
+	k1		.req	v1
+	k2		.req	v2
+	k3		.req	v3
+
+	t0		.req	v4
+	t1		.req	v5
+
+	dga		.req	q6
+	dgav		.req	v6
+	dgb		.req	s7
+	dgbv		.req	v7
+
+	dg0q		.req	q12
+	dg0s		.req	s12
+	dg0v		.req	v12
+	dg1s		.req	s13
+	dg1v		.req	v13
+	dg2s		.req	s14
+
+	.macro		add_only, op, ev, rc, s0, dg1
+	.ifc		\ev, ev
+	add		t1.4s, v\s0\().4s, \rc\().4s
+	sha1h		dg2s, dg0s
+	.ifnb		\dg1
+	sha1\op		dg0q, \dg1, t0.4s
+	.else
+	sha1\op		dg0q, dg1s, t0.4s
+	.endif
+	.else
+	.ifnb		\s0
+	add		t0.4s, v\s0\().4s, \rc\().4s
+	.endif
+	sha1h		dg1s, dg0s
+	sha1\op		dg0q, dg2s, t1.4s
+	.endif
+	.endm
+
+	.macro		add_update, op, ev, rc, s0, s1, s2, s3, dg1
+	sha1su0		v\s0\().4s, v\s1\().4s, v\s2\().4s
+	sha1su1		v\s0\().4s, v\s3\().4s
+	add_only	\op, \ev, \rc, \s1, \dg1
+	.endm
+
+	/*
+	 * The SHA1 round constants
+	 */
+	.align		4
+.Lsha1_rcon:
+	.word		0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
+
+	/*
+	 * void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+	 * 			  u8 *head, long bytes)
+	 */
+ENTRY(sha1_ce_transform)
+	/* load round constants */
+	adr		x6, .Lsha1_rcon
+	ld1r		{k0.4s}, [x6], #4
+	ld1r		{k1.4s}, [x6], #4
+	ld1r		{k2.4s}, [x6], #4
+	ld1r		{k3.4s}, [x6]
+
+	/* load state */
+	ldr		dga, [x2]
+	ldr		dgb, [x2, #16]
+
+	/* load partial state (if supplied) */
+	cbz		x3, 0f
+	ld1		{v8.4s-v11.4s}, [x3]
+	b		1f
+
+	/* load input */
+0:	ld1		{v8.4s-v11.4s}, [x1], #64
+	sub		w0, w0, #1
+
+1:
+CPU_LE(	rev32		v8.16b, v8.16b		)
+CPU_LE(	rev32		v9.16b, v9.16b		)
+CPU_LE(	rev32		v10.16b, v10.16b	)
+CPU_LE(	rev32		v11.16b, v11.16b	)
+
+2:	add		t0.4s, v8.4s, k0.4s
+	mov		dg0v.16b, dgav.16b
+
+	add_update	c, ev, k0,  8,  9, 10, 11, dgb
+	add_update	c, od, k0,  9, 10, 11,  8
+	add_update	c, ev, k0, 10, 11,  8,  9
+	add_update	c, od, k0, 11,  8,  9, 10
+	add_update	c, ev, k1,  8,  9, 10, 11
+
+	add_update	p, od, k1,  9, 10, 11,  8
+	add_update	p, ev, k1, 10, 11,  8,  9
+	add_update	p, od, k1, 11,  8,  9, 10
+	add_update	p, ev, k1,  8,  9, 10, 11
+	add_update	p, od, k2,  9, 10, 11,  8
+
+	add_update	m, ev, k2, 10, 11,  8,  9
+	add_update	m, od, k2, 11,  8,  9, 10
+	add_update	m, ev, k2,  8,  9, 10, 11
+	add_update	m, od, k2,  9, 10, 11,  8
+	add_update	m, ev, k3, 10, 11,  8,  9
+
+	add_update	p, od, k3, 11,  8,  9, 10
+	add_only	p, ev, k3,  9
+	add_only	p, od, k3, 10
+	add_only	p, ev, k3, 11
+	add_only	p, od
+
+	/* update state */
+	add		dgbv.2s, dgbv.2s, dg1v.2s
+	add		dgav.4s, dgav.4s, dg0v.4s
+
+	cbnz		w0, 0b
+
+	/*
+	 * Final block: add padding and total bit count.
+	 * Skip if we have no total byte count in x4. In that case, the input
+	 * size was not a round multiple of the block size, and the padding is
+	 * handled by the C code.
+	 */
+	cbz		x4, 3f
+	movi		v9.2d, #0
+	mov		x8, #0x80000000
+	movi		v10.2d, #0
+	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	fmov		d8, x8
+	mov		x4, #0
+	mov		v11.d[0], xzr
+	mov		v11.d[1], x7
+	b		2b
+
+	/* store new state */
+3:	str		dga, [x2]
+	str		dgb, [x2, #16]
+	ret
+ENDPROC(sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
new file mode 100644
index 000000000000..6fe83f37a750
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -0,0 +1,174 @@
+/*
+ * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+				  u8 *head, long bytes);
+
+static int sha1_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha1_state){
+		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+	};
+	return 0;
+}
+
+static int sha1_update(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if ((partial + len) >= SHA1_BLOCK_SIZE) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA1_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buffer + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA1_BLOCK_SIZE;
+		len %= SHA1_BLOCK_SIZE;
+
+		kernel_neon_begin_partial(16);
+		sha1_ce_transform(blocks, data, sctx->state,
+				  partial ? sctx->buffer : NULL, 0);
+		kernel_neon_end();
+
+		data += blocks * SHA1_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buffer + partial, data, len);
+	return 0;
+}
+
+static int sha1_final(struct shash_desc *desc, u8 *out)
+{
+	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
+
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be64 bits = cpu_to_be64(sctx->count << 3);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	u32 padlen = SHA1_BLOCK_SIZE
+		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
+
+	sha1_update(desc, padding, padlen);
+	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
+
+static int sha1_finup(struct shash_desc *desc, const u8 *data,
+		      unsigned int len, u8 *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int blocks;
+	int i;
+
+	if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
+		sha1_update(desc, data, len);
+		return sha1_final(desc, out);
+	}
+
+	/*
+	 * Use a fast path if the input is a multiple of 64 bytes. In
+	 * this case, there is no need to copy data around, and we can
+	 * perform the entire digest calculation in a single invocation
+	 * of sha1_ce_transform()
+	 */
+	blocks = len / SHA1_BLOCK_SIZE;
+
+	kernel_neon_begin_partial(16);
+	sha1_ce_transform(blocks, data, sctx->state, NULL, len);
+	kernel_neon_end();
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
+
+static int sha1_export(struct shash_desc *desc, void *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state *dst = out;
+
+	*dst = *sctx;
+	return 0;
+}
+
+static int sha1_import(struct shash_desc *desc, const void *in)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state const *src = in;
+
+	*sctx = *src;
+	return 0;
+}
+
+static struct shash_alg alg = {
+	.init			= sha1_init,
+	.update			= sha1_update,
+	.final			= sha1_final,
+	.finup			= sha1_finup,
+	.export			= sha1_export,
+	.import			= sha1_import,
+	.descsize		= sizeof(struct sha1_state),
+	.digestsize		= SHA1_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha1_state),
+	.base			= {
+		.cra_name		= "sha1",
+		.cra_driver_name	= "sha1-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA1_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+};
+
+static int __init sha1_ce_mod_init(void)
+{
+	return crypto_register_shash(&alg);
+}
+
+static void __exit sha1_ce_mod_fini(void)
+{
+	crypto_unregister_shash(&alg);
+}
+
+module_cpu_feature_match(SHA1, sha1_ce_mod_init);
+module_exit(sha1_ce_mod_fini);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 05/15] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that
have support for the SHA-1 part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig               |   3 +
 arch/arm64/Makefile              |   1 +
 arch/arm64/crypto/Kconfig        |  16 ++++
 arch/arm64/crypto/Makefile       |  12 +++
 arch/arm64/crypto/sha1-ce-core.S | 153 ++++++++++++++++++++++++++++++++++
 arch/arm64/crypto/sha1-ce-glue.c | 174 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 359 insertions(+)
 create mode 100644 arch/arm64/crypto/Kconfig
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/sha1-ce-core.S
 create mode 100644 arch/arm64/crypto/sha1-ce-glue.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e6e4d3749a6e..1cefc6fe969a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -342,5 +342,8 @@ source "arch/arm64/Kconfig.debug"
 source "security/Kconfig"
 
 source "crypto/Kconfig"
+if CRYPTO
+source "arch/arm64/crypto/Kconfig"
+endif
 
 source "lib/Kconfig"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2fceb71ac3b7..8185a913c5ed 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -45,6 +45,7 @@ export	TEXT_OFFSET GZFLAGS
 core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
 libs-y		:= arch/arm64/lib/ $(libs-y)
 libs-y		+= $(LIBGCC)
 
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
new file mode 100644
index 000000000000..7956881b5986
--- /dev/null
+++ b/arch/arm64/crypto/Kconfig
@@ -0,0 +1,16 @@
+
+menuconfig ARM64_CRYPTO
+	bool "ARM64 Accelerated Cryptographic Algorithms"
+	depends on ARM64
+	help
+	  Say Y here to choose from a selection of cryptographic algorithms
+	  implemented using ARM64 specific CPU features or instructions.
+
+if ARM64_CRYPTO
+
+config CRYPTO_SHA1_ARM64_CE
+	tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
+endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..0ed3caaec81b
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,12 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
+sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
new file mode 100644
index 000000000000..bd4af29f2722
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -0,0 +1,153 @@
+/*
+ * sha1-ce-core.S - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.arch		armv8-a+crypto
+
+	k0		.req	v0
+	k1		.req	v1
+	k2		.req	v2
+	k3		.req	v3
+
+	t0		.req	v4
+	t1		.req	v5
+
+	dga		.req	q6
+	dgav		.req	v6
+	dgb		.req	s7
+	dgbv		.req	v7
+
+	dg0q		.req	q12
+	dg0s		.req	s12
+	dg0v		.req	v12
+	dg1s		.req	s13
+	dg1v		.req	v13
+	dg2s		.req	s14
+
+	.macro		add_only, op, ev, rc, s0, dg1
+	.ifc		\ev, ev
+	add		t1.4s, v\s0\().4s, \rc\().4s
+	sha1h		dg2s, dg0s
+	.ifnb		\dg1
+	sha1\op		dg0q, \dg1, t0.4s
+	.else
+	sha1\op		dg0q, dg1s, t0.4s
+	.endif
+	.else
+	.ifnb		\s0
+	add		t0.4s, v\s0\().4s, \rc\().4s
+	.endif
+	sha1h		dg1s, dg0s
+	sha1\op		dg0q, dg2s, t1.4s
+	.endif
+	.endm
+
+	.macro		add_update, op, ev, rc, s0, s1, s2, s3, dg1
+	sha1su0		v\s0\().4s, v\s1\().4s, v\s2\().4s
+	sha1su1		v\s0\().4s, v\s3\().4s
+	add_only	\op, \ev, \rc, \s1, \dg1
+	.endm
+
+	/*
+	 * The SHA1 round constants
+	 */
+	.align		4
+.Lsha1_rcon:
+	.word		0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
+
+	/*
+	 * void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+	 * 			  u8 *head, long bytes)
+	 */
+ENTRY(sha1_ce_transform)
+	/* load round constants */
+	adr		x6, .Lsha1_rcon
+	ld1r		{k0.4s}, [x6], #4
+	ld1r		{k1.4s}, [x6], #4
+	ld1r		{k2.4s}, [x6], #4
+	ld1r		{k3.4s}, [x6]
+
+	/* load state */
+	ldr		dga, [x2]
+	ldr		dgb, [x2, #16]
+
+	/* load partial state (if supplied) */
+	cbz		x3, 0f
+	ld1		{v8.4s-v11.4s}, [x3]
+	b		1f
+
+	/* load input */
+0:	ld1		{v8.4s-v11.4s}, [x1], #64
+	sub		w0, w0, #1
+
+1:
+CPU_LE(	rev32		v8.16b, v8.16b		)
+CPU_LE(	rev32		v9.16b, v9.16b		)
+CPU_LE(	rev32		v10.16b, v10.16b	)
+CPU_LE(	rev32		v11.16b, v11.16b	)
+
+2:	add		t0.4s, v8.4s, k0.4s
+	mov		dg0v.16b, dgav.16b
+
+	add_update	c, ev, k0,  8,  9, 10, 11, dgb
+	add_update	c, od, k0,  9, 10, 11,  8
+	add_update	c, ev, k0, 10, 11,  8,  9
+	add_update	c, od, k0, 11,  8,  9, 10
+	add_update	c, ev, k1,  8,  9, 10, 11
+
+	add_update	p, od, k1,  9, 10, 11,  8
+	add_update	p, ev, k1, 10, 11,  8,  9
+	add_update	p, od, k1, 11,  8,  9, 10
+	add_update	p, ev, k1,  8,  9, 10, 11
+	add_update	p, od, k2,  9, 10, 11,  8
+
+	add_update	m, ev, k2, 10, 11,  8,  9
+	add_update	m, od, k2, 11,  8,  9, 10
+	add_update	m, ev, k2,  8,  9, 10, 11
+	add_update	m, od, k2,  9, 10, 11,  8
+	add_update	m, ev, k3, 10, 11,  8,  9
+
+	add_update	p, od, k3, 11,  8,  9, 10
+	add_only	p, ev, k3,  9
+	add_only	p, od, k3, 10
+	add_only	p, ev, k3, 11
+	add_only	p, od
+
+	/* update state */
+	add		dgbv.2s, dgbv.2s, dg1v.2s
+	add		dgav.4s, dgav.4s, dg0v.4s
+
+	cbnz		w0, 0b
+
+	/*
+	 * Final block: add padding and total bit count.
+	 * Skip if we have no total byte count in x4. In that case, the input
+	 * size was not a round multiple of the block size, and the padding is
+	 * handled by the C code.
+	 */
+	cbz		x4, 3f
+	movi		v9.2d, #0
+	mov		x8, #0x80000000
+	movi		v10.2d, #0
+	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	fmov		d8, x8
+	mov		x4, #0
+	mov		v11.d[0], xzr
+	mov		v11.d[1], x7
+	b		2b
+
+	/* store new state */
+3:	str		dga, [x2]
+	str		dgb, [x2, #16]
+	ret
+ENDPROC(sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
new file mode 100644
index 000000000000..6fe83f37a750
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -0,0 +1,174 @@
+/*
+ * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+				  u8 *head, long bytes);
+
+static int sha1_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha1_state){
+		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+	};
+	return 0;
+}
+
+static int sha1_update(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if ((partial + len) >= SHA1_BLOCK_SIZE) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA1_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buffer + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA1_BLOCK_SIZE;
+		len %= SHA1_BLOCK_SIZE;
+
+		kernel_neon_begin_partial(16);
+		sha1_ce_transform(blocks, data, sctx->state,
+				  partial ? sctx->buffer : NULL, 0);
+		kernel_neon_end();
+
+		data += blocks * SHA1_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buffer + partial, data, len);
+	return 0;
+}
+
+static int sha1_final(struct shash_desc *desc, u8 *out)
+{
+	static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
+
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be64 bits = cpu_to_be64(sctx->count << 3);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	u32 padlen = SHA1_BLOCK_SIZE
+		     - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
+
+	sha1_update(desc, padding, padlen);
+	sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
+
+static int sha1_finup(struct shash_desc *desc, const u8 *data,
+		      unsigned int len, u8 *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int blocks;
+	int i;
+
+	if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
+		sha1_update(desc, data, len);
+		return sha1_final(desc, out);
+	}
+
+	/*
+	 * Use a fast path if the input is a multiple of 64 bytes. In
+	 * this case, there is no need to copy data around, and we can
+	 * perform the entire digest calculation in a single invocation
+	 * of sha1_ce_transform()
+	 */
+	blocks = len / SHA1_BLOCK_SIZE;
+
+	kernel_neon_begin_partial(16);
+	sha1_ce_transform(blocks, data, sctx->state, NULL, len);
+	kernel_neon_end();
+
+	for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha1_state){};
+	return 0;
+}
+
+static int sha1_export(struct shash_desc *desc, void *out)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state *dst = out;
+
+	*dst = *sctx;
+	return 0;
+}
+
+static int sha1_import(struct shash_desc *desc, const void *in)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	struct sha1_state const *src = in;
+
+	*sctx = *src;
+	return 0;
+}
+
+static struct shash_alg alg = {
+	.init			= sha1_init,
+	.update			= sha1_update,
+	.final			= sha1_final,
+	.finup			= sha1_finup,
+	.export			= sha1_export,
+	.import			= sha1_import,
+	.descsize		= sizeof(struct sha1_state),
+	.digestsize		= SHA1_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha1_state),
+	.base			= {
+		.cra_name		= "sha1",
+		.cra_driver_name	= "sha1-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA1_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+};
+
+static int __init sha1_ce_mod_init(void)
+{
+	return crypto_register_shash(&alg);
+}
+
+static void __exit sha1_ce_mod_fini(void)
+{
+	crypto_unregister_shash(&alg);
+}
+
+module_cpu_feature_match(SHA1, sha1_ce_mod_init);
+module_exit(sha1_ce_mod_fini);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 06/15] arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms
for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |   5 +
 arch/arm64/crypto/Makefile       |   3 +
 arch/arm64/crypto/sha2-ce-core.S | 156 ++++++++++++++++++++++++
 arch/arm64/crypto/sha2-ce-glue.c | 254 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 418 insertions(+)
 create mode 100644 arch/arm64/crypto/sha2-ce-core.S
 create mode 100644 arch/arm64/crypto/sha2-ce-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 7956881b5986..eb1e99770c21 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -13,4 +13,9 @@ config CRYPTO_SHA1_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_SHA2_ARM64_CE
+	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0ed3caaec81b..0b3885a60d43 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -10,3 +10,6 @@
 
 obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
 sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
+
+obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
+sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
new file mode 100644
index 000000000000..53e750614169
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -0,0 +1,156 @@
+/*
+ * sha2-ce-core.S - core SHA-224/SHA-256 transform using v8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.arch		armv8-a+crypto
+
+	dga		.req	q20
+	dgav		.req	v20
+	dgb		.req	q21
+	dgbv		.req	v21
+
+	t0		.req	v22
+	t1		.req	v23
+
+	dg0q		.req	q24
+	dg0v		.req	v24
+	dg1q		.req	q25
+	dg1v		.req	v25
+	dg2q		.req	q26
+	dg2v		.req	v26
+
+	.macro		add_only, ev, rc, s0
+	mov		dg2v.16b, dg0v.16b
+	.ifeq		\ev
+	add		t1.4s, v\s0\().4s, \rc\().4s
+	sha256h		dg0q, dg1q, t0.4s
+	sha256h2	dg1q, dg2q, t0.4s
+	.else
+	.ifnb		\s0
+	add		t0.4s, v\s0\().4s, \rc\().4s
+	.endif
+	sha256h		dg0q, dg1q, t1.4s
+	sha256h2	dg1q, dg2q, t1.4s
+	.endif
+	.endm
+
+	.macro		add_update, ev, rc, s0, s1, s2, s3
+	sha256su0	v\s0\().4s, v\s1\().4s
+	sha256su1	v\s0\().4s, v\s2\().4s, v\s3\().4s
+	add_only	\ev, \rc, \s1
+	.endm
+
+	/*
+	 * The SHA-256 round constants
+	 */
+	.align		4
+.Lsha2_rcon:
+	.word		0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+	.word		0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+	.word		0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+	.word		0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+	.word		0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+	.word		0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+	.word		0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+	.word		0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+	.word		0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+	.word		0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+	.word		0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+	.word		0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+	.word		0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+	.word		0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+	.word		0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+	.word		0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+
+	/*
+	 * void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+	 *                        u8 *head, long bytes)
+	 */
+ENTRY(sha2_ce_transform)
+	/* load round constants */
+	adr		x8, .Lsha2_rcon
+	ld1		{ v0.4s- v3.4s}, [x8], #64
+	ld1		{ v4.4s- v7.4s}, [x8], #64
+	ld1		{ v8.4s-v11.4s}, [x8], #64
+	ld1		{v12.4s-v15.4s}, [x8]
+
+	/* load state */
+	ldp		dga, dgb, [x2]
+
+	/* load partial input (if supplied) */
+	cbz		x3, 0f
+	ld1		{v16.4s-v19.4s}, [x3]
+	b		1f
+
+	/* load input */
+0:	ld1		{v16.4s-v19.4s}, [x1], #64
+	sub		w0, w0, #1
+
+1:
+CPU_LE(	rev32		v16.16b, v16.16b	)
+CPU_LE(	rev32		v17.16b, v17.16b	)
+CPU_LE(	rev32		v18.16b, v18.16b	)
+CPU_LE(	rev32		v19.16b, v19.16b	)
+
+2:	add		t0.4s, v16.4s, v0.4s
+	mov		dg0v.16b, dgav.16b
+	mov		dg1v.16b, dgbv.16b
+
+	add_update	0,  v1, 16, 17, 18, 19
+	add_update	1,  v2, 17, 18, 19, 16
+	add_update	0,  v3, 18, 19, 16, 17
+	add_update	1,  v4, 19, 16, 17, 18
+
+	add_update	0,  v5, 16, 17, 18, 19
+	add_update	1,  v6, 17, 18, 19, 16
+	add_update	0,  v7, 18, 19, 16, 17
+	add_update	1,  v8, 19, 16, 17, 18
+
+	add_update	0,  v9, 16, 17, 18, 19
+	add_update	1, v10, 17, 18, 19, 16
+	add_update	0, v11, 18, 19, 16, 17
+	add_update	1, v12, 19, 16, 17, 18
+
+	add_only	0, v13, 17
+	add_only	1, v14, 18
+	add_only	0, v15, 19
+	add_only	1
+
+	/* update state */
+	add		dgav.4s, dgav.4s, dg0v.4s
+	add		dgbv.4s, dgbv.4s, dg1v.4s
+
+	/* handled all input blocks? */
+	cbnz		w0, 0b
+
+	/*
+	 * Final block: add padding and total bit count.
+	 * Skip if we have no total byte count in x4. In that case, the input
+	 * size was not a round multiple of the block size, and the padding is
+	 * handled by the C code.
+	 */
+	cbz		x4, 3f
+	movi		v17.2d, #0
+	mov		x8, #0x80000000
+	movi		v18.2d, #0
+	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	fmov		d16, x8
+	mov		x4, #0
+	mov		v19.d[0], xzr
+	mov		v19.d[1], x7
+	b		2b
+
+	/* store new state */
+3:	stp		dga, dgb, [x2]
+	ret
+ENDPROC(sha2_ce_transform)
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
new file mode 100644
index 000000000000..81617262b3df
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -0,0 +1,254 @@
+/*
+ * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+				 u8 *head, long bytes);
+
+static int sha224_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = {
+			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
+			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
+		}
+	};
+	return 0;
+}
+
+static int sha256_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = {
+			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
+		}
+	};
+	return 0;
+}
+
+static int sha2_update(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if ((partial + len) >= SHA256_BLOCK_SIZE) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA256_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA256_BLOCK_SIZE;
+		len %= SHA256_BLOCK_SIZE;
+
+		kernel_neon_begin_partial(28);
+		sha2_ce_transform(blocks, data, sctx->state,
+				  partial ? sctx->buf : NULL, 0);
+		kernel_neon_end();
+
+		data += blocks * SHA256_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+	return 0;
+}
+
+static void sha2_final(struct shash_desc *desc)
+{
+	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be64 bits = cpu_to_be64(sctx->count << 3);
+	u32 padlen = SHA256_BLOCK_SIZE
+		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
+
+	sha2_update(desc, padding, padlen);
+	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+}
+
+static int sha224_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_final(desc);
+
+	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_final(desc);
+
+	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static void sha2_finup(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	int blocks;
+
+	if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
+		sha2_update(desc, data, len);
+		sha2_final(desc);
+		return;
+	}
+
+	/*
+	 * Use a fast path if the input is a multiple of 64 bytes. In
+	 * this case, there is no need to copy data around, and we can
+	 * perform the entire digest calculation in a single invocation
+	 * of sha2_ce_transform()
+	 */
+	blocks = len / SHA256_BLOCK_SIZE;
+
+	kernel_neon_begin_partial(28);
+	sha2_ce_transform(blocks, data, sctx->state, NULL, len);
+	kernel_neon_end();
+}
+
+static int sha224_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_finup(desc, data, len);
+
+	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_finup(desc, data, len);
+
+	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha2_export(struct shash_desc *desc, void *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state *dst = out;
+
+	*dst = *sctx;
+	return 0;
+}
+
+static int sha2_import(struct shash_desc *desc, const void *in)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state const *src = in;
+
+	*sctx = *src;
+	return 0;
+}
+
+static struct shash_alg algs[] = { {
+	.init			= sha224_init,
+	.update			= sha2_update,
+	.final			= sha224_final,
+	.finup			= sha224_finup,
+	.export			= sha2_export,
+	.import			= sha2_import,
+	.descsize		= sizeof(struct sha256_state),
+	.digestsize		= SHA224_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha256_state),
+	.base			= {
+		.cra_name		= "sha224",
+		.cra_driver_name	= "sha224-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+}, {
+	.init			= sha256_init,
+	.update			= sha2_update,
+	.final			= sha256_final,
+	.finup			= sha256_finup,
+	.export			= sha2_export,
+	.import			= sha2_import,
+	.descsize		= sizeof(struct sha256_state),
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha256_state),
+	.base			= {
+		.cra_name		= "sha256",
+		.cra_driver_name	= "sha256-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+} };
+
+static int __init sha2_ce_mod_init(void)
+{
+	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha2_ce_mod_fini(void)
+{
+	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_cpu_feature_match(SHA2, sha2_ce_mod_init);
+module_exit(sha2_ce_mod_fini);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 06/15] arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms
for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig        |   5 +
 arch/arm64/crypto/Makefile       |   3 +
 arch/arm64/crypto/sha2-ce-core.S | 156 ++++++++++++++++++++++++
 arch/arm64/crypto/sha2-ce-glue.c | 254 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 418 insertions(+)
 create mode 100644 arch/arm64/crypto/sha2-ce-core.S
 create mode 100644 arch/arm64/crypto/sha2-ce-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 7956881b5986..eb1e99770c21 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -13,4 +13,9 @@ config CRYPTO_SHA1_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_SHA2_ARM64_CE
+	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0ed3caaec81b..0b3885a60d43 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -10,3 +10,6 @@
 
 obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
 sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
+
+obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
+sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
new file mode 100644
index 000000000000..53e750614169
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -0,0 +1,156 @@
+/*
+ * sha2-ce-core.S - core SHA-224/SHA-256 transform using v8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.text
+	.arch		armv8-a+crypto
+
+	dga		.req	q20
+	dgav		.req	v20
+	dgb		.req	q21
+	dgbv		.req	v21
+
+	t0		.req	v22
+	t1		.req	v23
+
+	dg0q		.req	q24
+	dg0v		.req	v24
+	dg1q		.req	q25
+	dg1v		.req	v25
+	dg2q		.req	q26
+	dg2v		.req	v26
+
+	.macro		add_only, ev, rc, s0
+	mov		dg2v.16b, dg0v.16b
+	.ifeq		\ev
+	add		t1.4s, v\s0\().4s, \rc\().4s
+	sha256h		dg0q, dg1q, t0.4s
+	sha256h2	dg1q, dg2q, t0.4s
+	.else
+	.ifnb		\s0
+	add		t0.4s, v\s0\().4s, \rc\().4s
+	.endif
+	sha256h		dg0q, dg1q, t1.4s
+	sha256h2	dg1q, dg2q, t1.4s
+	.endif
+	.endm
+
+	.macro		add_update, ev, rc, s0, s1, s2, s3
+	sha256su0	v\s0\().4s, v\s1\().4s
+	sha256su1	v\s0\().4s, v\s2\().4s, v\s3\().4s
+	add_only	\ev, \rc, \s1
+	.endm
+
+	/*
+	 * The SHA-256 round constants
+	 */
+	.align		4
+.Lsha2_rcon:
+	.word		0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+	.word		0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+	.word		0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+	.word		0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+	.word		0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+	.word		0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+	.word		0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+	.word		0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+	.word		0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+	.word		0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+	.word		0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+	.word		0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+	.word		0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+	.word		0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+	.word		0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+	.word		0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+
+	/*
+	 * void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+	 *                        u8 *head, long bytes)
+	 */
+ENTRY(sha2_ce_transform)
+	/* load round constants */
+	adr		x8, .Lsha2_rcon
+	ld1		{ v0.4s- v3.4s}, [x8], #64
+	ld1		{ v4.4s- v7.4s}, [x8], #64
+	ld1		{ v8.4s-v11.4s}, [x8], #64
+	ld1		{v12.4s-v15.4s}, [x8]
+
+	/* load state */
+	ldp		dga, dgb, [x2]
+
+	/* load partial input (if supplied) */
+	cbz		x3, 0f
+	ld1		{v16.4s-v19.4s}, [x3]
+	b		1f
+
+	/* load input */
+0:	ld1		{v16.4s-v19.4s}, [x1], #64
+	sub		w0, w0, #1
+
+1:
+CPU_LE(	rev32		v16.16b, v16.16b	)
+CPU_LE(	rev32		v17.16b, v17.16b	)
+CPU_LE(	rev32		v18.16b, v18.16b	)
+CPU_LE(	rev32		v19.16b, v19.16b	)
+
+2:	add		t0.4s, v16.4s, v0.4s
+	mov		dg0v.16b, dgav.16b
+	mov		dg1v.16b, dgbv.16b
+
+	add_update	0,  v1, 16, 17, 18, 19
+	add_update	1,  v2, 17, 18, 19, 16
+	add_update	0,  v3, 18, 19, 16, 17
+	add_update	1,  v4, 19, 16, 17, 18
+
+	add_update	0,  v5, 16, 17, 18, 19
+	add_update	1,  v6, 17, 18, 19, 16
+	add_update	0,  v7, 18, 19, 16, 17
+	add_update	1,  v8, 19, 16, 17, 18
+
+	add_update	0,  v9, 16, 17, 18, 19
+	add_update	1, v10, 17, 18, 19, 16
+	add_update	0, v11, 18, 19, 16, 17
+	add_update	1, v12, 19, 16, 17, 18
+
+	add_only	0, v13, 17
+	add_only	1, v14, 18
+	add_only	0, v15, 19
+	add_only	1
+
+	/* update state */
+	add		dgav.4s, dgav.4s, dg0v.4s
+	add		dgbv.4s, dgbv.4s, dg1v.4s
+
+	/* handled all input blocks? */
+	cbnz		w0, 0b
+
+	/*
+	 * Final block: add padding and total bit count.
+	 * Skip if we have no total byte count in x4. In that case, the input
+	 * size was not a round multiple of the block size, and the padding is
+	 * handled by the C code.
+	 */
+	cbz		x4, 3f
+	movi		v17.2d, #0
+	mov		x8, #0x80000000
+	movi		v18.2d, #0
+	ror		x7, x4, #29		// ror(lsl(x4, 3), 32)
+	fmov		d16, x8
+	mov		x4, #0
+	mov		v19.d[0], xzr
+	mov		v19.d[1], x7
+	b		2b
+
+	/* store new state */
+3:	stp		dga, dgb, [x2]
+	ret
+ENDPROC(sha2_ce_transform)
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
new file mode 100644
index 000000000000..81617262b3df
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -0,0 +1,254 @@
+/*
+ * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+				 u8 *head, long bytes);
+
+static int sha224_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = {
+			SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
+			SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
+		}
+	};
+	return 0;
+}
+
+static int sha256_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = {
+			SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+			SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
+		}
+	};
+	return 0;
+}
+
+static int sha2_update(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+	sctx->count += len;
+
+	if ((partial + len) >= SHA256_BLOCK_SIZE) {
+		int blocks;
+
+		if (partial) {
+			int p = SHA256_BLOCK_SIZE - partial;
+
+			memcpy(sctx->buf + partial, data, p);
+			data += p;
+			len -= p;
+		}
+
+		blocks = len / SHA256_BLOCK_SIZE;
+		len %= SHA256_BLOCK_SIZE;
+
+		kernel_neon_begin_partial(28);
+		sha2_ce_transform(blocks, data, sctx->state,
+				  partial ? sctx->buf : NULL, 0);
+		kernel_neon_end();
+
+		data += blocks * SHA256_BLOCK_SIZE;
+		partial = 0;
+	}
+	if (len)
+		memcpy(sctx->buf + partial, data, len);
+	return 0;
+}
+
+static void sha2_final(struct shash_desc *desc)
+{
+	static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be64 bits = cpu_to_be64(sctx->count << 3);
+	u32 padlen = SHA256_BLOCK_SIZE
+		     - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
+
+	sha2_update(desc, padding, padlen);
+	sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+}
+
+static int sha224_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_final(desc);
+
+	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_final(desc);
+
+	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static void sha2_finup(struct shash_desc *desc, const u8 *data,
+		       unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	int blocks;
+
+	if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
+		sha2_update(desc, data, len);
+		sha2_final(desc);
+		return;
+	}
+
+	/*
+	 * Use a fast path if the input is a multiple of 64 bytes. In
+	 * this case, there is no need to copy data around, and we can
+	 * perform the entire digest calculation in a single invocation
+	 * of sha2_ce_transform()
+	 */
+	blocks = len / SHA256_BLOCK_SIZE;
+
+	kernel_neon_begin_partial(28);
+	sha2_ce_transform(blocks, data, sctx->state, NULL, len);
+	kernel_neon_end();
+}
+
+static int sha224_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_finup(desc, data, len);
+
+	for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	__be32 *dst = (__be32 *)out;
+	int i;
+
+	sha2_finup(desc, data, len);
+
+	for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+		put_unaligned_be32(sctx->state[i], dst++);
+
+	*sctx = (struct sha256_state){};
+	return 0;
+}
+
+static int sha2_export(struct shash_desc *desc, void *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state *dst = out;
+
+	*dst = *sctx;
+	return 0;
+}
+
+static int sha2_import(struct shash_desc *desc, const void *in)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	struct sha256_state const *src = in;
+
+	*sctx = *src;
+	return 0;
+}
+
+static struct shash_alg algs[] = { {
+	.init			= sha224_init,
+	.update			= sha2_update,
+	.final			= sha224_final,
+	.finup			= sha224_finup,
+	.export			= sha2_export,
+	.import			= sha2_import,
+	.descsize		= sizeof(struct sha256_state),
+	.digestsize		= SHA224_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha256_state),
+	.base			= {
+		.cra_name		= "sha224",
+		.cra_driver_name	= "sha224-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+}, {
+	.init			= sha256_init,
+	.update			= sha2_update,
+	.final			= sha256_final,
+	.finup			= sha256_finup,
+	.export			= sha2_export,
+	.import			= sha2_import,
+	.descsize		= sizeof(struct sha256_state),
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.statesize		= sizeof(struct sha256_state),
+	.base			= {
+		.cra_name		= "sha256",
+		.cra_driver_name	= "sha256-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	}
+} };
+
+static int __init sha2_ce_mod_init(void)
+{
+	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha2_ce_mod_fini(void)
+{
+	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_cpu_feature_match(SHA2, sha2_ce_mod_init);
+module_exit(sha2_ce_mod_fini);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 07/15] arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |   6 ++
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/ghash-ce-core.S |  95 +++++++++++++++++++++++
 arch/arm64/crypto/ghash-ce-glue.c | 155 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 259 insertions(+)
 create mode 100644 arch/arm64/crypto/ghash-ce-core.S
 create mode 100644 arch/arm64/crypto/ghash-ce-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index eb1e99770c21..0c50859ee7b9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,4 +18,10 @@ config CRYPTO_SHA2_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+
+config CRYPTO_GHASH_ARM64_CE
+	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0b3885a60d43..e8c81a068868 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -13,3 +13,6 @@ sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
 
 obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
 sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
+
+obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
new file mode 100644
index 000000000000..b9e6eaf41c9b
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -0,0 +1,95 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * Based on arch/x86/crypto/ghash-pmullni-intel_asm.S
+ *
+ * Copyright (c) 2009 Intel Corp.
+ *   Author: Huang Ying <ying.huang@intel.com>
+ *           Vinodh Gopal
+ *           Erdinc Ozturk
+ *           Deniz Karakoyunlu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	DATA	.req	v0
+	SHASH	.req	v1
+	IN1	.req	v2
+	T1	.req	v2
+	T2	.req	v3
+	T3	.req	v4
+	VZR	.req	v5
+
+	.text
+	.arch		armv8-a+crypto
+
+	/*
+	 * void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+	 *			   struct ghash_key const *k, const char *head)
+	 */
+ENTRY(pmull_ghash_update)
+	ld1		{DATA.16b}, [x1]
+	ld1		{SHASH.16b}, [x3]
+	eor		VZR.16b, VZR.16b, VZR.16b
+
+	/* do the head block first, if supplied */
+	cbz		x4, 0f
+	ld1		{IN1.2d}, [x4]
+	b		1f
+
+0:	ld1		{IN1.2d}, [x2], #16
+	sub		w0, w0, #1
+1:	ext		IN1.16b, IN1.16b, IN1.16b, #8
+CPU_LE(	rev64		IN1.16b, IN1.16b	)
+	eor		DATA.16b, DATA.16b, IN1.16b
+
+	/* multiply DATA by SHASH in GF(2^128) */
+	ext		T2.16b, DATA.16b, DATA.16b, #8
+	ext		T3.16b, SHASH.16b, SHASH.16b, #8
+	eor		T2.16b, T2.16b, DATA.16b
+	eor		T3.16b, T3.16b, SHASH.16b
+
+	pmull2		T1.1q, SHASH.2d, DATA.2d	// a1 * b1
+	pmull		DATA.1q, SHASH.1d, DATA.1d	// a0 * b0
+	pmull		T2.1q, T2.1d, T3.1d		// (a1 + a0)(b1 + b0)
+	eor		T2.16b, T2.16b, T1.16b		// (a0 * b1) + (a1 * b0)
+	eor		T2.16b, T2.16b, DATA.16b
+
+	ext		T3.16b, VZR.16b, T2.16b, #8
+	ext		T2.16b, T2.16b, VZR.16b, #8
+	eor		DATA.16b, DATA.16b, T3.16b
+	eor		T1.16b, T1.16b, T2.16b	// <T1:DATA> is result of
+						// carry-less multiplication
+
+	/* first phase of the reduction */
+	shl		T3.2d, DATA.2d, #1
+	eor		T3.16b, T3.16b, DATA.16b
+	shl		T3.2d, T3.2d, #5
+	eor		T3.16b, T3.16b, DATA.16b
+	shl		T3.2d, T3.2d, #57
+	ext		T2.16b, VZR.16b, T3.16b, #8
+	ext		T3.16b, T3.16b, VZR.16b, #8
+	eor		DATA.16b, DATA.16b, T2.16b
+	eor		T1.16b, T1.16b, T3.16b
+
+	/* second phase of the reduction */
+	ushr		T2.2d, DATA.2d, #5
+	eor		T2.16b, T2.16b, DATA.16b
+	ushr		T2.2d, T2.2d, #1
+	eor		T2.16b, T2.16b, DATA.16b
+	ushr		T2.2d, T2.2d, #1
+	eor		T1.16b, T1.16b, T2.16b
+	eor		DATA.16b, DATA.16b, T1.16b
+
+	cbnz		w0, 0b
+
+	st1		{DATA.16b}, [x1]
+	ret
+ENDPROC(pmull_ghash_update)
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
new file mode 100644
index 000000000000..b92baf3f68c7
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -0,0 +1,155 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("GHASH secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+#define GHASH_BLOCK_SIZE	16
+#define GHASH_DIGEST_SIZE	16
+
+struct ghash_key {
+	u64 a;
+	u64 b;
+};
+
+struct ghash_desc_ctx {
+	u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
+	u8 buf[GHASH_BLOCK_SIZE];
+	u32 count;
+};
+
+asmlinkage void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+				   struct ghash_key const *k, const char *head);
+
+static int ghash_init(struct shash_desc *desc)
+{
+	struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	*ctx = (struct ghash_desc_ctx){};
+	return 0;
+}
+
+static int ghash_update(struct shash_desc *desc, const u8 *src,
+			unsigned int len)
+{
+	struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+	unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+	ctx->count += len;
+
+	if ((partial + len) >= GHASH_BLOCK_SIZE) {
+		struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+		int blocks;
+
+		if (partial) {
+			int p = GHASH_BLOCK_SIZE - partial;
+
+			memcpy(ctx->buf + partial, src, p);
+			src += p;
+			len -= p;
+		}
+
+		blocks = len / GHASH_BLOCK_SIZE;
+		len %= GHASH_BLOCK_SIZE;
+
+		kernel_neon_begin_partial(6);
+		pmull_ghash_update(blocks, ctx->digest, src, key,
+				   partial ? ctx->buf : NULL);
+		kernel_neon_end();
+		src += blocks * GHASH_BLOCK_SIZE;
+	}
+	if (len)
+		memcpy(ctx->buf + partial, src, len);
+	return 0;
+}
+
+static int ghash_final(struct shash_desc *desc, u8 *dst)
+{
+	struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+	unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+	if (partial) {
+		struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+
+		memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
+
+		kernel_neon_begin_partial(6);
+		pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
+		kernel_neon_end();
+	}
+	put_unaligned_be64(ctx->digest[1], dst);
+	put_unaligned_be64(ctx->digest[0], dst + 8);
+
+	*ctx = (struct ghash_desc_ctx){};
+	return 0;
+}
+
+static int ghash_setkey(struct crypto_shash *tfm,
+			const u8 *inkey, unsigned int keylen)
+{
+	struct ghash_key *key = crypto_shash_ctx(tfm);
+	u64 a, b;
+
+	if (keylen != GHASH_BLOCK_SIZE) {
+		crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	/* perform multiplication by 'x' in GF(2^128) */
+	b = get_unaligned_be64(inkey);
+	a = get_unaligned_be64(inkey + 8);
+
+	key->a = (a << 1) | (b >> 63);
+	key->b = (b << 1) | (a >> 63);
+
+	if (b >> 63)
+		key->b ^= 0xc200000000000000UL;
+
+	return 0;
+}
+
+static struct shash_alg ghash_alg = {
+	.digestsize	= GHASH_DIGEST_SIZE,
+	.init		= ghash_init,
+	.update		= ghash_update,
+	.final		= ghash_final,
+	.setkey		= ghash_setkey,
+	.descsize	= sizeof(struct ghash_desc_ctx),
+	.base		= {
+		.cra_name		= "ghash",
+		.cra_driver_name	= "ghash-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= GHASH_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct ghash_key),
+		.cra_module		= THIS_MODULE,
+	},
+};
+
+static int __init ghash_ce_mod_init(void)
+{
+	return crypto_register_shash(&ghash_alg);
+}
+
+static void __exit ghash_ce_mod_exit(void)
+{
+	crypto_unregister_shash(&ghash_alg);
+}
+
+module_cpu_feature_match(PMULL, ghash_ce_mod_init);
+module_exit(ghash_ce_mod_exit);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 07/15] arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |   6 ++
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/ghash-ce-core.S |  95 +++++++++++++++++++++++
 arch/arm64/crypto/ghash-ce-glue.c | 155 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 259 insertions(+)
 create mode 100644 arch/arm64/crypto/ghash-ce-core.S
 create mode 100644 arch/arm64/crypto/ghash-ce-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index eb1e99770c21..0c50859ee7b9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,4 +18,10 @@ config CRYPTO_SHA2_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+
+config CRYPTO_GHASH_ARM64_CE
+	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0b3885a60d43..e8c81a068868 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -13,3 +13,6 @@ sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
 
 obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
 sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
+
+obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
new file mode 100644
index 000000000000..b9e6eaf41c9b
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -0,0 +1,95 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * Based on arch/x86/crypto/ghash-pmullni-intel_asm.S
+ *
+ * Copyright (c) 2009 Intel Corp.
+ *   Author: Huang Ying <ying.huang@intel.com>
+ *           Vinodh Gopal
+ *           Erdinc Ozturk
+ *           Deniz Karakoyunlu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	DATA	.req	v0
+	SHASH	.req	v1
+	IN1	.req	v2
+	T1	.req	v2
+	T2	.req	v3
+	T3	.req	v4
+	VZR	.req	v5
+
+	.text
+	.arch		armv8-a+crypto
+
+	/*
+	 * void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+	 *			   struct ghash_key const *k, const char *head)
+	 */
+ENTRY(pmull_ghash_update)
+	ld1		{DATA.16b}, [x1]
+	ld1		{SHASH.16b}, [x3]
+	eor		VZR.16b, VZR.16b, VZR.16b
+
+	/* do the head block first, if supplied */
+	cbz		x4, 0f
+	ld1		{IN1.2d}, [x4]
+	b		1f
+
+0:	ld1		{IN1.2d}, [x2], #16
+	sub		w0, w0, #1
+1:	ext		IN1.16b, IN1.16b, IN1.16b, #8
+CPU_LE(	rev64		IN1.16b, IN1.16b	)
+	eor		DATA.16b, DATA.16b, IN1.16b
+
+	/* multiply DATA by SHASH in GF(2^128) */
+	ext		T2.16b, DATA.16b, DATA.16b, #8
+	ext		T3.16b, SHASH.16b, SHASH.16b, #8
+	eor		T2.16b, T2.16b, DATA.16b
+	eor		T3.16b, T3.16b, SHASH.16b
+
+	pmull2		T1.1q, SHASH.2d, DATA.2d	// a1 * b1
+	pmull		DATA.1q, SHASH.1d, DATA.1d	// a0 * b0
+	pmull		T2.1q, T2.1d, T3.1d		// (a1 + a0)(b1 + b0)
+	eor		T2.16b, T2.16b, T1.16b		// (a0 * b1) + (a1 * b0)
+	eor		T2.16b, T2.16b, DATA.16b
+
+	ext		T3.16b, VZR.16b, T2.16b, #8
+	ext		T2.16b, T2.16b, VZR.16b, #8
+	eor		DATA.16b, DATA.16b, T3.16b
+	eor		T1.16b, T1.16b, T2.16b	// <T1:DATA> is result of
+						// carry-less multiplication
+
+	/* first phase of the reduction */
+	shl		T3.2d, DATA.2d, #1
+	eor		T3.16b, T3.16b, DATA.16b
+	shl		T3.2d, T3.2d, #5
+	eor		T3.16b, T3.16b, DATA.16b
+	shl		T3.2d, T3.2d, #57
+	ext		T2.16b, VZR.16b, T3.16b, #8
+	ext		T3.16b, T3.16b, VZR.16b, #8
+	eor		DATA.16b, DATA.16b, T2.16b
+	eor		T1.16b, T1.16b, T3.16b
+
+	/* second phase of the reduction */
+	ushr		T2.2d, DATA.2d, #5
+	eor		T2.16b, T2.16b, DATA.16b
+	ushr		T2.2d, T2.2d, #1
+	eor		T2.16b, T2.16b, DATA.16b
+	ushr		T2.2d, T2.2d, #1
+	eor		T1.16b, T1.16b, T2.16b
+	eor		DATA.16b, DATA.16b, T1.16b
+
+	cbnz		w0, 0b
+
+	st1		{DATA.16b}, [x1]
+	ret
+ENDPROC(pmull_ghash_update)
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
new file mode 100644
index 000000000000..b92baf3f68c7
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -0,0 +1,155 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("GHASH secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+#define GHASH_BLOCK_SIZE	16
+#define GHASH_DIGEST_SIZE	16
+
+struct ghash_key {
+	u64 a;
+	u64 b;
+};
+
+struct ghash_desc_ctx {
+	u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
+	u8 buf[GHASH_BLOCK_SIZE];
+	u32 count;
+};
+
+asmlinkage void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+				   struct ghash_key const *k, const char *head);
+
+static int ghash_init(struct shash_desc *desc)
+{
+	struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	*ctx = (struct ghash_desc_ctx){};
+	return 0;
+}
+
+static int ghash_update(struct shash_desc *desc, const u8 *src,
+			unsigned int len)
+{
+	struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+	unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+	ctx->count += len;
+
+	if ((partial + len) >= GHASH_BLOCK_SIZE) {
+		struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+		int blocks;
+
+		if (partial) {
+			int p = GHASH_BLOCK_SIZE - partial;
+
+			memcpy(ctx->buf + partial, src, p);
+			src += p;
+			len -= p;
+		}
+
+		blocks = len / GHASH_BLOCK_SIZE;
+		len %= GHASH_BLOCK_SIZE;
+
+		kernel_neon_begin_partial(6);
+		pmull_ghash_update(blocks, ctx->digest, src, key,
+				   partial ? ctx->buf : NULL);
+		kernel_neon_end();
+		src += blocks * GHASH_BLOCK_SIZE;
+	}
+	if (len)
+		memcpy(ctx->buf + partial, src, len);
+	return 0;
+}
+
+static int ghash_final(struct shash_desc *desc, u8 *dst)
+{
+	struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+	unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+	if (partial) {
+		struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+
+		memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
+
+		kernel_neon_begin_partial(6);
+		pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
+		kernel_neon_end();
+	}
+	put_unaligned_be64(ctx->digest[1], dst);
+	put_unaligned_be64(ctx->digest[0], dst + 8);
+
+	*ctx = (struct ghash_desc_ctx){};
+	return 0;
+}
+
+static int ghash_setkey(struct crypto_shash *tfm,
+			const u8 *inkey, unsigned int keylen)
+{
+	struct ghash_key *key = crypto_shash_ctx(tfm);
+	u64 a, b;
+
+	if (keylen != GHASH_BLOCK_SIZE) {
+		crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	/* perform multiplication by 'x' in GF(2^128) */
+	b = get_unaligned_be64(inkey);
+	a = get_unaligned_be64(inkey + 8);
+
+	key->a = (a << 1) | (b >> 63);
+	key->b = (b << 1) | (a >> 63);
+
+	if (b >> 63)
+		key->b ^= 0xc200000000000000UL;
+
+	return 0;
+}
+
+static struct shash_alg ghash_alg = {
+	.digestsize	= GHASH_DIGEST_SIZE,
+	.init		= ghash_init,
+	.update		= ghash_update,
+	.final		= ghash_final,
+	.setkey		= ghash_setkey,
+	.descsize	= sizeof(struct ghash_desc_ctx),
+	.base		= {
+		.cra_name		= "ghash",
+		.cra_driver_name	= "ghash-ce",
+		.cra_priority		= 200,
+		.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+		.cra_blocksize		= GHASH_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct ghash_key),
+		.cra_module		= THIS_MODULE,
+	},
+};
+
+static int __init ghash_ce_mod_init(void)
+{
+	return crypto_register_shash(&ghash_alg);
+}
+
+static void __exit ghash_ce_mod_exit(void)
+{
+	crypto_unregister_shash(&ghash_alg);
+}
+
+module_cpu_feature_match(PMULL, ghash_ce_mod_init);
+module_exit(ghash_ce_mod_exit);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 08/15] arm64/crypto: AES using ARMv8 Crypto Extensions
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This patch adds support for the AES symmetric encryption algorithm for CPUs
that have support for the AES part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |   7 +-
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/aes-ce-cipher.c | 155 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/crypto/aes-ce-cipher.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 0c50859ee7b9..9ba32c0da871 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,10 +18,15 @@ config CRYPTO_SHA2_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
-
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_AES_ARM64_CE
+	tristate "AES core cipher using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_ALGAPI
+	select CRYPTO_AES
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index e8c81a068868..908abd9242b1 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -16,3 +16,6 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
 
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
 ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
new file mode 100644
index 000000000000..2075e1acae6b
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -0,0 +1,155 @@
+/*
+ * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+struct aes_block {
+	u8 b[AES_BLOCK_SIZE];
+};
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+	/*
+	 * # of rounds specified by AES:
+	 * 128 bit key		10 rounds
+	 * 192 bit key		12 rounds
+	 * 256 bit key		14 rounds
+	 * => n byte key	=> 6 + (n/4) rounds
+	 */
+	return 6 + ctx->key_length / 4;
+}
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct aes_block *out = (struct aes_block *)dst;
+	struct aes_block const *in = (struct aes_block *)src;
+	void *dummy0;
+	int dummy1;
+
+	kernel_neon_begin_partial(4);
+
+	__asm__("	ld1	{v0.16b}, %[in]			;"
+		"	ld1	{v1.2d}, [%[key]], #16		;"
+		"	cmp	%w[rounds], #10			;"
+		"	bmi	0f				;"
+		"	bne	3f				;"
+		"	mov	v3.16b, v1.16b			;"
+		"	b	2f				;"
+		"0:	mov	v2.16b, v1.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"1:	aese	v0.16b, v2.16b			;"
+		"	aesmc	v0.16b, v0.16b			;"
+		"2:	ld1	{v1.2d}, [%[key]], #16		;"
+		"	aese	v0.16b, v3.16b			;"
+		"	aesmc	v0.16b, v0.16b			;"
+		"3:	ld1	{v2.2d}, [%[key]], #16		;"
+		"	subs	%w[rounds], %w[rounds], #3	;"
+		"	aese	v0.16b, v1.16b			;"
+		"	aesmc	v0.16b, v0.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"	bpl	1b				;"
+		"	aese	v0.16b, v2.16b			;"
+		"	eor	v0.16b, v0.16b, v3.16b		;"
+		"	st1	{v0.16b}, %[out]		;"
+
+	:	[out]		"=Q"(*out),
+		[key]		"=r"(dummy0),
+		[rounds]	"=r"(dummy1)
+	:	[in]		"Q"(*in),
+				"1"(ctx->key_enc),
+				"2"(num_rounds(ctx) - 2)
+	:	"cc");
+
+	kernel_neon_end();
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct aes_block *out = (struct aes_block *)dst;
+	struct aes_block const *in = (struct aes_block *)src;
+	void *dummy0;
+	int dummy1;
+
+	kernel_neon_begin_partial(4);
+
+	__asm__("	ld1	{v0.16b}, %[in]			;"
+		"	ld1	{v1.2d}, [%[key]], #16		;"
+		"	cmp	%w[rounds], #10			;"
+		"	bmi	0f				;"
+		"	bne	3f				;"
+		"	mov	v3.16b, v1.16b			;"
+		"	b	2f				;"
+		"0:	mov	v2.16b, v1.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"1:	aesd	v0.16b, v2.16b			;"
+		"	aesimc	v0.16b, v0.16b			;"
+		"2:	ld1	{v1.2d}, [%[key]], #16		;"
+		"	aesd	v0.16b, v3.16b			;"
+		"	aesimc	v0.16b, v0.16b			;"
+		"3:	ld1	{v2.2d}, [%[key]], #16		;"
+		"	subs	%w[rounds], %w[rounds], #3	;"
+		"	aesd	v0.16b, v1.16b			;"
+		"	aesimc	v0.16b, v0.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"	bpl	1b				;"
+		"	aesd	v0.16b, v2.16b			;"
+		"	eor	v0.16b, v0.16b, v3.16b		;"
+		"	st1	{v0.16b}, %[out]		;"
+
+	:	[out]		"=Q"(*out),
+		[key]		"=r"(dummy0),
+		[rounds]	"=r"(dummy1)
+	:	[in]		"Q"(*in),
+				"1"(ctx->key_dec),
+				"2"(num_rounds(ctx) - 2)
+	:	"cc");
+
+	kernel_neon_end();
+}
+
+static struct crypto_alg aes_alg = {
+	.cra_name		= "aes",
+	.cra_driver_name	= "aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_cipher = {
+		.cia_min_keysize	= AES_MIN_KEY_SIZE,
+		.cia_max_keysize	= AES_MAX_KEY_SIZE,
+		.cia_setkey		= crypto_aes_set_key,
+		.cia_encrypt		= aes_cipher_encrypt,
+		.cia_decrypt		= aes_cipher_decrypt
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&aes_alg);
+}
+
+module_cpu_feature_match(AES, aes_mod_init);
+module_exit(aes_mod_exit);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 08/15] arm64/crypto: AES using ARMv8 Crypto Extensions
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the AES symmetric encryption algorithm for CPUs
that have support for the AES part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig         |   7 +-
 arch/arm64/crypto/Makefile        |   3 +
 arch/arm64/crypto/aes-ce-cipher.c | 155 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/crypto/aes-ce-cipher.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 0c50859ee7b9..9ba32c0da871 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,10 +18,15 @@ config CRYPTO_SHA2_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
-
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_AES_ARM64_CE
+	tristate "AES core cipher using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_ALGAPI
+	select CRYPTO_AES
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index e8c81a068868..908abd9242b1 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -16,3 +16,6 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
 
 obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
 ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
new file mode 100644
index 000000000000..2075e1acae6b
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -0,0 +1,155 @@
+/*
+ * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+struct aes_block {
+	u8 b[AES_BLOCK_SIZE];
+};
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+	/*
+	 * # of rounds specified by AES:
+	 * 128 bit key		10 rounds
+	 * 192 bit key		12 rounds
+	 * 256 bit key		14 rounds
+	 * => n byte key	=> 6 + (n/4) rounds
+	 */
+	return 6 + ctx->key_length / 4;
+}
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct aes_block *out = (struct aes_block *)dst;
+	struct aes_block const *in = (struct aes_block *)src;
+	void *dummy0;
+	int dummy1;
+
+	kernel_neon_begin_partial(4);
+
+	__asm__("	ld1	{v0.16b}, %[in]			;"
+		"	ld1	{v1.2d}, [%[key]], #16		;"
+		"	cmp	%w[rounds], #10			;"
+		"	bmi	0f				;"
+		"	bne	3f				;"
+		"	mov	v3.16b, v1.16b			;"
+		"	b	2f				;"
+		"0:	mov	v2.16b, v1.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"1:	aese	v0.16b, v2.16b			;"
+		"	aesmc	v0.16b, v0.16b			;"
+		"2:	ld1	{v1.2d}, [%[key]], #16		;"
+		"	aese	v0.16b, v3.16b			;"
+		"	aesmc	v0.16b, v0.16b			;"
+		"3:	ld1	{v2.2d}, [%[key]], #16		;"
+		"	subs	%w[rounds], %w[rounds], #3	;"
+		"	aese	v0.16b, v1.16b			;"
+		"	aesmc	v0.16b, v0.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"	bpl	1b				;"
+		"	aese	v0.16b, v2.16b			;"
+		"	eor	v0.16b, v0.16b, v3.16b		;"
+		"	st1	{v0.16b}, %[out]		;"
+
+	:	[out]		"=Q"(*out),
+		[key]		"=r"(dummy0),
+		[rounds]	"=r"(dummy1)
+	:	[in]		"Q"(*in),
+				"1"(ctx->key_enc),
+				"2"(num_rounds(ctx) - 2)
+	:	"cc");
+
+	kernel_neon_end();
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct aes_block *out = (struct aes_block *)dst;
+	struct aes_block const *in = (struct aes_block *)src;
+	void *dummy0;
+	int dummy1;
+
+	kernel_neon_begin_partial(4);
+
+	__asm__("	ld1	{v0.16b}, %[in]			;"
+		"	ld1	{v1.2d}, [%[key]], #16		;"
+		"	cmp	%w[rounds], #10			;"
+		"	bmi	0f				;"
+		"	bne	3f				;"
+		"	mov	v3.16b, v1.16b			;"
+		"	b	2f				;"
+		"0:	mov	v2.16b, v1.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"1:	aesd	v0.16b, v2.16b			;"
+		"	aesimc	v0.16b, v0.16b			;"
+		"2:	ld1	{v1.2d}, [%[key]], #16		;"
+		"	aesd	v0.16b, v3.16b			;"
+		"	aesimc	v0.16b, v0.16b			;"
+		"3:	ld1	{v2.2d}, [%[key]], #16		;"
+		"	subs	%w[rounds], %w[rounds], #3	;"
+		"	aesd	v0.16b, v1.16b			;"
+		"	aesimc	v0.16b, v0.16b			;"
+		"	ld1	{v3.2d}, [%[key]], #16		;"
+		"	bpl	1b				;"
+		"	aesd	v0.16b, v2.16b			;"
+		"	eor	v0.16b, v0.16b, v3.16b		;"
+		"	st1	{v0.16b}, %[out]		;"
+
+	:	[out]		"=Q"(*out),
+		[key]		"=r"(dummy0),
+		[rounds]	"=r"(dummy1)
+	:	[in]		"Q"(*in),
+				"1"(ctx->key_dec),
+				"2"(num_rounds(ctx) - 2)
+	:	"cc");
+
+	kernel_neon_end();
+}
+
+static struct crypto_alg aes_alg = {
+	.cra_name		= "aes",
+	.cra_driver_name	= "aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_cipher = {
+		.cia_min_keysize	= AES_MIN_KEY_SIZE,
+		.cia_max_keysize	= AES_MAX_KEY_SIZE,
+		.cia_setkey		= crypto_aes_set_key,
+		.cia_encrypt		= aes_cipher_encrypt,
+		.cia_decrypt		= aes_cipher_decrypt
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&aes_alg);
+}
+
+module_cpu_feature_match(AES, aes_mod_init);
+module_exit(aes_mod_exit);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 09/15] arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49   ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto
  Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel

This patch adds support for the AES-CCM encryption algorithm for CPUs that
have support for the AES part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 arch/arm64/crypto/Kconfig           |   7 +
 arch/arm64/crypto/Makefile          |   3 +
 arch/arm64/crypto/aes-ce-ccm-core.S | 222 +++++++++++++++++++++++++++
 arch/arm64/crypto/aes-ce-ccm-glue.c | 297 ++++++++++++++++++++++++++++++++++++
 4 files changed, 529 insertions(+)
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 9ba32c0da871..8fffd5af65ef 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -29,4 +29,11 @@ config CRYPTO_AES_ARM64_CE
 	select CRYPTO_ALGAPI
 	select CRYPTO_AES
 
+config CRYPTO_AES_ARM64_CE_CCM
+	tristate "AES in CCM mode using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_ALGAPI
+	select CRYPTO_AES
+	select CRYPTO_AEAD
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 908abd9242b1..311287d68078 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -19,3 +19,6 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
 CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
+aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
new file mode 100644
index 000000000000..a2a7fbcacc14
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -0,0 +1,222 @@
+/*
+ * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.arch	armv8-a+crypto
+
+	/*
+	 * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+	 *			     u32 *macp, u8 const rk[], u32 rounds);
+	 */
+ENTRY(ce_aes_ccm_auth_data)
+	ldr	w8, [x3]			/* leftover from prev round? */
+	ld1	{v0.2d}, [x0]			/* load mac */
+	cbz	w8, 1f
+	sub	w8, w8, #16
+	eor	v1.16b, v1.16b, v1.16b
+0:	ldrb	w7, [x1], #1			/* get 1 byte of input */
+	subs	w2, w2, #1
+	add	w8, w8, #1
+	ins	v1.b[0], w7
+	ext	v1.16b, v1.16b, v1.16b, #1	/* rotate in the input bytes */
+	beq	8f				/* out of input? */
+	cbnz	w8, 0b
+	eor	v0.16b, v0.16b, v1.16b
+1:	ld1	{v3.2d}, [x4]			/* load first round key */
+	prfm	pldl1strm, [x1]
+	cmp	w5, #12				/* which key size? */
+	add	x6, x4, #16
+	sub	w7, w5, #2			/* modified # of rounds */
+	bmi	2f
+	bne	5f
+	mov	v5.16b, v3.16b
+	b	4f
+2:	mov	v4.16b, v3.16b
+	ld1	{v5.2d}, [x6], #16		/* load 2nd round key */
+3:	aese	v0.16b, v4.16b
+	aesmc	v0.16b, v0.16b
+4:	ld1	{v3.2d}, [x6], #16		/* load next round key */
+	aese	v0.16b, v5.16b
+	aesmc	v0.16b, v0.16b
+5:	ld1	{v4.2d}, [x6], #16		/* load next round key */
+	subs	w7, w7, #3
+	aese	v0.16b, v3.16b
+	aesmc	v0.16b, v0.16b
+	ld1	{v5.2d}, [x6], #16		/* load next round key */
+	bpl	3b
+	aese	v0.16b, v4.16b
+	subs	w2, w2, #16			/* last data? */
+	eor	v0.16b, v0.16b, v5.16b		/* final round */
+	bmi	6f
+	ld1	{v1.16b}, [x1], #16		/* load next input block */
+	eor	v0.16b, v0.16b, v1.16b		/* xor with mac */
+	bne	1b
+6:	st1	{v0.2d}, [x0]			/* store mac */
+	beq	10f
+	adds	w2, w2, #16
+	beq	10f
+	mov	w8, w2
+7:	ldrb	w7, [x1], #1
+	umov	w6, v0.b[0]
+	eor	w6, w6, w7
+	strb	w6, [x0], #1
+	subs	w2, w2, #1
+	beq	10f
+	ext	v0.16b, v0.16b, v0.16b, #1	/* rotate out the mac bytes */
+	b	7b
+8:	mov	w7, w8
+	add	w8, w8, #16
+9:	ext	v1.16b, v1.16b, v1.16b, #1
+	adds	w7, w7, #1
+	bne	9b
+	eor	v0.16b, v0.16b, v1.16b
+	st1	{v0.2d}, [x0]
+10:	str	w8, [x3]
+	ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+	/*
+	 * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+	 * 			 u32 rounds);
+	 */
+ENTRY(ce_aes_ccm_final)
+	ld1	{v3.2d}, [x2], #16		/* load first round key */
+	ld1	{v0.2d}, [x0]			/* load mac */
+	cmp	w3, #12				/* which key size? */
+	sub	w3, w3, #2			/* modified # of rounds */
+	ld1	{v1.2d}, [x1]			/* load 1st ctriv */
+	bmi	0f
+	bne	3f
+	mov	v5.16b, v3.16b
+	b	2f
+0:	mov	v4.16b, v3.16b
+1:	ld1	{v5.2d}, [x2], #16		/* load next round key */
+	aese	v0.16b, v4.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v4.16b
+	aesmc	v1.16b, v1.16b
+2:	ld1	{v3.2d}, [x2], #16		/* load next round key */
+	aese	v0.16b, v5.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v5.16b
+	aesmc	v1.16b, v1.16b
+3:	ld1	{v4.2d}, [x2], #16		/* load next round key */
+	subs	w3, w3, #3
+	aese	v0.16b, v3.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v3.16b
+	aesmc	v1.16b, v1.16b
+	bpl	1b
+	aese	v0.16b, v4.16b
+	aese	v1.16b, v4.16b
+	/* final round key cancels out */
+	eor	v0.16b, v0.16b, v1.16b		/* en-/decrypt the mac */
+	st1	{v0.2d}, [x0]			/* store result */
+	ret
+ENDPROC(ce_aes_ccm_final)
+
+	.macro	aes_ccm_do_crypt,enc
+	ldr	x8, [x6, #8]			/* load lower ctr */
+	ld1	{v0.2d}, [x5]			/* load mac */
+	rev	x8, x8				/* keep swabbed ctr in reg */
+0:	/* outer loop */
+	ld1	{v1.1d}, [x6]			/* load upper ctr */
+	prfm	pldl1strm, [x1]
+	add	x8, x8, #1
+	rev	x9, x8
+	cmp	w4, #12				/* which key size? */
+	sub	w7, w4, #2			/* get modified # of rounds */
+	ins	v1.d[1], x9			/* no carry in lower ctr */
+	ld1	{v3.2d}, [x3]			/* load first round key */
+	add	x10, x3, #16
+	bmi	1f
+	bne	4f
+	mov	v5.16b, v3.16b
+	b	3f
+1:	mov	v4.16b, v3.16b
+	ld1	{v5.2d}, [x10], #16		/* load 2nd round key */
+2:	/* inner loop: 3 rounds, 2x interleaved */
+	aese	v0.16b, v4.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v4.16b
+	aesmc	v1.16b, v1.16b
+3:	ld1	{v3.2d}, [x10], #16		/* load next round key */
+	aese	v0.16b, v5.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v5.16b
+	aesmc	v1.16b, v1.16b
+4:	ld1	{v4.2d}, [x10], #16		/* load next round key */
+	subs	w7, w7, #3
+	aese	v0.16b, v3.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v3.16b
+	aesmc	v1.16b, v1.16b
+	ld1	{v5.2d}, [x10], #16		/* load next round key */
+	bpl	2b
+	aese	v0.16b, v4.16b
+	aese	v1.16b, v4.16b
+	subs	w2, w2, #16
+	bmi	6f				/* partial block? */
+	ld1	{v2.16b}, [x1], #16		/* load next input block */
+	.if	\enc == 1
+	eor	v2.16b, v2.16b, v5.16b		/* final round enc+mac */
+	eor	v1.16b, v1.16b, v2.16b		/* xor with crypted ctr */
+	.else
+	eor	v2.16b, v2.16b, v1.16b		/* xor with crypted ctr */
+	eor	v1.16b, v2.16b, v5.16b		/* final round enc */
+	.endif
+	eor	v0.16b, v0.16b, v2.16b		/* xor mac with pt ^ rk[last] */
+	st1	{v1.16b}, [x0], #16		/* write output block */
+	bne	0b
+	rev	x8, x8
+	st1	{v0.2d}, [x5]			/* store mac */
+	str	x8, [x6, #8]			/* store lsb end of ctr (BE) */
+5:	ret
+
+6:	eor	v0.16b, v0.16b, v5.16b		/* final round mac */
+	eor	v1.16b, v1.16b, v5.16b		/* final round enc */
+	st1	{v0.2d}, [x5]			/* store mac */
+	add	w2, w2, #16			/* process partial tail block */
+7:	ldrb	w9, [x1], #1			/* get 1 byte of input */
+	umov	w6, v1.b[0]			/* get top crypted ctr byte */
+	umov	w7, v0.b[0]			/* get top mac byte */
+	.if	\enc == 1
+	eor	w7, w7, w9
+	eor	w9, w9, w6
+	.else
+	eor	w9, w9, w6
+	eor	w7, w7, w9
+	.endif
+	strb	w9, [x0], #1			/* store out byte */
+	strb	w7, [x5], #1			/* store mac byte */
+	subs	w2, w2, #1
+	beq	5b
+	ext	v0.16b, v0.16b, v0.16b, #1	/* shift out mac byte */
+	ext	v1.16b, v1.16b, v1.16b, #1	/* shift out ctr byte */
+	b	7b
+	.endm
+
+	/*
+	 * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+	 * 			   u8 const rk[], u32 rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+	 * 			   u8 const rk[], u32 rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 */
+ENTRY(ce_aes_ccm_encrypt)
+	aes_ccm_do_crypt	1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+	aes_ccm_do_crypt	0
+ENDPROC(ce_aes_ccm_decrypt)
diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
new file mode 100644
index 000000000000..9e6cdde9b43d
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -0,0 +1,297 @@
+/*
+ * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+	/*
+	 * # of rounds specified by AES:
+	 * 128 bit key		10 rounds
+	 * 192 bit key		12 rounds
+	 * 256 bit key		14 rounds
+	 * => n byte key	=> 6 + (n/4) rounds
+	 */
+	return 6 + ctx->key_length / 4;
+}
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+				     u32 *macp, u32 const rk[], u32 rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+				   u32 const rk[], u32 rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+				   u32 const rk[], u32 rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+				 u32 rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+		      unsigned int key_len)
+{
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm);
+	int ret;
+
+	ret = crypto_aes_expand_key(ctx, in_key, key_len);
+	if (!ret)
+		return 0;
+
+	tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+	return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+	if ((authsize & 1) || authsize < 4)
+		return -EINVAL;
+	return 0;
+}
+
+static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	__be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8];
+	u32 l = req->iv[0] + 1;
+
+	/* verify that CCM dimension 'L' is set correctly in the IV */
+	if (l < 2 || l > 8)
+		return -EINVAL;
+
+	/* verify that msglen can in fact be represented in L bytes */
+	if (l < 4 && msglen >> (8 * l))
+		return -EOVERFLOW;
+
+	/*
+	 * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi
+	 * uses a u32 type to represent msglen so the top 4 bytes are always 0.
+	 */
+	n[0] = 0;
+	n[1] = cpu_to_be32(msglen);
+
+	memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+	/*
+	 * Meaning of byte 0 according to CCM spec (RFC 3610/NIST 800-38C)
+	 * - bits 0..2	: max # of bytes required to represent msglen, minus 1
+	 *                (already set by caller)
+	 * - bits 3..5	: size of auth tag (1 => 4 bytes, 2 => 6 bytes, etc)
+	 * - bit 6	: indicates presence of authenticate-only data
+	 */
+	maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+	if (req->assoclen)
+		maciv[0] |= 0x40;
+
+	memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+	return 0;
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct __packed { __be16 l; __be32 h; u16 len; } ltag;
+	struct scatter_walk walk;
+	u32 len = req->assoclen;
+	u32 macp = 0;
+
+	/* prepend the AAD with a length tag */
+	if (len < 0xff00) {
+		ltag.l = cpu_to_be16(len);
+		ltag.len = 2;
+	} else  {
+		ltag.l = cpu_to_be16(0xfffe);
+		put_unaligned_be32(len, &ltag.h);
+		ltag.len = 6;
+	}
+
+	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp, ctx->key_enc,
+			     num_rounds(ctx));
+	scatterwalk_start(&walk, req->assoc);
+
+	do {
+		u32 n = scatterwalk_clamp(&walk, len);
+		u8 *p;
+
+		if (!n) {
+			scatterwalk_start(&walk, sg_next(walk.sg));
+			n = scatterwalk_clamp(&walk, len);
+		}
+		p = scatterwalk_map(&walk);
+		ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
+				     num_rounds(ctx));
+		len -= n;
+
+		scatterwalk_unmap(p);
+		scatterwalk_advance(&walk, n);
+		scatterwalk_done(&walk, 0, len);
+	} while (len);
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct blkcipher_desc desc = { .info = req->iv };
+	struct blkcipher_walk walk;
+	u8 __aligned(8) mac[AES_BLOCK_SIZE];
+	u8 buf[AES_BLOCK_SIZE];
+	u32 len = req->cryptlen;
+	int err;
+
+	err = ccm_init_mac(req, mac, len);
+	if (err)
+		return err;
+
+	kernel_neon_begin_partial(6);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, mac);
+
+	/* preserve the original iv for the final round */
+	memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+	blkcipher_walk_init(&walk, req->dst, req->src, len);
+	err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+					     AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == len)
+			tail = 0;
+
+		ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc,
+				   num_rounds(ctx), mac, walk.iv);
+
+		len -= walk.nbytes - tail;
+		err = blkcipher_walk_done(&desc, &walk, tail);
+	}
+	if (!err)
+		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+	kernel_neon_end();
+
+	if (err)
+		return err;
+
+	/* copy authtag to end of dst */
+	scatterwalk_map_and_copy(mac, req->dst, req->cryptlen,
+				 crypto_aead_authsize(aead), 1);
+
+	return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	unsigned int authsize = crypto_aead_authsize(aead);
+	struct blkcipher_desc desc = { .info = req->iv };
+	struct blkcipher_walk walk;
+	u8 __aligned(8) mac[AES_BLOCK_SIZE];
+	u8 buf[AES_BLOCK_SIZE];
+	u32 len = req->cryptlen - authsize;
+	int err;
+
+	err = ccm_init_mac(req, mac, len);
+	if (err)
+		return err;
+
+	kernel_neon_begin_partial(6);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, mac);
+
+	/* preserve the original iv for the final round */
+	memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+	blkcipher_walk_init(&walk, req->dst, req->src, len);
+	err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+					     AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == len)
+			tail = 0;
+
+		ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc,
+				   num_rounds(ctx), mac, walk.iv);
+
+		len -= walk.nbytes - tail;
+		err = blkcipher_walk_done(&desc, &walk, tail);
+	}
+	if (!err)
+		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+	kernel_neon_end();
+
+	if (err)
+		return err;
+
+	/* compare calculated auth tag with the stored one */
+	scatterwalk_map_and_copy(buf, req->src, req->cryptlen - authsize,
+				 authsize, 0);
+
+	if (memcmp(mac, buf, authsize))
+		return -EBADMSG;
+	return 0;
+}
+
+static struct crypto_alg ccm_aes_alg = {
+	.cra_name		= "ccm(aes)",
+	.cra_driver_name	= "ccm-aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_AEAD,
+	.cra_blocksize		= 1,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_alignmask		= 7,
+	.cra_type		= &crypto_aead_type,
+	.cra_module		= THIS_MODULE,
+	.cra_aead = {
+		.ivsize		= AES_BLOCK_SIZE,
+		.maxauthsize	= AES_BLOCK_SIZE,
+		.setkey		= ccm_setkey,
+		.setauthsize	= ccm_setauthsize,
+		.encrypt	= ccm_encrypt,
+		.decrypt	= ccm_decrypt,
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	if (!(elf_hwcap & HWCAP_AES))
+		return -ENODEV;
+	return crypto_register_alg(&ccm_aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&ccm_aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES in CCM mode using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("ccm(aes)");
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH resend 09/15] arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
@ 2014-05-01 15:49   ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for the AES-CCM encryption algorithm for CPUs that
have support for the AES part of the ARM v8 Crypto Extensions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 arch/arm64/crypto/Kconfig           |   7 +
 arch/arm64/crypto/Makefile          |   3 +
 arch/arm64/crypto/aes-ce-ccm-core.S | 222 +++++++++++++++++++++++++++
 arch/arm64/crypto/aes-ce-ccm-glue.c | 297 ++++++++++++++++++++++++++++++++++++
 4 files changed, 529 insertions(+)
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
 create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 9ba32c0da871..8fffd5af65ef 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -29,4 +29,11 @@ config CRYPTO_AES_ARM64_CE
 	select CRYPTO_ALGAPI
 	select CRYPTO_AES
 
+config CRYPTO_AES_ARM64_CE_CCM
+	tristate "AES in CCM mode using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_ALGAPI
+	select CRYPTO_AES
+	select CRYPTO_AEAD
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 908abd9242b1..311287d68078 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -19,3 +19,6 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
 CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
+aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
new file mode 100644
index 000000000000..a2a7fbcacc14
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -0,0 +1,222 @@
+/*
+ * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.arch	armv8-a+crypto
+
+	/*
+	 * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+	 *			     u32 *macp, u8 const rk[], u32 rounds);
+	 */
+ENTRY(ce_aes_ccm_auth_data)
+	ldr	w8, [x3]			/* leftover from prev round? */
+	ld1	{v0.2d}, [x0]			/* load mac */
+	cbz	w8, 1f
+	sub	w8, w8, #16
+	eor	v1.16b, v1.16b, v1.16b
+0:	ldrb	w7, [x1], #1			/* get 1 byte of input */
+	subs	w2, w2, #1
+	add	w8, w8, #1
+	ins	v1.b[0], w7
+	ext	v1.16b, v1.16b, v1.16b, #1	/* rotate in the input bytes */
+	beq	8f				/* out of input? */
+	cbnz	w8, 0b
+	eor	v0.16b, v0.16b, v1.16b
+1:	ld1	{v3.2d}, [x4]			/* load first round key */
+	prfm	pldl1strm, [x1]
+	cmp	w5, #12				/* which key size? */
+	add	x6, x4, #16
+	sub	w7, w5, #2			/* modified # of rounds */
+	bmi	2f
+	bne	5f
+	mov	v5.16b, v3.16b
+	b	4f
+2:	mov	v4.16b, v3.16b
+	ld1	{v5.2d}, [x6], #16		/* load 2nd round key */
+3:	aese	v0.16b, v4.16b
+	aesmc	v0.16b, v0.16b
+4:	ld1	{v3.2d}, [x6], #16		/* load next round key */
+	aese	v0.16b, v5.16b
+	aesmc	v0.16b, v0.16b
+5:	ld1	{v4.2d}, [x6], #16		/* load next round key */
+	subs	w7, w7, #3
+	aese	v0.16b, v3.16b
+	aesmc	v0.16b, v0.16b
+	ld1	{v5.2d}, [x6], #16		/* load next round key */
+	bpl	3b
+	aese	v0.16b, v4.16b
+	subs	w2, w2, #16			/* last data? */
+	eor	v0.16b, v0.16b, v5.16b		/* final round */
+	bmi	6f
+	ld1	{v1.16b}, [x1], #16		/* load next input block */
+	eor	v0.16b, v0.16b, v1.16b		/* xor with mac */
+	bne	1b
+6:	st1	{v0.2d}, [x0]			/* store mac */
+	beq	10f
+	adds	w2, w2, #16
+	beq	10f
+	mov	w8, w2
+7:	ldrb	w7, [x1], #1
+	umov	w6, v0.b[0]
+	eor	w6, w6, w7
+	strb	w6, [x0], #1
+	subs	w2, w2, #1
+	beq	10f
+	ext	v0.16b, v0.16b, v0.16b, #1	/* rotate out the mac bytes */
+	b	7b
+8:	mov	w7, w8
+	add	w8, w8, #16
+9:	ext	v1.16b, v1.16b, v1.16b, #1
+	adds	w7, w7, #1
+	bne	9b
+	eor	v0.16b, v0.16b, v1.16b
+	st1	{v0.2d}, [x0]
+10:	str	w8, [x3]
+	ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+	/*
+	 * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+	 * 			 u32 rounds);
+	 */
+ENTRY(ce_aes_ccm_final)
+	ld1	{v3.2d}, [x2], #16		/* load first round key */
+	ld1	{v0.2d}, [x0]			/* load mac */
+	cmp	w3, #12				/* which key size? */
+	sub	w3, w3, #2			/* modified # of rounds */
+	ld1	{v1.2d}, [x1]			/* load 1st ctriv */
+	bmi	0f
+	bne	3f
+	mov	v5.16b, v3.16b
+	b	2f
+0:	mov	v4.16b, v3.16b
+1:	ld1	{v5.2d}, [x2], #16		/* load next round key */
+	aese	v0.16b, v4.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v4.16b
+	aesmc	v1.16b, v1.16b
+2:	ld1	{v3.2d}, [x2], #16		/* load next round key */
+	aese	v0.16b, v5.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v5.16b
+	aesmc	v1.16b, v1.16b
+3:	ld1	{v4.2d}, [x2], #16		/* load next round key */
+	subs	w3, w3, #3
+	aese	v0.16b, v3.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v3.16b
+	aesmc	v1.16b, v1.16b
+	bpl	1b
+	aese	v0.16b, v4.16b
+	aese	v1.16b, v4.16b
+	/* final round key cancels out */
+	eor	v0.16b, v0.16b, v1.16b		/* en-/decrypt the mac */
+	st1	{v0.2d}, [x0]			/* store result */
+	ret
+ENDPROC(ce_aes_ccm_final)
+
+	.macro	aes_ccm_do_crypt,enc
+	ldr	x8, [x6, #8]			/* load lower ctr */
+	ld1	{v0.2d}, [x5]			/* load mac */
+	rev	x8, x8				/* keep swabbed ctr in reg */
+0:	/* outer loop */
+	ld1	{v1.1d}, [x6]			/* load upper ctr */
+	prfm	pldl1strm, [x1]
+	add	x8, x8, #1
+	rev	x9, x8
+	cmp	w4, #12				/* which key size? */
+	sub	w7, w4, #2			/* get modified # of rounds */
+	ins	v1.d[1], x9			/* no carry in lower ctr */
+	ld1	{v3.2d}, [x3]			/* load first round key */
+	add	x10, x3, #16
+	bmi	1f
+	bne	4f
+	mov	v5.16b, v3.16b
+	b	3f
+1:	mov	v4.16b, v3.16b
+	ld1	{v5.2d}, [x10], #16		/* load 2nd round key */
+2:	/* inner loop: 3 rounds, 2x interleaved */
+	aese	v0.16b, v4.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v4.16b
+	aesmc	v1.16b, v1.16b
+3:	ld1	{v3.2d}, [x10], #16		/* load next round key */
+	aese	v0.16b, v5.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v5.16b
+	aesmc	v1.16b, v1.16b
+4:	ld1	{v4.2d}, [x10], #16		/* load next round key */
+	subs	w7, w7, #3
+	aese	v0.16b, v3.16b
+	aesmc	v0.16b, v0.16b
+	aese	v1.16b, v3.16b
+	aesmc	v1.16b, v1.16b
+	ld1	{v5.2d}, [x10], #16		/* load next round key */
+	bpl	2b
+	aese	v0.16b, v4.16b
+	aese	v1.16b, v4.16b
+	subs	w2, w2, #16
+	bmi	6f				/* partial block? */
+	ld1	{v2.16b}, [x1], #16		/* load next input block */
+	.if	\enc == 1
+	eor	v2.16b, v2.16b, v5.16b		/* final round enc+mac */
+	eor	v1.16b, v1.16b, v2.16b		/* xor with crypted ctr */
+	.else
+	eor	v2.16b, v2.16b, v1.16b		/* xor with crypted ctr */
+	eor	v1.16b, v2.16b, v5.16b		/* final round enc */
+	.endif
+	eor	v0.16b, v0.16b, v2.16b		/* xor mac with pt ^ rk[last] */
+	st1	{v1.16b}, [x0], #16		/* write output block */
+	bne	0b
+	rev	x8, x8
+	st1	{v0.2d}, [x5]			/* store mac */
+	str	x8, [x6, #8]			/* store lsb end of ctr (BE) */
+5:	ret
+
+6:	eor	v0.16b, v0.16b, v5.16b		/* final round mac */
+	eor	v1.16b, v1.16b, v5.16b		/* final round enc */
+	st1	{v0.2d}, [x5]			/* store mac */
+	add	w2, w2, #16			/* process partial tail block */
+7:	ldrb	w9, [x1], #1			/* get 1 byte of input */
+	umov	w6, v1.b[0]			/* get top crypted ctr byte */
+	umov	w7, v0.b[0]			/* get top mac byte */
+	.if	\enc == 1
+	eor	w7, w7, w9
+	eor	w9, w9, w6
+	.else
+	eor	w9, w9, w6
+	eor	w7, w7, w9
+	.endif
+	strb	w9, [x0], #1			/* store out byte */
+	strb	w7, [x5], #1			/* store mac byte */
+	subs	w2, w2, #1
+	beq	5b
+	ext	v0.16b, v0.16b, v0.16b, #1	/* shift out mac byte */
+	ext	v1.16b, v1.16b, v1.16b, #1	/* shift out ctr byte */
+	b	7b
+	.endm
+
+	/*
+	 * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+	 * 			   u8 const rk[], u32 rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+	 * 			   u8 const rk[], u32 rounds, u8 mac[],
+	 * 			   u8 ctr[]);
+	 */
+ENTRY(ce_aes_ccm_encrypt)
+	aes_ccm_do_crypt	1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+	aes_ccm_do_crypt	0
+ENDPROC(ce_aes_ccm_decrypt)
diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
new file mode 100644
index 000000000000..9e6cdde9b43d
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -0,0 +1,297 @@
+/*
+ * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+	/*
+	 * # of rounds specified by AES:
+	 * 128 bit key		10 rounds
+	 * 192 bit key		12 rounds
+	 * 256 bit key		14 rounds
+	 * => n byte key	=> 6 + (n/4) rounds
+	 */
+	return 6 + ctx->key_length / 4;
+}
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+				     u32 *macp, u32 const rk[], u32 rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+				   u32 const rk[], u32 rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+				   u32 const rk[], u32 rounds, u8 mac[],
+				   u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+				 u32 rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+		      unsigned int key_len)
+{
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm);
+	int ret;
+
+	ret = crypto_aes_expand_key(ctx, in_key, key_len);
+	if (!ret)
+		return 0;
+
+	tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+	return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+	if ((authsize & 1) || authsize < 4)
+		return -EINVAL;
+	return 0;
+}
+
+static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	__be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8];
+	u32 l = req->iv[0] + 1;
+
+	/* verify that CCM dimension 'L' is set correctly in the IV */
+	if (l < 2 || l > 8)
+		return -EINVAL;
+
+	/* verify that msglen can in fact be represented in L bytes */
+	if (l < 4 && msglen >> (8 * l))
+		return -EOVERFLOW;
+
+	/*
+	 * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi
+	 * uses a u32 type to represent msglen so the top 4 bytes are always 0.
+	 */
+	n[0] = 0;
+	n[1] = cpu_to_be32(msglen);
+
+	memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+	/*
+	 * Meaning of byte 0 according to CCM spec (RFC 3610/NIST 800-38C)
+	 * - bits 0..2	: max # of bytes required to represent msglen, minus 1
+	 *                (already set by caller)
+	 * - bits 3..5	: size of auth tag (1 => 4 bytes, 2 => 6 bytes, etc)
+	 * - bit 6	: indicates presence of authenticate-only data
+	 */
+	maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+	if (req->assoclen)
+		maciv[0] |= 0x40;
+
+	memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+	return 0;
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct __packed { __be16 l; __be32 h; u16 len; } ltag;
+	struct scatter_walk walk;
+	u32 len = req->assoclen;
+	u32 macp = 0;
+
+	/* prepend the AAD with a length tag */
+	if (len < 0xff00) {
+		ltag.l = cpu_to_be16(len);
+		ltag.len = 2;
+	} else  {
+		ltag.l = cpu_to_be16(0xfffe);
+		put_unaligned_be32(len, &ltag.h);
+		ltag.len = 6;
+	}
+
+	ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp, ctx->key_enc,
+			     num_rounds(ctx));
+	scatterwalk_start(&walk, req->assoc);
+
+	do {
+		u32 n = scatterwalk_clamp(&walk, len);
+		u8 *p;
+
+		if (!n) {
+			scatterwalk_start(&walk, sg_next(walk.sg));
+			n = scatterwalk_clamp(&walk, len);
+		}
+		p = scatterwalk_map(&walk);
+		ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
+				     num_rounds(ctx));
+		len -= n;
+
+		scatterwalk_unmap(p);
+		scatterwalk_advance(&walk, n);
+		scatterwalk_done(&walk, 0, len);
+	} while (len);
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	struct blkcipher_desc desc = { .info = req->iv };
+	struct blkcipher_walk walk;
+	u8 __aligned(8) mac[AES_BLOCK_SIZE];
+	u8 buf[AES_BLOCK_SIZE];
+	u32 len = req->cryptlen;
+	int err;
+
+	err = ccm_init_mac(req, mac, len);
+	if (err)
+		return err;
+
+	kernel_neon_begin_partial(6);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, mac);
+
+	/* preserve the original iv for the final round */
+	memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+	blkcipher_walk_init(&walk, req->dst, req->src, len);
+	err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+					     AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == len)
+			tail = 0;
+
+		ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc,
+				   num_rounds(ctx), mac, walk.iv);
+
+		len -= walk.nbytes - tail;
+		err = blkcipher_walk_done(&desc, &walk, tail);
+	}
+	if (!err)
+		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+	kernel_neon_end();
+
+	if (err)
+		return err;
+
+	/* copy authtag to end of dst */
+	scatterwalk_map_and_copy(mac, req->dst, req->cryptlen,
+				 crypto_aead_authsize(aead), 1);
+
+	return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+	unsigned int authsize = crypto_aead_authsize(aead);
+	struct blkcipher_desc desc = { .info = req->iv };
+	struct blkcipher_walk walk;
+	u8 __aligned(8) mac[AES_BLOCK_SIZE];
+	u8 buf[AES_BLOCK_SIZE];
+	u32 len = req->cryptlen - authsize;
+	int err;
+
+	err = ccm_init_mac(req, mac, len);
+	if (err)
+		return err;
+
+	kernel_neon_begin_partial(6);
+
+	if (req->assoclen)
+		ccm_calculate_auth_mac(req, mac);
+
+	/* preserve the original iv for the final round */
+	memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+	blkcipher_walk_init(&walk, req->dst, req->src, len);
+	err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+					     AES_BLOCK_SIZE);
+
+	while (walk.nbytes) {
+		u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+		if (walk.nbytes == len)
+			tail = 0;
+
+		ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+				   walk.nbytes - tail, ctx->key_enc,
+				   num_rounds(ctx), mac, walk.iv);
+
+		len -= walk.nbytes - tail;
+		err = blkcipher_walk_done(&desc, &walk, tail);
+	}
+	if (!err)
+		ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+	kernel_neon_end();
+
+	if (err)
+		return err;
+
+	/* compare calculated auth tag with the stored one */
+	scatterwalk_map_and_copy(buf, req->src, req->cryptlen - authsize,
+				 authsize, 0);
+
+	if (memcmp(mac, buf, authsize))
+		return -EBADMSG;
+	return 0;
+}
+
+static struct crypto_alg ccm_aes_alg = {
+	.cra_name		= "ccm(aes)",
+	.cra_driver_name	= "ccm-aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_AEAD,
+	.cra_blocksize		= 1,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_alignmask		= 7,
+	.cra_type		= &crypto_aead_type,
+	.cra_module		= THIS_MODULE,
+	.cra_aead = {
+		.ivsize		= AES_BLOCK_SIZE,
+		.maxauthsize	= AES_BLOCK_SIZE,
+		.setkey		= ccm_setkey,
+		.setauthsize	= ccm_setauthsize,
+		.encrypt	= ccm_encrypt,
+		.decrypt	= ccm_decrypt,
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	if (!(elf_hwcap & HWCAP_AES))
+		return -ENODEV;
+	return crypto_register_alg(&ccm_aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&ccm_aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES in CCM mode using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("ccm(aes)");
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
  2014-05-01 15:49   ` Ard Biesheuvel
@ 2014-05-06 14:31     ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper,
	Russell King - ARM Linux

On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> Switch the default unaligned access method to 'hardware implemented'
> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/asm-generic/unaligned.h | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)

I'm happy to take this patch via the arm64 tree. But arm is affected as
well, so it would be good to know if Russell has any objections (cc'ed).

Patch below for reference. Thanks.

Catalin

> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
> index 03cf5936bad6..1ac097279db1 100644
> --- a/include/asm-generic/unaligned.h
> +++ b/include/asm-generic/unaligned.h
> @@ -4,22 +4,27 @@
>  /*
>   * This is the most generic implementation of unaligned accesses
>   * and should work almost anywhere.
> - *
> - * If an architecture can handle unaligned accesses in hardware,
> - * it may want to use the linux/unaligned/access_ok.h implementation
> - * instead.
>   */
>  #include <asm/byteorder.h>
>  
> +/* Set by the arch if it can handle unaligned accesses in hardware. */
> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/access_ok.h>
> +#endif
> +
>  #if defined(__LITTLE_ENDIAN)
> -# include <linux/unaligned/le_struct.h>
> -# include <linux/unaligned/be_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +#  include <linux/unaligned/le_struct.h>
> +#  include <linux/unaligned/be_byteshift.h>
> +# endif
>  # include <linux/unaligned/generic.h>
>  # define get_unaligned	__get_unaligned_le
>  # define put_unaligned	__put_unaligned_le
>  #elif defined(__BIG_ENDIAN)
> -# include <linux/unaligned/be_struct.h>
> -# include <linux/unaligned/le_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +#  include <linux/unaligned/be_struct.h>
> +#  include <linux/unaligned/le_byteshift.h>
> +# endif
>  # include <linux/unaligned/generic.h>
>  # define get_unaligned	__get_unaligned_be
>  # define put_unaligned	__put_unaligned_be
> -- 
> 1.8.3.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-06 14:31     ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> Switch the default unaligned access method to 'hardware implemented'
> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/asm-generic/unaligned.h | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)

I'm happy to take this patch via the arm64 tree. But arm is affected as
well, so it would be good to know if Russell has any objections (cc'ed).

Patch below for reference. Thanks.

Catalin

> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
> index 03cf5936bad6..1ac097279db1 100644
> --- a/include/asm-generic/unaligned.h
> +++ b/include/asm-generic/unaligned.h
> @@ -4,22 +4,27 @@
>  /*
>   * This is the most generic implementation of unaligned accesses
>   * and should work almost anywhere.
> - *
> - * If an architecture can handle unaligned accesses in hardware,
> - * it may want to use the linux/unaligned/access_ok.h implementation
> - * instead.
>   */
>  #include <asm/byteorder.h>
>  
> +/* Set by the arch if it can handle unaligned accesses in hardware. */
> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/access_ok.h>
> +#endif
> +
>  #if defined(__LITTLE_ENDIAN)
> -# include <linux/unaligned/le_struct.h>
> -# include <linux/unaligned/be_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +#  include <linux/unaligned/le_struct.h>
> +#  include <linux/unaligned/be_byteshift.h>
> +# endif
>  # include <linux/unaligned/generic.h>
>  # define get_unaligned	__get_unaligned_le
>  # define put_unaligned	__put_unaligned_le
>  #elif defined(__BIG_ENDIAN)
> -# include <linux/unaligned/be_struct.h>
> -# include <linux/unaligned/le_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +#  include <linux/unaligned/be_struct.h>
> +#  include <linux/unaligned/le_byteshift.h>
> +# endif
>  # include <linux/unaligned/generic.h>
>  # define get_unaligned	__get_unaligned_be
>  # define put_unaligned	__put_unaligned_be
> -- 
> 1.8.3.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
  2014-05-06 14:31     ` Catalin Marinas
@ 2014-05-06 14:34       ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:34 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper,
	Russell King - ARM Linux

On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
>> Switch the default unaligned access method to 'hardware implemented'
>> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Acked-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  include/asm-generic/unaligned.h | 21 +++++++++++++--------
>>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> I'm happy to take this patch via the arm64 tree. But arm is affected as
> well, so it would be good to know if Russell has any objections (cc'ed).
>
> Patch below for reference. Thanks.
>

Russell has already replied to that:
http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2

Regards,
Ard.


>> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
>> index 03cf5936bad6..1ac097279db1 100644
>> --- a/include/asm-generic/unaligned.h
>> +++ b/include/asm-generic/unaligned.h
>> @@ -4,22 +4,27 @@
>>  /*
>>   * This is the most generic implementation of unaligned accesses
>>   * and should work almost anywhere.
>> - *
>> - * If an architecture can handle unaligned accesses in hardware,
>> - * it may want to use the linux/unaligned/access_ok.h implementation
>> - * instead.
>>   */
>>  #include <asm/byteorder.h>
>>
>> +/* Set by the arch if it can handle unaligned accesses in hardware. */
>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/access_ok.h>
>> +#endif
>> +
>>  #if defined(__LITTLE_ENDIAN)
>> -# include <linux/unaligned/le_struct.h>
>> -# include <linux/unaligned/be_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +#  include <linux/unaligned/le_struct.h>
>> +#  include <linux/unaligned/be_byteshift.h>
>> +# endif
>>  # include <linux/unaligned/generic.h>
>>  # define get_unaligned       __get_unaligned_le
>>  # define put_unaligned       __put_unaligned_le
>>  #elif defined(__BIG_ENDIAN)
>> -# include <linux/unaligned/be_struct.h>
>> -# include <linux/unaligned/le_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +#  include <linux/unaligned/be_struct.h>
>> +#  include <linux/unaligned/le_byteshift.h>
>> +# endif
>>  # include <linux/unaligned/generic.h>
>>  # define get_unaligned       __get_unaligned_be
>>  # define put_unaligned       __put_unaligned_be
>> --
>> 1.8.3.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-06 14:34       ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
>> Switch the default unaligned access method to 'hardware implemented'
>> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Acked-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  include/asm-generic/unaligned.h | 21 +++++++++++++--------
>>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> I'm happy to take this patch via the arm64 tree. But arm is affected as
> well, so it would be good to know if Russell has any objections (cc'ed).
>
> Patch below for reference. Thanks.
>

Russell has already replied to that:
http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2

Regards,
Ard.


>> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
>> index 03cf5936bad6..1ac097279db1 100644
>> --- a/include/asm-generic/unaligned.h
>> +++ b/include/asm-generic/unaligned.h
>> @@ -4,22 +4,27 @@
>>  /*
>>   * This is the most generic implementation of unaligned accesses
>>   * and should work almost anywhere.
>> - *
>> - * If an architecture can handle unaligned accesses in hardware,
>> - * it may want to use the linux/unaligned/access_ok.h implementation
>> - * instead.
>>   */
>>  #include <asm/byteorder.h>
>>
>> +/* Set by the arch if it can handle unaligned accesses in hardware. */
>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/access_ok.h>
>> +#endif
>> +
>>  #if defined(__LITTLE_ENDIAN)
>> -# include <linux/unaligned/le_struct.h>
>> -# include <linux/unaligned/be_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +#  include <linux/unaligned/le_struct.h>
>> +#  include <linux/unaligned/be_byteshift.h>
>> +# endif
>>  # include <linux/unaligned/generic.h>
>>  # define get_unaligned       __get_unaligned_le
>>  # define put_unaligned       __put_unaligned_le
>>  #elif defined(__BIG_ENDIAN)
>> -# include <linux/unaligned/be_struct.h>
>> -# include <linux/unaligned/le_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +#  include <linux/unaligned/be_struct.h>
>> +#  include <linux/unaligned/le_byteshift.h>
>> +# endif
>>  # include <linux/unaligned/generic.h>
>>  # define get_unaligned       __get_unaligned_be
>>  # define put_unaligned       __put_unaligned_be
>> --
>> 1.8.3.2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
  2014-05-01 15:49   ` Ard Biesheuvel
@ 2014-05-06 14:43     ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:43 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel

On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 4aef42a04bdc..86ac6a9bc86a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
>  	preempt_enable();
>  }
>  
> +/*
> + * Save the userland FPSIMD state of 'current' to memory
> + */
> +void fpsimd_preserve_current_state(void)
> +{
> +	fpsimd_save_state(&current->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load the userland FPSIMD state of 'current' from memory
> + */
> +void fpsimd_restore_current_state(void)
> +{
> +	fpsimd_load_state(&current->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load an updated userland FPSIMD state for 'current' from memory
> + */
> +void fpsimd_update_current_state(struct fpsimd_state *state)
> +{
> +	preempt_disable();
> +	fpsimd_load_state(state);
> +	preempt_enable();
> +}

Minor - please update the comment above the functions to state that
preemption needs to be disabled by the caller.

> +/*
> + * Invalidate live CPU copies of task t's FPSIMD state
> + */
> +void fpsimd_flush_task_state(struct task_struct *t)
> +{
> +}

I guess this will be added in a subsequent patch. You could either move
it there or add a comment in the commit log that it is a dummy function
for now (I prefer the former).

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 14:43     ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 4aef42a04bdc..86ac6a9bc86a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
>  	preempt_enable();
>  }
>  
> +/*
> + * Save the userland FPSIMD state of 'current' to memory
> + */
> +void fpsimd_preserve_current_state(void)
> +{
> +	fpsimd_save_state(&current->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load the userland FPSIMD state of 'current' from memory
> + */
> +void fpsimd_restore_current_state(void)
> +{
> +	fpsimd_load_state(&current->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load an updated userland FPSIMD state for 'current' from memory
> + */
> +void fpsimd_update_current_state(struct fpsimd_state *state)
> +{
> +	preempt_disable();
> +	fpsimd_load_state(state);
> +	preempt_enable();
> +}

Minor - please update the comment above the functions to state that
preemption needs to be disabled by the caller.

> +/*
> + * Invalidate live CPU copies of task t's FPSIMD state
> + */
> +void fpsimd_flush_task_state(struct task_struct *t)
> +{
> +}

I guess this will be added in a subsequent patch. You could either move
it there or add a comment in the commit log that it is a dummy function
for now (I prefer the former).

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
  2014-05-06 14:43     ` Catalin Marinas
@ 2014-05-06 14:48       ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:48 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 4aef42a04bdc..86ac6a9bc86a 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
>>       preempt_enable();
>>  }
>>
>> +/*
>> + * Save the userland FPSIMD state of 'current' to memory
>> + */
>> +void fpsimd_preserve_current_state(void)
>> +{
>> +     fpsimd_save_state(&current->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load the userland FPSIMD state of 'current' from memory
>> + */
>> +void fpsimd_restore_current_state(void)
>> +{
>> +     fpsimd_load_state(&current->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load an updated userland FPSIMD state for 'current' from memory
>> + */
>> +void fpsimd_update_current_state(struct fpsimd_state *state)
>> +{
>> +     preempt_disable();
>> +     fpsimd_load_state(state);
>> +     preempt_enable();
>> +}
>
> Minor - please update the comment above the functions to state that
> preemption needs to be disabled by the caller.
>

Do you mean in all three cases? And, by implication, that the
preempt_disable()/enable() pair should be moved to the call site for
fpsimd_update_current_state() ?


>> +/*
>> + * Invalidate live CPU copies of task t's FPSIMD state
>> + */
>> +void fpsimd_flush_task_state(struct task_struct *t)
>> +{
>> +}
>
> I guess this will be added in a subsequent patch. You could either move
> it there or add a comment in the commit log that it is a dummy function
> for now (I prefer the former).
>

OK

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 14:48       ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 4aef42a04bdc..86ac6a9bc86a 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
>>       preempt_enable();
>>  }
>>
>> +/*
>> + * Save the userland FPSIMD state of 'current' to memory
>> + */
>> +void fpsimd_preserve_current_state(void)
>> +{
>> +     fpsimd_save_state(&current->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load the userland FPSIMD state of 'current' from memory
>> + */
>> +void fpsimd_restore_current_state(void)
>> +{
>> +     fpsimd_load_state(&current->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load an updated userland FPSIMD state for 'current' from memory
>> + */
>> +void fpsimd_update_current_state(struct fpsimd_state *state)
>> +{
>> +     preempt_disable();
>> +     fpsimd_load_state(state);
>> +     preempt_enable();
>> +}
>
> Minor - please update the comment above the functions to state that
> preemption needs to be disabled by the caller.
>

Do you mean in all three cases? And, by implication, that the
preempt_disable()/enable() pair should be moved to the call site for
fpsimd_update_current_state() ?


>> +/*
>> + * Invalidate live CPU copies of task t's FPSIMD state
>> + */
>> +void fpsimd_flush_task_state(struct task_struct *t)
>> +{
>> +}
>
> I guess this will be added in a subsequent patch. You could either move
> it there or add a comment in the commit log that it is a dummy function
> for now (I prefer the former).
>

OK

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
  2014-05-06 14:48       ` Ard Biesheuvel
@ 2014-05-06 15:12         ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:12 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel

On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> >> index 4aef42a04bdc..86ac6a9bc86a 100644
> >> --- a/arch/arm64/kernel/fpsimd.c
> >> +++ b/arch/arm64/kernel/fpsimd.c
> >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> >>       preempt_enable();
> >>  }
> >>
> >> +/*
> >> + * Save the userland FPSIMD state of 'current' to memory
> >> + */
> >> +void fpsimd_preserve_current_state(void)
> >> +{
> >> +     fpsimd_save_state(&current->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load the userland FPSIMD state of 'current' from memory
> >> + */
> >> +void fpsimd_restore_current_state(void)
> >> +{
> >> +     fpsimd_load_state(&current->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load an updated userland FPSIMD state for 'current' from memory
> >> + */
> >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> >> +{
> >> +     preempt_disable();
> >> +     fpsimd_load_state(state);
> >> +     preempt_enable();
> >> +}
> >
> > Minor - please update the comment above the functions to state that
> > preemption needs to be disabled by the caller.
> >
> 
> Do you mean in all three cases? And, by implication, that the
> preempt_disable()/enable() pair should be moved to the call site for
> fpsimd_update_current_state() ?

No, just the comment for the first two functions updated.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 15:12         ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> >> index 4aef42a04bdc..86ac6a9bc86a 100644
> >> --- a/arch/arm64/kernel/fpsimd.c
> >> +++ b/arch/arm64/kernel/fpsimd.c
> >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> >>       preempt_enable();
> >>  }
> >>
> >> +/*
> >> + * Save the userland FPSIMD state of 'current' to memory
> >> + */
> >> +void fpsimd_preserve_current_state(void)
> >> +{
> >> +     fpsimd_save_state(&current->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load the userland FPSIMD state of 'current' from memory
> >> + */
> >> +void fpsimd_restore_current_state(void)
> >> +{
> >> +     fpsimd_load_state(&current->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load an updated userland FPSIMD state for 'current' from memory
> >> + */
> >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> >> +{
> >> +     preempt_disable();
> >> +     fpsimd_load_state(state);
> >> +     preempt_enable();
> >> +}
> >
> > Minor - please update the comment above the functions to state that
> > preemption needs to be disabled by the caller.
> >
> 
> Do you mean in all three cases? And, by implication, that the
> preempt_disable()/enable() pair should be moved to the call site for
> fpsimd_update_current_state() ?

No, just the comment for the first two functions updated.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
  2014-05-06 14:34       ` Ard Biesheuvel
@ 2014-05-06 15:14         ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper,
	Russell King - ARM Linux

On Tue, May 06, 2014 at 03:34:23PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> >> Switch the default unaligned access method to 'hardware implemented'
> >> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> Acked-by: Arnd Bergmann <arnd@arndb.de>
> >> ---
> >>  include/asm-generic/unaligned.h | 21 +++++++++++++--------
> >>  1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > I'm happy to take this patch via the arm64 tree. But arm is affected as
> > well, so it would be good to know if Russell has any objections (cc'ed).
> >
> > Patch below for reference. Thanks.
> >
> 
> Russell has already replied to that:
> http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-06 15:14         ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 06, 2014 at 03:34:23PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> >> Switch the default unaligned access method to 'hardware implemented'
> >> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> Acked-by: Arnd Bergmann <arnd@arndb.de>
> >> ---
> >>  include/asm-generic/unaligned.h | 21 +++++++++++++--------
> >>  1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > I'm happy to take this patch via the arm64 tree. But arm is affected as
> > well, so it would be good to know if Russell has any objections (cc'ed).
> >
> > Patch below for reference. Thanks.
> >
> 
> Russell has already replied to that:
> http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
  2014-05-06 15:12         ` Catalin Marinas
@ 2014-05-06 15:42           ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:42 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On Tue, May 06, 2014 at 04:12:55PM +0100, Catalin Marinas wrote:
> On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> > On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> > >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > >> index 4aef42a04bdc..86ac6a9bc86a 100644
> > >> --- a/arch/arm64/kernel/fpsimd.c
> > >> +++ b/arch/arm64/kernel/fpsimd.c
> > >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> > >>       preempt_enable();
> > >>  }
> > >>
> > >> +/*
> > >> + * Save the userland FPSIMD state of 'current' to memory
> > >> + */
> > >> +void fpsimd_preserve_current_state(void)
> > >> +{
> > >> +     fpsimd_save_state(&current->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load the userland FPSIMD state of 'current' from memory
> > >> + */
> > >> +void fpsimd_restore_current_state(void)
> > >> +{
> > >> +     fpsimd_load_state(&current->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load an updated userland FPSIMD state for 'current' from memory
> > >> + */
> > >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> > >> +{
> > >> +     preempt_disable();
> > >> +     fpsimd_load_state(state);
> > >> +     preempt_enable();
> > >> +}
> > >
> > > Minor - please update the comment above the functions to state that
> > > preemption needs to be disabled by the caller.
> > >
> > 
> > Do you mean in all three cases? And, by implication, that the
> > preempt_disable()/enable() pair should be moved to the call site for
> > fpsimd_update_current_state() ?
> 
> No, just the comment for the first two functions updated.

I noticed in a subsequent patch that you add preempt_disable/enable
already in the first two functions. You could do it here as well to
avoid confusion (and no need to update the comment).

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 15:42           ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 06, 2014 at 04:12:55PM +0100, Catalin Marinas wrote:
> On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> > On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> > >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > >> index 4aef42a04bdc..86ac6a9bc86a 100644
> > >> --- a/arch/arm64/kernel/fpsimd.c
> > >> +++ b/arch/arm64/kernel/fpsimd.c
> > >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> > >>       preempt_enable();
> > >>  }
> > >>
> > >> +/*
> > >> + * Save the userland FPSIMD state of 'current' to memory
> > >> + */
> > >> +void fpsimd_preserve_current_state(void)
> > >> +{
> > >> +     fpsimd_save_state(&current->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load the userland FPSIMD state of 'current' from memory
> > >> + */
> > >> +void fpsimd_restore_current_state(void)
> > >> +{
> > >> +     fpsimd_load_state(&current->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load an updated userland FPSIMD state for 'current' from memory
> > >> + */
> > >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> > >> +{
> > >> +     preempt_disable();
> > >> +     fpsimd_load_state(state);
> > >> +     preempt_enable();
> > >> +}
> > >
> > > Minor - please update the comment above the functions to state that
> > > preemption needs to be disabled by the caller.
> > >
> > 
> > Do you mean in all three cases? And, by implication, that the
> > preempt_disable()/enable() pair should be moved to the call site for
> > fpsimd_update_current_state() ?
> 
> No, just the comment for the first two functions updated.

I noticed in a subsequent patch that you add preempt_disable/enable
already in the first two functions. You could do it here as well to
avoid confusion (and no need to update the comment).

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
  2014-05-01 15:49   ` Ard Biesheuvel
@ 2014-05-06 16:08     ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:08 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel

On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  {
>         switch (cmd) {
>         case CPU_PM_ENTER:
> -               if (current->mm)
> +               if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>                         fpsimd_save_state(&current->thread.fpsimd_state);
>                 break;
>         case CPU_PM_EXIT:
> -               if (current->mm)
> -                       fpsimd_load_state(&current->thread.fpsimd_state);
> +               set_thread_flag(TIF_FOREIGN_FPSTATE);

I think we could enter a PM state on a kernel thread (idle), so we
should preserve the current->mm check as well.

>                 break;
>         case CPU_PM_ENTER_FAILED:
>         default:
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 06448a77ff53..882f01774365 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>                 clear_thread_flag(TIF_NOTIFY_RESUME);
>                 tracehook_notify_resume(regs);
>         }
> +
> +       if (thread_flags & _TIF_FOREIGN_FPSTATE)
> +               fpsimd_restore_current_state();

I think this should be safe. Even if we get preempted here, ret_to_user
would loop over TI_FLAGS with interrupts disabled until no work pending.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-06 16:08     ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  {
>         switch (cmd) {
>         case CPU_PM_ENTER:
> -               if (current->mm)
> +               if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>                         fpsimd_save_state(&current->thread.fpsimd_state);
>                 break;
>         case CPU_PM_EXIT:
> -               if (current->mm)
> -                       fpsimd_load_state(&current->thread.fpsimd_state);
> +               set_thread_flag(TIF_FOREIGN_FPSTATE);

I think we could enter a PM state on a kernel thread (idle), so we
should preserve the current->mm check as well.

>                 break;
>         case CPU_PM_ENTER_FAILED:
>         default:
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 06448a77ff53..882f01774365 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>                 clear_thread_flag(TIF_NOTIFY_RESUME);
>                 tracehook_notify_resume(regs);
>         }
> +
> +       if (thread_flags & _TIF_FOREIGN_FPSTATE)
> +               fpsimd_restore_current_state();

I think this should be safe. Even if we get preempted here, ret_to_user
would loop over TI_FLAGS with interrupts disabled until no work pending.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
  2014-05-06 16:08     ` Catalin Marinas
@ 2014-05-06 16:25       ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 16:25 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
>> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>>  {
>>         switch (cmd) {
>>         case CPU_PM_ENTER:
>> -               if (current->mm)
>> +               if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>>                         fpsimd_save_state(&current->thread.fpsimd_state);
>>                 break;
>>         case CPU_PM_EXIT:
>> -               if (current->mm)
>> -                       fpsimd_load_state(&current->thread.fpsimd_state);
>> +               set_thread_flag(TIF_FOREIGN_FPSTATE);
>
> I think we could enter a PM state on a kernel thread (idle), so we
> should preserve the current->mm check as well.
>

OK

>>                 break;
>>         case CPU_PM_ENTER_FAILED:
>>         default:
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 06448a77ff53..882f01774365 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>>                 clear_thread_flag(TIF_NOTIFY_RESUME);
>>                 tracehook_notify_resume(regs);
>>         }
>> +
>> +       if (thread_flags & _TIF_FOREIGN_FPSTATE)
>> +               fpsimd_restore_current_state();
>
> I think this should be safe. Even if we get preempted here, ret_to_user
> would loop over TI_FLAGS with interrupts disabled until no work pending.
>

I don't follow. Do you think I should change something here?

Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
is checked again, but this time with preemption disabled.

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-06 16:25       ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
>> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>>  {
>>         switch (cmd) {
>>         case CPU_PM_ENTER:
>> -               if (current->mm)
>> +               if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>>                         fpsimd_save_state(&current->thread.fpsimd_state);
>>                 break;
>>         case CPU_PM_EXIT:
>> -               if (current->mm)
>> -                       fpsimd_load_state(&current->thread.fpsimd_state);
>> +               set_thread_flag(TIF_FOREIGN_FPSTATE);
>
> I think we could enter a PM state on a kernel thread (idle), so we
> should preserve the current->mm check as well.
>

OK

>>                 break;
>>         case CPU_PM_ENTER_FAILED:
>>         default:
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 06448a77ff53..882f01774365 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>>                 clear_thread_flag(TIF_NOTIFY_RESUME);
>>                 tracehook_notify_resume(regs);
>>         }
>> +
>> +       if (thread_flags & _TIF_FOREIGN_FPSTATE)
>> +               fpsimd_restore_current_state();
>
> I think this should be safe. Even if we get preempted here, ret_to_user
> would loop over TI_FLAGS with interrupts disabled until no work pending.
>

I don't follow. Do you think I should change something here?

Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
is checked again, but this time with preemption disabled.

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
  2014-05-06 16:25       ` Ard Biesheuvel
@ 2014-05-06 16:31         ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:31 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On Tue, May 06, 2014 at 05:25:14PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> >> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> >>  {
> >>         switch (cmd) {
> >>         case CPU_PM_ENTER:
> >> -               if (current->mm)
> >> +               if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> >>                         fpsimd_save_state(&current->thread.fpsimd_state);
> >>                 break;
> >>         case CPU_PM_EXIT:
> >> -               if (current->mm)
> >> -                       fpsimd_load_state(&current->thread.fpsimd_state);
> >> +               set_thread_flag(TIF_FOREIGN_FPSTATE);
> >
> > I think we could enter a PM state on a kernel thread (idle), so we
> > should preserve the current->mm check as well.
> 
> OK
> 
> >>                 break;
> >>         case CPU_PM_ENTER_FAILED:
> >>         default:
> >> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> >> index 06448a77ff53..882f01774365 100644
> >> --- a/arch/arm64/kernel/signal.c
> >> +++ b/arch/arm64/kernel/signal.c
> >> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> >>                 clear_thread_flag(TIF_NOTIFY_RESUME);
> >>                 tracehook_notify_resume(regs);
> >>         }
> >> +
> >> +       if (thread_flags & _TIF_FOREIGN_FPSTATE)
> >> +               fpsimd_restore_current_state();
> >
> > I think this should be safe. Even if we get preempted here, ret_to_user
> > would loop over TI_FLAGS with interrupts disabled until no work pending.
> 
> I don't follow. Do you think I should change something here?

No, I think it's safe (just thinking out loud). That's assuming
TIF_FOREIGN_FPSTATE is never set in interrupt context but a brief look
at subsequent patch shows that it doesn't.

> Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
> is checked again, but this time with preemption disabled.

Yes. I was thinking about the scenario where we get to
do_notify_resume() because of other work but with TIF_FOREIGN_FPSTATE
cleared. In the meantime, we get preempted and TIF_FOREIGN_FPSTATE gets
set when switching back to this thread. In this case, ret_to_user loops
again over TI_FLAGS.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-06 16:31         ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 06, 2014 at 05:25:14PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> >> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> >>  {
> >>         switch (cmd) {
> >>         case CPU_PM_ENTER:
> >> -               if (current->mm)
> >> +               if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> >>                         fpsimd_save_state(&current->thread.fpsimd_state);
> >>                 break;
> >>         case CPU_PM_EXIT:
> >> -               if (current->mm)
> >> -                       fpsimd_load_state(&current->thread.fpsimd_state);
> >> +               set_thread_flag(TIF_FOREIGN_FPSTATE);
> >
> > I think we could enter a PM state on a kernel thread (idle), so we
> > should preserve the current->mm check as well.
> 
> OK
> 
> >>                 break;
> >>         case CPU_PM_ENTER_FAILED:
> >>         default:
> >> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> >> index 06448a77ff53..882f01774365 100644
> >> --- a/arch/arm64/kernel/signal.c
> >> +++ b/arch/arm64/kernel/signal.c
> >> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> >>                 clear_thread_flag(TIF_NOTIFY_RESUME);
> >>                 tracehook_notify_resume(regs);
> >>         }
> >> +
> >> +       if (thread_flags & _TIF_FOREIGN_FPSTATE)
> >> +               fpsimd_restore_current_state();
> >
> > I think this should be safe. Even if we get preempted here, ret_to_user
> > would loop over TI_FLAGS with interrupts disabled until no work pending.
> 
> I don't follow. Do you think I should change something here?

No, I think it's safe (just thinking out loud). That's assuming
TIF_FOREIGN_FPSTATE is never set in interrupt context but a brief look
at subsequent patch shows that it doesn't.

> Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
> is checked again, but this time with preemption disabled.

Yes. I was thinking about the scenario where we get to
do_notify_resume() because of other work but with TIF_FOREIGN_FPSTATE
cleared. In the meantime, we get preempted and TIF_FOREIGN_FPSTATE gets
set when switching back to this thread. In this case, ret_to_user loops
again over TI_FLAGS.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
  2014-05-01 15:49   ` Ard Biesheuvel
@ 2014-05-06 16:49     ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:49 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel

On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 7a900142dbc8..05e1b24aca4c 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,17 @@ struct fpsimd_state {
>  	unsigned int cpu;
>  };
>  
> +/*
> + * Struct for stacking the bottom 'n' FP/SIMD registers.
> + */
> +struct fpsimd_partial_state {
> +	u32		num_regs;
> +	u32		fpsr;
> +	u32		fpcr;
> +	__uint128_t	vregs[32] __aligned(16);
> +} __aligned(16);

Do we need this explicit alignment here?

> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index bbec599c96bd..69e75134689d 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -62,3 +62,38 @@
>  	ldr	w\tmpnr, [\state, #16 * 2 + 4]
>  	msr	fpcr, x\tmpnr
>  .endm
> +
> +.altmacro
> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
> +	mrs	x\tmpnr1, fpsr
> +	str	w\numnr, [\state]
> +	mrs	x\tmpnr2, fpcr
> +	stp	w\tmpnr1, w\tmpnr2, [\state, #4]
> +	adr	x\tmpnr1, 0f
> +	add	\state, \state, x\numnr, lsl #4
> +	sub	x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
> +	br	x\tmpnr1
> +	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> +	.irp	qb, %(qa + 1)
> +	stp	q\qa, q\qb, [\state, # -16 * \qa - 16]
> +	.endr
> +	.endr
> +0:
> +.endm
> +
> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
> +	ldp	w\tmpnr1, w\tmpnr2, [\state, #4]
> +	msr	fpsr, x\tmpnr1
> +	msr	fpcr, x\tmpnr2
> +	adr	x\tmpnr1, 0f
> +	ldr	w\tmpnr2, [\state]
> +	add	\state, \state, x\tmpnr2, lsl #4
> +	sub	x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
> +	br	x\tmpnr1
> +	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> +	.irp	qb, %(qa + 1)
> +	ldp	q\qa, q\qb, [\state, # -16 * \qa - 16]
> +	.endr
> +	.endr
> +0:
> +.endm

BTW, it may be better if num_regs is placed at the end of the structure,
especially since you use stp to store both fpsr and fpcr (though I
haven't rewritten the above to see how they look).

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
@ 2014-05-06 16:49     ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 7a900142dbc8..05e1b24aca4c 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,17 @@ struct fpsimd_state {
>  	unsigned int cpu;
>  };
>  
> +/*
> + * Struct for stacking the bottom 'n' FP/SIMD registers.
> + */
> +struct fpsimd_partial_state {
> +	u32		num_regs;
> +	u32		fpsr;
> +	u32		fpcr;
> +	__uint128_t	vregs[32] __aligned(16);
> +} __aligned(16);

Do we need this explicit alignment here?

> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index bbec599c96bd..69e75134689d 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -62,3 +62,38 @@
>  	ldr	w\tmpnr, [\state, #16 * 2 + 4]
>  	msr	fpcr, x\tmpnr
>  .endm
> +
> +.altmacro
> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
> +	mrs	x\tmpnr1, fpsr
> +	str	w\numnr, [\state]
> +	mrs	x\tmpnr2, fpcr
> +	stp	w\tmpnr1, w\tmpnr2, [\state, #4]
> +	adr	x\tmpnr1, 0f
> +	add	\state, \state, x\numnr, lsl #4
> +	sub	x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
> +	br	x\tmpnr1
> +	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> +	.irp	qb, %(qa + 1)
> +	stp	q\qa, q\qb, [\state, # -16 * \qa - 16]
> +	.endr
> +	.endr
> +0:
> +.endm
> +
> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
> +	ldp	w\tmpnr1, w\tmpnr2, [\state, #4]
> +	msr	fpsr, x\tmpnr1
> +	msr	fpcr, x\tmpnr2
> +	adr	x\tmpnr1, 0f
> +	ldr	w\tmpnr2, [\state]
> +	add	\state, \state, x\tmpnr2, lsl #4
> +	sub	x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
> +	br	x\tmpnr1
> +	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> +	.irp	qb, %(qa + 1)
> +	ldp	q\qa, q\qb, [\state, # -16 * \qa - 16]
> +	.endr
> +	.endr
> +0:
> +.endm

BTW, it may be better if num_regs is placed at the end of the structure,
especially since you use stp to store both fpsr and fpcr (though I
haven't rewritten the above to see how they look).

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
  2014-05-06 16:49     ` Catalin Marinas
@ 2014-05-06 17:09       ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 17:09 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel

On 6 May 2014 18:49, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index 7a900142dbc8..05e1b24aca4c 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -41,6 +41,17 @@ struct fpsimd_state {
>>       unsigned int cpu;
>>  };
>>
>> +/*
>> + * Struct for stacking the bottom 'n' FP/SIMD registers.
>> + */
>> +struct fpsimd_partial_state {
>> +     u32             num_regs;
>> +     u32             fpsr;
>> +     u32             fpcr;
>> +     __uint128_t     vregs[32] __aligned(16);
>> +} __aligned(16);
>
> Do we need this explicit alignment here?
>

Without it, the implied alignment is 8 bytes, I suppose, but I haven't
checked carefully.
I will check and remove this if 8 bytes is the default.

>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
>> index bbec599c96bd..69e75134689d 100644
>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>> @@ -62,3 +62,38 @@
>>       ldr     w\tmpnr, [\state, #16 * 2 + 4]
>>       msr     fpcr, x\tmpnr
>>  .endm
>> +
>> +.altmacro
>> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
>> +     mrs     x\tmpnr1, fpsr
>> +     str     w\numnr, [\state]
>> +     mrs     x\tmpnr2, fpcr
>> +     stp     w\tmpnr1, w\tmpnr2, [\state, #4]
>> +     adr     x\tmpnr1, 0f
>> +     add     \state, \state, x\numnr, lsl #4
>> +     sub     x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
>> +     br      x\tmpnr1
>> +     .irp    qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> +     .irp    qb, %(qa + 1)
>> +     stp     q\qa, q\qb, [\state, # -16 * \qa - 16]
>> +     .endr
>> +     .endr
>> +0:
>> +.endm
>> +
>> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
>> +     ldp     w\tmpnr1, w\tmpnr2, [\state, #4]
>> +     msr     fpsr, x\tmpnr1
>> +     msr     fpcr, x\tmpnr2
>> +     adr     x\tmpnr1, 0f
>> +     ldr     w\tmpnr2, [\state]
>> +     add     \state, \state, x\tmpnr2, lsl #4
>> +     sub     x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
>> +     br      x\tmpnr1
>> +     .irp    qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> +     .irp    qb, %(qa + 1)
>> +     ldp     q\qa, q\qb, [\state, # -16 * \qa - 16]
>> +     .endr
>> +     .endr
>> +0:
>> +.endm
>
> BTW, it may be better if num_regs is placed at the end of the structure,
> especially since you use stp to store both fpsr and fpcr (though I
> haven't rewritten the above to see how they look).
>

I suppose you mean in the middle, i.e., after fpsr and fpcr?
Yes that makes sense, I will change that.

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
@ 2014-05-06 17:09       ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 17:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 May 2014 18:49, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index 7a900142dbc8..05e1b24aca4c 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -41,6 +41,17 @@ struct fpsimd_state {
>>       unsigned int cpu;
>>  };
>>
>> +/*
>> + * Struct for stacking the bottom 'n' FP/SIMD registers.
>> + */
>> +struct fpsimd_partial_state {
>> +     u32             num_regs;
>> +     u32             fpsr;
>> +     u32             fpcr;
>> +     __uint128_t     vregs[32] __aligned(16);
>> +} __aligned(16);
>
> Do we need this explicit alignment here?
>

Without it, the implied alignment is 8 bytes, I suppose, but I haven't
checked carefully.
I will check and remove this if 8 bytes is the default.

>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
>> index bbec599c96bd..69e75134689d 100644
>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>> @@ -62,3 +62,38 @@
>>       ldr     w\tmpnr, [\state, #16 * 2 + 4]
>>       msr     fpcr, x\tmpnr
>>  .endm
>> +
>> +.altmacro
>> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
>> +     mrs     x\tmpnr1, fpsr
>> +     str     w\numnr, [\state]
>> +     mrs     x\tmpnr2, fpcr
>> +     stp     w\tmpnr1, w\tmpnr2, [\state, #4]
>> +     adr     x\tmpnr1, 0f
>> +     add     \state, \state, x\numnr, lsl #4
>> +     sub     x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
>> +     br      x\tmpnr1
>> +     .irp    qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> +     .irp    qb, %(qa + 1)
>> +     stp     q\qa, q\qb, [\state, # -16 * \qa - 16]
>> +     .endr
>> +     .endr
>> +0:
>> +.endm
>> +
>> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
>> +     ldp     w\tmpnr1, w\tmpnr2, [\state, #4]
>> +     msr     fpsr, x\tmpnr1
>> +     msr     fpcr, x\tmpnr2
>> +     adr     x\tmpnr1, 0f
>> +     ldr     w\tmpnr2, [\state]
>> +     add     \state, \state, x\tmpnr2, lsl #4
>> +     sub     x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
>> +     br      x\tmpnr1
>> +     .irp    qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> +     .irp    qb, %(qa + 1)
>> +     ldp     q\qa, q\qb, [\state, # -16 * \qa - 16]
>> +     .endr
>> +     .endr
>> +0:
>> +.endm
>
> BTW, it may be better if num_regs is placed at the end of the structure,
> especially since you use stp to store both fpsr and fpcr (though I
> haven't rewritten the above to see how they look).
>

I suppose you mean in the middle, i.e., after fpsr and fpcr?
Yes that makes sense, I will change that.

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-07 14:45   ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-07 14:45 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
> This is a repost of the arm64 crypto patches that I have posted to the LAKML
> over the past months. They have now been verified on actual hardware
> (Cortex-A57) so if there are no remaining issues I would like to propose them
> for 3.16.
> 
> Ard Biesheuvel (15):
>   asm-generic: allow generic unaligned access if the arch supports it
>   arm64: add abstractions for FPSIMD state manipulation
>   arm64: defer reloading a task's FPSIMD state to userland resume
>   arm64: add support for kernel mode NEON in interrupt context
>   arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>   arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>   arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>   arm64/crypto: AES using ARMv8 Crypto Extensions
>   arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>   arm64: pull in <asm/simd.h> from asm-generic
>   arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>     Extensions
>   arm64/crypto: add shared macro to test for NEED_RESCHED
>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>   arm64/crypto: add voluntary preemption to Crypto Extensions GHASH

There are about 5 patches that make sense to me ;) and apart from a few
minor comments they look fine.

There are the other 10 crypto patches that are beyond my knowledge. Do
you know anyone who could do a sanity check on them? Are there any tests
that would show the correctness of the implementation?

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-07 14:45   ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-07 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
> This is a repost of the arm64 crypto patches that I have posted to the LAKML
> over the past months. They have now been verified on actual hardware
> (Cortex-A57) so if there are no remaining issues I would like to propose them
> for 3.16.
> 
> Ard Biesheuvel (15):
>   asm-generic: allow generic unaligned access if the arch supports it
>   arm64: add abstractions for FPSIMD state manipulation
>   arm64: defer reloading a task's FPSIMD state to userland resume
>   arm64: add support for kernel mode NEON in interrupt context
>   arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>   arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>   arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>   arm64/crypto: AES using ARMv8 Crypto Extensions
>   arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>   arm64: pull in <asm/simd.h> from asm-generic
>   arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>     Extensions
>   arm64/crypto: add shared macro to test for NEED_RESCHED
>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>   arm64/crypto: add voluntary preemption to Crypto Extensions GHASH

There are about 5 patches that make sense to me ;) and apart from a few
minor comments they look fine.

There are the other 10 crypto patches that are beyond my knowledge. Do
you know anyone who could do a sanity check on them? Are there any tests
that would show the correctness of the implementation?

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-07 14:45   ` Catalin Marinas
@ 2014-05-07 19:58     ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-07 19:58 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>>   asm-generic: allow generic unaligned access if the arch supports it
>>   arm64: add abstractions for FPSIMD state manipulation
>>   arm64: defer reloading a task's FPSIMD state to userland resume
>>   arm64: add support for kernel mode NEON in interrupt context
>>   arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>   arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>   arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>   arm64/crypto: AES using ARMv8 Crypto Extensions
>>   arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>   arm64: pull in <asm/simd.h> from asm-generic
>>   arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>     Extensions
>>   arm64/crypto: add shared macro to test for NEED_RESCHED
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>   arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>

The core ciphers and hashes are always tested automatically at each
insmod (unles CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is set)
This applies to plain AES, SHA1, SHA-224/256 and GHASH.

For the remaining modules, there is the tcrypt.ko module which can be
used to explicitly invoke the builtin crypto tests:

modprobe tcrypt.ko mode=xx

with xx in

AES-ECB/CBC/CTR/XTS: 10
AES-CCM: 37

There are also builtin benchmarks:

AES-ECB/CBC/CTR/XTS: 500
SHA1: 303
SHA256: 304

Consult crypto/tcrypt.c for an exhaustive list.

Note that the modprobe of tcrypt.ko always fails: that is intentional
as the test and/or benchmark runs in the init() function so there is
no reason to keep the module loaded afterwards.

As far as the relative performance is concerned. these implementations
are all significantly faster than their generic C counterparts, as
they mostly use dedicated instructions. The NEON implementation of AES
is also significantly faster, at least on Cortex-A57. Mileage may vary
for other implementations.

>From a security pov, note that these algorithms are all time
invariant, i.e., there are no data dependent lookups.

Regards,
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-07 19:58     ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-07 19:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>>   asm-generic: allow generic unaligned access if the arch supports it
>>   arm64: add abstractions for FPSIMD state manipulation
>>   arm64: defer reloading a task's FPSIMD state to userland resume
>>   arm64: add support for kernel mode NEON in interrupt context
>>   arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>   arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>   arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>   arm64/crypto: AES using ARMv8 Crypto Extensions
>>   arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>   arm64: pull in <asm/simd.h> from asm-generic
>>   arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>     Extensions
>>   arm64/crypto: add shared macro to test for NEED_RESCHED
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>   arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>

The core ciphers and hashes are always tested automatically at each
insmod (unles CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is set)
This applies to plain AES, SHA1, SHA-224/256 and GHASH.

For the remaining modules, there is the tcrypt.ko module which can be
used to explicitly invoke the builtin crypto tests:

modprobe tcrypt.ko mode=xx

with xx in

AES-ECB/CBC/CTR/XTS: 10
AES-CCM: 37

There are also builtin benchmarks:

AES-ECB/CBC/CTR/XTS: 500
SHA1: 303
SHA256: 304

Consult crypto/tcrypt.c for an exhaustive list.

Note that the modprobe of tcrypt.ko always fails: that is intentional
as the test and/or benchmark runs in the init() function so there is
no reason to keep the module loaded afterwards.

As far as the relative performance is concerned. these implementations
are all significantly faster than their generic C counterparts, as
they mostly use dedicated instructions. The NEON implementation of AES
is also significantly faster, at least on Cortex-A57. Mileage may vary
for other implementations.

>From a security pov, note that these algorithms are all time
invariant, i.e., there are no data dependent lookups.

Regards,
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-07 14:45   ` Catalin Marinas
@ 2014-05-08 11:22     ` Ard Biesheuvel
  -1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-08 11:22 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>>   asm-generic: allow generic unaligned access if the arch supports it
>>   arm64: add abstractions for FPSIMD state manipulation
>>   arm64: defer reloading a task's FPSIMD state to userland resume
>>   arm64: add support for kernel mode NEON in interrupt context
>>   arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>   arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>   arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>   arm64/crypto: AES using ARMv8 Crypto Extensions
>>   arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>   arm64: pull in <asm/simd.h> from asm-generic
>>   arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>     Extensions
>>   arm64/crypto: add shared macro to test for NEED_RESCHED
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>   arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>

I will re-send the 3 FPSIMD patches separately with your review
comments addressed.

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-08 11:22     ` Ard Biesheuvel
  0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-08 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>>   asm-generic: allow generic unaligned access if the arch supports it
>>   arm64: add abstractions for FPSIMD state manipulation
>>   arm64: defer reloading a task's FPSIMD state to userland resume
>>   arm64: add support for kernel mode NEON in interrupt context
>>   arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>   arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>   arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>   arm64/crypto: AES using ARMv8 Crypto Extensions
>>   arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>   arm64: pull in <asm/simd.h> from asm-generic
>>   arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>     Extensions
>>   arm64/crypto: add shared macro to test for NEED_RESCHED
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>   arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>   arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>

I will re-send the 3 FPSIMD patches separately with your review
comments addressed.

-- 
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-08 11:22     ` Ard Biesheuvel
@ 2014-05-08 21:50       ` Catalin Marinas
  -1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-08 21:50 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper

On 8 May 2014, at 12:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>>> over the past months. They have now been verified on actual hardware
>>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>>> for 3.16.
>>> 
>>> Ard Biesheuvel (15):
>>>  asm-generic: allow generic unaligned access if the arch supports it
>>>  arm64: add abstractions for FPSIMD state manipulation
>>>  arm64: defer reloading a task's FPSIMD state to userland resume
>>>  arm64: add support for kernel mode NEON in interrupt context
>>>  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>>  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>>  arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>>  arm64/crypto: AES using ARMv8 Crypto Extensions
>>>  arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>>  arm64: pull in <asm/simd.h> from asm-generic
>>>  arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>>    Extensions
>>>  arm64/crypto: add shared macro to test for NEED_RESCHED
>>>  arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>>  arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>>  arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>> 
>> There are about 5 patches that make sense to me ;) and apart from a few
>> minor comments they look fine.
>> 
>> There are the other 10 crypto patches that are beyond my knowledge. Do
>> you know anyone who could do a sanity check on them? Are there any tests
>> that would show the correctness of the implementation?
> 
> I will re-send the 3 FPSIMD patches separately with your review
> comments addressed.

Thanks. If you get another acked/reviewed-by tag on the rest of the
patches it’s even better (I plan to get the series in for 3.16, after
we do some tests internally as well).

Catalin--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-08 21:50       ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-08 21:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 8 May 2014, at 12:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>>> over the past months. They have now been verified on actual hardware
>>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>>> for 3.16.
>>> 
>>> Ard Biesheuvel (15):
>>>  asm-generic: allow generic unaligned access if the arch supports it
>>>  arm64: add abstractions for FPSIMD state manipulation
>>>  arm64: defer reloading a task's FPSIMD state to userland resume
>>>  arm64: add support for kernel mode NEON in interrupt context
>>>  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>>  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>>  arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>>  arm64/crypto: AES using ARMv8 Crypto Extensions
>>>  arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>>  arm64: pull in <asm/simd.h> from asm-generic
>>>  arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>>    Extensions
>>>  arm64/crypto: add shared macro to test for NEED_RESCHED
>>>  arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>>  arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>>  arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>> 
>> There are about 5 patches that make sense to me ;) and apart from a few
>> minor comments they look fine.
>> 
>> There are the other 10 crypto patches that are beyond my knowledge. Do
>> you know anyone who could do a sanity check on them? Are there any tests
>> that would show the correctness of the implementation?
> 
> I will re-send the 3 FPSIMD patches separately with your review
> comments addressed.

Thanks. If you get another acked/reviewed-by tag on the rest of the
patches it?s even better (I plan to get the series in for 3.16, after
we do some tests internally as well).

Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-08 21:50       ` Catalin Marinas
  (?)
@ 2014-05-09  6:37       ` Ard Biesheuvel
  2014-05-14  1:29         ` Herbert Xu
  -1 siblings, 1 reply; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-09  6:37 UTC (permalink / raw)
  To: Catalin Marinas, Herbert Xu, Jussi Kivilinna; +Cc: linux-crypto, steve.capper

On 8 May 2014 23:50, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On 8 May 2014, at 12:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>>>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>>>> over the past months. They have now been verified on actual hardware
>>>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>>>> for 3.16.
>>>>
>>>> Ard Biesheuvel (15):
>>>>  asm-generic: allow generic unaligned access if the arch supports it
>>>>  arm64: add abstractions for FPSIMD state manipulation
>>>>  arm64: defer reloading a task's FPSIMD state to userland resume
>>>>  arm64: add support for kernel mode NEON in interrupt context
>>>>  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>>>  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>>>  arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>>>  arm64/crypto: AES using ARMv8 Crypto Extensions
>>>>  arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>>>  arm64: pull in <asm/simd.h> from asm-generic
>>>>  arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>>>    Extensions
>>>>  arm64/crypto: add shared macro to test for NEED_RESCHED
>>>>  arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>>>  arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>>>  arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>>>
>>> There are about 5 patches that make sense to me ;) and apart from a few
>>> minor comments they look fine.
>>>
>>> There are the other 10 crypto patches that are beyond my knowledge. Do
>>> you know anyone who could do a sanity check on them? Are there any tests
>>> that would show the correctness of the implementation?
>>
>> I will re-send the 3 FPSIMD patches separately with your review
>> comments addressed.
>
> Thanks. If you get another acked/reviewed-by tag on the rest of the
> patches it’s even better (I plan to get the series in for 3.16, after
> we do some tests internally as well).
>

@Herbert, Jussi: care to share your opinion on the SHAx, GHASH and AES
patches above? Herbert has already acked the ccm patch, but Catalin is
requesting for more review and/or acknowledgements before merging
these patches for arm64.

Regards,
Ard.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-09  6:37       ` Ard Biesheuvel
@ 2014-05-14  1:29         ` Herbert Xu
  2014-05-14  8:47           ` Catalin Marinas
  0 siblings, 1 reply; 55+ messages in thread
From: Herbert Xu @ 2014-05-14  1:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Jussi Kivilinna, linux-crypto, steve.capper

On Fri, May 09, 2014 at 08:37:58AM +0200, Ard Biesheuvel wrote:
>
> @Herbert, Jussi: care to share your opinion on the SHAx, GHASH and AES
> patches above? Herbert has already acked the ccm patch, but Catalin is
> requesting for more review and/or acknowledgements before merging
> these patches for arm64.

They look fine to me.  You can add my Ack to them.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH resend 00/15] arm64 crypto roundup
  2014-05-14  1:29         ` Herbert Xu
@ 2014-05-14  8:47           ` Catalin Marinas
  0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-14  8:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Ard Biesheuvel, Jussi Kivilinna, linux-crypto, steve.capper

On Wed, May 14, 2014 at 02:29:05AM +0100, Herbert Xu wrote:
> On Fri, May 09, 2014 at 08:37:58AM +0200, Ard Biesheuvel wrote:
> >
> > @Herbert, Jussi: care to share your opinion on the SHAx, GHASH and AES
> > patches above? Herbert has already acked the ccm patch, but Catalin is
> > requesting for more review and/or acknowledgements before merging
> > these patches for arm64.
> 
> They look fine to me.  You can add my Ack to them.

Many thanks, both to you and Jussi.

Regards.

-- 
Catalin

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2014-05-14  8:47 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 15:49 [PATCH resend 00/15] arm64 crypto roundup Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-06 14:31   ` Catalin Marinas
2014-05-06 14:31     ` Catalin Marinas
2014-05-06 14:34     ` Ard Biesheuvel
2014-05-06 14:34       ` Ard Biesheuvel
2014-05-06 15:14       ` Catalin Marinas
2014-05-06 15:14         ` Catalin Marinas
2014-05-01 15:49 ` [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-06 14:43   ` Catalin Marinas
2014-05-06 14:43     ` Catalin Marinas
2014-05-06 14:48     ` Ard Biesheuvel
2014-05-06 14:48       ` Ard Biesheuvel
2014-05-06 15:12       ` Catalin Marinas
2014-05-06 15:12         ` Catalin Marinas
2014-05-06 15:42         ` Catalin Marinas
2014-05-06 15:42           ` Catalin Marinas
2014-05-01 15:49 ` [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-06 16:08   ` Catalin Marinas
2014-05-06 16:08     ` Catalin Marinas
2014-05-06 16:25     ` Ard Biesheuvel
2014-05-06 16:25       ` Ard Biesheuvel
2014-05-06 16:31       ` Catalin Marinas
2014-05-06 16:31         ` Catalin Marinas
2014-05-01 15:49 ` [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-06 16:49   ` Catalin Marinas
2014-05-06 16:49     ` Catalin Marinas
2014-05-06 17:09     ` Ard Biesheuvel
2014-05-06 17:09       ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 05/15] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 06/15] arm64/crypto: SHA-224/SHA-256 " Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 07/15] arm64/crypto: GHASH secure hash " Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 08/15] arm64/crypto: AES " Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 09/15] arm64/crypto: AES in CCM mode " Ard Biesheuvel
2014-05-01 15:49   ` Ard Biesheuvel
2014-05-07 14:45 ` [PATCH resend 00/15] arm64 crypto roundup Catalin Marinas
2014-05-07 14:45   ` Catalin Marinas
2014-05-07 19:58   ` Ard Biesheuvel
2014-05-07 19:58     ` Ard Biesheuvel
2014-05-08 11:22   ` Ard Biesheuvel
2014-05-08 11:22     ` Ard Biesheuvel
2014-05-08 21:50     ` Catalin Marinas
2014-05-08 21:50       ` Catalin Marinas
2014-05-09  6:37       ` Ard Biesheuvel
2014-05-14  1:29         ` Herbert Xu
2014-05-14  8:47           ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.