* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This is a repost of the arm64 crypto patches that I have posted to the LAKML
over the past months. They have now been verified on actual hardware
(Cortex-A57) so if there are no remaining issues I would like to propose them
for 3.16.
Ard Biesheuvel (15):
asm-generic: allow generic unaligned access if the arch supports it
arm64: add abstractions for FPSIMD state manipulation
arm64: defer reloading a task's FPSIMD state to userland resume
arm64: add support for kernel mode NEON in interrupt context
arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
arm64/crypto: AES using ARMv8 Crypto Extensions
arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
arm64: pull in <asm/simd.h> from asm-generic
arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
Extensions
arm64/crypto: add shared macro to test for NEED_RESCHED
arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Kconfig | 53 ++++
arch/arm64/crypto/Makefile | 38 +++
arch/arm64/crypto/aes-ce-ccm-core.S | 222 ++++++++++++++
arch/arm64/crypto/aes-ce-ccm-glue.c | 297 ++++++++++++++++++
arch/arm64/crypto/aes-ce-cipher.c | 155 ++++++++++
arch/arm64/crypto/aes-ce.S | 147 +++++++++
arch/arm64/crypto/aes-glue.c | 446 +++++++++++++++++++++++++++
arch/arm64/crypto/aes-modes.S | 548 ++++++++++++++++++++++++++++++++++
arch/arm64/crypto/aes-neon.S | 382 ++++++++++++++++++++++++
arch/arm64/crypto/ghash-ce-core.S | 97 ++++++
arch/arm64/crypto/ghash-ce-glue.c | 172 +++++++++++
arch/arm64/crypto/sha1-ce-core.S | 154 ++++++++++
arch/arm64/crypto/sha1-ce-glue.c | 201 +++++++++++++
arch/arm64/crypto/sha2-ce-core.S | 159 ++++++++++
arch/arm64/crypto/sha2-ce-glue.c | 281 +++++++++++++++++
arch/arm64/include/asm/Kbuild | 1 +
arch/arm64/include/asm/assembler.h | 21 ++
arch/arm64/include/asm/fpsimd.h | 23 ++
arch/arm64/include/asm/fpsimdmacros.h | 35 +++
arch/arm64/include/asm/neon.h | 6 +-
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry-fpsimd.S | 24 ++
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/fpsimd.c | 187 ++++++++++--
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 2 +
arch/arm64/kernel/signal.c | 13 +-
arch/arm64/kernel/signal32.c | 9 +-
include/asm-generic/unaligned.h | 21 +-
31 files changed, 3662 insertions(+), 44 deletions(-)
create mode 100644 arch/arm64/crypto/Kconfig
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c
create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
create mode 100644 arch/arm64/crypto/aes-ce.S
create mode 100644 arch/arm64/crypto/aes-glue.c
create mode 100644 arch/arm64/crypto/aes-modes.S
create mode 100644 arch/arm64/crypto/aes-neon.S
create mode 100644 arch/arm64/crypto/ghash-ce-core.S
create mode 100644 arch/arm64/crypto/ghash-ce-glue.c
create mode 100644 arch/arm64/crypto/sha1-ce-core.S
create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
create mode 100644 arch/arm64/crypto/sha2-ce-core.S
create mode 100644 arch/arm64/crypto/sha2-ce-glue.c
--
1.8.3.2
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This is a repost of the arm64 crypto patches that I have posted to the LAKML
over the past months. They have now been verified on actual hardware
(Cortex-A57) so if there are no remaining issues I would like to propose them
for 3.16.
Ard Biesheuvel (15):
asm-generic: allow generic unaligned access if the arch supports it
arm64: add abstractions for FPSIMD state manipulation
arm64: defer reloading a task's FPSIMD state to userland resume
arm64: add support for kernel mode NEON in interrupt context
arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
arm64/crypto: AES using ARMv8 Crypto Extensions
arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
arm64: pull in <asm/simd.h> from asm-generic
arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
Extensions
arm64/crypto: add shared macro to test for NEED_RESCHED
arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Kconfig | 53 ++++
arch/arm64/crypto/Makefile | 38 +++
arch/arm64/crypto/aes-ce-ccm-core.S | 222 ++++++++++++++
arch/arm64/crypto/aes-ce-ccm-glue.c | 297 ++++++++++++++++++
arch/arm64/crypto/aes-ce-cipher.c | 155 ++++++++++
arch/arm64/crypto/aes-ce.S | 147 +++++++++
arch/arm64/crypto/aes-glue.c | 446 +++++++++++++++++++++++++++
arch/arm64/crypto/aes-modes.S | 548 ++++++++++++++++++++++++++++++++++
arch/arm64/crypto/aes-neon.S | 382 ++++++++++++++++++++++++
arch/arm64/crypto/ghash-ce-core.S | 97 ++++++
arch/arm64/crypto/ghash-ce-glue.c | 172 +++++++++++
arch/arm64/crypto/sha1-ce-core.S | 154 ++++++++++
arch/arm64/crypto/sha1-ce-glue.c | 201 +++++++++++++
arch/arm64/crypto/sha2-ce-core.S | 159 ++++++++++
arch/arm64/crypto/sha2-ce-glue.c | 281 +++++++++++++++++
arch/arm64/include/asm/Kbuild | 1 +
arch/arm64/include/asm/assembler.h | 21 ++
arch/arm64/include/asm/fpsimd.h | 23 ++
arch/arm64/include/asm/fpsimdmacros.h | 35 +++
arch/arm64/include/asm/neon.h | 6 +-
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry-fpsimd.S | 24 ++
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/fpsimd.c | 187 ++++++++++--
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 2 +
arch/arm64/kernel/signal.c | 13 +-
arch/arm64/kernel/signal32.c | 9 +-
include/asm-generic/unaligned.h | 21 +-
31 files changed, 3662 insertions(+), 44 deletions(-)
create mode 100644 arch/arm64/crypto/Kconfig
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c
create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
create mode 100644 arch/arm64/crypto/aes-ce.S
create mode 100644 arch/arm64/crypto/aes-glue.c
create mode 100644 arch/arm64/crypto/aes-modes.S
create mode 100644 arch/arm64/crypto/aes-neon.S
create mode 100644 arch/arm64/crypto/ghash-ce-core.S
create mode 100644 arch/arm64/crypto/ghash-ce-glue.c
create mode 100644 arch/arm64/crypto/sha1-ce-core.S
create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
create mode 100644 arch/arm64/crypto/sha2-ce-core.S
create mode 100644 arch/arm64/crypto/sha2-ce-glue.c
--
1.8.3.2
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
Switch the default unaligned access method to 'hardware implemented'
if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
include/asm-generic/unaligned.h | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 03cf5936bad6..1ac097279db1 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -4,22 +4,27 @@
/*
* This is the most generic implementation of unaligned accesses
* and should work almost anywhere.
- *
- * If an architecture can handle unaligned accesses in hardware,
- * it may want to use the linux/unaligned/access_ok.h implementation
- * instead.
*/
#include <asm/byteorder.h>
+/* Set by the arch if it can handle unaligned accesses in hardware. */
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/access_ok.h>
+#endif
+
#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/le_struct.h>
+# include <linux/unaligned/be_byteshift.h>
+# endif
# include <linux/unaligned/generic.h>
# define get_unaligned __get_unaligned_le
# define put_unaligned __put_unaligned_le
#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/be_struct.h>
+# include <linux/unaligned/le_byteshift.h>
+# endif
# include <linux/unaligned/generic.h>
# define get_unaligned __get_unaligned_be
# define put_unaligned __put_unaligned_be
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
Switch the default unaligned access method to 'hardware implemented'
if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
include/asm-generic/unaligned.h | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 03cf5936bad6..1ac097279db1 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -4,22 +4,27 @@
/*
* This is the most generic implementation of unaligned accesses
* and should work almost anywhere.
- *
- * If an architecture can handle unaligned accesses in hardware,
- * it may want to use the linux/unaligned/access_ok.h implementation
- * instead.
*/
#include <asm/byteorder.h>
+/* Set by the arch if it can handle unaligned accesses in hardware. */
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/access_ok.h>
+#endif
+
#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/le_struct.h>
+# include <linux/unaligned/be_byteshift.h>
+# endif
# include <linux/unaligned/generic.h>
# define get_unaligned __get_unaligned_le
# define put_unaligned __put_unaligned_le
#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
+# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+# include <linux/unaligned/be_struct.h>
+# include <linux/unaligned/le_byteshift.h>
+# endif
# include <linux/unaligned/generic.h>
# define get_unaligned __get_unaligned_be
# define put_unaligned __put_unaligned_be
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
There are two tacit assumptions in the FPSIMD handling code that will no longer
hold after the next patch that optimizes away some FPSIMD state restores:
. the FPSIMD registers of this CPU contain the userland FPSIMD state of
task 'current';
. when switching to a task, its FPSIMD state will always be restored from
memory.
This patch adds the following functions to abstract away from straight FPSIMD
register file saves and restores:
- fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved
- fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded
- fpsimd_update_current_state -> replace current's FPSIMD state
- fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state
Where necessary, the ptrace, signal handling and fork code are updated to use
the above wrappers instead of poking into the FPSIMD registers directly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 6 ++++++
arch/arm64/kernel/fpsimd.c | 33 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 2 ++
arch/arm64/kernel/signal.c | 9 +++------
arch/arm64/kernel/signal32.c | 9 +++------
6 files changed, 48 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac13008..9f9b8e438546 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -58,6 +58,12 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
+extern void fpsimd_preserve_current_state(void);
+extern void fpsimd_restore_current_state(void);
+extern void fpsimd_update_current_state(struct fpsimd_state *state);
+
+extern void fpsimd_flush_task_state(struct task_struct *target);
+
#endif
#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4aef42a04bdc..86ac6a9bc86a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
preempt_enable();
}
+/*
+ * Save the userland FPSIMD state of 'current' to memory
+ */
+void fpsimd_preserve_current_state(void)
+{
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+}
+
+/*
+ * Load the userland FPSIMD state of 'current' from memory
+ */
+void fpsimd_restore_current_state(void)
+{
+ fpsimd_load_state(¤t->thread.fpsimd_state);
+}
+
+/*
+ * Load an updated userland FPSIMD state for 'current' from memory
+ */
+void fpsimd_update_current_state(struct fpsimd_state *state)
+{
+ preempt_disable();
+ fpsimd_load_state(state);
+ preempt_enable();
+}
+
+/*
+ * Invalidate live CPU copies of task t's FPSIMD state
+ */
+void fpsimd_flush_task_state(struct task_struct *t)
+{
+}
+
#ifdef CONFIG_KERNEL_MODE_NEON
/*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6391485f342d..c5693163408c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -205,7 +205,7 @@ void release_thread(struct task_struct *dead_task)
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ fpsimd_preserve_current_state();
*dst = *src;
return 0;
}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 6a8928bba03c..f8700eca24e7 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -517,6 +517,7 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
return ret;
target->thread.fpsimd_state.user_fpsimd = newstate;
+ fpsimd_flush_task_state(target);
return ret;
}
@@ -764,6 +765,7 @@ static int compat_vfp_set(struct task_struct *target,
uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
}
+ fpsimd_flush_task_state(target);
return ret;
}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591f75dd..06448a77ff53 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -51,7 +51,7 @@ static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
int err;
/* dump the hardware registers to the fpsimd_state structure */
- fpsimd_save_state(fpsimd);
+ fpsimd_preserve_current_state();
/* copy the FP and status/control registers */
err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
@@ -86,11 +86,8 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
/* load the hardware registers from the fpsimd_state structure */
- if (!err) {
- preempt_disable();
- fpsimd_load_state(&fpsimd);
- preempt_enable();
- }
+ if (!err)
+ fpsimd_update_current_state(&fpsimd);
return err ? -EFAULT : 0;
}
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index b3fc9f5ec6d3..ac7e237d0bda 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -219,7 +219,7 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
* Note that this also saves V16-31, which aren't visible
* in AArch32.
*/
- fpsimd_save_state(fpsimd);
+ fpsimd_preserve_current_state();
/* Place structure header on the stack */
__put_user_error(magic, &frame->magic, err);
@@ -282,11 +282,8 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
* We don't need to touch the exception register, so
* reload the hardware state.
*/
- if (!err) {
- preempt_disable();
- fpsimd_load_state(&fpsimd);
- preempt_enable();
- }
+ if (!err)
+ fpsimd_update_current_state(&fpsimd);
return err ? -EFAULT : 0;
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
There are two tacit assumptions in the FPSIMD handling code that will no longer
hold after the next patch that optimizes away some FPSIMD state restores:
. the FPSIMD registers of this CPU contain the userland FPSIMD state of
task 'current';
. when switching to a task, its FPSIMD state will always be restored from
memory.
This patch adds the following functions to abstract away from straight FPSIMD
register file saves and restores:
- fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved
- fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded
- fpsimd_update_current_state -> replace current's FPSIMD state
- fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state
Where necessary, the ptrace, signal handling and fork code are updated to use
the above wrappers instead of poking into the FPSIMD registers directly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 6 ++++++
arch/arm64/kernel/fpsimd.c | 33 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/ptrace.c | 2 ++
arch/arm64/kernel/signal.c | 9 +++------
arch/arm64/kernel/signal32.c | 9 +++------
6 files changed, 48 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac13008..9f9b8e438546 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -58,6 +58,12 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
+extern void fpsimd_preserve_current_state(void);
+extern void fpsimd_restore_current_state(void);
+extern void fpsimd_update_current_state(struct fpsimd_state *state);
+
+extern void fpsimd_flush_task_state(struct task_struct *target);
+
#endif
#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4aef42a04bdc..86ac6a9bc86a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
preempt_enable();
}
+/*
+ * Save the userland FPSIMD state of 'current' to memory
+ */
+void fpsimd_preserve_current_state(void)
+{
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+}
+
+/*
+ * Load the userland FPSIMD state of 'current' from memory
+ */
+void fpsimd_restore_current_state(void)
+{
+ fpsimd_load_state(¤t->thread.fpsimd_state);
+}
+
+/*
+ * Load an updated userland FPSIMD state for 'current' from memory
+ */
+void fpsimd_update_current_state(struct fpsimd_state *state)
+{
+ preempt_disable();
+ fpsimd_load_state(state);
+ preempt_enable();
+}
+
+/*
+ * Invalidate live CPU copies of task t's FPSIMD state
+ */
+void fpsimd_flush_task_state(struct task_struct *t)
+{
+}
+
#ifdef CONFIG_KERNEL_MODE_NEON
/*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6391485f342d..c5693163408c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -205,7 +205,7 @@ void release_thread(struct task_struct *dead_task)
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ fpsimd_preserve_current_state();
*dst = *src;
return 0;
}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 6a8928bba03c..f8700eca24e7 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -517,6 +517,7 @@ static int fpr_set(struct task_struct *target, const struct user_regset *regset,
return ret;
target->thread.fpsimd_state.user_fpsimd = newstate;
+ fpsimd_flush_task_state(target);
return ret;
}
@@ -764,6 +765,7 @@ static int compat_vfp_set(struct task_struct *target,
uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
}
+ fpsimd_flush_task_state(target);
return ret;
}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591f75dd..06448a77ff53 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -51,7 +51,7 @@ static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
int err;
/* dump the hardware registers to the fpsimd_state structure */
- fpsimd_save_state(fpsimd);
+ fpsimd_preserve_current_state();
/* copy the FP and status/control registers */
err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
@@ -86,11 +86,8 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
__get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
/* load the hardware registers from the fpsimd_state structure */
- if (!err) {
- preempt_disable();
- fpsimd_load_state(&fpsimd);
- preempt_enable();
- }
+ if (!err)
+ fpsimd_update_current_state(&fpsimd);
return err ? -EFAULT : 0;
}
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index b3fc9f5ec6d3..ac7e237d0bda 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -219,7 +219,7 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
* Note that this also saves V16-31, which aren't visible
* in AArch32.
*/
- fpsimd_save_state(fpsimd);
+ fpsimd_preserve_current_state();
/* Place structure header on the stack */
__put_user_error(magic, &frame->magic, err);
@@ -282,11 +282,8 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
* We don't need to touch the exception register, so
* reload the hardware state.
*/
- if (!err) {
- preempt_disable();
- fpsimd_load_state(&fpsimd);
- preempt_enable();
- }
+ if (!err)
+ fpsimd_update_current_state(&fpsimd);
return err ? -EFAULT : 0;
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.
This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task re-enters userland. If a task does not enter
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/fpsimd.c | 136 ++++++++++++++++++++++++++++++-----
arch/arm64/kernel/signal.c | 4 ++
5 files changed, 127 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9f9b8e438546..7a900142dbc8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,6 +37,8 @@ struct fpsimd_state {
u32 fpcr;
};
};
+ /* the id of the last cpu to have restored this state */
+ unsigned int cpu;
};
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b66ffd..4a1ca1cfb2f8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_SIGPENDING 0
#define TIF_NEED_RESCHED 1
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
+#define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */
#define TIF_SYSCALL_TRACE 8
#define TIF_POLLING_NRFLAG 16
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
@@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
+#define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE)
#define _TIF_32BIT (1 << TIF_32BIT)
#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
- _TIF_NOTIFY_RESUME)
+ _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
#endif /* __KERNEL__ */
#endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 39ac630d83de..80464e2fb1a5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -576,7 +576,7 @@ fast_work_pending:
str x0, [sp, #S_X0] // returned x0
work_pending:
tbnz x1, #TIF_NEED_RESCHED, work_resched
- /* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+ /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
ldr x2, [sp, #S_PSTATE]
mov x0, sp // 'regs'
tst x2, #PSR_MODE_MASK // user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 86ac6a9bc86a..6cfbb4ef27d7 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -35,6 +35,60 @@
#define FPEXC_IDF (1 << 7)
/*
+ * In order to reduce the number of times the FPSIMD state is needlessly saved
+ * and restored, we need to keep track of two things:
+ * (a) for each task, we need to remember which CPU was the last one to have
+ * the task's FPSIMD state loaded into its FPSIMD registers;
+ * (b) for each CPU, we need to remember which task's userland FPSIMD state has
+ * been loaded into its FPSIMD registers most recently, or whether it has
+ * been used to perform kernel mode NEON in the meantime.
+ *
+ * For (a), we add a 'cpu' field to struct fpsimd_state, which gets updated to
+ * the id of the current CPU everytime the state is loaded onto a CPU. For (b),
+ * we add the per-cpu variable 'fpsimd_last_state' (below), which contains the
+ * address of the userland FPSIMD state of the task that was loaded onto the CPU
+ * the most recently, or NULL if kernel mode NEON has been performed after that.
+ *
+ * With this in place, we no longer have to restore the next FPSIMD state right
+ * when switching between tasks. Instead, we can defer this check to userland
+ * resume, at which time we verify whether the CPU's fpsimd_last_state and the
+ * task's fpsimd_state.cpu are still mutually in sync. If this is the case, we
+ * can omit the FPSIMD restore.
+ *
+ * As an optimization, we use the thread_info flag TIF_FOREIGN_FPSTATE to
+ * indicate whether or not the userland FPSIMD state of the current task is
+ * present in the registers. The flag is set unless the FPSIMD registers of this
+ * CPU currently contain the most recent userland FPSIMD state of the current
+ * task.
+ *
+ * For a certain task, the sequence may look something like this:
+ * - the task gets scheduled in; if both the task's fpsimd_state.cpu field
+ * contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu
+ * variable points to the task's fpsimd_state, the TIF_FOREIGN_FPSTATE flag is
+ * cleared, otherwise it is set;
+ *
+ * - the task returns to userland; if TIF_FOREIGN_FPSTATE is set, the task's
+ * userland FPSIMD state is copied from memory to the registers, the task's
+ * fpsimd_state.cpu field is set to the id of the current CPU, the current
+ * CPU's fpsimd_last_state pointer is set to this task's fpsimd_state and the
+ * TIF_FOREIGN_FPSTATE flag is cleared;
+ *
+ * - the task executes an ordinary syscall; upon return to userland, the
+ * TIF_FOREIGN_FPSTATE flag will still be cleared, so no FPSIMD state is
+ * restored;
+ *
+ * - the task executes a syscall which executes some NEON instructions; this is
+ * preceded by a call to kernel_neon_begin(), which copies the task's FPSIMD
+ * register contents to memory, clears the fpsimd_last_state per-cpu variable
+ * and sets the TIF_FOREIGN_FPSTATE flag;
+ *
+ * - the task gets preempted after kernel_neon_end() is called; as we have not
+ * returned from the 2nd syscall yet, TIF_FOREIGN_FPSTATE is still set so
+ * whatever is in the FPSIMD registers is not saved to memory, but discarded.
+ */
+static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state);
+
+/*
* Trapped FP/ASIMD access.
*/
void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
@@ -72,44 +126,85 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
void fpsimd_thread_switch(struct task_struct *next)
{
- /* check if not kernel threads */
- if (current->mm)
+ /*
+ * Save the current FPSIMD state to memory, but only if whatever is in
+ * the registers is in fact the most recent userland FPSIMD state of
+ * 'current'.
+ */
+ if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
- if (next->mm)
- fpsimd_load_state(&next->thread.fpsimd_state);
+
+ if (next->mm) {
+ /*
+ * If we are switching to a task whose most recent userland
+ * FPSIMD state is already in the registers of *this* cpu,
+ * we can skip loading the state from memory. Otherwise, set
+ * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
+ * upon the next return to userland.
+ */
+ struct fpsimd_state *st = &next->thread.fpsimd_state;
+
+ if (__this_cpu_read(fpsimd_last_state) == st
+ && st->cpu == smp_processor_id())
+ clear_ti_thread_flag(task_thread_info(next),
+ TIF_FOREIGN_FPSTATE);
+ else
+ set_ti_thread_flag(task_thread_info(next),
+ TIF_FOREIGN_FPSTATE);
+ }
}
void fpsimd_flush_thread(void)
{
- preempt_disable();
memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
- fpsimd_load_state(¤t->thread.fpsimd_state);
- preempt_enable();
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
}
/*
- * Save the userland FPSIMD state of 'current' to memory
+ * Save the userland FPSIMD state of 'current' to memory, but only if the state
+ * currently held in the registers does in fact belong to 'current'
*/
void fpsimd_preserve_current_state(void)
{
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ preempt_disable();
+ if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ preempt_enable();
}
/*
- * Load the userland FPSIMD state of 'current' from memory
+ * Load the userland FPSIMD state of 'current' from memory, but only if the
+ * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
+ * state of 'current'
*/
void fpsimd_restore_current_state(void)
{
- fpsimd_load_state(¤t->thread.fpsimd_state);
+ preempt_disable();
+ if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ struct fpsimd_state *st = ¤t->thread.fpsimd_state;
+
+ fpsimd_load_state(st);
+ this_cpu_write(fpsimd_last_state, st);
+ st->cpu = smp_processor_id();
+ }
+ preempt_enable();
}
/*
- * Load an updated userland FPSIMD state for 'current' from memory
+ * Load an updated userland FPSIMD state for 'current' from memory and set the
+ * flag that indicates that the FPSIMD register contents are the most recent
+ * FPSIMD state of 'current'
*/
void fpsimd_update_current_state(struct fpsimd_state *state)
{
preempt_disable();
fpsimd_load_state(state);
+ if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ struct fpsimd_state *st = ¤t->thread.fpsimd_state;
+
+ this_cpu_write(fpsimd_last_state, st);
+ st->cpu = smp_processor_id();
+ }
preempt_enable();
}
@@ -118,6 +213,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
*/
void fpsimd_flush_task_state(struct task_struct *t)
{
+ t->thread.fpsimd_state.cpu = NR_CPUS;
}
#ifdef CONFIG_KERNEL_MODE_NEON
@@ -131,16 +227,19 @@ void kernel_neon_begin(void)
BUG_ON(in_interrupt());
preempt_disable();
- if (current->mm)
+ /*
+ * Save the userland FPSIMD state if we have one and if we haven't done
+ * so already. Clear fpsimd_last_state to indicate that there is no
+ * longer userland FPSIMD state in the registers.
+ */
+ if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
+ this_cpu_write(fpsimd_last_state, NULL);
}
EXPORT_SYMBOL(kernel_neon_begin);
void kernel_neon_end(void)
{
- if (current->mm)
- fpsimd_load_state(¤t->thread.fpsimd_state);
-
preempt_enable();
}
EXPORT_SYMBOL(kernel_neon_end);
@@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
{
switch (cmd) {
case CPU_PM_ENTER:
- if (current->mm)
+ if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
break;
case CPU_PM_EXIT:
- if (current->mm)
- fpsimd_load_state(¤t->thread.fpsimd_state);
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
break;
case CPU_PM_ENTER_FAILED:
default:
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 06448a77ff53..882f01774365 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+ if (thread_flags & _TIF_FOREIGN_FPSTATE)
+ fpsimd_restore_current_state();
+
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.
This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task re-enters userland. If a task does not enter
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/thread_info.h | 4 +-
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/kernel/fpsimd.c | 136 ++++++++++++++++++++++++++++++-----
arch/arm64/kernel/signal.c | 4 ++
5 files changed, 127 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9f9b8e438546..7a900142dbc8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,6 +37,8 @@ struct fpsimd_state {
u32 fpcr;
};
};
+ /* the id of the last cpu to have restored this state */
+ unsigned int cpu;
};
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b66ffd..4a1ca1cfb2f8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_SIGPENDING 0
#define TIF_NEED_RESCHED 1
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
+#define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */
#define TIF_SYSCALL_TRACE 8
#define TIF_POLLING_NRFLAG 16
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
@@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
+#define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE)
#define _TIF_32BIT (1 << TIF_32BIT)
#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
- _TIF_NOTIFY_RESUME)
+ _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
#endif /* __KERNEL__ */
#endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 39ac630d83de..80464e2fb1a5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -576,7 +576,7 @@ fast_work_pending:
str x0, [sp, #S_X0] // returned x0
work_pending:
tbnz x1, #TIF_NEED_RESCHED, work_resched
- /* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+ /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
ldr x2, [sp, #S_PSTATE]
mov x0, sp // 'regs'
tst x2, #PSR_MODE_MASK // user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 86ac6a9bc86a..6cfbb4ef27d7 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -35,6 +35,60 @@
#define FPEXC_IDF (1 << 7)
/*
+ * In order to reduce the number of times the FPSIMD state is needlessly saved
+ * and restored, we need to keep track of two things:
+ * (a) for each task, we need to remember which CPU was the last one to have
+ * the task's FPSIMD state loaded into its FPSIMD registers;
+ * (b) for each CPU, we need to remember which task's userland FPSIMD state has
+ * been loaded into its FPSIMD registers most recently, or whether it has
+ * been used to perform kernel mode NEON in the meantime.
+ *
+ * For (a), we add a 'cpu' field to struct fpsimd_state, which gets updated to
+ * the id of the current CPU everytime the state is loaded onto a CPU. For (b),
+ * we add the per-cpu variable 'fpsimd_last_state' (below), which contains the
+ * address of the userland FPSIMD state of the task that was loaded onto the CPU
+ * the most recently, or NULL if kernel mode NEON has been performed after that.
+ *
+ * With this in place, we no longer have to restore the next FPSIMD state right
+ * when switching between tasks. Instead, we can defer this check to userland
+ * resume,@which time we verify whether the CPU's fpsimd_last_state and the
+ * task's fpsimd_state.cpu are still mutually in sync. If this is the case, we
+ * can omit the FPSIMD restore.
+ *
+ * As an optimization, we use the thread_info flag TIF_FOREIGN_FPSTATE to
+ * indicate whether or not the userland FPSIMD state of the current task is
+ * present in the registers. The flag is set unless the FPSIMD registers of this
+ * CPU currently contain the most recent userland FPSIMD state of the current
+ * task.
+ *
+ * For a certain task, the sequence may look something like this:
+ * - the task gets scheduled in; if both the task's fpsimd_state.cpu field
+ * contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu
+ * variable points to the task's fpsimd_state, the TIF_FOREIGN_FPSTATE flag is
+ * cleared, otherwise it is set;
+ *
+ * - the task returns to userland; if TIF_FOREIGN_FPSTATE is set, the task's
+ * userland FPSIMD state is copied from memory to the registers, the task's
+ * fpsimd_state.cpu field is set to the id of the current CPU, the current
+ * CPU's fpsimd_last_state pointer is set to this task's fpsimd_state and the
+ * TIF_FOREIGN_FPSTATE flag is cleared;
+ *
+ * - the task executes an ordinary syscall; upon return to userland, the
+ * TIF_FOREIGN_FPSTATE flag will still be cleared, so no FPSIMD state is
+ * restored;
+ *
+ * - the task executes a syscall which executes some NEON instructions; this is
+ * preceded by a call to kernel_neon_begin(), which copies the task's FPSIMD
+ * register contents to memory, clears the fpsimd_last_state per-cpu variable
+ * and sets the TIF_FOREIGN_FPSTATE flag;
+ *
+ * - the task gets preempted after kernel_neon_end() is called; as we have not
+ * returned from the 2nd syscall yet, TIF_FOREIGN_FPSTATE is still set so
+ * whatever is in the FPSIMD registers is not saved to memory, but discarded.
+ */
+static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state);
+
+/*
* Trapped FP/ASIMD access.
*/
void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
@@ -72,44 +126,85 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
void fpsimd_thread_switch(struct task_struct *next)
{
- /* check if not kernel threads */
- if (current->mm)
+ /*
+ * Save the current FPSIMD state to memory, but only if whatever is in
+ * the registers is in fact the most recent userland FPSIMD state of
+ * 'current'.
+ */
+ if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
- if (next->mm)
- fpsimd_load_state(&next->thread.fpsimd_state);
+
+ if (next->mm) {
+ /*
+ * If we are switching to a task whose most recent userland
+ * FPSIMD state is already in the registers of *this* cpu,
+ * we can skip loading the state from memory. Otherwise, set
+ * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
+ * upon the next return to userland.
+ */
+ struct fpsimd_state *st = &next->thread.fpsimd_state;
+
+ if (__this_cpu_read(fpsimd_last_state) == st
+ && st->cpu == smp_processor_id())
+ clear_ti_thread_flag(task_thread_info(next),
+ TIF_FOREIGN_FPSTATE);
+ else
+ set_ti_thread_flag(task_thread_info(next),
+ TIF_FOREIGN_FPSTATE);
+ }
}
void fpsimd_flush_thread(void)
{
- preempt_disable();
memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
- fpsimd_load_state(¤t->thread.fpsimd_state);
- preempt_enable();
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
}
/*
- * Save the userland FPSIMD state of 'current' to memory
+ * Save the userland FPSIMD state of 'current' to memory, but only if the state
+ * currently held in the registers does in fact belong to 'current'
*/
void fpsimd_preserve_current_state(void)
{
- fpsimd_save_state(¤t->thread.fpsimd_state);
+ preempt_disable();
+ if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ preempt_enable();
}
/*
- * Load the userland FPSIMD state of 'current' from memory
+ * Load the userland FPSIMD state of 'current' from memory, but only if the
+ * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
+ * state of 'current'
*/
void fpsimd_restore_current_state(void)
{
- fpsimd_load_state(¤t->thread.fpsimd_state);
+ preempt_disable();
+ if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ struct fpsimd_state *st = ¤t->thread.fpsimd_state;
+
+ fpsimd_load_state(st);
+ this_cpu_write(fpsimd_last_state, st);
+ st->cpu = smp_processor_id();
+ }
+ preempt_enable();
}
/*
- * Load an updated userland FPSIMD state for 'current' from memory
+ * Load an updated userland FPSIMD state for 'current' from memory and set the
+ * flag that indicates that the FPSIMD register contents are the most recent
+ * FPSIMD state of 'current'
*/
void fpsimd_update_current_state(struct fpsimd_state *state)
{
preempt_disable();
fpsimd_load_state(state);
+ if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+ struct fpsimd_state *st = ¤t->thread.fpsimd_state;
+
+ this_cpu_write(fpsimd_last_state, st);
+ st->cpu = smp_processor_id();
+ }
preempt_enable();
}
@@ -118,6 +213,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
*/
void fpsimd_flush_task_state(struct task_struct *t)
{
+ t->thread.fpsimd_state.cpu = NR_CPUS;
}
#ifdef CONFIG_KERNEL_MODE_NEON
@@ -131,16 +227,19 @@ void kernel_neon_begin(void)
BUG_ON(in_interrupt());
preempt_disable();
- if (current->mm)
+ /*
+ * Save the userland FPSIMD state if we have one and if we haven't done
+ * so already. Clear fpsimd_last_state to indicate that there is no
+ * longer userland FPSIMD state in the registers.
+ */
+ if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
+ this_cpu_write(fpsimd_last_state, NULL);
}
EXPORT_SYMBOL(kernel_neon_begin);
void kernel_neon_end(void)
{
- if (current->mm)
- fpsimd_load_state(¤t->thread.fpsimd_state);
-
preempt_enable();
}
EXPORT_SYMBOL(kernel_neon_end);
@@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
{
switch (cmd) {
case CPU_PM_ENTER:
- if (current->mm)
+ if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
fpsimd_save_state(¤t->thread.fpsimd_state);
break;
case CPU_PM_EXIT:
- if (current->mm)
- fpsimd_load_state(¤t->thread.fpsimd_state);
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
break;
case CPU_PM_ENTER_FAILED:
default:
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 06448a77ff53..882f01774365 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
}
+
+ if (thread_flags & _TIF_FOREIGN_FPSTATE)
+ fpsimd_restore_current_state();
+
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 15 ++++++++++++
arch/arm64/include/asm/fpsimdmacros.h | 35 ++++++++++++++++++++++++++++
arch/arm64/include/asm/neon.h | 6 ++++-
arch/arm64/kernel/entry-fpsimd.S | 24 +++++++++++++++++++
arch/arm64/kernel/fpsimd.c | 44 ++++++++++++++++++++++++-----------
5 files changed, 109 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7a900142dbc8..05e1b24aca4c 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,17 @@ struct fpsimd_state {
unsigned int cpu;
};
+/*
+ * Struct for stacking the bottom 'n' FP/SIMD registers.
+ */
+struct fpsimd_partial_state {
+ u32 num_regs;
+ u32 fpsr;
+ u32 fpcr;
+ __uint128_t vregs[32] __aligned(16);
+} __aligned(16);
+
+
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/* Masks for extracting the FPSR and FPCR from the FPSCR */
#define VFP_FPSCR_STAT_MASK 0xf800009f
@@ -66,6 +77,10 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
extern void fpsimd_flush_task_state(struct task_struct *target);
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+ u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
+
#endif
#endif
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599c96bd..69e75134689d 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,38 @@
ldr w\tmpnr, [\state, #16 * 2 + 4]
msr fpcr, x\tmpnr
.endm
+
+.altmacro
+.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
+ mrs x\tmpnr1, fpsr
+ str w\numnr, [\state]
+ mrs x\tmpnr2, fpcr
+ stp w\tmpnr1, w\tmpnr2, [\state, #4]
+ adr x\tmpnr1, 0f
+ add \state, \state, x\numnr, lsl #4
+ sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
+ br x\tmpnr1
+ .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+ .irp qb, %(qa + 1)
+ stp q\qa, q\qb, [\state, # -16 * \qa - 16]
+ .endr
+ .endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
+ ldp w\tmpnr1, w\tmpnr2, [\state, #4]
+ msr fpsr, x\tmpnr1
+ msr fpcr, x\tmpnr2
+ adr x\tmpnr1, 0f
+ ldr w\tmpnr2, [\state]
+ add \state, \state, x\tmpnr2, lsl #4
+ sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
+ br x\tmpnr1
+ .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+ .irp qb, %(qa + 1)
+ ldp q\qa, q\qb, [\state, # -16 * \qa - 16]
+ .endr
+ .endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a97780..13ce4cc18e26 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,11 @@
* published by the Free Software Foundation.
*/
+#include <linux/types.h>
+
#define cpu_has_neon() (1)
-void kernel_neon_begin(void);
+#define kernel_neon_begin() kernel_neon_begin_partial(32)
+
+void kernel_neon_begin_partial(u32 num_regs);
void kernel_neon_end(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6dbfa6..d358ccacfc00 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
fpsimd_restore x0, 8
ret
ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+ fpsimd_save_partial x0, 1, 8, 9
+ ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+ fpsimd_restore_partial x0, 8, 9
+ ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 6cfbb4ef27d7..82ebc259a787 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -218,29 +218,45 @@ void fpsimd_flush_task_state(struct task_struct *t)
#ifdef CONFIG_KERNEL_MODE_NEON
+static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
+static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+
/*
* Kernel-side NEON support functions
*/
-void kernel_neon_begin(void)
+void kernel_neon_begin_partial(u32 num_regs)
{
- /* Avoid using the NEON in interrupt context */
- BUG_ON(in_interrupt());
- preempt_disable();
+ if (in_interrupt()) {
+ struct fpsimd_partial_state *s = this_cpu_ptr(
+ in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
- /*
- * Save the userland FPSIMD state if we have one and if we haven't done
- * so already. Clear fpsimd_last_state to indicate that there is no
- * longer userland FPSIMD state in the registers.
- */
- if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
- fpsimd_save_state(¤t->thread.fpsimd_state);
- this_cpu_write(fpsimd_last_state, NULL);
+ BUG_ON(num_regs > 32);
+ fpsimd_save_partial_state(s, roundup(num_regs, 2));
+ } else {
+ /*
+ * Save the userland FPSIMD state if we have one and if we
+ * haven't done so already. Clear fpsimd_last_state to indicate
+ * that there is no longer userland FPSIMD state in the
+ * registers.
+ */
+ preempt_disable();
+ if (current->mm &&
+ !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ this_cpu_write(fpsimd_last_state, NULL);
+ }
}
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(kernel_neon_begin_partial);
void kernel_neon_end(void)
{
- preempt_enable();
+ if (in_interrupt()) {
+ struct fpsimd_partial_state *s = this_cpu_ptr(
+ in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+ fpsimd_load_partial_state(s);
+ } else {
+ preempt_enable();
+ }
}
EXPORT_SYMBOL(kernel_neon_end);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/fpsimd.h | 15 ++++++++++++
arch/arm64/include/asm/fpsimdmacros.h | 35 ++++++++++++++++++++++++++++
arch/arm64/include/asm/neon.h | 6 ++++-
arch/arm64/kernel/entry-fpsimd.S | 24 +++++++++++++++++++
arch/arm64/kernel/fpsimd.c | 44 ++++++++++++++++++++++++-----------
5 files changed, 109 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7a900142dbc8..05e1b24aca4c 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,17 @@ struct fpsimd_state {
unsigned int cpu;
};
+/*
+ * Struct for stacking the bottom 'n' FP/SIMD registers.
+ */
+struct fpsimd_partial_state {
+ u32 num_regs;
+ u32 fpsr;
+ u32 fpcr;
+ __uint128_t vregs[32] __aligned(16);
+} __aligned(16);
+
+
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/* Masks for extracting the FPSR and FPCR from the FPSCR */
#define VFP_FPSCR_STAT_MASK 0xf800009f
@@ -66,6 +77,10 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
extern void fpsimd_flush_task_state(struct task_struct *target);
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+ u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
+
#endif
#endif
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599c96bd..69e75134689d 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,38 @@
ldr w\tmpnr, [\state, #16 * 2 + 4]
msr fpcr, x\tmpnr
.endm
+
+.altmacro
+.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
+ mrs x\tmpnr1, fpsr
+ str w\numnr, [\state]
+ mrs x\tmpnr2, fpcr
+ stp w\tmpnr1, w\tmpnr2, [\state, #4]
+ adr x\tmpnr1, 0f
+ add \state, \state, x\numnr, lsl #4
+ sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
+ br x\tmpnr1
+ .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+ .irp qb, %(qa + 1)
+ stp q\qa, q\qb, [\state, # -16 * \qa - 16]
+ .endr
+ .endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
+ ldp w\tmpnr1, w\tmpnr2, [\state, #4]
+ msr fpsr, x\tmpnr1
+ msr fpcr, x\tmpnr2
+ adr x\tmpnr1, 0f
+ ldr w\tmpnr2, [\state]
+ add \state, \state, x\tmpnr2, lsl #4
+ sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
+ br x\tmpnr1
+ .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+ .irp qb, %(qa + 1)
+ ldp q\qa, q\qb, [\state, # -16 * \qa - 16]
+ .endr
+ .endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a97780..13ce4cc18e26 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,11 @@
* published by the Free Software Foundation.
*/
+#include <linux/types.h>
+
#define cpu_has_neon() (1)
-void kernel_neon_begin(void);
+#define kernel_neon_begin() kernel_neon_begin_partial(32)
+
+void kernel_neon_begin_partial(u32 num_regs);
void kernel_neon_end(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6dbfa6..d358ccacfc00 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
fpsimd_restore x0, 8
ret
ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+ fpsimd_save_partial x0, 1, 8, 9
+ ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+ fpsimd_restore_partial x0, 8, 9
+ ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 6cfbb4ef27d7..82ebc259a787 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -218,29 +218,45 @@ void fpsimd_flush_task_state(struct task_struct *t)
#ifdef CONFIG_KERNEL_MODE_NEON
+static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
+static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+
/*
* Kernel-side NEON support functions
*/
-void kernel_neon_begin(void)
+void kernel_neon_begin_partial(u32 num_regs)
{
- /* Avoid using the NEON in interrupt context */
- BUG_ON(in_interrupt());
- preempt_disable();
+ if (in_interrupt()) {
+ struct fpsimd_partial_state *s = this_cpu_ptr(
+ in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
- /*
- * Save the userland FPSIMD state if we have one and if we haven't done
- * so already. Clear fpsimd_last_state to indicate that there is no
- * longer userland FPSIMD state in the registers.
- */
- if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
- fpsimd_save_state(¤t->thread.fpsimd_state);
- this_cpu_write(fpsimd_last_state, NULL);
+ BUG_ON(num_regs > 32);
+ fpsimd_save_partial_state(s, roundup(num_regs, 2));
+ } else {
+ /*
+ * Save the userland FPSIMD state if we have one and if we
+ * haven't done so already. Clear fpsimd_last_state to indicate
+ * that there is no longer userland FPSIMD state in the
+ * registers.
+ */
+ preempt_disable();
+ if (current->mm &&
+ !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+ fpsimd_save_state(¤t->thread.fpsimd_state);
+ this_cpu_write(fpsimd_last_state, NULL);
+ }
}
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(kernel_neon_begin_partial);
void kernel_neon_end(void)
{
- preempt_enable();
+ if (in_interrupt()) {
+ struct fpsimd_partial_state *s = this_cpu_ptr(
+ in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+ fpsimd_load_partial_state(s);
+ } else {
+ preempt_enable();
+ }
}
EXPORT_SYMBOL(kernel_neon_end);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 05/15] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that
have support for the SHA-1 part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Kconfig | 16 ++++
arch/arm64/crypto/Makefile | 12 +++
arch/arm64/crypto/sha1-ce-core.S | 153 ++++++++++++++++++++++++++++++++++
arch/arm64/crypto/sha1-ce-glue.c | 174 +++++++++++++++++++++++++++++++++++++++
6 files changed, 359 insertions(+)
create mode 100644 arch/arm64/crypto/Kconfig
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/sha1-ce-core.S
create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e6e4d3749a6e..1cefc6fe969a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -342,5 +342,8 @@ source "arch/arm64/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
+if CRYPTO
+source "arch/arm64/crypto/Kconfig"
+endif
source "lib/Kconfig"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2fceb71ac3b7..8185a913c5ed 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -45,6 +45,7 @@ export TEXT_OFFSET GZFLAGS
core-y += arch/arm64/kernel/ arch/arm64/mm/
core-$(CONFIG_KVM) += arch/arm64/kvm/
core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
libs-y := arch/arm64/lib/ $(libs-y)
libs-y += $(LIBGCC)
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
new file mode 100644
index 000000000000..7956881b5986
--- /dev/null
+++ b/arch/arm64/crypto/Kconfig
@@ -0,0 +1,16 @@
+
+menuconfig ARM64_CRYPTO
+ bool "ARM64 Accelerated Cryptographic Algorithms"
+ depends on ARM64
+ help
+ Say Y here to choose from a selection of cryptographic algorithms
+ implemented using ARM64 specific CPU features or instructions.
+
+if ARM64_CRYPTO
+
+config CRYPTO_SHA1_ARM64_CE
+ tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_HASH
+
+endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..0ed3caaec81b
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,12 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
+sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
new file mode 100644
index 000000000000..bd4af29f2722
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -0,0 +1,153 @@
+/*
+ * sha1-ce-core.S - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+ .arch armv8-a+crypto
+
+ k0 .req v0
+ k1 .req v1
+ k2 .req v2
+ k3 .req v3
+
+ t0 .req v4
+ t1 .req v5
+
+ dga .req q6
+ dgav .req v6
+ dgb .req s7
+ dgbv .req v7
+
+ dg0q .req q12
+ dg0s .req s12
+ dg0v .req v12
+ dg1s .req s13
+ dg1v .req v13
+ dg2s .req s14
+
+ .macro add_only, op, ev, rc, s0, dg1
+ .ifc \ev, ev
+ add t1.4s, v\s0\().4s, \rc\().4s
+ sha1h dg2s, dg0s
+ .ifnb \dg1
+ sha1\op dg0q, \dg1, t0.4s
+ .else
+ sha1\op dg0q, dg1s, t0.4s
+ .endif
+ .else
+ .ifnb \s0
+ add t0.4s, v\s0\().4s, \rc\().4s
+ .endif
+ sha1h dg1s, dg0s
+ sha1\op dg0q, dg2s, t1.4s
+ .endif
+ .endm
+
+ .macro add_update, op, ev, rc, s0, s1, s2, s3, dg1
+ sha1su0 v\s0\().4s, v\s1\().4s, v\s2\().4s
+ sha1su1 v\s0\().4s, v\s3\().4s
+ add_only \op, \ev, \rc, \s1, \dg1
+ .endm
+
+ /*
+ * The SHA1 round constants
+ */
+ .align 4
+.Lsha1_rcon:
+ .word 0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
+
+ /*
+ * void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+ * u8 *head, long bytes)
+ */
+ENTRY(sha1_ce_transform)
+ /* load round constants */
+ adr x6, .Lsha1_rcon
+ ld1r {k0.4s}, [x6], #4
+ ld1r {k1.4s}, [x6], #4
+ ld1r {k2.4s}, [x6], #4
+ ld1r {k3.4s}, [x6]
+
+ /* load state */
+ ldr dga, [x2]
+ ldr dgb, [x2, #16]
+
+ /* load partial state (if supplied) */
+ cbz x3, 0f
+ ld1 {v8.4s-v11.4s}, [x3]
+ b 1f
+
+ /* load input */
+0: ld1 {v8.4s-v11.4s}, [x1], #64
+ sub w0, w0, #1
+
+1:
+CPU_LE( rev32 v8.16b, v8.16b )
+CPU_LE( rev32 v9.16b, v9.16b )
+CPU_LE( rev32 v10.16b, v10.16b )
+CPU_LE( rev32 v11.16b, v11.16b )
+
+2: add t0.4s, v8.4s, k0.4s
+ mov dg0v.16b, dgav.16b
+
+ add_update c, ev, k0, 8, 9, 10, 11, dgb
+ add_update c, od, k0, 9, 10, 11, 8
+ add_update c, ev, k0, 10, 11, 8, 9
+ add_update c, od, k0, 11, 8, 9, 10
+ add_update c, ev, k1, 8, 9, 10, 11
+
+ add_update p, od, k1, 9, 10, 11, 8
+ add_update p, ev, k1, 10, 11, 8, 9
+ add_update p, od, k1, 11, 8, 9, 10
+ add_update p, ev, k1, 8, 9, 10, 11
+ add_update p, od, k2, 9, 10, 11, 8
+
+ add_update m, ev, k2, 10, 11, 8, 9
+ add_update m, od, k2, 11, 8, 9, 10
+ add_update m, ev, k2, 8, 9, 10, 11
+ add_update m, od, k2, 9, 10, 11, 8
+ add_update m, ev, k3, 10, 11, 8, 9
+
+ add_update p, od, k3, 11, 8, 9, 10
+ add_only p, ev, k3, 9
+ add_only p, od, k3, 10
+ add_only p, ev, k3, 11
+ add_only p, od
+
+ /* update state */
+ add dgbv.2s, dgbv.2s, dg1v.2s
+ add dgav.4s, dgav.4s, dg0v.4s
+
+ cbnz w0, 0b
+
+ /*
+ * Final block: add padding and total bit count.
+ * Skip if we have no total byte count in x4. In that case, the input
+ * size was not a round multiple of the block size, and the padding is
+ * handled by the C code.
+ */
+ cbz x4, 3f
+ movi v9.2d, #0
+ mov x8, #0x80000000
+ movi v10.2d, #0
+ ror x7, x4, #29 // ror(lsl(x4, 3), 32)
+ fmov d8, x8
+ mov x4, #0
+ mov v11.d[0], xzr
+ mov v11.d[1], x7
+ b 2b
+
+ /* store new state */
+3: str dga, [x2]
+ str dgb, [x2, #16]
+ ret
+ENDPROC(sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
new file mode 100644
index 000000000000..6fe83f37a750
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -0,0 +1,174 @@
+/*
+ * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+ u8 *head, long bytes);
+
+static int sha1_init(struct shash_desc *desc)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha1_state){
+ .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+ };
+ return 0;
+}
+
+static int sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+ sctx->count += len;
+
+ if ((partial + len) >= SHA1_BLOCK_SIZE) {
+ int blocks;
+
+ if (partial) {
+ int p = SHA1_BLOCK_SIZE - partial;
+
+ memcpy(sctx->buffer + partial, data, p);
+ data += p;
+ len -= p;
+ }
+
+ blocks = len / SHA1_BLOCK_SIZE;
+ len %= SHA1_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(16);
+ sha1_ce_transform(blocks, data, sctx->state,
+ partial ? sctx->buffer : NULL, 0);
+ kernel_neon_end();
+
+ data += blocks * SHA1_BLOCK_SIZE;
+ partial = 0;
+ }
+ if (len)
+ memcpy(sctx->buffer + partial, data, len);
+ return 0;
+}
+
+static int sha1_final(struct shash_desc *desc, u8 *out)
+{
+ static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
+
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ __be64 bits = cpu_to_be64(sctx->count << 3);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ u32 padlen = SHA1_BLOCK_SIZE
+ - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
+
+ sha1_update(desc, padding, padlen);
+ sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+ for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha1_state){};
+ return 0;
+}
+
+static int sha1_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int blocks;
+ int i;
+
+ if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
+ sha1_update(desc, data, len);
+ return sha1_final(desc, out);
+ }
+
+ /*
+ * Use a fast path if the input is a multiple of 64 bytes. In
+ * this case, there is no need to copy data around, and we can
+ * perform the entire digest calculation in a single invocation
+ * of sha1_ce_transform()
+ */
+ blocks = len / SHA1_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(16);
+ sha1_ce_transform(blocks, data, sctx->state, NULL, len);
+ kernel_neon_end();
+
+ for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha1_state){};
+ return 0;
+}
+
+static int sha1_export(struct shash_desc *desc, void *out)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ struct sha1_state *dst = out;
+
+ *dst = *sctx;
+ return 0;
+}
+
+static int sha1_import(struct shash_desc *desc, const void *in)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ struct sha1_state const *src = in;
+
+ *sctx = *src;
+ return 0;
+}
+
+static struct shash_alg alg = {
+ .init = sha1_init,
+ .update = sha1_update,
+ .final = sha1_final,
+ .finup = sha1_finup,
+ .export = sha1_export,
+ .import = sha1_import,
+ .descsize = sizeof(struct sha1_state),
+ .digestsize = SHA1_DIGEST_SIZE,
+ .statesize = sizeof(struct sha1_state),
+ .base = {
+ .cra_name = "sha1",
+ .cra_driver_name = "sha1-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = SHA1_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static int __init sha1_ce_mod_init(void)
+{
+ return crypto_register_shash(&alg);
+}
+
+static void __exit sha1_ce_mod_fini(void)
+{
+ crypto_unregister_shash(&alg);
+}
+
+module_cpu_feature_match(SHA1, sha1_ce_mod_init);
+module_exit(sha1_ce_mod_fini);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 05/15] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that
have support for the SHA-1 part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 1 +
arch/arm64/crypto/Kconfig | 16 ++++
arch/arm64/crypto/Makefile | 12 +++
arch/arm64/crypto/sha1-ce-core.S | 153 ++++++++++++++++++++++++++++++++++
arch/arm64/crypto/sha1-ce-glue.c | 174 +++++++++++++++++++++++++++++++++++++++
6 files changed, 359 insertions(+)
create mode 100644 arch/arm64/crypto/Kconfig
create mode 100644 arch/arm64/crypto/Makefile
create mode 100644 arch/arm64/crypto/sha1-ce-core.S
create mode 100644 arch/arm64/crypto/sha1-ce-glue.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e6e4d3749a6e..1cefc6fe969a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -342,5 +342,8 @@ source "arch/arm64/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
+if CRYPTO
+source "arch/arm64/crypto/Kconfig"
+endif
source "lib/Kconfig"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2fceb71ac3b7..8185a913c5ed 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -45,6 +45,7 @@ export TEXT_OFFSET GZFLAGS
core-y += arch/arm64/kernel/ arch/arm64/mm/
core-$(CONFIG_KVM) += arch/arm64/kvm/
core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
libs-y := arch/arm64/lib/ $(libs-y)
libs-y += $(LIBGCC)
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
new file mode 100644
index 000000000000..7956881b5986
--- /dev/null
+++ b/arch/arm64/crypto/Kconfig
@@ -0,0 +1,16 @@
+
+menuconfig ARM64_CRYPTO
+ bool "ARM64 Accelerated Cryptographic Algorithms"
+ depends on ARM64
+ help
+ Say Y here to choose from a selection of cryptographic algorithms
+ implemented using ARM64 specific CPU features or instructions.
+
+if ARM64_CRYPTO
+
+config CRYPTO_SHA1_ARM64_CE
+ tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_HASH
+
+endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..0ed3caaec81b
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,12 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
+sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
new file mode 100644
index 000000000000..bd4af29f2722
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -0,0 +1,153 @@
+/*
+ * sha1-ce-core.S - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+ .arch armv8-a+crypto
+
+ k0 .req v0
+ k1 .req v1
+ k2 .req v2
+ k3 .req v3
+
+ t0 .req v4
+ t1 .req v5
+
+ dga .req q6
+ dgav .req v6
+ dgb .req s7
+ dgbv .req v7
+
+ dg0q .req q12
+ dg0s .req s12
+ dg0v .req v12
+ dg1s .req s13
+ dg1v .req v13
+ dg2s .req s14
+
+ .macro add_only, op, ev, rc, s0, dg1
+ .ifc \ev, ev
+ add t1.4s, v\s0\().4s, \rc\().4s
+ sha1h dg2s, dg0s
+ .ifnb \dg1
+ sha1\op dg0q, \dg1, t0.4s
+ .else
+ sha1\op dg0q, dg1s, t0.4s
+ .endif
+ .else
+ .ifnb \s0
+ add t0.4s, v\s0\().4s, \rc\().4s
+ .endif
+ sha1h dg1s, dg0s
+ sha1\op dg0q, dg2s, t1.4s
+ .endif
+ .endm
+
+ .macro add_update, op, ev, rc, s0, s1, s2, s3, dg1
+ sha1su0 v\s0\().4s, v\s1\().4s, v\s2\().4s
+ sha1su1 v\s0\().4s, v\s3\().4s
+ add_only \op, \ev, \rc, \s1, \dg1
+ .endm
+
+ /*
+ * The SHA1 round constants
+ */
+ .align 4
+.Lsha1_rcon:
+ .word 0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
+
+ /*
+ * void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+ * u8 *head, long bytes)
+ */
+ENTRY(sha1_ce_transform)
+ /* load round constants */
+ adr x6, .Lsha1_rcon
+ ld1r {k0.4s}, [x6], #4
+ ld1r {k1.4s}, [x6], #4
+ ld1r {k2.4s}, [x6], #4
+ ld1r {k3.4s}, [x6]
+
+ /* load state */
+ ldr dga, [x2]
+ ldr dgb, [x2, #16]
+
+ /* load partial state (if supplied) */
+ cbz x3, 0f
+ ld1 {v8.4s-v11.4s}, [x3]
+ b 1f
+
+ /* load input */
+0: ld1 {v8.4s-v11.4s}, [x1], #64
+ sub w0, w0, #1
+
+1:
+CPU_LE( rev32 v8.16b, v8.16b )
+CPU_LE( rev32 v9.16b, v9.16b )
+CPU_LE( rev32 v10.16b, v10.16b )
+CPU_LE( rev32 v11.16b, v11.16b )
+
+2: add t0.4s, v8.4s, k0.4s
+ mov dg0v.16b, dgav.16b
+
+ add_update c, ev, k0, 8, 9, 10, 11, dgb
+ add_update c, od, k0, 9, 10, 11, 8
+ add_update c, ev, k0, 10, 11, 8, 9
+ add_update c, od, k0, 11, 8, 9, 10
+ add_update c, ev, k1, 8, 9, 10, 11
+
+ add_update p, od, k1, 9, 10, 11, 8
+ add_update p, ev, k1, 10, 11, 8, 9
+ add_update p, od, k1, 11, 8, 9, 10
+ add_update p, ev, k1, 8, 9, 10, 11
+ add_update p, od, k2, 9, 10, 11, 8
+
+ add_update m, ev, k2, 10, 11, 8, 9
+ add_update m, od, k2, 11, 8, 9, 10
+ add_update m, ev, k2, 8, 9, 10, 11
+ add_update m, od, k2, 9, 10, 11, 8
+ add_update m, ev, k3, 10, 11, 8, 9
+
+ add_update p, od, k3, 11, 8, 9, 10
+ add_only p, ev, k3, 9
+ add_only p, od, k3, 10
+ add_only p, ev, k3, 11
+ add_only p, od
+
+ /* update state */
+ add dgbv.2s, dgbv.2s, dg1v.2s
+ add dgav.4s, dgav.4s, dg0v.4s
+
+ cbnz w0, 0b
+
+ /*
+ * Final block: add padding and total bit count.
+ * Skip if we have no total byte count in x4. In that case, the input
+ * size was not a round multiple of the block size, and the padding is
+ * handled by the C code.
+ */
+ cbz x4, 3f
+ movi v9.2d, #0
+ mov x8, #0x80000000
+ movi v10.2d, #0
+ ror x7, x4, #29 // ror(lsl(x4, 3), 32)
+ fmov d8, x8
+ mov x4, #0
+ mov v11.d[0], xzr
+ mov v11.d[1], x7
+ b 2b
+
+ /* store new state */
+3: str dga, [x2]
+ str dgb, [x2, #16]
+ ret
+ENDPROC(sha1_ce_transform)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
new file mode 100644
index 000000000000..6fe83f37a750
--- /dev/null
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -0,0 +1,174 @@
+/*
+ * sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha1_ce_transform(int blocks, u8 const *src, u32 *state,
+ u8 *head, long bytes);
+
+static int sha1_init(struct shash_desc *desc)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha1_state){
+ .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+ };
+ return 0;
+}
+
+static int sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+ sctx->count += len;
+
+ if ((partial + len) >= SHA1_BLOCK_SIZE) {
+ int blocks;
+
+ if (partial) {
+ int p = SHA1_BLOCK_SIZE - partial;
+
+ memcpy(sctx->buffer + partial, data, p);
+ data += p;
+ len -= p;
+ }
+
+ blocks = len / SHA1_BLOCK_SIZE;
+ len %= SHA1_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(16);
+ sha1_ce_transform(blocks, data, sctx->state,
+ partial ? sctx->buffer : NULL, 0);
+ kernel_neon_end();
+
+ data += blocks * SHA1_BLOCK_SIZE;
+ partial = 0;
+ }
+ if (len)
+ memcpy(sctx->buffer + partial, data, len);
+ return 0;
+}
+
+static int sha1_final(struct shash_desc *desc, u8 *out)
+{
+ static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
+
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ __be64 bits = cpu_to_be64(sctx->count << 3);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ u32 padlen = SHA1_BLOCK_SIZE
+ - ((sctx->count + sizeof(bits)) % SHA1_BLOCK_SIZE);
+
+ sha1_update(desc, padding, padlen);
+ sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+ for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha1_state){};
+ return 0;
+}
+
+static int sha1_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int blocks;
+ int i;
+
+ if (sctx->count || !len || (len % SHA1_BLOCK_SIZE)) {
+ sha1_update(desc, data, len);
+ return sha1_final(desc, out);
+ }
+
+ /*
+ * Use a fast path if the input is a multiple of 64 bytes. In
+ * this case, there is no need to copy data around, and we can
+ * perform the entire digest calculation in a single invocation
+ * of sha1_ce_transform()
+ */
+ blocks = len / SHA1_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(16);
+ sha1_ce_transform(blocks, data, sctx->state, NULL, len);
+ kernel_neon_end();
+
+ for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha1_state){};
+ return 0;
+}
+
+static int sha1_export(struct shash_desc *desc, void *out)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ struct sha1_state *dst = out;
+
+ *dst = *sctx;
+ return 0;
+}
+
+static int sha1_import(struct shash_desc *desc, const void *in)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ struct sha1_state const *src = in;
+
+ *sctx = *src;
+ return 0;
+}
+
+static struct shash_alg alg = {
+ .init = sha1_init,
+ .update = sha1_update,
+ .final = sha1_final,
+ .finup = sha1_finup,
+ .export = sha1_export,
+ .import = sha1_import,
+ .descsize = sizeof(struct sha1_state),
+ .digestsize = SHA1_DIGEST_SIZE,
+ .statesize = sizeof(struct sha1_state),
+ .base = {
+ .cra_name = "sha1",
+ .cra_driver_name = "sha1-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = SHA1_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static int __init sha1_ce_mod_init(void)
+{
+ return crypto_register_shash(&alg);
+}
+
+static void __exit sha1_ce_mod_fini(void)
+{
+ crypto_unregister_shash(&alg);
+}
+
+module_cpu_feature_match(SHA1, sha1_ce_mod_init);
+module_exit(sha1_ce_mod_fini);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 06/15] arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms
for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 5 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/sha2-ce-core.S | 156 ++++++++++++++++++++++++
arch/arm64/crypto/sha2-ce-glue.c | 254 +++++++++++++++++++++++++++++++++++++++
4 files changed, 418 insertions(+)
create mode 100644 arch/arm64/crypto/sha2-ce-core.S
create mode 100644 arch/arm64/crypto/sha2-ce-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 7956881b5986..eb1e99770c21 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -13,4 +13,9 @@ config CRYPTO_SHA1_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
+config CRYPTO_SHA2_ARM64_CE
+ tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_HASH
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0ed3caaec81b..0b3885a60d43 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -10,3 +10,6 @@
obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
+
+obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
+sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
new file mode 100644
index 000000000000..53e750614169
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -0,0 +1,156 @@
+/*
+ * sha2-ce-core.S - core SHA-224/SHA-256 transform using v8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+ .arch armv8-a+crypto
+
+ dga .req q20
+ dgav .req v20
+ dgb .req q21
+ dgbv .req v21
+
+ t0 .req v22
+ t1 .req v23
+
+ dg0q .req q24
+ dg0v .req v24
+ dg1q .req q25
+ dg1v .req v25
+ dg2q .req q26
+ dg2v .req v26
+
+ .macro add_only, ev, rc, s0
+ mov dg2v.16b, dg0v.16b
+ .ifeq \ev
+ add t1.4s, v\s0\().4s, \rc\().4s
+ sha256h dg0q, dg1q, t0.4s
+ sha256h2 dg1q, dg2q, t0.4s
+ .else
+ .ifnb \s0
+ add t0.4s, v\s0\().4s, \rc\().4s
+ .endif
+ sha256h dg0q, dg1q, t1.4s
+ sha256h2 dg1q, dg2q, t1.4s
+ .endif
+ .endm
+
+ .macro add_update, ev, rc, s0, s1, s2, s3
+ sha256su0 v\s0\().4s, v\s1\().4s
+ sha256su1 v\s0\().4s, v\s2\().4s, v\s3\().4s
+ add_only \ev, \rc, \s1
+ .endm
+
+ /*
+ * The SHA-256 round constants
+ */
+ .align 4
+.Lsha2_rcon:
+ .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+ .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+ .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+ .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+ .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+ .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+ .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+ .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+ .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+ .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+ .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+ .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+ .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+ .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+ .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+ .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+
+ /*
+ * void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+ * u8 *head, long bytes)
+ */
+ENTRY(sha2_ce_transform)
+ /* load round constants */
+ adr x8, .Lsha2_rcon
+ ld1 { v0.4s- v3.4s}, [x8], #64
+ ld1 { v4.4s- v7.4s}, [x8], #64
+ ld1 { v8.4s-v11.4s}, [x8], #64
+ ld1 {v12.4s-v15.4s}, [x8]
+
+ /* load state */
+ ldp dga, dgb, [x2]
+
+ /* load partial input (if supplied) */
+ cbz x3, 0f
+ ld1 {v16.4s-v19.4s}, [x3]
+ b 1f
+
+ /* load input */
+0: ld1 {v16.4s-v19.4s}, [x1], #64
+ sub w0, w0, #1
+
+1:
+CPU_LE( rev32 v16.16b, v16.16b )
+CPU_LE( rev32 v17.16b, v17.16b )
+CPU_LE( rev32 v18.16b, v18.16b )
+CPU_LE( rev32 v19.16b, v19.16b )
+
+2: add t0.4s, v16.4s, v0.4s
+ mov dg0v.16b, dgav.16b
+ mov dg1v.16b, dgbv.16b
+
+ add_update 0, v1, 16, 17, 18, 19
+ add_update 1, v2, 17, 18, 19, 16
+ add_update 0, v3, 18, 19, 16, 17
+ add_update 1, v4, 19, 16, 17, 18
+
+ add_update 0, v5, 16, 17, 18, 19
+ add_update 1, v6, 17, 18, 19, 16
+ add_update 0, v7, 18, 19, 16, 17
+ add_update 1, v8, 19, 16, 17, 18
+
+ add_update 0, v9, 16, 17, 18, 19
+ add_update 1, v10, 17, 18, 19, 16
+ add_update 0, v11, 18, 19, 16, 17
+ add_update 1, v12, 19, 16, 17, 18
+
+ add_only 0, v13, 17
+ add_only 1, v14, 18
+ add_only 0, v15, 19
+ add_only 1
+
+ /* update state */
+ add dgav.4s, dgav.4s, dg0v.4s
+ add dgbv.4s, dgbv.4s, dg1v.4s
+
+ /* handled all input blocks? */
+ cbnz w0, 0b
+
+ /*
+ * Final block: add padding and total bit count.
+ * Skip if we have no total byte count in x4. In that case, the input
+ * size was not a round multiple of the block size, and the padding is
+ * handled by the C code.
+ */
+ cbz x4, 3f
+ movi v17.2d, #0
+ mov x8, #0x80000000
+ movi v18.2d, #0
+ ror x7, x4, #29 // ror(lsl(x4, 3), 32)
+ fmov d16, x8
+ mov x4, #0
+ mov v19.d[0], xzr
+ mov v19.d[1], x7
+ b 2b
+
+ /* store new state */
+3: stp dga, dgb, [x2]
+ ret
+ENDPROC(sha2_ce_transform)
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
new file mode 100644
index 000000000000..81617262b3df
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -0,0 +1,254 @@
+/*
+ * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+ u8 *head, long bytes);
+
+static int sha224_init(struct shash_desc *desc)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha256_state){
+ .state = {
+ SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
+ SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
+ }
+ };
+ return 0;
+}
+
+static int sha256_init(struct shash_desc *desc)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha256_state){
+ .state = {
+ SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+ SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
+ }
+ };
+ return 0;
+}
+
+static int sha2_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+ sctx->count += len;
+
+ if ((partial + len) >= SHA256_BLOCK_SIZE) {
+ int blocks;
+
+ if (partial) {
+ int p = SHA256_BLOCK_SIZE - partial;
+
+ memcpy(sctx->buf + partial, data, p);
+ data += p;
+ len -= p;
+ }
+
+ blocks = len / SHA256_BLOCK_SIZE;
+ len %= SHA256_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(28);
+ sha2_ce_transform(blocks, data, sctx->state,
+ partial ? sctx->buf : NULL, 0);
+ kernel_neon_end();
+
+ data += blocks * SHA256_BLOCK_SIZE;
+ partial = 0;
+ }
+ if (len)
+ memcpy(sctx->buf + partial, data, len);
+ return 0;
+}
+
+static void sha2_final(struct shash_desc *desc)
+{
+ static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be64 bits = cpu_to_be64(sctx->count << 3);
+ u32 padlen = SHA256_BLOCK_SIZE
+ - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
+
+ sha2_update(desc, padding, padlen);
+ sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+}
+
+static int sha224_final(struct shash_desc *desc, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_final(desc);
+
+ for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_final(desc);
+
+ for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static void sha2_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ int blocks;
+
+ if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
+ sha2_update(desc, data, len);
+ sha2_final(desc);
+ return;
+ }
+
+ /*
+ * Use a fast path if the input is a multiple of 64 bytes. In
+ * this case, there is no need to copy data around, and we can
+ * perform the entire digest calculation in a single invocation
+ * of sha2_ce_transform()
+ */
+ blocks = len / SHA256_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(28);
+ sha2_ce_transform(blocks, data, sctx->state, NULL, len);
+ kernel_neon_end();
+}
+
+static int sha224_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_finup(desc, data, len);
+
+ for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_finup(desc, data, len);
+
+ for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha2_export(struct shash_desc *desc, void *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ struct sha256_state *dst = out;
+
+ *dst = *sctx;
+ return 0;
+}
+
+static int sha2_import(struct shash_desc *desc, const void *in)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ struct sha256_state const *src = in;
+
+ *sctx = *src;
+ return 0;
+}
+
+static struct shash_alg algs[] = { {
+ .init = sha224_init,
+ .update = sha2_update,
+ .final = sha224_final,
+ .finup = sha224_finup,
+ .export = sha2_export,
+ .import = sha2_import,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA224_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
+ .base = {
+ .cra_name = "sha224",
+ .cra_driver_name = "sha224-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+}, {
+ .init = sha256_init,
+ .update = sha2_update,
+ .final = sha256_final,
+ .finup = sha256_finup,
+ .export = sha2_export,
+ .import = sha2_import,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA256_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
+ .base = {
+ .cra_name = "sha256",
+ .cra_driver_name = "sha256-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+} };
+
+static int __init sha2_ce_mod_init(void)
+{
+ return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha2_ce_mod_fini(void)
+{
+ crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_cpu_feature_match(SHA2, sha2_ce_mod_init);
+module_exit(sha2_ce_mod_fini);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 06/15] arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms
for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 5 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/sha2-ce-core.S | 156 ++++++++++++++++++++++++
arch/arm64/crypto/sha2-ce-glue.c | 254 +++++++++++++++++++++++++++++++++++++++
4 files changed, 418 insertions(+)
create mode 100644 arch/arm64/crypto/sha2-ce-core.S
create mode 100644 arch/arm64/crypto/sha2-ce-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 7956881b5986..eb1e99770c21 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -13,4 +13,9 @@ config CRYPTO_SHA1_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
+config CRYPTO_SHA2_ARM64_CE
+ tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_HASH
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0ed3caaec81b..0b3885a60d43 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -10,3 +10,6 @@
obj-$(CONFIG_CRYPTO_SHA1_ARM64_CE) += sha1-ce.o
sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
+
+obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
+sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
new file mode 100644
index 000000000000..53e750614169
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -0,0 +1,156 @@
+/*
+ * sha2-ce-core.S - core SHA-224/SHA-256 transform using v8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+ .arch armv8-a+crypto
+
+ dga .req q20
+ dgav .req v20
+ dgb .req q21
+ dgbv .req v21
+
+ t0 .req v22
+ t1 .req v23
+
+ dg0q .req q24
+ dg0v .req v24
+ dg1q .req q25
+ dg1v .req v25
+ dg2q .req q26
+ dg2v .req v26
+
+ .macro add_only, ev, rc, s0
+ mov dg2v.16b, dg0v.16b
+ .ifeq \ev
+ add t1.4s, v\s0\().4s, \rc\().4s
+ sha256h dg0q, dg1q, t0.4s
+ sha256h2 dg1q, dg2q, t0.4s
+ .else
+ .ifnb \s0
+ add t0.4s, v\s0\().4s, \rc\().4s
+ .endif
+ sha256h dg0q, dg1q, t1.4s
+ sha256h2 dg1q, dg2q, t1.4s
+ .endif
+ .endm
+
+ .macro add_update, ev, rc, s0, s1, s2, s3
+ sha256su0 v\s0\().4s, v\s1\().4s
+ sha256su1 v\s0\().4s, v\s2\().4s, v\s3\().4s
+ add_only \ev, \rc, \s1
+ .endm
+
+ /*
+ * The SHA-256 round constants
+ */
+ .align 4
+.Lsha2_rcon:
+ .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+ .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+ .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+ .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+ .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+ .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+ .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+ .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+ .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+ .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+ .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+ .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+ .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+ .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+ .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+ .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+
+ /*
+ * void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+ * u8 *head, long bytes)
+ */
+ENTRY(sha2_ce_transform)
+ /* load round constants */
+ adr x8, .Lsha2_rcon
+ ld1 { v0.4s- v3.4s}, [x8], #64
+ ld1 { v4.4s- v7.4s}, [x8], #64
+ ld1 { v8.4s-v11.4s}, [x8], #64
+ ld1 {v12.4s-v15.4s}, [x8]
+
+ /* load state */
+ ldp dga, dgb, [x2]
+
+ /* load partial input (if supplied) */
+ cbz x3, 0f
+ ld1 {v16.4s-v19.4s}, [x3]
+ b 1f
+
+ /* load input */
+0: ld1 {v16.4s-v19.4s}, [x1], #64
+ sub w0, w0, #1
+
+1:
+CPU_LE( rev32 v16.16b, v16.16b )
+CPU_LE( rev32 v17.16b, v17.16b )
+CPU_LE( rev32 v18.16b, v18.16b )
+CPU_LE( rev32 v19.16b, v19.16b )
+
+2: add t0.4s, v16.4s, v0.4s
+ mov dg0v.16b, dgav.16b
+ mov dg1v.16b, dgbv.16b
+
+ add_update 0, v1, 16, 17, 18, 19
+ add_update 1, v2, 17, 18, 19, 16
+ add_update 0, v3, 18, 19, 16, 17
+ add_update 1, v4, 19, 16, 17, 18
+
+ add_update 0, v5, 16, 17, 18, 19
+ add_update 1, v6, 17, 18, 19, 16
+ add_update 0, v7, 18, 19, 16, 17
+ add_update 1, v8, 19, 16, 17, 18
+
+ add_update 0, v9, 16, 17, 18, 19
+ add_update 1, v10, 17, 18, 19, 16
+ add_update 0, v11, 18, 19, 16, 17
+ add_update 1, v12, 19, 16, 17, 18
+
+ add_only 0, v13, 17
+ add_only 1, v14, 18
+ add_only 0, v15, 19
+ add_only 1
+
+ /* update state */
+ add dgav.4s, dgav.4s, dg0v.4s
+ add dgbv.4s, dgbv.4s, dg1v.4s
+
+ /* handled all input blocks? */
+ cbnz w0, 0b
+
+ /*
+ * Final block: add padding and total bit count.
+ * Skip if we have no total byte count in x4. In that case, the input
+ * size was not a round multiple of the block size, and the padding is
+ * handled by the C code.
+ */
+ cbz x4, 3f
+ movi v17.2d, #0
+ mov x8, #0x80000000
+ movi v18.2d, #0
+ ror x7, x4, #29 // ror(lsl(x4, 3), 32)
+ fmov d16, x8
+ mov x4, #0
+ mov v19.d[0], xzr
+ mov v19.d[1], x7
+ b 2b
+
+ /* store new state */
+3: stp dga, dgb, [x2]
+ ret
+ENDPROC(sha2_ce_transform)
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
new file mode 100644
index 000000000000..81617262b3df
--- /dev/null
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -0,0 +1,254 @@
+/*
+ * sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <crypto/sha.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha2_ce_transform(int blocks, u8 const *src, u32 *state,
+ u8 *head, long bytes);
+
+static int sha224_init(struct shash_desc *desc)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha256_state){
+ .state = {
+ SHA224_H0, SHA224_H1, SHA224_H2, SHA224_H3,
+ SHA224_H4, SHA224_H5, SHA224_H6, SHA224_H7,
+ }
+ };
+ return 0;
+}
+
+static int sha256_init(struct shash_desc *desc)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha256_state){
+ .state = {
+ SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+ SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7,
+ }
+ };
+ return 0;
+}
+
+static int sha2_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
+
+ sctx->count += len;
+
+ if ((partial + len) >= SHA256_BLOCK_SIZE) {
+ int blocks;
+
+ if (partial) {
+ int p = SHA256_BLOCK_SIZE - partial;
+
+ memcpy(sctx->buf + partial, data, p);
+ data += p;
+ len -= p;
+ }
+
+ blocks = len / SHA256_BLOCK_SIZE;
+ len %= SHA256_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(28);
+ sha2_ce_transform(blocks, data, sctx->state,
+ partial ? sctx->buf : NULL, 0);
+ kernel_neon_end();
+
+ data += blocks * SHA256_BLOCK_SIZE;
+ partial = 0;
+ }
+ if (len)
+ memcpy(sctx->buf + partial, data, len);
+ return 0;
+}
+
+static void sha2_final(struct shash_desc *desc)
+{
+ static const u8 padding[SHA256_BLOCK_SIZE] = { 0x80, };
+
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be64 bits = cpu_to_be64(sctx->count << 3);
+ u32 padlen = SHA256_BLOCK_SIZE
+ - ((sctx->count + sizeof(bits)) % SHA256_BLOCK_SIZE);
+
+ sha2_update(desc, padding, padlen);
+ sha2_update(desc, (const u8 *)&bits, sizeof(bits));
+}
+
+static int sha224_final(struct shash_desc *desc, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_final(desc);
+
+ for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_final(desc);
+
+ for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static void sha2_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ int blocks;
+
+ if (sctx->count || !len || (len % SHA256_BLOCK_SIZE)) {
+ sha2_update(desc, data, len);
+ sha2_final(desc);
+ return;
+ }
+
+ /*
+ * Use a fast path if the input is a multiple of 64 bytes. In
+ * this case, there is no need to copy data around, and we can
+ * perform the entire digest calculation in a single invocation
+ * of sha2_ce_transform()
+ */
+ blocks = len / SHA256_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(28);
+ sha2_ce_transform(blocks, data, sctx->state, NULL, len);
+ kernel_neon_end();
+}
+
+static int sha224_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_finup(desc, data, len);
+
+ for (i = 0; i < SHA224_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ __be32 *dst = (__be32 *)out;
+ int i;
+
+ sha2_finup(desc, data, len);
+
+ for (i = 0; i < SHA256_DIGEST_SIZE / sizeof(__be32); i++)
+ put_unaligned_be32(sctx->state[i], dst++);
+
+ *sctx = (struct sha256_state){};
+ return 0;
+}
+
+static int sha2_export(struct shash_desc *desc, void *out)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ struct sha256_state *dst = out;
+
+ *dst = *sctx;
+ return 0;
+}
+
+static int sha2_import(struct shash_desc *desc, const void *in)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ struct sha256_state const *src = in;
+
+ *sctx = *src;
+ return 0;
+}
+
+static struct shash_alg algs[] = { {
+ .init = sha224_init,
+ .update = sha2_update,
+ .final = sha224_final,
+ .finup = sha224_finup,
+ .export = sha2_export,
+ .import = sha2_import,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA224_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
+ .base = {
+ .cra_name = "sha224",
+ .cra_driver_name = "sha224-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+}, {
+ .init = sha256_init,
+ .update = sha2_update,
+ .final = sha256_final,
+ .finup = sha256_finup,
+ .export = sha2_export,
+ .import = sha2_import,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA256_DIGEST_SIZE,
+ .statesize = sizeof(struct sha256_state),
+ .base = {
+ .cra_name = "sha256",
+ .cra_driver_name = "sha256-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+} };
+
+static int __init sha2_ce_mod_init(void)
+{
+ return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha2_ce_mod_fini(void)
+{
+ crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_cpu_feature_match(SHA2, sha2_ce_mod_init);
+module_exit(sha2_ce_mod_fini);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 07/15] arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 6 ++
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/ghash-ce-core.S | 95 +++++++++++++++++++++++
arch/arm64/crypto/ghash-ce-glue.c | 155 ++++++++++++++++++++++++++++++++++++++
4 files changed, 259 insertions(+)
create mode 100644 arch/arm64/crypto/ghash-ce-core.S
create mode 100644 arch/arm64/crypto/ghash-ce-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index eb1e99770c21..0c50859ee7b9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,4 +18,10 @@ config CRYPTO_SHA2_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
+
+config CRYPTO_GHASH_ARM64_CE
+ tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_HASH
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0b3885a60d43..e8c81a068868 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -13,3 +13,6 @@ sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
+
+obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
new file mode 100644
index 000000000000..b9e6eaf41c9b
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -0,0 +1,95 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * Based on arch/x86/crypto/ghash-pmullni-intel_asm.S
+ *
+ * Copyright (c) 2009 Intel Corp.
+ * Author: Huang Ying <ying.huang@intel.com>
+ * Vinodh Gopal
+ * Erdinc Ozturk
+ * Deniz Karakoyunlu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ DATA .req v0
+ SHASH .req v1
+ IN1 .req v2
+ T1 .req v2
+ T2 .req v3
+ T3 .req v4
+ VZR .req v5
+
+ .text
+ .arch armv8-a+crypto
+
+ /*
+ * void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+ * struct ghash_key const *k, const char *head)
+ */
+ENTRY(pmull_ghash_update)
+ ld1 {DATA.16b}, [x1]
+ ld1 {SHASH.16b}, [x3]
+ eor VZR.16b, VZR.16b, VZR.16b
+
+ /* do the head block first, if supplied */
+ cbz x4, 0f
+ ld1 {IN1.2d}, [x4]
+ b 1f
+
+0: ld1 {IN1.2d}, [x2], #16
+ sub w0, w0, #1
+1: ext IN1.16b, IN1.16b, IN1.16b, #8
+CPU_LE( rev64 IN1.16b, IN1.16b )
+ eor DATA.16b, DATA.16b, IN1.16b
+
+ /* multiply DATA by SHASH in GF(2^128) */
+ ext T2.16b, DATA.16b, DATA.16b, #8
+ ext T3.16b, SHASH.16b, SHASH.16b, #8
+ eor T2.16b, T2.16b, DATA.16b
+ eor T3.16b, T3.16b, SHASH.16b
+
+ pmull2 T1.1q, SHASH.2d, DATA.2d // a1 * b1
+ pmull DATA.1q, SHASH.1d, DATA.1d // a0 * b0
+ pmull T2.1q, T2.1d, T3.1d // (a1 + a0)(b1 + b0)
+ eor T2.16b, T2.16b, T1.16b // (a0 * b1) + (a1 * b0)
+ eor T2.16b, T2.16b, DATA.16b
+
+ ext T3.16b, VZR.16b, T2.16b, #8
+ ext T2.16b, T2.16b, VZR.16b, #8
+ eor DATA.16b, DATA.16b, T3.16b
+ eor T1.16b, T1.16b, T2.16b // <T1:DATA> is result of
+ // carry-less multiplication
+
+ /* first phase of the reduction */
+ shl T3.2d, DATA.2d, #1
+ eor T3.16b, T3.16b, DATA.16b
+ shl T3.2d, T3.2d, #5
+ eor T3.16b, T3.16b, DATA.16b
+ shl T3.2d, T3.2d, #57
+ ext T2.16b, VZR.16b, T3.16b, #8
+ ext T3.16b, T3.16b, VZR.16b, #8
+ eor DATA.16b, DATA.16b, T2.16b
+ eor T1.16b, T1.16b, T3.16b
+
+ /* second phase of the reduction */
+ ushr T2.2d, DATA.2d, #5
+ eor T2.16b, T2.16b, DATA.16b
+ ushr T2.2d, T2.2d, #1
+ eor T2.16b, T2.16b, DATA.16b
+ ushr T2.2d, T2.2d, #1
+ eor T1.16b, T1.16b, T2.16b
+ eor DATA.16b, DATA.16b, T1.16b
+
+ cbnz w0, 0b
+
+ st1 {DATA.16b}, [x1]
+ ret
+ENDPROC(pmull_ghash_update)
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
new file mode 100644
index 000000000000..b92baf3f68c7
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -0,0 +1,155 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("GHASH secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+#define GHASH_BLOCK_SIZE 16
+#define GHASH_DIGEST_SIZE 16
+
+struct ghash_key {
+ u64 a;
+ u64 b;
+};
+
+struct ghash_desc_ctx {
+ u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
+ u8 buf[GHASH_BLOCK_SIZE];
+ u32 count;
+};
+
+asmlinkage void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+ struct ghash_key const *k, const char *head);
+
+static int ghash_init(struct shash_desc *desc)
+{
+ struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+
+ *ctx = (struct ghash_desc_ctx){};
+ return 0;
+}
+
+static int ghash_update(struct shash_desc *desc, const u8 *src,
+ unsigned int len)
+{
+ struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+ unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+ ctx->count += len;
+
+ if ((partial + len) >= GHASH_BLOCK_SIZE) {
+ struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+ int blocks;
+
+ if (partial) {
+ int p = GHASH_BLOCK_SIZE - partial;
+
+ memcpy(ctx->buf + partial, src, p);
+ src += p;
+ len -= p;
+ }
+
+ blocks = len / GHASH_BLOCK_SIZE;
+ len %= GHASH_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(6);
+ pmull_ghash_update(blocks, ctx->digest, src, key,
+ partial ? ctx->buf : NULL);
+ kernel_neon_end();
+ src += blocks * GHASH_BLOCK_SIZE;
+ }
+ if (len)
+ memcpy(ctx->buf + partial, src, len);
+ return 0;
+}
+
+static int ghash_final(struct shash_desc *desc, u8 *dst)
+{
+ struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+ unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+ if (partial) {
+ struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+
+ memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
+
+ kernel_neon_begin_partial(6);
+ pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
+ kernel_neon_end();
+ }
+ put_unaligned_be64(ctx->digest[1], dst);
+ put_unaligned_be64(ctx->digest[0], dst + 8);
+
+ *ctx = (struct ghash_desc_ctx){};
+ return 0;
+}
+
+static int ghash_setkey(struct crypto_shash *tfm,
+ const u8 *inkey, unsigned int keylen)
+{
+ struct ghash_key *key = crypto_shash_ctx(tfm);
+ u64 a, b;
+
+ if (keylen != GHASH_BLOCK_SIZE) {
+ crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+ return -EINVAL;
+ }
+
+ /* perform multiplication by 'x' in GF(2^128) */
+ b = get_unaligned_be64(inkey);
+ a = get_unaligned_be64(inkey + 8);
+
+ key->a = (a << 1) | (b >> 63);
+ key->b = (b << 1) | (a >> 63);
+
+ if (b >> 63)
+ key->b ^= 0xc200000000000000UL;
+
+ return 0;
+}
+
+static struct shash_alg ghash_alg = {
+ .digestsize = GHASH_DIGEST_SIZE,
+ .init = ghash_init,
+ .update = ghash_update,
+ .final = ghash_final,
+ .setkey = ghash_setkey,
+ .descsize = sizeof(struct ghash_desc_ctx),
+ .base = {
+ .cra_name = "ghash",
+ .cra_driver_name = "ghash-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = GHASH_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct ghash_key),
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static int __init ghash_ce_mod_init(void)
+{
+ return crypto_register_shash(&ghash_alg);
+}
+
+static void __exit ghash_ce_mod_exit(void)
+{
+ crypto_unregister_shash(&ghash_alg);
+}
+
+module_cpu_feature_match(PMULL, ghash_ce_mod_init);
+module_exit(ghash_ce_mod_exit);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 07/15] arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 6 ++
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/ghash-ce-core.S | 95 +++++++++++++++++++++++
arch/arm64/crypto/ghash-ce-glue.c | 155 ++++++++++++++++++++++++++++++++++++++
4 files changed, 259 insertions(+)
create mode 100644 arch/arm64/crypto/ghash-ce-core.S
create mode 100644 arch/arm64/crypto/ghash-ce-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index eb1e99770c21..0c50859ee7b9 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,4 +18,10 @@ config CRYPTO_SHA2_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
+
+config CRYPTO_GHASH_ARM64_CE
+ tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_HASH
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 0b3885a60d43..e8c81a068868 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -13,3 +13,6 @@ sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
obj-$(CONFIG_CRYPTO_SHA2_ARM64_CE) += sha2-ce.o
sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
+
+obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
new file mode 100644
index 000000000000..b9e6eaf41c9b
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -0,0 +1,95 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * Based on arch/x86/crypto/ghash-pmullni-intel_asm.S
+ *
+ * Copyright (c) 2009 Intel Corp.
+ * Author: Huang Ying <ying.huang@intel.com>
+ * Vinodh Gopal
+ * Erdinc Ozturk
+ * Deniz Karakoyunlu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ DATA .req v0
+ SHASH .req v1
+ IN1 .req v2
+ T1 .req v2
+ T2 .req v3
+ T3 .req v4
+ VZR .req v5
+
+ .text
+ .arch armv8-a+crypto
+
+ /*
+ * void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+ * struct ghash_key const *k, const char *head)
+ */
+ENTRY(pmull_ghash_update)
+ ld1 {DATA.16b}, [x1]
+ ld1 {SHASH.16b}, [x3]
+ eor VZR.16b, VZR.16b, VZR.16b
+
+ /* do the head block first, if supplied */
+ cbz x4, 0f
+ ld1 {IN1.2d}, [x4]
+ b 1f
+
+0: ld1 {IN1.2d}, [x2], #16
+ sub w0, w0, #1
+1: ext IN1.16b, IN1.16b, IN1.16b, #8
+CPU_LE( rev64 IN1.16b, IN1.16b )
+ eor DATA.16b, DATA.16b, IN1.16b
+
+ /* multiply DATA by SHASH in GF(2^128) */
+ ext T2.16b, DATA.16b, DATA.16b, #8
+ ext T3.16b, SHASH.16b, SHASH.16b, #8
+ eor T2.16b, T2.16b, DATA.16b
+ eor T3.16b, T3.16b, SHASH.16b
+
+ pmull2 T1.1q, SHASH.2d, DATA.2d // a1 * b1
+ pmull DATA.1q, SHASH.1d, DATA.1d // a0 * b0
+ pmull T2.1q, T2.1d, T3.1d // (a1 + a0)(b1 + b0)
+ eor T2.16b, T2.16b, T1.16b // (a0 * b1) + (a1 * b0)
+ eor T2.16b, T2.16b, DATA.16b
+
+ ext T3.16b, VZR.16b, T2.16b, #8
+ ext T2.16b, T2.16b, VZR.16b, #8
+ eor DATA.16b, DATA.16b, T3.16b
+ eor T1.16b, T1.16b, T2.16b // <T1:DATA> is result of
+ // carry-less multiplication
+
+ /* first phase of the reduction */
+ shl T3.2d, DATA.2d, #1
+ eor T3.16b, T3.16b, DATA.16b
+ shl T3.2d, T3.2d, #5
+ eor T3.16b, T3.16b, DATA.16b
+ shl T3.2d, T3.2d, #57
+ ext T2.16b, VZR.16b, T3.16b, #8
+ ext T3.16b, T3.16b, VZR.16b, #8
+ eor DATA.16b, DATA.16b, T2.16b
+ eor T1.16b, T1.16b, T3.16b
+
+ /* second phase of the reduction */
+ ushr T2.2d, DATA.2d, #5
+ eor T2.16b, T2.16b, DATA.16b
+ ushr T2.2d, T2.2d, #1
+ eor T2.16b, T2.16b, DATA.16b
+ ushr T2.2d, T2.2d, #1
+ eor T1.16b, T1.16b, T2.16b
+ eor DATA.16b, DATA.16b, T1.16b
+
+ cbnz w0, 0b
+
+ st1 {DATA.16b}, [x1]
+ ret
+ENDPROC(pmull_ghash_update)
diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
new file mode 100644
index 000000000000..b92baf3f68c7
--- /dev/null
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -0,0 +1,155 @@
+/*
+ * Accelerated GHASH implementation with ARMv8 PMULL instructions.
+ *
+ * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/internal/hash.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("GHASH secure hash using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+#define GHASH_BLOCK_SIZE 16
+#define GHASH_DIGEST_SIZE 16
+
+struct ghash_key {
+ u64 a;
+ u64 b;
+};
+
+struct ghash_desc_ctx {
+ u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
+ u8 buf[GHASH_BLOCK_SIZE];
+ u32 count;
+};
+
+asmlinkage void pmull_ghash_update(int blocks, u64 dg[], const char *src,
+ struct ghash_key const *k, const char *head);
+
+static int ghash_init(struct shash_desc *desc)
+{
+ struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+
+ *ctx = (struct ghash_desc_ctx){};
+ return 0;
+}
+
+static int ghash_update(struct shash_desc *desc, const u8 *src,
+ unsigned int len)
+{
+ struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+ unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+ ctx->count += len;
+
+ if ((partial + len) >= GHASH_BLOCK_SIZE) {
+ struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+ int blocks;
+
+ if (partial) {
+ int p = GHASH_BLOCK_SIZE - partial;
+
+ memcpy(ctx->buf + partial, src, p);
+ src += p;
+ len -= p;
+ }
+
+ blocks = len / GHASH_BLOCK_SIZE;
+ len %= GHASH_BLOCK_SIZE;
+
+ kernel_neon_begin_partial(6);
+ pmull_ghash_update(blocks, ctx->digest, src, key,
+ partial ? ctx->buf : NULL);
+ kernel_neon_end();
+ src += blocks * GHASH_BLOCK_SIZE;
+ }
+ if (len)
+ memcpy(ctx->buf + partial, src, len);
+ return 0;
+}
+
+static int ghash_final(struct shash_desc *desc, u8 *dst)
+{
+ struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
+ unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
+
+ if (partial) {
+ struct ghash_key *key = crypto_shash_ctx(desc->tfm);
+
+ memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
+
+ kernel_neon_begin_partial(6);
+ pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
+ kernel_neon_end();
+ }
+ put_unaligned_be64(ctx->digest[1], dst);
+ put_unaligned_be64(ctx->digest[0], dst + 8);
+
+ *ctx = (struct ghash_desc_ctx){};
+ return 0;
+}
+
+static int ghash_setkey(struct crypto_shash *tfm,
+ const u8 *inkey, unsigned int keylen)
+{
+ struct ghash_key *key = crypto_shash_ctx(tfm);
+ u64 a, b;
+
+ if (keylen != GHASH_BLOCK_SIZE) {
+ crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+ return -EINVAL;
+ }
+
+ /* perform multiplication by 'x' in GF(2^128) */
+ b = get_unaligned_be64(inkey);
+ a = get_unaligned_be64(inkey + 8);
+
+ key->a = (a << 1) | (b >> 63);
+ key->b = (b << 1) | (a >> 63);
+
+ if (b >> 63)
+ key->b ^= 0xc200000000000000UL;
+
+ return 0;
+}
+
+static struct shash_alg ghash_alg = {
+ .digestsize = GHASH_DIGEST_SIZE,
+ .init = ghash_init,
+ .update = ghash_update,
+ .final = ghash_final,
+ .setkey = ghash_setkey,
+ .descsize = sizeof(struct ghash_desc_ctx),
+ .base = {
+ .cra_name = "ghash",
+ .cra_driver_name = "ghash-ce",
+ .cra_priority = 200,
+ .cra_flags = CRYPTO_ALG_TYPE_SHASH,
+ .cra_blocksize = GHASH_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct ghash_key),
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static int __init ghash_ce_mod_init(void)
+{
+ return crypto_register_shash(&ghash_alg);
+}
+
+static void __exit ghash_ce_mod_exit(void)
+{
+ crypto_unregister_shash(&ghash_alg);
+}
+
+module_cpu_feature_match(PMULL, ghash_ce_mod_init);
+module_exit(ghash_ce_mod_exit);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 08/15] arm64/crypto: AES using ARMv8 Crypto Extensions
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This patch adds support for the AES symmetric encryption algorithm for CPUs
that have support for the AES part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 7 +-
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-ce-cipher.c | 155 ++++++++++++++++++++++++++++++++++++++
3 files changed, 164 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 0c50859ee7b9..9ba32c0da871 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,10 +18,15 @@ config CRYPTO_SHA2_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
-
config CRYPTO_GHASH_ARM64_CE
tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
+config CRYPTO_AES_ARM64_CE
+ tristate "AES core cipher using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_ALGAPI
+ select CRYPTO_AES
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index e8c81a068868..908abd9242b1 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -16,3 +16,6 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
new file mode 100644
index 000000000000..2075e1acae6b
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -0,0 +1,155 @@
+/*
+ * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+struct aes_block {
+ u8 b[AES_BLOCK_SIZE];
+};
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+ /*
+ * # of rounds specified by AES:
+ * 128 bit key 10 rounds
+ * 192 bit key 12 rounds
+ * 256 bit key 14 rounds
+ * => n byte key => 6 + (n/4) rounds
+ */
+ return 6 + ctx->key_length / 4;
+}
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ struct aes_block *out = (struct aes_block *)dst;
+ struct aes_block const *in = (struct aes_block *)src;
+ void *dummy0;
+ int dummy1;
+
+ kernel_neon_begin_partial(4);
+
+ __asm__(" ld1 {v0.16b}, %[in] ;"
+ " ld1 {v1.2d}, [%[key]], #16 ;"
+ " cmp %w[rounds], #10 ;"
+ " bmi 0f ;"
+ " bne 3f ;"
+ " mov v3.16b, v1.16b ;"
+ " b 2f ;"
+ "0: mov v2.16b, v1.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ "1: aese v0.16b, v2.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ " aese v0.16b, v3.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ " subs %w[rounds], %w[rounds], #3 ;"
+ " aese v0.16b, v1.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ " bpl 1b ;"
+ " aese v0.16b, v2.16b ;"
+ " eor v0.16b, v0.16b, v3.16b ;"
+ " st1 {v0.16b}, %[out] ;"
+
+ : [out] "=Q"(*out),
+ [key] "=r"(dummy0),
+ [rounds] "=r"(dummy1)
+ : [in] "Q"(*in),
+ "1"(ctx->key_enc),
+ "2"(num_rounds(ctx) - 2)
+ : "cc");
+
+ kernel_neon_end();
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ struct aes_block *out = (struct aes_block *)dst;
+ struct aes_block const *in = (struct aes_block *)src;
+ void *dummy0;
+ int dummy1;
+
+ kernel_neon_begin_partial(4);
+
+ __asm__(" ld1 {v0.16b}, %[in] ;"
+ " ld1 {v1.2d}, [%[key]], #16 ;"
+ " cmp %w[rounds], #10 ;"
+ " bmi 0f ;"
+ " bne 3f ;"
+ " mov v3.16b, v1.16b ;"
+ " b 2f ;"
+ "0: mov v2.16b, v1.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ "1: aesd v0.16b, v2.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ " aesd v0.16b, v3.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ " subs %w[rounds], %w[rounds], #3 ;"
+ " aesd v0.16b, v1.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ " bpl 1b ;"
+ " aesd v0.16b, v2.16b ;"
+ " eor v0.16b, v0.16b, v3.16b ;"
+ " st1 {v0.16b}, %[out] ;"
+
+ : [out] "=Q"(*out),
+ [key] "=r"(dummy0),
+ [rounds] "=r"(dummy1)
+ : [in] "Q"(*in),
+ "1"(ctx->key_dec),
+ "2"(num_rounds(ctx) - 2)
+ : "cc");
+
+ kernel_neon_end();
+}
+
+static struct crypto_alg aes_alg = {
+ .cra_name = "aes",
+ .cra_driver_name = "aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = crypto_aes_set_key,
+ .cia_encrypt = aes_cipher_encrypt,
+ .cia_decrypt = aes_cipher_decrypt
+ }
+};
+
+static int __init aes_mod_init(void)
+{
+ return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+ crypto_unregister_alg(&aes_alg);
+}
+
+module_cpu_feature_match(AES, aes_mod_init);
+module_exit(aes_mod_exit);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 08/15] arm64/crypto: AES using ARMv8 Crypto Extensions
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This patch adds support for the AES symmetric encryption algorithm for CPUs
that have support for the AES part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 7 +-
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-ce-cipher.c | 155 ++++++++++++++++++++++++++++++++++++++
3 files changed, 164 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 0c50859ee7b9..9ba32c0da871 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,10 +18,15 @@ config CRYPTO_SHA2_ARM64_CE
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
-
config CRYPTO_GHASH_ARM64_CE
tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_HASH
+config CRYPTO_AES_ARM64_CE
+ tristate "AES core cipher using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_ALGAPI
+ select CRYPTO_AES
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index e8c81a068868..908abd9242b1 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -16,3 +16,6 @@ sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
new file mode 100644
index 000000000000..2075e1acae6b
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -0,0 +1,155 @@
+/*
+ * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/cpufeature.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+struct aes_block {
+ u8 b[AES_BLOCK_SIZE];
+};
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+ /*
+ * # of rounds specified by AES:
+ * 128 bit key 10 rounds
+ * 192 bit key 12 rounds
+ * 256 bit key 14 rounds
+ * => n byte key => 6 + (n/4) rounds
+ */
+ return 6 + ctx->key_length / 4;
+}
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ struct aes_block *out = (struct aes_block *)dst;
+ struct aes_block const *in = (struct aes_block *)src;
+ void *dummy0;
+ int dummy1;
+
+ kernel_neon_begin_partial(4);
+
+ __asm__(" ld1 {v0.16b}, %[in] ;"
+ " ld1 {v1.2d}, [%[key]], #16 ;"
+ " cmp %w[rounds], #10 ;"
+ " bmi 0f ;"
+ " bne 3f ;"
+ " mov v3.16b, v1.16b ;"
+ " b 2f ;"
+ "0: mov v2.16b, v1.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ "1: aese v0.16b, v2.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ " aese v0.16b, v3.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ " subs %w[rounds], %w[rounds], #3 ;"
+ " aese v0.16b, v1.16b ;"
+ " aesmc v0.16b, v0.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ " bpl 1b ;"
+ " aese v0.16b, v2.16b ;"
+ " eor v0.16b, v0.16b, v3.16b ;"
+ " st1 {v0.16b}, %[out] ;"
+
+ : [out] "=Q"(*out),
+ [key] "=r"(dummy0),
+ [rounds] "=r"(dummy1)
+ : [in] "Q"(*in),
+ "1"(ctx->key_enc),
+ "2"(num_rounds(ctx) - 2)
+ : "cc");
+
+ kernel_neon_end();
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+ struct aes_block *out = (struct aes_block *)dst;
+ struct aes_block const *in = (struct aes_block *)src;
+ void *dummy0;
+ int dummy1;
+
+ kernel_neon_begin_partial(4);
+
+ __asm__(" ld1 {v0.16b}, %[in] ;"
+ " ld1 {v1.2d}, [%[key]], #16 ;"
+ " cmp %w[rounds], #10 ;"
+ " bmi 0f ;"
+ " bne 3f ;"
+ " mov v3.16b, v1.16b ;"
+ " b 2f ;"
+ "0: mov v2.16b, v1.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ "1: aesd v0.16b, v2.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ "2: ld1 {v1.2d}, [%[key]], #16 ;"
+ " aesd v0.16b, v3.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ "3: ld1 {v2.2d}, [%[key]], #16 ;"
+ " subs %w[rounds], %w[rounds], #3 ;"
+ " aesd v0.16b, v1.16b ;"
+ " aesimc v0.16b, v0.16b ;"
+ " ld1 {v3.2d}, [%[key]], #16 ;"
+ " bpl 1b ;"
+ " aesd v0.16b, v2.16b ;"
+ " eor v0.16b, v0.16b, v3.16b ;"
+ " st1 {v0.16b}, %[out] ;"
+
+ : [out] "=Q"(*out),
+ [key] "=r"(dummy0),
+ [rounds] "=r"(dummy1)
+ : [in] "Q"(*in),
+ "1"(ctx->key_dec),
+ "2"(num_rounds(ctx) - 2)
+ : "cc");
+
+ kernel_neon_end();
+}
+
+static struct crypto_alg aes_alg = {
+ .cra_name = "aes",
+ .cra_driver_name = "aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = crypto_aes_set_key,
+ .cia_encrypt = aes_cipher_encrypt,
+ .cia_decrypt = aes_cipher_decrypt
+ }
+};
+
+static int __init aes_mod_init(void)
+{
+ return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+ crypto_unregister_alg(&aes_alg);
+}
+
+module_cpu_feature_match(AES, aes_mod_init);
+module_exit(aes_mod_exit);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 09/15] arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-01 15:49 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: catalin.marinas, will.deacon, steve.capper, Ard Biesheuvel
This patch adds support for the AES-CCM encryption algorithm for CPUs that
have support for the AES part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
---
arch/arm64/crypto/Kconfig | 7 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-ce-ccm-core.S | 222 +++++++++++++++++++++++++++
arch/arm64/crypto/aes-ce-ccm-glue.c | 297 ++++++++++++++++++++++++++++++++++++
4 files changed, 529 insertions(+)
create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 9ba32c0da871..8fffd5af65ef 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -29,4 +29,11 @@ config CRYPTO_AES_ARM64_CE
select CRYPTO_ALGAPI
select CRYPTO_AES
+config CRYPTO_AES_ARM64_CE_CCM
+ tristate "AES in CCM mode using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_ALGAPI
+ select CRYPTO_AES
+ select CRYPTO_AEAD
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 908abd9242b1..311287d68078 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -19,3 +19,6 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
+aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
new file mode 100644
index 000000000000..a2a7fbcacc14
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -0,0 +1,222 @@
+/*
+ * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+ .text
+ .arch armv8-a+crypto
+
+ /*
+ * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+ * u32 *macp, u8 const rk[], u32 rounds);
+ */
+ENTRY(ce_aes_ccm_auth_data)
+ ldr w8, [x3] /* leftover from prev round? */
+ ld1 {v0.2d}, [x0] /* load mac */
+ cbz w8, 1f
+ sub w8, w8, #16
+ eor v1.16b, v1.16b, v1.16b
+0: ldrb w7, [x1], #1 /* get 1 byte of input */
+ subs w2, w2, #1
+ add w8, w8, #1
+ ins v1.b[0], w7
+ ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */
+ beq 8f /* out of input? */
+ cbnz w8, 0b
+ eor v0.16b, v0.16b, v1.16b
+1: ld1 {v3.2d}, [x4] /* load first round key */
+ prfm pldl1strm, [x1]
+ cmp w5, #12 /* which key size? */
+ add x6, x4, #16
+ sub w7, w5, #2 /* modified # of rounds */
+ bmi 2f
+ bne 5f
+ mov v5.16b, v3.16b
+ b 4f
+2: mov v4.16b, v3.16b
+ ld1 {v5.2d}, [x6], #16 /* load 2nd round key */
+3: aese v0.16b, v4.16b
+ aesmc v0.16b, v0.16b
+4: ld1 {v3.2d}, [x6], #16 /* load next round key */
+ aese v0.16b, v5.16b
+ aesmc v0.16b, v0.16b
+5: ld1 {v4.2d}, [x6], #16 /* load next round key */
+ subs w7, w7, #3
+ aese v0.16b, v3.16b
+ aesmc v0.16b, v0.16b
+ ld1 {v5.2d}, [x6], #16 /* load next round key */
+ bpl 3b
+ aese v0.16b, v4.16b
+ subs w2, w2, #16 /* last data? */
+ eor v0.16b, v0.16b, v5.16b /* final round */
+ bmi 6f
+ ld1 {v1.16b}, [x1], #16 /* load next input block */
+ eor v0.16b, v0.16b, v1.16b /* xor with mac */
+ bne 1b
+6: st1 {v0.2d}, [x0] /* store mac */
+ beq 10f
+ adds w2, w2, #16
+ beq 10f
+ mov w8, w2
+7: ldrb w7, [x1], #1
+ umov w6, v0.b[0]
+ eor w6, w6, w7
+ strb w6, [x0], #1
+ subs w2, w2, #1
+ beq 10f
+ ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */
+ b 7b
+8: mov w7, w8
+ add w8, w8, #16
+9: ext v1.16b, v1.16b, v1.16b, #1
+ adds w7, w7, #1
+ bne 9b
+ eor v0.16b, v0.16b, v1.16b
+ st1 {v0.2d}, [x0]
+10: str w8, [x3]
+ ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+ /*
+ * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+ * u32 rounds);
+ */
+ENTRY(ce_aes_ccm_final)
+ ld1 {v3.2d}, [x2], #16 /* load first round key */
+ ld1 {v0.2d}, [x0] /* load mac */
+ cmp w3, #12 /* which key size? */
+ sub w3, w3, #2 /* modified # of rounds */
+ ld1 {v1.2d}, [x1] /* load 1st ctriv */
+ bmi 0f
+ bne 3f
+ mov v5.16b, v3.16b
+ b 2f
+0: mov v4.16b, v3.16b
+1: ld1 {v5.2d}, [x2], #16 /* load next round key */
+ aese v0.16b, v4.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v4.16b
+ aesmc v1.16b, v1.16b
+2: ld1 {v3.2d}, [x2], #16 /* load next round key */
+ aese v0.16b, v5.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v5.16b
+ aesmc v1.16b, v1.16b
+3: ld1 {v4.2d}, [x2], #16 /* load next round key */
+ subs w3, w3, #3
+ aese v0.16b, v3.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v3.16b
+ aesmc v1.16b, v1.16b
+ bpl 1b
+ aese v0.16b, v4.16b
+ aese v1.16b, v4.16b
+ /* final round key cancels out */
+ eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */
+ st1 {v0.2d}, [x0] /* store result */
+ ret
+ENDPROC(ce_aes_ccm_final)
+
+ .macro aes_ccm_do_crypt,enc
+ ldr x8, [x6, #8] /* load lower ctr */
+ ld1 {v0.2d}, [x5] /* load mac */
+ rev x8, x8 /* keep swabbed ctr in reg */
+0: /* outer loop */
+ ld1 {v1.1d}, [x6] /* load upper ctr */
+ prfm pldl1strm, [x1]
+ add x8, x8, #1
+ rev x9, x8
+ cmp w4, #12 /* which key size? */
+ sub w7, w4, #2 /* get modified # of rounds */
+ ins v1.d[1], x9 /* no carry in lower ctr */
+ ld1 {v3.2d}, [x3] /* load first round key */
+ add x10, x3, #16
+ bmi 1f
+ bne 4f
+ mov v5.16b, v3.16b
+ b 3f
+1: mov v4.16b, v3.16b
+ ld1 {v5.2d}, [x10], #16 /* load 2nd round key */
+2: /* inner loop: 3 rounds, 2x interleaved */
+ aese v0.16b, v4.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v4.16b
+ aesmc v1.16b, v1.16b
+3: ld1 {v3.2d}, [x10], #16 /* load next round key */
+ aese v0.16b, v5.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v5.16b
+ aesmc v1.16b, v1.16b
+4: ld1 {v4.2d}, [x10], #16 /* load next round key */
+ subs w7, w7, #3
+ aese v0.16b, v3.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v3.16b
+ aesmc v1.16b, v1.16b
+ ld1 {v5.2d}, [x10], #16 /* load next round key */
+ bpl 2b
+ aese v0.16b, v4.16b
+ aese v1.16b, v4.16b
+ subs w2, w2, #16
+ bmi 6f /* partial block? */
+ ld1 {v2.16b}, [x1], #16 /* load next input block */
+ .if \enc == 1
+ eor v2.16b, v2.16b, v5.16b /* final round enc+mac */
+ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
+ .else
+ eor v2.16b, v2.16b, v1.16b /* xor with crypted ctr */
+ eor v1.16b, v2.16b, v5.16b /* final round enc */
+ .endif
+ eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
+ st1 {v1.16b}, [x0], #16 /* write output block */
+ bne 0b
+ rev x8, x8
+ st1 {v0.2d}, [x5] /* store mac */
+ str x8, [x6, #8] /* store lsb end of ctr (BE) */
+5: ret
+
+6: eor v0.16b, v0.16b, v5.16b /* final round mac */
+ eor v1.16b, v1.16b, v5.16b /* final round enc */
+ st1 {v0.2d}, [x5] /* store mac */
+ add w2, w2, #16 /* process partial tail block */
+7: ldrb w9, [x1], #1 /* get 1 byte of input */
+ umov w6, v1.b[0] /* get top crypted ctr byte */
+ umov w7, v0.b[0] /* get top mac byte */
+ .if \enc == 1
+ eor w7, w7, w9
+ eor w9, w9, w6
+ .else
+ eor w9, w9, w6
+ eor w7, w7, w9
+ .endif
+ strb w9, [x0], #1 /* store out byte */
+ strb w7, [x5], #1 /* store mac byte */
+ subs w2, w2, #1
+ beq 5b
+ ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */
+ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */
+ b 7b
+ .endm
+
+ /*
+ * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+ * u8 const rk[], u32 rounds, u8 mac[],
+ * u8 ctr[]);
+ * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+ * u8 const rk[], u32 rounds, u8 mac[],
+ * u8 ctr[]);
+ */
+ENTRY(ce_aes_ccm_encrypt)
+ aes_ccm_do_crypt 1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+ aes_ccm_do_crypt 0
+ENDPROC(ce_aes_ccm_decrypt)
diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
new file mode 100644
index 000000000000..9e6cdde9b43d
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -0,0 +1,297 @@
+/*
+ * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+ /*
+ * # of rounds specified by AES:
+ * 128 bit key 10 rounds
+ * 192 bit key 12 rounds
+ * 256 bit key 14 rounds
+ * => n byte key => 6 + (n/4) rounds
+ */
+ return 6 + ctx->key_length / 4;
+}
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+ u32 *macp, u32 const rk[], u32 rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+ u32 const rk[], u32 rounds, u8 mac[],
+ u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+ u32 const rk[], u32 rounds, u8 mac[],
+ u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+ u32 rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm);
+ int ret;
+
+ ret = crypto_aes_expand_key(ctx, in_key, key_len);
+ if (!ret)
+ return 0;
+
+ tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+ return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+ if ((authsize & 1) || authsize < 4)
+ return -EINVAL;
+ return 0;
+}
+
+static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ __be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8];
+ u32 l = req->iv[0] + 1;
+
+ /* verify that CCM dimension 'L' is set correctly in the IV */
+ if (l < 2 || l > 8)
+ return -EINVAL;
+
+ /* verify that msglen can in fact be represented in L bytes */
+ if (l < 4 && msglen >> (8 * l))
+ return -EOVERFLOW;
+
+ /*
+ * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi
+ * uses a u32 type to represent msglen so the top 4 bytes are always 0.
+ */
+ n[0] = 0;
+ n[1] = cpu_to_be32(msglen);
+
+ memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+ /*
+ * Meaning of byte 0 according to CCM spec (RFC 3610/NIST 800-38C)
+ * - bits 0..2 : max # of bytes required to represent msglen, minus 1
+ * (already set by caller)
+ * - bits 3..5 : size of auth tag (1 => 4 bytes, 2 => 6 bytes, etc)
+ * - bit 6 : indicates presence of authenticate-only data
+ */
+ maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+ if (req->assoclen)
+ maciv[0] |= 0x40;
+
+ memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+ return 0;
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+ struct __packed { __be16 l; __be32 h; u16 len; } ltag;
+ struct scatter_walk walk;
+ u32 len = req->assoclen;
+ u32 macp = 0;
+
+ /* prepend the AAD with a length tag */
+ if (len < 0xff00) {
+ ltag.l = cpu_to_be16(len);
+ ltag.len = 2;
+ } else {
+ ltag.l = cpu_to_be16(0xfffe);
+ put_unaligned_be32(len, <ag.h);
+ ltag.len = 6;
+ }
+
+ ce_aes_ccm_auth_data(mac, (u8 *)<ag, ltag.len, &macp, ctx->key_enc,
+ num_rounds(ctx));
+ scatterwalk_start(&walk, req->assoc);
+
+ do {
+ u32 n = scatterwalk_clamp(&walk, len);
+ u8 *p;
+
+ if (!n) {
+ scatterwalk_start(&walk, sg_next(walk.sg));
+ n = scatterwalk_clamp(&walk, len);
+ }
+ p = scatterwalk_map(&walk);
+ ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
+ num_rounds(ctx));
+ len -= n;
+
+ scatterwalk_unmap(p);
+ scatterwalk_advance(&walk, n);
+ scatterwalk_done(&walk, 0, len);
+ } while (len);
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+ struct blkcipher_desc desc = { .info = req->iv };
+ struct blkcipher_walk walk;
+ u8 __aligned(8) mac[AES_BLOCK_SIZE];
+ u8 buf[AES_BLOCK_SIZE];
+ u32 len = req->cryptlen;
+ int err;
+
+ err = ccm_init_mac(req, mac, len);
+ if (err)
+ return err;
+
+ kernel_neon_begin_partial(6);
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, mac);
+
+ /* preserve the original iv for the final round */
+ memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+ blkcipher_walk_init(&walk, req->dst, req->src, len);
+ err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+ AES_BLOCK_SIZE);
+
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+ if (walk.nbytes == len)
+ tail = 0;
+
+ ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc,
+ num_rounds(ctx), mac, walk.iv);
+
+ len -= walk.nbytes - tail;
+ err = blkcipher_walk_done(&desc, &walk, tail);
+ }
+ if (!err)
+ ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+ kernel_neon_end();
+
+ if (err)
+ return err;
+
+ /* copy authtag to end of dst */
+ scatterwalk_map_and_copy(mac, req->dst, req->cryptlen,
+ crypto_aead_authsize(aead), 1);
+
+ return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+ unsigned int authsize = crypto_aead_authsize(aead);
+ struct blkcipher_desc desc = { .info = req->iv };
+ struct blkcipher_walk walk;
+ u8 __aligned(8) mac[AES_BLOCK_SIZE];
+ u8 buf[AES_BLOCK_SIZE];
+ u32 len = req->cryptlen - authsize;
+ int err;
+
+ err = ccm_init_mac(req, mac, len);
+ if (err)
+ return err;
+
+ kernel_neon_begin_partial(6);
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, mac);
+
+ /* preserve the original iv for the final round */
+ memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+ blkcipher_walk_init(&walk, req->dst, req->src, len);
+ err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+ AES_BLOCK_SIZE);
+
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+ if (walk.nbytes == len)
+ tail = 0;
+
+ ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc,
+ num_rounds(ctx), mac, walk.iv);
+
+ len -= walk.nbytes - tail;
+ err = blkcipher_walk_done(&desc, &walk, tail);
+ }
+ if (!err)
+ ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+ kernel_neon_end();
+
+ if (err)
+ return err;
+
+ /* compare calculated auth tag with the stored one */
+ scatterwalk_map_and_copy(buf, req->src, req->cryptlen - authsize,
+ authsize, 0);
+
+ if (memcmp(mac, buf, authsize))
+ return -EBADMSG;
+ return 0;
+}
+
+static struct crypto_alg ccm_aes_alg = {
+ .cra_name = "ccm(aes)",
+ .cra_driver_name = "ccm-aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_AEAD,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_alignmask = 7,
+ .cra_type = &crypto_aead_type,
+ .cra_module = THIS_MODULE,
+ .cra_aead = {
+ .ivsize = AES_BLOCK_SIZE,
+ .maxauthsize = AES_BLOCK_SIZE,
+ .setkey = ccm_setkey,
+ .setauthsize = ccm_setauthsize,
+ .encrypt = ccm_encrypt,
+ .decrypt = ccm_decrypt,
+ }
+};
+
+static int __init aes_mod_init(void)
+{
+ if (!(elf_hwcap & HWCAP_AES))
+ return -ENODEV;
+ return crypto_register_alg(&ccm_aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+ crypto_unregister_alg(&ccm_aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES in CCM mode using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("ccm(aes)");
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH resend 09/15] arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
@ 2014-05-01 15:49 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-01 15:49 UTC (permalink / raw)
To: linux-arm-kernel
This patch adds support for the AES-CCM encryption algorithm for CPUs that
have support for the AES part of the ARM v8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
---
arch/arm64/crypto/Kconfig | 7 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-ce-ccm-core.S | 222 +++++++++++++++++++++++++++
arch/arm64/crypto/aes-ce-ccm-glue.c | 297 ++++++++++++++++++++++++++++++++++++
4 files changed, 529 insertions(+)
create mode 100644 arch/arm64/crypto/aes-ce-ccm-core.S
create mode 100644 arch/arm64/crypto/aes-ce-ccm-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 9ba32c0da871..8fffd5af65ef 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -29,4 +29,11 @@ config CRYPTO_AES_ARM64_CE
select CRYPTO_ALGAPI
select CRYPTO_AES
+config CRYPTO_AES_ARM64_CE_CCM
+ tristate "AES in CCM mode using ARMv8 Crypto Extensions"
+ depends on ARM64 && KERNEL_MODE_NEON
+ select CRYPTO_ALGAPI
+ select CRYPTO_AES
+ select CRYPTO_AEAD
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 908abd9242b1..311287d68078 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -19,3 +19,6 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
+aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
new file mode 100644
index 000000000000..a2a7fbcacc14
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-core.S
@@ -0,0 +1,222 @@
+/*
+ * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+ .text
+ .arch armv8-a+crypto
+
+ /*
+ * void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+ * u32 *macp, u8 const rk[], u32 rounds);
+ */
+ENTRY(ce_aes_ccm_auth_data)
+ ldr w8, [x3] /* leftover from prev round? */
+ ld1 {v0.2d}, [x0] /* load mac */
+ cbz w8, 1f
+ sub w8, w8, #16
+ eor v1.16b, v1.16b, v1.16b
+0: ldrb w7, [x1], #1 /* get 1 byte of input */
+ subs w2, w2, #1
+ add w8, w8, #1
+ ins v1.b[0], w7
+ ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */
+ beq 8f /* out of input? */
+ cbnz w8, 0b
+ eor v0.16b, v0.16b, v1.16b
+1: ld1 {v3.2d}, [x4] /* load first round key */
+ prfm pldl1strm, [x1]
+ cmp w5, #12 /* which key size? */
+ add x6, x4, #16
+ sub w7, w5, #2 /* modified # of rounds */
+ bmi 2f
+ bne 5f
+ mov v5.16b, v3.16b
+ b 4f
+2: mov v4.16b, v3.16b
+ ld1 {v5.2d}, [x6], #16 /* load 2nd round key */
+3: aese v0.16b, v4.16b
+ aesmc v0.16b, v0.16b
+4: ld1 {v3.2d}, [x6], #16 /* load next round key */
+ aese v0.16b, v5.16b
+ aesmc v0.16b, v0.16b
+5: ld1 {v4.2d}, [x6], #16 /* load next round key */
+ subs w7, w7, #3
+ aese v0.16b, v3.16b
+ aesmc v0.16b, v0.16b
+ ld1 {v5.2d}, [x6], #16 /* load next round key */
+ bpl 3b
+ aese v0.16b, v4.16b
+ subs w2, w2, #16 /* last data? */
+ eor v0.16b, v0.16b, v5.16b /* final round */
+ bmi 6f
+ ld1 {v1.16b}, [x1], #16 /* load next input block */
+ eor v0.16b, v0.16b, v1.16b /* xor with mac */
+ bne 1b
+6: st1 {v0.2d}, [x0] /* store mac */
+ beq 10f
+ adds w2, w2, #16
+ beq 10f
+ mov w8, w2
+7: ldrb w7, [x1], #1
+ umov w6, v0.b[0]
+ eor w6, w6, w7
+ strb w6, [x0], #1
+ subs w2, w2, #1
+ beq 10f
+ ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */
+ b 7b
+8: mov w7, w8
+ add w8, w8, #16
+9: ext v1.16b, v1.16b, v1.16b, #1
+ adds w7, w7, #1
+ bne 9b
+ eor v0.16b, v0.16b, v1.16b
+ st1 {v0.2d}, [x0]
+10: str w8, [x3]
+ ret
+ENDPROC(ce_aes_ccm_auth_data)
+
+ /*
+ * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[],
+ * u32 rounds);
+ */
+ENTRY(ce_aes_ccm_final)
+ ld1 {v3.2d}, [x2], #16 /* load first round key */
+ ld1 {v0.2d}, [x0] /* load mac */
+ cmp w3, #12 /* which key size? */
+ sub w3, w3, #2 /* modified # of rounds */
+ ld1 {v1.2d}, [x1] /* load 1st ctriv */
+ bmi 0f
+ bne 3f
+ mov v5.16b, v3.16b
+ b 2f
+0: mov v4.16b, v3.16b
+1: ld1 {v5.2d}, [x2], #16 /* load next round key */
+ aese v0.16b, v4.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v4.16b
+ aesmc v1.16b, v1.16b
+2: ld1 {v3.2d}, [x2], #16 /* load next round key */
+ aese v0.16b, v5.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v5.16b
+ aesmc v1.16b, v1.16b
+3: ld1 {v4.2d}, [x2], #16 /* load next round key */
+ subs w3, w3, #3
+ aese v0.16b, v3.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v3.16b
+ aesmc v1.16b, v1.16b
+ bpl 1b
+ aese v0.16b, v4.16b
+ aese v1.16b, v4.16b
+ /* final round key cancels out */
+ eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */
+ st1 {v0.2d}, [x0] /* store result */
+ ret
+ENDPROC(ce_aes_ccm_final)
+
+ .macro aes_ccm_do_crypt,enc
+ ldr x8, [x6, #8] /* load lower ctr */
+ ld1 {v0.2d}, [x5] /* load mac */
+ rev x8, x8 /* keep swabbed ctr in reg */
+0: /* outer loop */
+ ld1 {v1.1d}, [x6] /* load upper ctr */
+ prfm pldl1strm, [x1]
+ add x8, x8, #1
+ rev x9, x8
+ cmp w4, #12 /* which key size? */
+ sub w7, w4, #2 /* get modified # of rounds */
+ ins v1.d[1], x9 /* no carry in lower ctr */
+ ld1 {v3.2d}, [x3] /* load first round key */
+ add x10, x3, #16
+ bmi 1f
+ bne 4f
+ mov v5.16b, v3.16b
+ b 3f
+1: mov v4.16b, v3.16b
+ ld1 {v5.2d}, [x10], #16 /* load 2nd round key */
+2: /* inner loop: 3 rounds, 2x interleaved */
+ aese v0.16b, v4.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v4.16b
+ aesmc v1.16b, v1.16b
+3: ld1 {v3.2d}, [x10], #16 /* load next round key */
+ aese v0.16b, v5.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v5.16b
+ aesmc v1.16b, v1.16b
+4: ld1 {v4.2d}, [x10], #16 /* load next round key */
+ subs w7, w7, #3
+ aese v0.16b, v3.16b
+ aesmc v0.16b, v0.16b
+ aese v1.16b, v3.16b
+ aesmc v1.16b, v1.16b
+ ld1 {v5.2d}, [x10], #16 /* load next round key */
+ bpl 2b
+ aese v0.16b, v4.16b
+ aese v1.16b, v4.16b
+ subs w2, w2, #16
+ bmi 6f /* partial block? */
+ ld1 {v2.16b}, [x1], #16 /* load next input block */
+ .if \enc == 1
+ eor v2.16b, v2.16b, v5.16b /* final round enc+mac */
+ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
+ .else
+ eor v2.16b, v2.16b, v1.16b /* xor with crypted ctr */
+ eor v1.16b, v2.16b, v5.16b /* final round enc */
+ .endif
+ eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
+ st1 {v1.16b}, [x0], #16 /* write output block */
+ bne 0b
+ rev x8, x8
+ st1 {v0.2d}, [x5] /* store mac */
+ str x8, [x6, #8] /* store lsb end of ctr (BE) */
+5: ret
+
+6: eor v0.16b, v0.16b, v5.16b /* final round mac */
+ eor v1.16b, v1.16b, v5.16b /* final round enc */
+ st1 {v0.2d}, [x5] /* store mac */
+ add w2, w2, #16 /* process partial tail block */
+7: ldrb w9, [x1], #1 /* get 1 byte of input */
+ umov w6, v1.b[0] /* get top crypted ctr byte */
+ umov w7, v0.b[0] /* get top mac byte */
+ .if \enc == 1
+ eor w7, w7, w9
+ eor w9, w9, w6
+ .else
+ eor w9, w9, w6
+ eor w7, w7, w9
+ .endif
+ strb w9, [x0], #1 /* store out byte */
+ strb w7, [x5], #1 /* store mac byte */
+ subs w2, w2, #1
+ beq 5b
+ ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */
+ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */
+ b 7b
+ .endm
+
+ /*
+ * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+ * u8 const rk[], u32 rounds, u8 mac[],
+ * u8 ctr[]);
+ * void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+ * u8 const rk[], u32 rounds, u8 mac[],
+ * u8 ctr[]);
+ */
+ENTRY(ce_aes_ccm_encrypt)
+ aes_ccm_do_crypt 1
+ENDPROC(ce_aes_ccm_encrypt)
+
+ENTRY(ce_aes_ccm_decrypt)
+ aes_ccm_do_crypt 0
+ENDPROC(ce_aes_ccm_decrypt)
diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
new file mode 100644
index 000000000000..9e6cdde9b43d
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -0,0 +1,297 @@
+/*
+ * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
+ *
+ * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <asm/unaligned.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+
+static int num_rounds(struct crypto_aes_ctx *ctx)
+{
+ /*
+ * # of rounds specified by AES:
+ * 128 bit key 10 rounds
+ * 192 bit key 12 rounds
+ * 256 bit key 14 rounds
+ * => n byte key => 6 + (n/4) rounds
+ */
+ return 6 + ctx->key_length / 4;
+}
+
+asmlinkage void ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes,
+ u32 *macp, u32 const rk[], u32 rounds);
+
+asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes,
+ u32 const rk[], u32 rounds, u8 mac[],
+ u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
+ u32 const rk[], u32 rounds, u8 mac[],
+ u8 ctr[]);
+
+asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
+ u32 rounds);
+
+static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm);
+ int ret;
+
+ ret = crypto_aes_expand_key(ctx, in_key, key_len);
+ if (!ret)
+ return 0;
+
+ tfm->base.crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+ return -EINVAL;
+}
+
+static int ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
+{
+ if ((authsize & 1) || authsize < 4)
+ return -EINVAL;
+ return 0;
+}
+
+static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ __be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8];
+ u32 l = req->iv[0] + 1;
+
+ /* verify that CCM dimension 'L' is set correctly in the IV */
+ if (l < 2 || l > 8)
+ return -EINVAL;
+
+ /* verify that msglen can in fact be represented in L bytes */
+ if (l < 4 && msglen >> (8 * l))
+ return -EOVERFLOW;
+
+ /*
+ * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi
+ * uses a u32 type to represent msglen so the top 4 bytes are always 0.
+ */
+ n[0] = 0;
+ n[1] = cpu_to_be32(msglen);
+
+ memcpy(maciv, req->iv, AES_BLOCK_SIZE - l);
+
+ /*
+ * Meaning of byte 0 according to CCM spec (RFC 3610/NIST 800-38C)
+ * - bits 0..2 : max # of bytes required to represent msglen, minus 1
+ * (already set by caller)
+ * - bits 3..5 : size of auth tag (1 => 4 bytes, 2 => 6 bytes, etc)
+ * - bit 6 : indicates presence of authenticate-only data
+ */
+ maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2;
+ if (req->assoclen)
+ maciv[0] |= 0x40;
+
+ memset(&req->iv[AES_BLOCK_SIZE - l], 0, l);
+ return 0;
+}
+
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+ struct __packed { __be16 l; __be32 h; u16 len; } ltag;
+ struct scatter_walk walk;
+ u32 len = req->assoclen;
+ u32 macp = 0;
+
+ /* prepend the AAD with a length tag */
+ if (len < 0xff00) {
+ ltag.l = cpu_to_be16(len);
+ ltag.len = 2;
+ } else {
+ ltag.l = cpu_to_be16(0xfffe);
+ put_unaligned_be32(len, <ag.h);
+ ltag.len = 6;
+ }
+
+ ce_aes_ccm_auth_data(mac, (u8 *)<ag, ltag.len, &macp, ctx->key_enc,
+ num_rounds(ctx));
+ scatterwalk_start(&walk, req->assoc);
+
+ do {
+ u32 n = scatterwalk_clamp(&walk, len);
+ u8 *p;
+
+ if (!n) {
+ scatterwalk_start(&walk, sg_next(walk.sg));
+ n = scatterwalk_clamp(&walk, len);
+ }
+ p = scatterwalk_map(&walk);
+ ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
+ num_rounds(ctx));
+ len -= n;
+
+ scatterwalk_unmap(p);
+ scatterwalk_advance(&walk, n);
+ scatterwalk_done(&walk, 0, len);
+ } while (len);
+}
+
+static int ccm_encrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+ struct blkcipher_desc desc = { .info = req->iv };
+ struct blkcipher_walk walk;
+ u8 __aligned(8) mac[AES_BLOCK_SIZE];
+ u8 buf[AES_BLOCK_SIZE];
+ u32 len = req->cryptlen;
+ int err;
+
+ err = ccm_init_mac(req, mac, len);
+ if (err)
+ return err;
+
+ kernel_neon_begin_partial(6);
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, mac);
+
+ /* preserve the original iv for the final round */
+ memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+ blkcipher_walk_init(&walk, req->dst, req->src, len);
+ err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+ AES_BLOCK_SIZE);
+
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+ if (walk.nbytes == len)
+ tail = 0;
+
+ ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc,
+ num_rounds(ctx), mac, walk.iv);
+
+ len -= walk.nbytes - tail;
+ err = blkcipher_walk_done(&desc, &walk, tail);
+ }
+ if (!err)
+ ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+ kernel_neon_end();
+
+ if (err)
+ return err;
+
+ /* copy authtag to end of dst */
+ scatterwalk_map_and_copy(mac, req->dst, req->cryptlen,
+ crypto_aead_authsize(aead), 1);
+
+ return 0;
+}
+
+static int ccm_decrypt(struct aead_request *req)
+{
+ struct crypto_aead *aead = crypto_aead_reqtfm(req);
+ struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
+ unsigned int authsize = crypto_aead_authsize(aead);
+ struct blkcipher_desc desc = { .info = req->iv };
+ struct blkcipher_walk walk;
+ u8 __aligned(8) mac[AES_BLOCK_SIZE];
+ u8 buf[AES_BLOCK_SIZE];
+ u32 len = req->cryptlen - authsize;
+ int err;
+
+ err = ccm_init_mac(req, mac, len);
+ if (err)
+ return err;
+
+ kernel_neon_begin_partial(6);
+
+ if (req->assoclen)
+ ccm_calculate_auth_mac(req, mac);
+
+ /* preserve the original iv for the final round */
+ memcpy(buf, req->iv, AES_BLOCK_SIZE);
+
+ blkcipher_walk_init(&walk, req->dst, req->src, len);
+ err = blkcipher_aead_walk_virt_block(&desc, &walk, aead,
+ AES_BLOCK_SIZE);
+
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+
+ if (walk.nbytes == len)
+ tail = 0;
+
+ ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc,
+ num_rounds(ctx), mac, walk.iv);
+
+ len -= walk.nbytes - tail;
+ err = blkcipher_walk_done(&desc, &walk, tail);
+ }
+ if (!err)
+ ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+
+ kernel_neon_end();
+
+ if (err)
+ return err;
+
+ /* compare calculated auth tag with the stored one */
+ scatterwalk_map_and_copy(buf, req->src, req->cryptlen - authsize,
+ authsize, 0);
+
+ if (memcmp(mac, buf, authsize))
+ return -EBADMSG;
+ return 0;
+}
+
+static struct crypto_alg ccm_aes_alg = {
+ .cra_name = "ccm(aes)",
+ .cra_driver_name = "ccm-aes-ce",
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_AEAD,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_alignmask = 7,
+ .cra_type = &crypto_aead_type,
+ .cra_module = THIS_MODULE,
+ .cra_aead = {
+ .ivsize = AES_BLOCK_SIZE,
+ .maxauthsize = AES_BLOCK_SIZE,
+ .setkey = ccm_setkey,
+ .setauthsize = ccm_setauthsize,
+ .encrypt = ccm_encrypt,
+ .decrypt = ccm_decrypt,
+ }
+};
+
+static int __init aes_mod_init(void)
+{
+ if (!(elf_hwcap & HWCAP_AES))
+ return -ENODEV;
+ return crypto_register_alg(&ccm_aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+ crypto_unregister_alg(&ccm_aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
+
+MODULE_DESCRIPTION("Synchronous AES in CCM mode using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("ccm(aes)");
--
1.8.3.2
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-06 14:31 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:31 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper,
Russell King - ARM Linux
On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> Switch the default unaligned access method to 'hardware implemented'
> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
> include/asm-generic/unaligned.h | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
I'm happy to take this patch via the arm64 tree. But arm is affected as
well, so it would be good to know if Russell has any objections (cc'ed).
Patch below for reference. Thanks.
Catalin
> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
> index 03cf5936bad6..1ac097279db1 100644
> --- a/include/asm-generic/unaligned.h
> +++ b/include/asm-generic/unaligned.h
> @@ -4,22 +4,27 @@
> /*
> * This is the most generic implementation of unaligned accesses
> * and should work almost anywhere.
> - *
> - * If an architecture can handle unaligned accesses in hardware,
> - * it may want to use the linux/unaligned/access_ok.h implementation
> - * instead.
> */
> #include <asm/byteorder.h>
>
> +/* Set by the arch if it can handle unaligned accesses in hardware. */
> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/access_ok.h>
> +#endif
> +
> #if defined(__LITTLE_ENDIAN)
> -# include <linux/unaligned/le_struct.h>
> -# include <linux/unaligned/be_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/le_struct.h>
> +# include <linux/unaligned/be_byteshift.h>
> +# endif
> # include <linux/unaligned/generic.h>
> # define get_unaligned __get_unaligned_le
> # define put_unaligned __put_unaligned_le
> #elif defined(__BIG_ENDIAN)
> -# include <linux/unaligned/be_struct.h>
> -# include <linux/unaligned/le_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/be_struct.h>
> +# include <linux/unaligned/le_byteshift.h>
> +# endif
> # include <linux/unaligned/generic.h>
> # define get_unaligned __get_unaligned_be
> # define put_unaligned __put_unaligned_be
> --
> 1.8.3.2
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-06 14:31 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:31 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> Switch the default unaligned access method to 'hardware implemented'
> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
> include/asm-generic/unaligned.h | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
I'm happy to take this patch via the arm64 tree. But arm is affected as
well, so it would be good to know if Russell has any objections (cc'ed).
Patch below for reference. Thanks.
Catalin
> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
> index 03cf5936bad6..1ac097279db1 100644
> --- a/include/asm-generic/unaligned.h
> +++ b/include/asm-generic/unaligned.h
> @@ -4,22 +4,27 @@
> /*
> * This is the most generic implementation of unaligned accesses
> * and should work almost anywhere.
> - *
> - * If an architecture can handle unaligned accesses in hardware,
> - * it may want to use the linux/unaligned/access_ok.h implementation
> - * instead.
> */
> #include <asm/byteorder.h>
>
> +/* Set by the arch if it can handle unaligned accesses in hardware. */
> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/access_ok.h>
> +#endif
> +
> #if defined(__LITTLE_ENDIAN)
> -# include <linux/unaligned/le_struct.h>
> -# include <linux/unaligned/be_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/le_struct.h>
> +# include <linux/unaligned/be_byteshift.h>
> +# endif
> # include <linux/unaligned/generic.h>
> # define get_unaligned __get_unaligned_le
> # define put_unaligned __put_unaligned_le
> #elif defined(__BIG_ENDIAN)
> -# include <linux/unaligned/be_struct.h>
> -# include <linux/unaligned/le_byteshift.h>
> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +# include <linux/unaligned/be_struct.h>
> +# include <linux/unaligned/le_byteshift.h>
> +# endif
> # include <linux/unaligned/generic.h>
> # define get_unaligned __get_unaligned_be
> # define put_unaligned __put_unaligned_be
> --
> 1.8.3.2
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
2014-05-06 14:31 ` Catalin Marinas
@ 2014-05-06 14:34 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:34 UTC (permalink / raw)
To: Catalin Marinas
Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper,
Russell King - ARM Linux
On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
>> Switch the default unaligned access method to 'hardware implemented'
>> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Acked-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>> include/asm-generic/unaligned.h | 21 +++++++++++++--------
>> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> I'm happy to take this patch via the arm64 tree. But arm is affected as
> well, so it would be good to know if Russell has any objections (cc'ed).
>
> Patch below for reference. Thanks.
>
Russell has already replied to that:
http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2
Regards,
Ard.
>> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
>> index 03cf5936bad6..1ac097279db1 100644
>> --- a/include/asm-generic/unaligned.h
>> +++ b/include/asm-generic/unaligned.h
>> @@ -4,22 +4,27 @@
>> /*
>> * This is the most generic implementation of unaligned accesses
>> * and should work almost anywhere.
>> - *
>> - * If an architecture can handle unaligned accesses in hardware,
>> - * it may want to use the linux/unaligned/access_ok.h implementation
>> - * instead.
>> */
>> #include <asm/byteorder.h>
>>
>> +/* Set by the arch if it can handle unaligned accesses in hardware. */
>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/access_ok.h>
>> +#endif
>> +
>> #if defined(__LITTLE_ENDIAN)
>> -# include <linux/unaligned/le_struct.h>
>> -# include <linux/unaligned/be_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/le_struct.h>
>> +# include <linux/unaligned/be_byteshift.h>
>> +# endif
>> # include <linux/unaligned/generic.h>
>> # define get_unaligned __get_unaligned_le
>> # define put_unaligned __put_unaligned_le
>> #elif defined(__BIG_ENDIAN)
>> -# include <linux/unaligned/be_struct.h>
>> -# include <linux/unaligned/le_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/be_struct.h>
>> +# include <linux/unaligned/le_byteshift.h>
>> +# endif
>> # include <linux/unaligned/generic.h>
>> # define get_unaligned __get_unaligned_be
>> # define put_unaligned __put_unaligned_be
>> --
>> 1.8.3.2
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-06 14:34 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:34 UTC (permalink / raw)
To: linux-arm-kernel
On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
>> Switch the default unaligned access method to 'hardware implemented'
>> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Acked-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>> include/asm-generic/unaligned.h | 21 +++++++++++++--------
>> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> I'm happy to take this patch via the arm64 tree. But arm is affected as
> well, so it would be good to know if Russell has any objections (cc'ed).
>
> Patch below for reference. Thanks.
>
Russell has already replied to that:
http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2
Regards,
Ard.
>> diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
>> index 03cf5936bad6..1ac097279db1 100644
>> --- a/include/asm-generic/unaligned.h
>> +++ b/include/asm-generic/unaligned.h
>> @@ -4,22 +4,27 @@
>> /*
>> * This is the most generic implementation of unaligned accesses
>> * and should work almost anywhere.
>> - *
>> - * If an architecture can handle unaligned accesses in hardware,
>> - * it may want to use the linux/unaligned/access_ok.h implementation
>> - * instead.
>> */
>> #include <asm/byteorder.h>
>>
>> +/* Set by the arch if it can handle unaligned accesses in hardware. */
>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/access_ok.h>
>> +#endif
>> +
>> #if defined(__LITTLE_ENDIAN)
>> -# include <linux/unaligned/le_struct.h>
>> -# include <linux/unaligned/be_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/le_struct.h>
>> +# include <linux/unaligned/be_byteshift.h>
>> +# endif
>> # include <linux/unaligned/generic.h>
>> # define get_unaligned __get_unaligned_le
>> # define put_unaligned __put_unaligned_le
>> #elif defined(__BIG_ENDIAN)
>> -# include <linux/unaligned/be_struct.h>
>> -# include <linux/unaligned/le_byteshift.h>
>> +# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +# include <linux/unaligned/be_struct.h>
>> +# include <linux/unaligned/le_byteshift.h>
>> +# endif
>> # include <linux/unaligned/generic.h>
>> # define get_unaligned __get_unaligned_be
>> # define put_unaligned __put_unaligned_be
>> --
>> 1.8.3.2
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-06 14:43 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:43 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel
On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 4aef42a04bdc..86ac6a9bc86a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> preempt_enable();
> }
>
> +/*
> + * Save the userland FPSIMD state of 'current' to memory
> + */
> +void fpsimd_preserve_current_state(void)
> +{
> + fpsimd_save_state(¤t->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load the userland FPSIMD state of 'current' from memory
> + */
> +void fpsimd_restore_current_state(void)
> +{
> + fpsimd_load_state(¤t->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load an updated userland FPSIMD state for 'current' from memory
> + */
> +void fpsimd_update_current_state(struct fpsimd_state *state)
> +{
> + preempt_disable();
> + fpsimd_load_state(state);
> + preempt_enable();
> +}
Minor - please update the comment above the functions to state that
preemption needs to be disabled by the caller.
> +/*
> + * Invalidate live CPU copies of task t's FPSIMD state
> + */
> +void fpsimd_flush_task_state(struct task_struct *t)
> +{
> +}
I guess this will be added in a subsequent patch. You could either move
it there or add a comment in the commit log that it is a dummy function
for now (I prefer the former).
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 14:43 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 14:43 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 4aef42a04bdc..86ac6a9bc86a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> preempt_enable();
> }
>
> +/*
> + * Save the userland FPSIMD state of 'current' to memory
> + */
> +void fpsimd_preserve_current_state(void)
> +{
> + fpsimd_save_state(¤t->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load the userland FPSIMD state of 'current' from memory
> + */
> +void fpsimd_restore_current_state(void)
> +{
> + fpsimd_load_state(¤t->thread.fpsimd_state);
> +}
> +
> +/*
> + * Load an updated userland FPSIMD state for 'current' from memory
> + */
> +void fpsimd_update_current_state(struct fpsimd_state *state)
> +{
> + preempt_disable();
> + fpsimd_load_state(state);
> + preempt_enable();
> +}
Minor - please update the comment above the functions to state that
preemption needs to be disabled by the caller.
> +/*
> + * Invalidate live CPU copies of task t's FPSIMD state
> + */
> +void fpsimd_flush_task_state(struct task_struct *t)
> +{
> +}
I guess this will be added in a subsequent patch. You could either move
it there or add a comment in the commit log that it is a dummy function
for now (I prefer the former).
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
2014-05-06 14:43 ` Catalin Marinas
@ 2014-05-06 14:48 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:48 UTC (permalink / raw)
To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 4aef42a04bdc..86ac6a9bc86a 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
>> preempt_enable();
>> }
>>
>> +/*
>> + * Save the userland FPSIMD state of 'current' to memory
>> + */
>> +void fpsimd_preserve_current_state(void)
>> +{
>> + fpsimd_save_state(¤t->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load the userland FPSIMD state of 'current' from memory
>> + */
>> +void fpsimd_restore_current_state(void)
>> +{
>> + fpsimd_load_state(¤t->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load an updated userland FPSIMD state for 'current' from memory
>> + */
>> +void fpsimd_update_current_state(struct fpsimd_state *state)
>> +{
>> + preempt_disable();
>> + fpsimd_load_state(state);
>> + preempt_enable();
>> +}
>
> Minor - please update the comment above the functions to state that
> preemption needs to be disabled by the caller.
>
Do you mean in all three cases? And, by implication, that the
preempt_disable()/enable() pair should be moved to the call site for
fpsimd_update_current_state() ?
>> +/*
>> + * Invalidate live CPU copies of task t's FPSIMD state
>> + */
>> +void fpsimd_flush_task_state(struct task_struct *t)
>> +{
>> +}
>
> I guess this will be added in a subsequent patch. You could either move
> it there or add a comment in the commit log that it is a dummy function
> for now (I prefer the former).
>
OK
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 14:48 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 14:48 UTC (permalink / raw)
To: linux-arm-kernel
On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 4aef42a04bdc..86ac6a9bc86a 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
>> preempt_enable();
>> }
>>
>> +/*
>> + * Save the userland FPSIMD state of 'current' to memory
>> + */
>> +void fpsimd_preserve_current_state(void)
>> +{
>> + fpsimd_save_state(¤t->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load the userland FPSIMD state of 'current' from memory
>> + */
>> +void fpsimd_restore_current_state(void)
>> +{
>> + fpsimd_load_state(¤t->thread.fpsimd_state);
>> +}
>> +
>> +/*
>> + * Load an updated userland FPSIMD state for 'current' from memory
>> + */
>> +void fpsimd_update_current_state(struct fpsimd_state *state)
>> +{
>> + preempt_disable();
>> + fpsimd_load_state(state);
>> + preempt_enable();
>> +}
>
> Minor - please update the comment above the functions to state that
> preemption needs to be disabled by the caller.
>
Do you mean in all three cases? And, by implication, that the
preempt_disable()/enable() pair should be moved to the call site for
fpsimd_update_current_state() ?
>> +/*
>> + * Invalidate live CPU copies of task t's FPSIMD state
>> + */
>> +void fpsimd_flush_task_state(struct task_struct *t)
>> +{
>> +}
>
> I guess this will be added in a subsequent patch. You could either move
> it there or add a comment in the commit log that it is a dummy function
> for now (I prefer the former).
>
OK
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
2014-05-06 14:48 ` Ard Biesheuvel
@ 2014-05-06 15:12 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:12 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel
On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> >> index 4aef42a04bdc..86ac6a9bc86a 100644
> >> --- a/arch/arm64/kernel/fpsimd.c
> >> +++ b/arch/arm64/kernel/fpsimd.c
> >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> >> preempt_enable();
> >> }
> >>
> >> +/*
> >> + * Save the userland FPSIMD state of 'current' to memory
> >> + */
> >> +void fpsimd_preserve_current_state(void)
> >> +{
> >> + fpsimd_save_state(¤t->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load the userland FPSIMD state of 'current' from memory
> >> + */
> >> +void fpsimd_restore_current_state(void)
> >> +{
> >> + fpsimd_load_state(¤t->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load an updated userland FPSIMD state for 'current' from memory
> >> + */
> >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> >> +{
> >> + preempt_disable();
> >> + fpsimd_load_state(state);
> >> + preempt_enable();
> >> +}
> >
> > Minor - please update the comment above the functions to state that
> > preemption needs to be disabled by the caller.
> >
>
> Do you mean in all three cases? And, by implication, that the
> preempt_disable()/enable() pair should be moved to the call site for
> fpsimd_update_current_state() ?
No, just the comment for the first two functions updated.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 15:12 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:12 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> >> index 4aef42a04bdc..86ac6a9bc86a 100644
> >> --- a/arch/arm64/kernel/fpsimd.c
> >> +++ b/arch/arm64/kernel/fpsimd.c
> >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> >> preempt_enable();
> >> }
> >>
> >> +/*
> >> + * Save the userland FPSIMD state of 'current' to memory
> >> + */
> >> +void fpsimd_preserve_current_state(void)
> >> +{
> >> + fpsimd_save_state(¤t->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load the userland FPSIMD state of 'current' from memory
> >> + */
> >> +void fpsimd_restore_current_state(void)
> >> +{
> >> + fpsimd_load_state(¤t->thread.fpsimd_state);
> >> +}
> >> +
> >> +/*
> >> + * Load an updated userland FPSIMD state for 'current' from memory
> >> + */
> >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> >> +{
> >> + preempt_disable();
> >> + fpsimd_load_state(state);
> >> + preempt_enable();
> >> +}
> >
> > Minor - please update the comment above the functions to state that
> > preemption needs to be disabled by the caller.
> >
>
> Do you mean in all three cases? And, by implication, that the
> preempt_disable()/enable() pair should be moved to the call site for
> fpsimd_update_current_state() ?
No, just the comment for the first two functions updated.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
2014-05-06 14:34 ` Ard Biesheuvel
@ 2014-05-06 15:14 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:14 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper,
Russell King - ARM Linux
On Tue, May 06, 2014 at 03:34:23PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> >> Switch the default unaligned access method to 'hardware implemented'
> >> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> Acked-by: Arnd Bergmann <arnd@arndb.de>
> >> ---
> >> include/asm-generic/unaligned.h | 21 +++++++++++++--------
> >> 1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > I'm happy to take this patch via the arm64 tree. But arm is affected as
> > well, so it would be good to know if Russell has any objections (cc'ed).
> >
> > Patch below for reference. Thanks.
> >
>
> Russell has already replied to that:
> http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it
@ 2014-05-06 15:14 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:14 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 06, 2014 at 03:34:23PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 16:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:33PM +0100, Ard Biesheuvel wrote:
> >> Switch the default unaligned access method to 'hardware implemented'
> >> if HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> Acked-by: Arnd Bergmann <arnd@arndb.de>
> >> ---
> >> include/asm-generic/unaligned.h | 21 +++++++++++++--------
> >> 1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > I'm happy to take this patch via the arm64 tree. But arm is affected as
> > well, so it would be good to know if Russell has any objections (cc'ed).
> >
> > Patch below for reference. Thanks.
> >
>
> Russell has already replied to that:
> http://marc.info/?l=linux-arm-kernel&m=139696976302889&w=2
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
2014-05-06 15:12 ` Catalin Marinas
@ 2014-05-06 15:42 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:42 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On Tue, May 06, 2014 at 04:12:55PM +0100, Catalin Marinas wrote:
> On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> > On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> > >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > >> index 4aef42a04bdc..86ac6a9bc86a 100644
> > >> --- a/arch/arm64/kernel/fpsimd.c
> > >> +++ b/arch/arm64/kernel/fpsimd.c
> > >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> > >> preempt_enable();
> > >> }
> > >>
> > >> +/*
> > >> + * Save the userland FPSIMD state of 'current' to memory
> > >> + */
> > >> +void fpsimd_preserve_current_state(void)
> > >> +{
> > >> + fpsimd_save_state(¤t->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load the userland FPSIMD state of 'current' from memory
> > >> + */
> > >> +void fpsimd_restore_current_state(void)
> > >> +{
> > >> + fpsimd_load_state(¤t->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load an updated userland FPSIMD state for 'current' from memory
> > >> + */
> > >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> > >> +{
> > >> + preempt_disable();
> > >> + fpsimd_load_state(state);
> > >> + preempt_enable();
> > >> +}
> > >
> > > Minor - please update the comment above the functions to state that
> > > preemption needs to be disabled by the caller.
> > >
> >
> > Do you mean in all three cases? And, by implication, that the
> > preempt_disable()/enable() pair should be moved to the call site for
> > fpsimd_update_current_state() ?
>
> No, just the comment for the first two functions updated.
I noticed in a subsequent patch that you add preempt_disable/enable
already in the first two functions. You could do it here as well to
avoid confusion (and no need to update the comment).
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation
@ 2014-05-06 15:42 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 15:42 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 06, 2014 at 04:12:55PM +0100, Catalin Marinas wrote:
> On Tue, May 06, 2014 at 03:48:08PM +0100, Ard Biesheuvel wrote:
> > On 6 May 2014 16:43, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, May 01, 2014 at 04:49:34PM +0100, Ard Biesheuvel wrote:
> > >> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > >> index 4aef42a04bdc..86ac6a9bc86a 100644
> > >> --- a/arch/arm64/kernel/fpsimd.c
> > >> +++ b/arch/arm64/kernel/fpsimd.c
> > >> @@ -87,6 +87,39 @@ void fpsimd_flush_thread(void)
> > >> preempt_enable();
> > >> }
> > >>
> > >> +/*
> > >> + * Save the userland FPSIMD state of 'current' to memory
> > >> + */
> > >> +void fpsimd_preserve_current_state(void)
> > >> +{
> > >> + fpsimd_save_state(¤t->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load the userland FPSIMD state of 'current' from memory
> > >> + */
> > >> +void fpsimd_restore_current_state(void)
> > >> +{
> > >> + fpsimd_load_state(¤t->thread.fpsimd_state);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Load an updated userland FPSIMD state for 'current' from memory
> > >> + */
> > >> +void fpsimd_update_current_state(struct fpsimd_state *state)
> > >> +{
> > >> + preempt_disable();
> > >> + fpsimd_load_state(state);
> > >> + preempt_enable();
> > >> +}
> > >
> > > Minor - please update the comment above the functions to state that
> > > preemption needs to be disabled by the caller.
> > >
> >
> > Do you mean in all three cases? And, by implication, that the
> > preempt_disable()/enable() pair should be moved to the call site for
> > fpsimd_update_current_state() ?
>
> No, just the comment for the first two functions updated.
I noticed in a subsequent patch that you add preempt_disable/enable
already in the first two functions. You could do it here as well to
avoid confusion (and no need to update the comment).
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-06 16:08 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:08 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel
On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> {
> switch (cmd) {
> case CPU_PM_ENTER:
> - if (current->mm)
> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> fpsimd_save_state(¤t->thread.fpsimd_state);
> break;
> case CPU_PM_EXIT:
> - if (current->mm)
> - fpsimd_load_state(¤t->thread.fpsimd_state);
> + set_thread_flag(TIF_FOREIGN_FPSTATE);
I think we could enter a PM state on a kernel thread (idle), so we
should preserve the current->mm check as well.
> break;
> case CPU_PM_ENTER_FAILED:
> default:
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 06448a77ff53..882f01774365 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> clear_thread_flag(TIF_NOTIFY_RESUME);
> tracehook_notify_resume(regs);
> }
> +
> + if (thread_flags & _TIF_FOREIGN_FPSTATE)
> + fpsimd_restore_current_state();
I think this should be safe. Even if we get preempted here, ret_to_user
would loop over TI_FLAGS with interrupts disabled until no work pending.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-06 16:08 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:08 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> {
> switch (cmd) {
> case CPU_PM_ENTER:
> - if (current->mm)
> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> fpsimd_save_state(¤t->thread.fpsimd_state);
> break;
> case CPU_PM_EXIT:
> - if (current->mm)
> - fpsimd_load_state(¤t->thread.fpsimd_state);
> + set_thread_flag(TIF_FOREIGN_FPSTATE);
I think we could enter a PM state on a kernel thread (idle), so we
should preserve the current->mm check as well.
> break;
> case CPU_PM_ENTER_FAILED:
> default:
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 06448a77ff53..882f01774365 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> clear_thread_flag(TIF_NOTIFY_RESUME);
> tracehook_notify_resume(regs);
> }
> +
> + if (thread_flags & _TIF_FOREIGN_FPSTATE)
> + fpsimd_restore_current_state();
I think this should be safe. Even if we get preempted here, ret_to_user
would loop over TI_FLAGS with interrupts disabled until no work pending.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
2014-05-06 16:08 ` Catalin Marinas
@ 2014-05-06 16:25 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 16:25 UTC (permalink / raw)
To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
>> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>> {
>> switch (cmd) {
>> case CPU_PM_ENTER:
>> - if (current->mm)
>> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>> fpsimd_save_state(¤t->thread.fpsimd_state);
>> break;
>> case CPU_PM_EXIT:
>> - if (current->mm)
>> - fpsimd_load_state(¤t->thread.fpsimd_state);
>> + set_thread_flag(TIF_FOREIGN_FPSTATE);
>
> I think we could enter a PM state on a kernel thread (idle), so we
> should preserve the current->mm check as well.
>
OK
>> break;
>> case CPU_PM_ENTER_FAILED:
>> default:
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 06448a77ff53..882f01774365 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>> clear_thread_flag(TIF_NOTIFY_RESUME);
>> tracehook_notify_resume(regs);
>> }
>> +
>> + if (thread_flags & _TIF_FOREIGN_FPSTATE)
>> + fpsimd_restore_current_state();
>
> I think this should be safe. Even if we get preempted here, ret_to_user
> would loop over TI_FLAGS with interrupts disabled until no work pending.
>
I don't follow. Do you think I should change something here?
Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
is checked again, but this time with preemption disabled.
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-06 16:25 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 16:25 UTC (permalink / raw)
To: linux-arm-kernel
On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
>> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>> {
>> switch (cmd) {
>> case CPU_PM_ENTER:
>> - if (current->mm)
>> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
>> fpsimd_save_state(¤t->thread.fpsimd_state);
>> break;
>> case CPU_PM_EXIT:
>> - if (current->mm)
>> - fpsimd_load_state(¤t->thread.fpsimd_state);
>> + set_thread_flag(TIF_FOREIGN_FPSTATE);
>
> I think we could enter a PM state on a kernel thread (idle), so we
> should preserve the current->mm check as well.
>
OK
>> break;
>> case CPU_PM_ENTER_FAILED:
>> default:
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 06448a77ff53..882f01774365 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>> clear_thread_flag(TIF_NOTIFY_RESUME);
>> tracehook_notify_resume(regs);
>> }
>> +
>> + if (thread_flags & _TIF_FOREIGN_FPSTATE)
>> + fpsimd_restore_current_state();
>
> I think this should be safe. Even if we get preempted here, ret_to_user
> would loop over TI_FLAGS with interrupts disabled until no work pending.
>
I don't follow. Do you think I should change something here?
Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
is checked again, but this time with preemption disabled.
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
2014-05-06 16:25 ` Ard Biesheuvel
@ 2014-05-06 16:31 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:31 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On Tue, May 06, 2014 at 05:25:14PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> >> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> >> {
> >> switch (cmd) {
> >> case CPU_PM_ENTER:
> >> - if (current->mm)
> >> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> >> fpsimd_save_state(¤t->thread.fpsimd_state);
> >> break;
> >> case CPU_PM_EXIT:
> >> - if (current->mm)
> >> - fpsimd_load_state(¤t->thread.fpsimd_state);
> >> + set_thread_flag(TIF_FOREIGN_FPSTATE);
> >
> > I think we could enter a PM state on a kernel thread (idle), so we
> > should preserve the current->mm check as well.
>
> OK
>
> >> break;
> >> case CPU_PM_ENTER_FAILED:
> >> default:
> >> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> >> index 06448a77ff53..882f01774365 100644
> >> --- a/arch/arm64/kernel/signal.c
> >> +++ b/arch/arm64/kernel/signal.c
> >> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> >> clear_thread_flag(TIF_NOTIFY_RESUME);
> >> tracehook_notify_resume(regs);
> >> }
> >> +
> >> + if (thread_flags & _TIF_FOREIGN_FPSTATE)
> >> + fpsimd_restore_current_state();
> >
> > I think this should be safe. Even if we get preempted here, ret_to_user
> > would loop over TI_FLAGS with interrupts disabled until no work pending.
>
> I don't follow. Do you think I should change something here?
No, I think it's safe (just thinking out loud). That's assuming
TIF_FOREIGN_FPSTATE is never set in interrupt context but a brief look
at subsequent patch shows that it doesn't.
> Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
> is checked again, but this time with preemption disabled.
Yes. I was thinking about the scenario where we get to
do_notify_resume() because of other work but with TIF_FOREIGN_FPSTATE
cleared. In the meantime, we get preempted and TIF_FOREIGN_FPSTATE gets
set when switching back to this thread. In this case, ret_to_user loops
again over TI_FLAGS.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume
@ 2014-05-06 16:31 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:31 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 06, 2014 at 05:25:14PM +0100, Ard Biesheuvel wrote:
> On 6 May 2014 18:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, May 01, 2014 at 04:49:35PM +0100, Ard Biesheuvel wrote:
> >> @@ -153,12 +252,11 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> >> {
> >> switch (cmd) {
> >> case CPU_PM_ENTER:
> >> - if (current->mm)
> >> + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
> >> fpsimd_save_state(¤t->thread.fpsimd_state);
> >> break;
> >> case CPU_PM_EXIT:
> >> - if (current->mm)
> >> - fpsimd_load_state(¤t->thread.fpsimd_state);
> >> + set_thread_flag(TIF_FOREIGN_FPSTATE);
> >
> > I think we could enter a PM state on a kernel thread (idle), so we
> > should preserve the current->mm check as well.
>
> OK
>
> >> break;
> >> case CPU_PM_ENTER_FAILED:
> >> default:
> >> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> >> index 06448a77ff53..882f01774365 100644
> >> --- a/arch/arm64/kernel/signal.c
> >> +++ b/arch/arm64/kernel/signal.c
> >> @@ -413,4 +413,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> >> clear_thread_flag(TIF_NOTIFY_RESUME);
> >> tracehook_notify_resume(regs);
> >> }
> >> +
> >> + if (thread_flags & _TIF_FOREIGN_FPSTATE)
> >> + fpsimd_restore_current_state();
> >
> > I think this should be safe. Even if we get preempted here, ret_to_user
> > would loop over TI_FLAGS with interrupts disabled until no work pending.
>
> I don't follow. Do you think I should change something here?
No, I think it's safe (just thinking out loud). That's assuming
TIF_FOREIGN_FPSTATE is never set in interrupt context but a brief look
at subsequent patch shows that it doesn't.
> Anyway, inside fpsimd_restore_current_state() the TIF_FOREIGN_FPSTATE
> is checked again, but this time with preemption disabled.
Yes. I was thinking about the scenario where we get to
do_notify_resume() because of other work but with TIF_FOREIGN_FPSTATE
cleared. In the meantime, we get preempted and TIF_FOREIGN_FPSTATE gets
set when switching back to this thread. In this case, ret_to_user loops
again over TI_FLAGS.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-06 16:49 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:49 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel
On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 7a900142dbc8..05e1b24aca4c 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,17 @@ struct fpsimd_state {
> unsigned int cpu;
> };
>
> +/*
> + * Struct for stacking the bottom 'n' FP/SIMD registers.
> + */
> +struct fpsimd_partial_state {
> + u32 num_regs;
> + u32 fpsr;
> + u32 fpcr;
> + __uint128_t vregs[32] __aligned(16);
> +} __aligned(16);
Do we need this explicit alignment here?
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index bbec599c96bd..69e75134689d 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -62,3 +62,38 @@
> ldr w\tmpnr, [\state, #16 * 2 + 4]
> msr fpcr, x\tmpnr
> .endm
> +
> +.altmacro
> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
> + mrs x\tmpnr1, fpsr
> + str w\numnr, [\state]
> + mrs x\tmpnr2, fpcr
> + stp w\tmpnr1, w\tmpnr2, [\state, #4]
> + adr x\tmpnr1, 0f
> + add \state, \state, x\numnr, lsl #4
> + sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
> + br x\tmpnr1
> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> + .irp qb, %(qa + 1)
> + stp q\qa, q\qb, [\state, # -16 * \qa - 16]
> + .endr
> + .endr
> +0:
> +.endm
> +
> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
> + ldp w\tmpnr1, w\tmpnr2, [\state, #4]
> + msr fpsr, x\tmpnr1
> + msr fpcr, x\tmpnr2
> + adr x\tmpnr1, 0f
> + ldr w\tmpnr2, [\state]
> + add \state, \state, x\tmpnr2, lsl #4
> + sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
> + br x\tmpnr1
> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> + .irp qb, %(qa + 1)
> + ldp q\qa, q\qb, [\state, # -16 * \qa - 16]
> + .endr
> + .endr
> +0:
> +.endm
BTW, it may be better if num_regs is placed at the end of the structure,
especially since you use stp to store both fpsr and fpcr (though I
haven't rewritten the above to see how they look).
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
@ 2014-05-06 16:49 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-06 16:49 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 7a900142dbc8..05e1b24aca4c 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,17 @@ struct fpsimd_state {
> unsigned int cpu;
> };
>
> +/*
> + * Struct for stacking the bottom 'n' FP/SIMD registers.
> + */
> +struct fpsimd_partial_state {
> + u32 num_regs;
> + u32 fpsr;
> + u32 fpcr;
> + __uint128_t vregs[32] __aligned(16);
> +} __aligned(16);
Do we need this explicit alignment here?
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index bbec599c96bd..69e75134689d 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -62,3 +62,38 @@
> ldr w\tmpnr, [\state, #16 * 2 + 4]
> msr fpcr, x\tmpnr
> .endm
> +
> +.altmacro
> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
> + mrs x\tmpnr1, fpsr
> + str w\numnr, [\state]
> + mrs x\tmpnr2, fpcr
> + stp w\tmpnr1, w\tmpnr2, [\state, #4]
> + adr x\tmpnr1, 0f
> + add \state, \state, x\numnr, lsl #4
> + sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
> + br x\tmpnr1
> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> + .irp qb, %(qa + 1)
> + stp q\qa, q\qb, [\state, # -16 * \qa - 16]
> + .endr
> + .endr
> +0:
> +.endm
> +
> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
> + ldp w\tmpnr1, w\tmpnr2, [\state, #4]
> + msr fpsr, x\tmpnr1
> + msr fpcr, x\tmpnr2
> + adr x\tmpnr1, 0f
> + ldr w\tmpnr2, [\state]
> + add \state, \state, x\tmpnr2, lsl #4
> + sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
> + br x\tmpnr1
> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
> + .irp qb, %(qa + 1)
> + ldp q\qa, q\qb, [\state, # -16 * \qa - 16]
> + .endr
> + .endr
> +0:
> +.endm
BTW, it may be better if num_regs is placed at the end of the structure,
especially since you use stp to store both fpsr and fpcr (though I
haven't rewritten the above to see how they look).
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
2014-05-06 16:49 ` Catalin Marinas
@ 2014-05-06 17:09 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 17:09 UTC (permalink / raw)
To: Catalin Marinas; +Cc: steve.capper, Will Deacon, linux-crypto, linux-arm-kernel
On 6 May 2014 18:49, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index 7a900142dbc8..05e1b24aca4c 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -41,6 +41,17 @@ struct fpsimd_state {
>> unsigned int cpu;
>> };
>>
>> +/*
>> + * Struct for stacking the bottom 'n' FP/SIMD registers.
>> + */
>> +struct fpsimd_partial_state {
>> + u32 num_regs;
>> + u32 fpsr;
>> + u32 fpcr;
>> + __uint128_t vregs[32] __aligned(16);
>> +} __aligned(16);
>
> Do we need this explicit alignment here?
>
Without it, the implied alignment is 8 bytes, I suppose, but I haven't
checked carefully.
I will check and remove this if 8 bytes is the default.
>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
>> index bbec599c96bd..69e75134689d 100644
>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>> @@ -62,3 +62,38 @@
>> ldr w\tmpnr, [\state, #16 * 2 + 4]
>> msr fpcr, x\tmpnr
>> .endm
>> +
>> +.altmacro
>> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
>> + mrs x\tmpnr1, fpsr
>> + str w\numnr, [\state]
>> + mrs x\tmpnr2, fpcr
>> + stp w\tmpnr1, w\tmpnr2, [\state, #4]
>> + adr x\tmpnr1, 0f
>> + add \state, \state, x\numnr, lsl #4
>> + sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
>> + br x\tmpnr1
>> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> + .irp qb, %(qa + 1)
>> + stp q\qa, q\qb, [\state, # -16 * \qa - 16]
>> + .endr
>> + .endr
>> +0:
>> +.endm
>> +
>> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
>> + ldp w\tmpnr1, w\tmpnr2, [\state, #4]
>> + msr fpsr, x\tmpnr1
>> + msr fpcr, x\tmpnr2
>> + adr x\tmpnr1, 0f
>> + ldr w\tmpnr2, [\state]
>> + add \state, \state, x\tmpnr2, lsl #4
>> + sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
>> + br x\tmpnr1
>> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> + .irp qb, %(qa + 1)
>> + ldp q\qa, q\qb, [\state, # -16 * \qa - 16]
>> + .endr
>> + .endr
>> +0:
>> +.endm
>
> BTW, it may be better if num_regs is placed at the end of the structure,
> especially since you use stp to store both fpsr and fpcr (though I
> haven't rewritten the above to see how they look).
>
I suppose you mean in the middle, i.e., after fpsr and fpcr?
Yes that makes sense, I will change that.
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context
@ 2014-05-06 17:09 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-06 17:09 UTC (permalink / raw)
To: linux-arm-kernel
On 6 May 2014 18:49, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:36PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index 7a900142dbc8..05e1b24aca4c 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -41,6 +41,17 @@ struct fpsimd_state {
>> unsigned int cpu;
>> };
>>
>> +/*
>> + * Struct for stacking the bottom 'n' FP/SIMD registers.
>> + */
>> +struct fpsimd_partial_state {
>> + u32 num_regs;
>> + u32 fpsr;
>> + u32 fpcr;
>> + __uint128_t vregs[32] __aligned(16);
>> +} __aligned(16);
>
> Do we need this explicit alignment here?
>
Without it, the implied alignment is 8 bytes, I suppose, but I haven't
checked carefully.
I will check and remove this if 8 bytes is the default.
>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
>> index bbec599c96bd..69e75134689d 100644
>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>> @@ -62,3 +62,38 @@
>> ldr w\tmpnr, [\state, #16 * 2 + 4]
>> msr fpcr, x\tmpnr
>> .endm
>> +
>> +.altmacro
>> +.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
>> + mrs x\tmpnr1, fpsr
>> + str w\numnr, [\state]
>> + mrs x\tmpnr2, fpcr
>> + stp w\tmpnr1, w\tmpnr2, [\state, #4]
>> + adr x\tmpnr1, 0f
>> + add \state, \state, x\numnr, lsl #4
>> + sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
>> + br x\tmpnr1
>> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> + .irp qb, %(qa + 1)
>> + stp q\qa, q\qb, [\state, # -16 * \qa - 16]
>> + .endr
>> + .endr
>> +0:
>> +.endm
>> +
>> +.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
>> + ldp w\tmpnr1, w\tmpnr2, [\state, #4]
>> + msr fpsr, x\tmpnr1
>> + msr fpcr, x\tmpnr2
>> + adr x\tmpnr1, 0f
>> + ldr w\tmpnr2, [\state]
>> + add \state, \state, x\tmpnr2, lsl #4
>> + sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
>> + br x\tmpnr1
>> + .irp qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
>> + .irp qb, %(qa + 1)
>> + ldp q\qa, q\qb, [\state, # -16 * \qa - 16]
>> + .endr
>> + .endr
>> +0:
>> +.endm
>
> BTW, it may be better if num_regs is placed at the end of the structure,
> especially since you use stp to store both fpsr and fpcr (though I
> haven't rewritten the above to see how they look).
>
I suppose you mean in the middle, i.e., after fpsr and fpcr?
Yes that makes sense, I will change that.
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-01 15:49 ` Ard Biesheuvel
@ 2014-05-07 14:45 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-07 14:45 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
> This is a repost of the arm64 crypto patches that I have posted to the LAKML
> over the past months. They have now been verified on actual hardware
> (Cortex-A57) so if there are no remaining issues I would like to propose them
> for 3.16.
>
> Ard Biesheuvel (15):
> asm-generic: allow generic unaligned access if the arch supports it
> arm64: add abstractions for FPSIMD state manipulation
> arm64: defer reloading a task's FPSIMD state to userland resume
> arm64: add support for kernel mode NEON in interrupt context
> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
> arm64/crypto: AES using ARMv8 Crypto Extensions
> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
> arm64: pull in <asm/simd.h> from asm-generic
> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
> Extensions
> arm64/crypto: add shared macro to test for NEED_RESCHED
> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
There are about 5 patches that make sense to me ;) and apart from a few
minor comments they look fine.
There are the other 10 crypto patches that are beyond my knowledge. Do
you know anyone who could do a sanity check on them? Are there any tests
that would show the correctness of the implementation?
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-07 14:45 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-07 14:45 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
> This is a repost of the arm64 crypto patches that I have posted to the LAKML
> over the past months. They have now been verified on actual hardware
> (Cortex-A57) so if there are no remaining issues I would like to propose them
> for 3.16.
>
> Ard Biesheuvel (15):
> asm-generic: allow generic unaligned access if the arch supports it
> arm64: add abstractions for FPSIMD state manipulation
> arm64: defer reloading a task's FPSIMD state to userland resume
> arm64: add support for kernel mode NEON in interrupt context
> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
> arm64/crypto: AES using ARMv8 Crypto Extensions
> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
> arm64: pull in <asm/simd.h> from asm-generic
> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
> Extensions
> arm64/crypto: add shared macro to test for NEED_RESCHED
> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
There are about 5 patches that make sense to me ;) and apart from a few
minor comments they look fine.
There are the other 10 crypto patches that are beyond my knowledge. Do
you know anyone who could do a sanity check on them? Are there any tests
that would show the correctness of the implementation?
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-07 14:45 ` Catalin Marinas
@ 2014-05-07 19:58 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-07 19:58 UTC (permalink / raw)
To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>> asm-generic: allow generic unaligned access if the arch supports it
>> arm64: add abstractions for FPSIMD state manipulation
>> arm64: defer reloading a task's FPSIMD state to userland resume
>> arm64: add support for kernel mode NEON in interrupt context
>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>> arm64/crypto: AES using ARMv8 Crypto Extensions
>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>> arm64: pull in <asm/simd.h> from asm-generic
>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>> Extensions
>> arm64/crypto: add shared macro to test for NEED_RESCHED
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>
The core ciphers and hashes are always tested automatically at each
insmod (unles CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is set)
This applies to plain AES, SHA1, SHA-224/256 and GHASH.
For the remaining modules, there is the tcrypt.ko module which can be
used to explicitly invoke the builtin crypto tests:
modprobe tcrypt.ko mode=xx
with xx in
AES-ECB/CBC/CTR/XTS: 10
AES-CCM: 37
There are also builtin benchmarks:
AES-ECB/CBC/CTR/XTS: 500
SHA1: 303
SHA256: 304
Consult crypto/tcrypt.c for an exhaustive list.
Note that the modprobe of tcrypt.ko always fails: that is intentional
as the test and/or benchmark runs in the init() function so there is
no reason to keep the module loaded afterwards.
As far as the relative performance is concerned. these implementations
are all significantly faster than their generic C counterparts, as
they mostly use dedicated instructions. The NEON implementation of AES
is also significantly faster, at least on Cortex-A57. Mileage may vary
for other implementations.
>From a security pov, note that these algorithms are all time
invariant, i.e., there are no data dependent lookups.
Regards,
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-07 19:58 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-07 19:58 UTC (permalink / raw)
To: linux-arm-kernel
On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>> asm-generic: allow generic unaligned access if the arch supports it
>> arm64: add abstractions for FPSIMD state manipulation
>> arm64: defer reloading a task's FPSIMD state to userland resume
>> arm64: add support for kernel mode NEON in interrupt context
>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>> arm64/crypto: AES using ARMv8 Crypto Extensions
>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>> arm64: pull in <asm/simd.h> from asm-generic
>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>> Extensions
>> arm64/crypto: add shared macro to test for NEED_RESCHED
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>
The core ciphers and hashes are always tested automatically at each
insmod (unles CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is set)
This applies to plain AES, SHA1, SHA-224/256 and GHASH.
For the remaining modules, there is the tcrypt.ko module which can be
used to explicitly invoke the builtin crypto tests:
modprobe tcrypt.ko mode=xx
with xx in
AES-ECB/CBC/CTR/XTS: 10
AES-CCM: 37
There are also builtin benchmarks:
AES-ECB/CBC/CTR/XTS: 500
SHA1: 303
SHA256: 304
Consult crypto/tcrypt.c for an exhaustive list.
Note that the modprobe of tcrypt.ko always fails: that is intentional
as the test and/or benchmark runs in the init() function so there is
no reason to keep the module loaded afterwards.
As far as the relative performance is concerned. these implementations
are all significantly faster than their generic C counterparts, as
they mostly use dedicated instructions. The NEON implementation of AES
is also significantly faster, at least on Cortex-A57. Mileage may vary
for other implementations.
>From a security pov, note that these algorithms are all time
invariant, i.e., there are no data dependent lookups.
Regards,
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-07 14:45 ` Catalin Marinas
@ 2014-05-08 11:22 ` Ard Biesheuvel
-1 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-08 11:22 UTC (permalink / raw)
To: Catalin Marinas; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>> asm-generic: allow generic unaligned access if the arch supports it
>> arm64: add abstractions for FPSIMD state manipulation
>> arm64: defer reloading a task's FPSIMD state to userland resume
>> arm64: add support for kernel mode NEON in interrupt context
>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>> arm64/crypto: AES using ARMv8 Crypto Extensions
>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>> arm64: pull in <asm/simd.h> from asm-generic
>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>> Extensions
>> arm64/crypto: add shared macro to test for NEED_RESCHED
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>
I will re-send the 3 FPSIMD patches separately with your review
comments addressed.
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-08 11:22 ` Ard Biesheuvel
0 siblings, 0 replies; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-08 11:22 UTC (permalink / raw)
To: linux-arm-kernel
On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>> over the past months. They have now been verified on actual hardware
>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>> for 3.16.
>>
>> Ard Biesheuvel (15):
>> asm-generic: allow generic unaligned access if the arch supports it
>> arm64: add abstractions for FPSIMD state manipulation
>> arm64: defer reloading a task's FPSIMD state to userland resume
>> arm64: add support for kernel mode NEON in interrupt context
>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>> arm64/crypto: AES using ARMv8 Crypto Extensions
>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>> arm64: pull in <asm/simd.h> from asm-generic
>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>> Extensions
>> arm64/crypto: add shared macro to test for NEED_RESCHED
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>
> There are about 5 patches that make sense to me ;) and apart from a few
> minor comments they look fine.
>
> There are the other 10 crypto patches that are beyond my knowledge. Do
> you know anyone who could do a sanity check on them? Are there any tests
> that would show the correctness of the implementation?
>
I will re-send the 3 FPSIMD patches separately with your review
comments addressed.
--
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-08 11:22 ` Ard Biesheuvel
@ 2014-05-08 21:50 ` Catalin Marinas
-1 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-08 21:50 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-arm-kernel, linux-crypto, Will Deacon, steve.capper
On 8 May 2014, at 12:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>>> over the past months. They have now been verified on actual hardware
>>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>>> for 3.16.
>>>
>>> Ard Biesheuvel (15):
>>> asm-generic: allow generic unaligned access if the arch supports it
>>> arm64: add abstractions for FPSIMD state manipulation
>>> arm64: defer reloading a task's FPSIMD state to userland resume
>>> arm64: add support for kernel mode NEON in interrupt context
>>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>> arm64/crypto: AES using ARMv8 Crypto Extensions
>>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>> arm64: pull in <asm/simd.h> from asm-generic
>>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>> Extensions
>>> arm64/crypto: add shared macro to test for NEED_RESCHED
>>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>>
>> There are about 5 patches that make sense to me ;) and apart from a few
>> minor comments they look fine.
>>
>> There are the other 10 crypto patches that are beyond my knowledge. Do
>> you know anyone who could do a sanity check on them? Are there any tests
>> that would show the correctness of the implementation?
>
> I will re-send the 3 FPSIMD patches separately with your review
> comments addressed.
Thanks. If you get another acked/reviewed-by tag on the rest of the
patches it’s even better (I plan to get the series in for 3.16, after
we do some tests internally as well).
Catalin--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH resend 00/15] arm64 crypto roundup
@ 2014-05-08 21:50 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-08 21:50 UTC (permalink / raw)
To: linux-arm-kernel
On 8 May 2014, at 12:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>>> over the past months. They have now been verified on actual hardware
>>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>>> for 3.16.
>>>
>>> Ard Biesheuvel (15):
>>> asm-generic: allow generic unaligned access if the arch supports it
>>> arm64: add abstractions for FPSIMD state manipulation
>>> arm64: defer reloading a task's FPSIMD state to userland resume
>>> arm64: add support for kernel mode NEON in interrupt context
>>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>> arm64/crypto: AES using ARMv8 Crypto Extensions
>>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>> arm64: pull in <asm/simd.h> from asm-generic
>>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>> Extensions
>>> arm64/crypto: add shared macro to test for NEED_RESCHED
>>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>>
>> There are about 5 patches that make sense to me ;) and apart from a few
>> minor comments they look fine.
>>
>> There are the other 10 crypto patches that are beyond my knowledge. Do
>> you know anyone who could do a sanity check on them? Are there any tests
>> that would show the correctness of the implementation?
>
> I will re-send the 3 FPSIMD patches separately with your review
> comments addressed.
Thanks. If you get another acked/reviewed-by tag on the rest of the
patches it?s even better (I plan to get the series in for 3.16, after
we do some tests internally as well).
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-08 21:50 ` Catalin Marinas
(?)
@ 2014-05-09 6:37 ` Ard Biesheuvel
2014-05-14 1:29 ` Herbert Xu
-1 siblings, 1 reply; 55+ messages in thread
From: Ard Biesheuvel @ 2014-05-09 6:37 UTC (permalink / raw)
To: Catalin Marinas, Herbert Xu, Jussi Kivilinna; +Cc: linux-crypto, steve.capper
On 8 May 2014 23:50, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On 8 May 2014, at 12:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 7 May 2014 16:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Thu, May 01, 2014 at 04:49:32PM +0100, Ard Biesheuvel wrote:
>>>> This is a repost of the arm64 crypto patches that I have posted to the LAKML
>>>> over the past months. They have now been verified on actual hardware
>>>> (Cortex-A57) so if there are no remaining issues I would like to propose them
>>>> for 3.16.
>>>>
>>>> Ard Biesheuvel (15):
>>>> asm-generic: allow generic unaligned access if the arch supports it
>>>> arm64: add abstractions for FPSIMD state manipulation
>>>> arm64: defer reloading a task's FPSIMD state to userland resume
>>>> arm64: add support for kernel mode NEON in interrupt context
>>>> arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
>>>> arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
>>>> arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
>>>> arm64/crypto: AES using ARMv8 Crypto Extensions
>>>> arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
>>>> arm64: pull in <asm/simd.h> from asm-generic
>>>> arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto
>>>> Extensions
>>>> arm64/crypto: add shared macro to test for NEED_RESCHED
>>>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA1
>>>> arm64/crypto: add voluntary preemption to Crypto Extensions SHA2
>>>> arm64/crypto: add voluntary preemption to Crypto Extensions GHASH
>>>
>>> There are about 5 patches that make sense to me ;) and apart from a few
>>> minor comments they look fine.
>>>
>>> There are the other 10 crypto patches that are beyond my knowledge. Do
>>> you know anyone who could do a sanity check on them? Are there any tests
>>> that would show the correctness of the implementation?
>>
>> I will re-send the 3 FPSIMD patches separately with your review
>> comments addressed.
>
> Thanks. If you get another acked/reviewed-by tag on the rest of the
> patches it’s even better (I plan to get the series in for 3.16, after
> we do some tests internally as well).
>
@Herbert, Jussi: care to share your opinion on the SHAx, GHASH and AES
patches above? Herbert has already acked the ccm patch, but Catalin is
requesting for more review and/or acknowledgements before merging
these patches for arm64.
Regards,
Ard.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-09 6:37 ` Ard Biesheuvel
@ 2014-05-14 1:29 ` Herbert Xu
2014-05-14 8:47 ` Catalin Marinas
0 siblings, 1 reply; 55+ messages in thread
From: Herbert Xu @ 2014-05-14 1:29 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Catalin Marinas, Jussi Kivilinna, linux-crypto, steve.capper
On Fri, May 09, 2014 at 08:37:58AM +0200, Ard Biesheuvel wrote:
>
> @Herbert, Jussi: care to share your opinion on the SHAx, GHASH and AES
> patches above? Herbert has already acked the ccm patch, but Catalin is
> requesting for more review and/or acknowledgements before merging
> these patches for arm64.
They look fine to me. You can add my Ack to them.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH resend 00/15] arm64 crypto roundup
2014-05-14 1:29 ` Herbert Xu
@ 2014-05-14 8:47 ` Catalin Marinas
0 siblings, 0 replies; 55+ messages in thread
From: Catalin Marinas @ 2014-05-14 8:47 UTC (permalink / raw)
To: Herbert Xu; +Cc: Ard Biesheuvel, Jussi Kivilinna, linux-crypto, steve.capper
On Wed, May 14, 2014 at 02:29:05AM +0100, Herbert Xu wrote:
> On Fri, May 09, 2014 at 08:37:58AM +0200, Ard Biesheuvel wrote:
> >
> > @Herbert, Jussi: care to share your opinion on the SHAx, GHASH and AES
> > patches above? Herbert has already acked the ccm patch, but Catalin is
> > requesting for more review and/or acknowledgements before merging
> > these patches for arm64.
>
> They look fine to me. You can add my Ack to them.
Many thanks, both to you and Jussi.
Regards.
--
Catalin
^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2014-05-14 8:47 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 15:49 [PATCH resend 00/15] arm64 crypto roundup Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 01/15] asm-generic: allow generic unaligned access if the arch supports it Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-06 14:31 ` Catalin Marinas
2014-05-06 14:31 ` Catalin Marinas
2014-05-06 14:34 ` Ard Biesheuvel
2014-05-06 14:34 ` Ard Biesheuvel
2014-05-06 15:14 ` Catalin Marinas
2014-05-06 15:14 ` Catalin Marinas
2014-05-01 15:49 ` [PATCH resend 02/15] arm64: add abstractions for FPSIMD state manipulation Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-06 14:43 ` Catalin Marinas
2014-05-06 14:43 ` Catalin Marinas
2014-05-06 14:48 ` Ard Biesheuvel
2014-05-06 14:48 ` Ard Biesheuvel
2014-05-06 15:12 ` Catalin Marinas
2014-05-06 15:12 ` Catalin Marinas
2014-05-06 15:42 ` Catalin Marinas
2014-05-06 15:42 ` Catalin Marinas
2014-05-01 15:49 ` [PATCH resend 03/15] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-06 16:08 ` Catalin Marinas
2014-05-06 16:08 ` Catalin Marinas
2014-05-06 16:25 ` Ard Biesheuvel
2014-05-06 16:25 ` Ard Biesheuvel
2014-05-06 16:31 ` Catalin Marinas
2014-05-06 16:31 ` Catalin Marinas
2014-05-01 15:49 ` [PATCH resend 04/15] arm64: add support for kernel mode NEON in interrupt context Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-06 16:49 ` Catalin Marinas
2014-05-06 16:49 ` Catalin Marinas
2014-05-06 17:09 ` Ard Biesheuvel
2014-05-06 17:09 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 05/15] arm64/crypto: SHA-1 using ARMv8 Crypto Extensions Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 06/15] arm64/crypto: SHA-224/SHA-256 " Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 07/15] arm64/crypto: GHASH secure hash " Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 08/15] arm64/crypto: AES " Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-01 15:49 ` [PATCH resend 09/15] arm64/crypto: AES in CCM mode " Ard Biesheuvel
2014-05-01 15:49 ` Ard Biesheuvel
2014-05-07 14:45 ` [PATCH resend 00/15] arm64 crypto roundup Catalin Marinas
2014-05-07 14:45 ` Catalin Marinas
2014-05-07 19:58 ` Ard Biesheuvel
2014-05-07 19:58 ` Ard Biesheuvel
2014-05-08 11:22 ` Ard Biesheuvel
2014-05-08 11:22 ` Ard Biesheuvel
2014-05-08 21:50 ` Catalin Marinas
2014-05-08 21:50 ` Catalin Marinas
2014-05-09 6:37 ` Ard Biesheuvel
2014-05-14 1:29 ` Herbert Xu
2014-05-14 8:47 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.