linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES
@ 2014-01-06  8:22 Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 1/8] arm64: add kernel emulation for AES instructions Ard Biesheuvel
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This series is mostly a repost of patches that I have proposed (and that have
been discussed) on the list previously. This time, I am resending it as a
coherent series to try and make a stronger case for the features I am proposing.

Patch #8 implements the core AES cipher. It relies on patches 2, 3 and 5 to
support udev autoloading of the module if the AES extension is supported by the
CPU it is running on. Patches 6 and 7 are required because the core AES cipher
may be called from interrupt context.

Patch #1 is included as a bonus/reference *only*. It is not intended to be
merged, but it allows those who would like to try this code to do so on a system
or emulator that does not have the Crypto Extensions implemented.

Patch #2 moves arch_cpu_uevent() from x86 to generic code, as it is generic in
nature and can be reused by other archs.

Patch #3 introduces a generic 'cpu' modalias type which can be used to autoload
modules based on whether the CPU supports a certain optional feature. How IDs
map to features is not specified, this is up to the arch.

Patch #4 changes the x86 specific 'x86cpu' modalias so it adheres to the syntax
introduced in patch #3. This is not strictly necessary, but it was suggested in
a discussion between H. Peter Anvin, Andi Kleen and myself, i.e., that a generic
solution should also cover the x86 use case (which has 320 feature bits already)
[http://marc.info/?l=linux-kernel&m=138384467604799&w=2]

@Peter, Andi: could I please have your ack(s) on patch #2, and possibly on
patch #4 if you still feel that all archs should use the same modalias syntax
(and you are happy with the way I implemented that)

Patch #5 enables the generic 'cpu' feature matching introduced in patch 3 for
arm64.

Patch #6 is an optimization to the arm64 kernel mode NEON code that tries to
avoid pointless saves/restores of the NEON register file.

Patch #7 adds support to the kernel mode NEON code for being called from 
interrupt context. It also adds support for partial saves/restores.

Patch #8 implements the AES core cipher.

The series depends on 4bff28ccda2b ("arm64: Add hwcaps for crypto and CRC32
extensions.") which is already in Catalin's tree and in linux-next.

Ard Biesheuvel (8):
  arm64: add kernel emulation for AES instructions
  x86: move arch_cpu_uevent() to generic code
  cpu: advertise CPU features over udev in a generic way
  x86: align with generic cpu modalias
  arm64: advertise CPU features for modalias matching
  arm64: defer reloading a task's FPSIMD state to userland resume
  arm64: add support for kernel mode NEON in atomic context
  arm64: add Crypto Extensions based synchronous core AES cipher

 arch/arm64/Kconfig                    |   3 +
 arch/arm64/Makefile                   |   2 +
 arch/arm64/crypto/Makefile            |  13 ++
 arch/arm64/crypto/aes-ce-cipher.c     | 112 ++++++++++++
 arch/arm64/emu/Makefile               |  11 ++
 arch/arm64/emu/ce-aes.c               | 331 ++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/fpsimd.h       |  20 ++
 arch/arm64/include/asm/fpsimdmacros.h |  37 ++++
 arch/arm64/include/asm/neon.h         |   6 +-
 arch/arm64/include/asm/thread_info.h  |   4 +-
 arch/arm64/include/asm/traps.h        |  10 +
 arch/arm64/kernel/entry-fpsimd.S      |  24 +++
 arch/arm64/kernel/entry.S             |   6 +-
 arch/arm64/kernel/fpsimd.c            | 108 +++++++++--
 arch/arm64/kernel/process.c           |   3 +-
 arch/arm64/kernel/setup.c             |  20 ++
 arch/arm64/kernel/signal.c            |   3 +
 arch/arm64/kernel/traps.c             |  49 +++++
 arch/x86/kernel/cpu/match.c           |  14 +-
 crypto/Kconfig                        |   6 +
 drivers/base/cpu.c                    |  15 +-
 include/linux/cpu.h                   |   1 -
 include/linux/mod_devicetable.h       |  15 ++
 scripts/mod/devicetable-offsets.c     |   3 +
 scripts/mod/file2alias.c              |  20 +-
 25 files changed, 793 insertions(+), 43 deletions(-)
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-ce-cipher.c
 create mode 100644 arch/arm64/emu/Makefile
 create mode 100644 arch/arm64/emu/ce-aes.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/8] arm64: add kernel emulation for AES instructions
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 2/8] x86: move arch_cpu_uevent() to generic code Ard Biesheuvel
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

NOTE: this patch is not intended for merging upstream, but is only
included as a bonus so mere mortals (i.e., those whose ARMv8 system
does not support the AES crypto instructions) can test the series
if they like.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Makefile            |   1 +
 arch/arm64/emu/Makefile        |  11 ++
 arch/arm64/emu/ce-aes.c        | 331 +++++++++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/traps.h |  10 ++
 arch/arm64/kernel/entry.S      |   4 +-
 arch/arm64/kernel/traps.c      |  49 ++++++
 6 files changed, 405 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/emu/Makefile
 create mode 100644 arch/arm64/emu/ce-aes.c

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 2fceb71ac3b7..e0b75464b7f1 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -42,6 +42,7 @@ TEXT_OFFSET := 0x00080000
 
 export	TEXT_OFFSET GZFLAGS
 
+core-y		+= arch/arm64/emu/
 core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
diff --git a/arch/arm64/emu/Makefile b/arch/arm64/emu/Makefile
new file mode 100644
index 000000000000..224b4e19ff6f
--- /dev/null
+++ b/arch/arm64/emu/Makefile
@@ -0,0 +1,11 @@
+#
+# linux/arch/arm64/emu/Makefile
+#
+# Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-y += ce-aes.o
diff --git a/arch/arm64/emu/ce-aes.c b/arch/arm64/emu/ce-aes.c
new file mode 100644
index 000000000000..d50fadbd7336
--- /dev/null
+++ b/arch/arm64/emu/ce-aes.c
@@ -0,0 +1,331 @@
+/*
+ * ce-aes.c - emulate aese/aesd/aesmc/aesimc instructions
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/printk.h>
+#include <linux/ptrace.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <asm/traps.h>
+#include <asm/hwcap.h>
+
+union AES_STATE {
+	u8	bytes[16];
+	u32	cols[4];
+	u64	l[2];
+} __aligned(8);
+
+static void add_sub_shift(union AES_STATE *st, union AES_STATE *rk, int inv);
+static void mix_columns(union AES_STATE *out, union AES_STATE *in, int inv);
+
+#define REG_ACCESS(op, r, mem) \
+	do { case r: asm(#op " {v" #r ".16b}, [%0]" : : "r"(mem)); goto out; \
+	} while (0)
+
+#define REG_SWITCH(reg, op, m) do { switch (reg) { \
+	REG_ACCESS(op,  0, m);	REG_ACCESS(op,  1, m);	REG_ACCESS(op,  2, m); \
+	REG_ACCESS(op,  3, m);	REG_ACCESS(op,  4, m);	REG_ACCESS(op,  5, m); \
+	REG_ACCESS(op,  6, m);	REG_ACCESS(op,  7, m);	REG_ACCESS(op,  8, m); \
+	REG_ACCESS(op,  9, m);	REG_ACCESS(op, 10, m);	REG_ACCESS(op, 11, m); \
+	REG_ACCESS(op, 12, m);	REG_ACCESS(op, 13, m);	REG_ACCESS(op, 14, m); \
+	REG_ACCESS(op, 15, m);	REG_ACCESS(op, 16, m);	REG_ACCESS(op, 17, m); \
+	REG_ACCESS(op, 18, m);	REG_ACCESS(op, 19, m);	REG_ACCESS(op, 20, m); \
+	REG_ACCESS(op, 21, m);	REG_ACCESS(op, 22, m);	REG_ACCESS(op, 23, m); \
+	REG_ACCESS(op, 24, m);	REG_ACCESS(op, 25, m);	REG_ACCESS(op, 26, m); \
+	REG_ACCESS(op, 27, m);	REG_ACCESS(op, 28, m);	REG_ACCESS(op, 29, m); \
+	REG_ACCESS(op, 30, m);	REG_ACCESS(op, 31, m); \
+	} out:; } while (0)
+
+static void load_neon_reg(union AES_STATE *st, int reg)
+{
+	REG_SWITCH(reg, st1, st->bytes);
+}
+
+static void save_neon_reg(union AES_STATE *st, int reg)
+{
+	REG_SWITCH(reg, ld1, st->bytes);
+}
+
+static void aesce_do_emulate(unsigned int instr)
+{
+	enum { AESE, AESD, AESMC, AESIMC } kind = (instr >> 12) & 3;
+	int rn = (instr >> 5) & 0x1f;
+	int rd = instr & 0x1f;
+	union AES_STATE in, out;
+
+	load_neon_reg(&in, rn);
+
+	switch (kind) {
+	case AESE:
+	case AESD:
+		load_neon_reg(&out, rd);
+		add_sub_shift(&out, &in, kind & 1);
+		break;
+	case AESMC:
+	case AESIMC:
+		mix_columns(&out, &in, kind & 1);
+		break;
+	}
+	save_neon_reg(&out, rd);
+}
+
+static int aesce_emu_instr(struct pt_regs *regs, unsigned int instr);
+
+static struct undef_hook aesce_emu_uh = {
+	.instr_val	= 0x4e284800,
+	.instr_mask	= 0xffffcc00,
+	.fn		= aesce_emu_instr,
+};
+
+static int aesce_emu_instr(struct pt_regs *regs, unsigned int instr)
+{
+	do {
+		aesce_do_emulate(instr);
+		regs->pc += 4;
+		get_user(instr, (u32 __user *)regs->pc);
+	} while ((instr & aesce_emu_uh.instr_mask) == aesce_emu_uh.instr_val);
+
+	return 0;
+}
+
+static int aesce_emu_init(void)
+{
+	register_undef_hook(&aesce_emu_uh);
+	elf_hwcap |= HWCAP_AES;
+	return 0;
+}
+
+arch_initcall(aesce_emu_init);
+
+static void add_sub_shift(union AES_STATE *st, union AES_STATE *rk, int inv)
+{
+	static u8 const sbox[][256] = { {
+		0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
+		0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
+		0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
+		0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
+		0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc,
+		0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
+		0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a,
+		0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
+		0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,
+		0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
+		0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,
+		0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
+		0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85,
+		0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
+		0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,
+		0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
+		0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17,
+		0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
+		0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88,
+		0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
+		0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,
+		0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
+		0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9,
+		0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
+		0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6,
+		0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
+		0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,
+		0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
+		0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94,
+		0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
+		0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68,
+		0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16
+	}, {
+		0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38,
+		0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb,
+		0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87,
+		0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb,
+		0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d,
+		0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e,
+		0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2,
+		0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25,
+		0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16,
+		0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92,
+		0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda,
+		0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84,
+		0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a,
+		0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06,
+		0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02,
+		0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b,
+		0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea,
+		0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73,
+		0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85,
+		0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e,
+		0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89,
+		0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b,
+		0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20,
+		0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4,
+		0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31,
+		0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f,
+		0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d,
+		0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef,
+		0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0,
+		0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61,
+		0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26,
+		0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
+	} };
+	static u8 const permute[][16] = {
+		{ 0,  5, 10, 15, 4, 9, 14,  3, 8, 13, 2,  7, 12, 1, 6, 11 },
+		{ 0, 13, 10,  7, 4, 1, 14, 11, 8,  5, 2, 15, 12, 9, 6,  3 },
+	};
+	int i;
+
+	rk->l[0] ^= st->l[0];
+	rk->l[1] ^= st->l[1];
+
+	for (i = 0; i < 16; i++)
+		st->bytes[i] = sbox[inv][rk->bytes[permute[inv][i]]];
+}
+
+static void mix_columns(union AES_STATE *out, union AES_STATE *in, int inv)
+{
+	static u32 const mc[][256] = { {
+		0x00000000, 0x03010102, 0x06020204, 0x05030306,
+		0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
+		0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
+		0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
+		0x30101020, 0x33111122, 0x36121224, 0x35131326,
+		0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
+		0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
+		0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
+		0x60202040, 0x63212142, 0x66222244, 0x65232346,
+		0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
+		0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
+		0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
+		0x50303060, 0x53313162, 0x56323264, 0x55333366,
+		0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
+		0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
+		0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
+		0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
+		0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
+		0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
+		0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
+		0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
+		0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
+		0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
+		0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
+		0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
+		0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
+		0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
+		0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
+		0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
+		0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
+		0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
+		0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
+		0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
+		0x97848413, 0x94858511, 0x91868617, 0x92878715,
+		0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
+		0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
+		0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
+		0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
+		0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
+		0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
+		0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
+		0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
+		0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
+		0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
+		0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
+		0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
+		0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
+		0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
+		0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
+		0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
+		0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
+		0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
+		0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
+		0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
+		0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
+		0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
+		0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
+		0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
+		0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
+		0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
+		0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
+		0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
+		0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
+		0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
+	}, {
+		0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
+		0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
+		0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
+		0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
+		0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
+		0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
+		0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
+		0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
+		0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
+		0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
+		0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
+		0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
+		0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
+		0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
+		0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
+		0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
+		0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
+		0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
+		0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
+		0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
+		0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
+		0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
+		0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
+		0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
+		0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
+		0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
+		0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
+		0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
+		0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
+		0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
+		0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
+		0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
+		0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
+		0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
+		0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
+		0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
+		0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
+		0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
+		0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
+		0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
+		0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
+		0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
+		0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
+		0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
+		0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
+		0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
+		0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
+		0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
+		0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
+		0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
+		0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
+		0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
+		0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
+		0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
+		0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
+		0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
+		0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
+		0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
+		0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
+		0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
+		0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
+		0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
+		0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
+		0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
+	} };
+
+	int i;
+
+	for (i = 0; i < 16; i += 4)
+		out->cols[i >> 2] = cpu_to_le32(
+			mc[inv][in->bytes[i]] ^
+			rol32(mc[inv][in->bytes[i + 1]], 8) ^
+			rol32(mc[inv][in->bytes[i + 2]], 16) ^
+			rol32(mc[inv][in->bytes[i + 3]], 24));
+}
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
index 10ca8ff93cc2..781e50cb2f03 100644
--- a/arch/arm64/include/asm/traps.h
+++ b/arch/arm64/include/asm/traps.h
@@ -27,4 +27,14 @@ static inline int in_exception_text(unsigned long ptr)
 	       ptr < (unsigned long)&__exception_text_end;
 }
 
+struct undef_hook {
+	struct list_head node;
+	u32 instr_mask;
+	u32 instr_val;
+	int (*fn)(struct pt_regs *regs, unsigned int instr);
+};
+
+void register_undef_hook(struct undef_hook *hook);
+void unregister_undef_hook(struct undef_hook *hook);
+
 #endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 32d7fe6c3d6a..7cfbb65f6aa9 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -283,7 +283,9 @@ el1_undef:
 	 * Undefined instruction
 	 */
 	mov	x0, sp
-	b	do_undefinstr
+	bl	do_undefinstr
+
+	kernel_exit 1
 el1_dbg:
 	/*
 	 * Debug exception handling
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 7ffadddb645d..3cc4c915b73f 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -257,11 +257,60 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,
 		die(str, regs, err);
 }
 
+static LIST_HEAD(undef_hook);
+static DEFINE_RAW_SPINLOCK(undef_lock);
+
+void register_undef_hook(struct undef_hook *hook)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&undef_lock, flags);
+	list_add(&hook->node, &undef_hook);
+	raw_spin_unlock_irqrestore(&undef_lock, flags);
+}
+
+void unregister_undef_hook(struct undef_hook *hook)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&undef_lock, flags);
+	list_del(&hook->node);
+	raw_spin_unlock_irqrestore(&undef_lock, flags);
+}
+
+static int call_undef_hook(struct pt_regs *regs, void __user *pc)
+{
+	struct undef_hook *hook;
+	unsigned long flags;
+	int (*fn)(struct pt_regs *regs, unsigned int instr) = NULL;
+	unsigned int instr;
+	mm_segment_t fs;
+	int ret;
+
+	fs = get_fs();
+	set_fs(KERNEL_DS);
+
+	get_user(instr, (u32 __user *)pc);
+
+	raw_spin_lock_irqsave(&undef_lock, flags);
+	list_for_each_entry(hook, &undef_hook, node)
+		if ((instr & hook->instr_mask) == hook->instr_val)
+			fn = hook->fn;
+	raw_spin_unlock_irqrestore(&undef_lock, flags);
+
+	ret = fn ? fn(regs, instr) : 1;
+	set_fs(fs);
+	return ret;
+}
+
 asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
 {
 	siginfo_t info;
 	void __user *pc = (void __user *)instruction_pointer(regs);
 
+	if (call_undef_hook(regs, pc) == 0)
+		return;
+
 	/* check for AArch32 breakpoint instructions */
 	if (!aarch32_break_handler(regs))
 		return;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/8] x86: move arch_cpu_uevent() to generic code
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 1/8] arm64: add kernel emulation for AES instructions Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 3/8] cpu: advertise CPU features over udev in a generic way Ard Biesheuvel
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

Only x86 implements arch_cpu_uevent(), and there is nothing arch
specific about it, so move it to drivers/base/cpu.c.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/kernel/cpu/match.c | 11 -----------
 drivers/base/cpu.c          | 15 ++++++++++++++-
 include/linux/cpu.h         |  1 -
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c
index 36565373af87..ab6082a9020e 100644
--- a/arch/x86/kernel/cpu/match.c
+++ b/arch/x86/kernel/cpu/match.c
@@ -78,14 +78,3 @@ ssize_t arch_print_cpu_modalias(struct device *dev,
 	*buf++ = '\n';
 	return buf - bufptr;
 }
-
-int arch_cpu_uevent(struct device *dev, struct kobj_uevent_env *env)
-{
-	char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
-	if (buf) {
-		arch_print_cpu_modalias(NULL, NULL, buf);
-		add_uevent_var(env, "MODALIAS=%s", buf);
-		kfree(buf);
-	}
-	return 0;
-}
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index f48370dfc908..270649012e64 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -286,6 +286,19 @@ static void cpu_device_release(struct device *dev)
 	 */
 }
 
+#ifdef CONFIG_ARCH_HAS_CPU_AUTOPROBE
+static int cpu_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+	char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (buf) {
+		arch_print_cpu_modalias(NULL, NULL, buf);
+		add_uevent_var(env, "MODALIAS=%s", buf);
+		kfree(buf);
+	}
+	return 0;
+}
+#endif
+
 /*
  * register_cpu - Setup a sysfs device for a CPU.
  * @cpu - cpu->hotpluggable field set to 1 will generate a control file in
@@ -307,7 +320,7 @@ int register_cpu(struct cpu *cpu, int num)
 	cpu->dev.offline = !cpu_online(num);
 	cpu->dev.of_node = of_get_cpu_node(num, NULL);
 #ifdef CONFIG_ARCH_HAS_CPU_AUTOPROBE
-	cpu->dev.bus->uevent = arch_cpu_uevent;
+	cpu->dev.bus->uevent = cpu_uevent;
 #endif
 	cpu->dev.groups = common_cpu_attr_groups;
 	if (cpu->hotpluggable)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 03e235ad1bba..dcc4a0d9c45f 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -47,7 +47,6 @@ extern ssize_t arch_cpu_release(const char *, size_t);
 struct notifier_block;
 
 #ifdef CONFIG_ARCH_HAS_CPU_AUTOPROBE
-extern int arch_cpu_uevent(struct device *dev, struct kobj_uevent_env *env);
 extern ssize_t arch_print_cpu_modalias(struct device *dev,
 				       struct device_attribute *attr,
 				       char *bufptr);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/8] cpu: advertise CPU features over udev in a generic way
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 1/8] arm64: add kernel emulation for AES instructions Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 2/8] x86: move arch_cpu_uevent() to generic code Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 4/8] x86: align with generic cpu modalias Ard Biesheuvel
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This patch implements a generic modalias 'cpu:type:...:feature:...'
which enables CPU feature flag based module loading in a generic way.
All the arch needs to do is enable CONFIG_ARCH_HAS_CPU_AUTOPROBE and
implement arch_print_cpu_modalias(). The modules need to declare the
CPU feature they depend on with MODULE_DEVICE_TABLE(cpu, ...)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 include/linux/mod_devicetable.h   | 15 +++++++++++++++
 scripts/mod/devicetable-offsets.c |  3 +++
 scripts/mod/file2alias.c          | 10 ++++++++++
 3 files changed, 28 insertions(+)

diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 45e921401b06..3aad3c0f765c 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -564,6 +564,21 @@ struct x86_cpu_id {
 #define X86_MODEL_ANY  0
 #define X86_FEATURE_ANY 0	/* Same as FPU, you can't test for that */
 
+/*
+ * Generic table type for matching CPU features.
+ * @feature:	the bit number of the feature (0 - 65535)
+ *
+ * How the bit numbers map to actual CPU features is entirely up to the arch,
+ * although using the same ID space as hwcaps seems obvious.
+ */
+
+struct cpu_feature {
+	__u16	feature;
+};
+
+/* hwcap const are bit masks, so take the log when using them as feature IDs */
+#define CPU_FEATURE_HWCAP(x)	ilog2(x)
+
 #define IPACK_ANY_FORMAT 0xff
 #define IPACK_ANY_ID (~0)
 struct ipack_device_id {
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index bb5d115ca671..f282516acc7b 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -174,6 +174,9 @@ int main(void)
 	DEVID_FIELD(x86_cpu_id, model);
 	DEVID_FIELD(x86_cpu_id, vendor);
 
+	DEVID(cpu_feature);
+	DEVID_FIELD(cpu_feature, feature);
+
 	DEVID(mei_cl_device_id);
 	DEVID_FIELD(mei_cl_device_id, name);
 
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 23708636b05c..8a69005228d8 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1135,6 +1135,16 @@ static int do_x86cpu_entry(const char *filename, void *symval,
 }
 ADD_TO_DEVTABLE("x86cpu", x86_cpu_id, do_x86cpu_entry);
 
+/* LOOKS like cpu:type:*:feature:*FEAT* */
+static int do_cpu_entry(const char *filename, void *symval, char *alias)
+{
+	DEF_FIELD(symval, cpu_feature, feature);
+
+	sprintf(alias, "cpu:type:*:feature:*%04X*", feature);
+	return 1;
+}
+ADD_TO_DEVTABLE("cpu", cpu_feature, do_cpu_entry);
+
 /* Looks like: mei:S */
 static int do_mei_entry(const char *filename, void *symval,
 			char *alias)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/8] x86: align with generic cpu modalias
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2014-01-06  8:22 ` [PATCH 3/8] cpu: advertise CPU features over udev in a generic way Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 5/8] arm64: advertise CPU features for modalias matching Ard Biesheuvel
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

Align with the new generic 'cpu:type:...:features:...' modalias
by moving the 'x86' prefix and the vendor/family/model IDs into
the 'type' field.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/kernel/cpu/match.c |  3 +--
 scripts/mod/file2alias.c    | 10 +++++-----
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c
index ab6082a9020e..82e92b2185f6 100644
--- a/arch/x86/kernel/cpu/match.c
+++ b/arch/x86/kernel/cpu/match.c
@@ -56,8 +56,7 @@ ssize_t arch_print_cpu_modalias(struct device *dev,
 	int i, n;
 	char *buf = bufptr;
 
-	n = snprintf(buf, size, "x86cpu:vendor:%04X:family:%04X:"
-		     "model:%04X:feature:",
+	n = snprintf(buf, size, "cpu:type:x86,ven%04Xfam%04Xmod%04X:feature:",
 		boot_cpu_data.x86_vendor,
 		boot_cpu_data.x86,
 		boot_cpu_data.x86_model);
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 8a69005228d8..5fdad833f951 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1110,7 +1110,7 @@ static int do_amba_entry(const char *filename,
 }
 ADD_TO_DEVTABLE("amba", amba_id, do_amba_entry);
 
-/* LOOKS like x86cpu:vendor:VVVV:family:FFFF:model:MMMM:feature:*,FEAT,*
+/* LOOKS like cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:*,FEAT,*
  * All fields are numbers. It would be nicer to use strings for vendor
  * and feature, but getting those out of the build system here is too
  * complicated.
@@ -1124,10 +1124,10 @@ static int do_x86cpu_entry(const char *filename, void *symval,
 	DEF_FIELD(symval, x86_cpu_id, model);
 	DEF_FIELD(symval, x86_cpu_id, vendor);
 
-	strcpy(alias, "x86cpu:");
-	ADD(alias, "vendor:",  vendor != X86_VENDOR_ANY, vendor);
-	ADD(alias, ":family:", family != X86_FAMILY_ANY, family);
-	ADD(alias, ":model:",  model  != X86_MODEL_ANY,  model);
+	strcpy(alias, "cpu:type:x86,");
+	ADD(alias, "ven", vendor != X86_VENDOR_ANY, vendor);
+	ADD(alias, "fam", family != X86_FAMILY_ANY, family);
+	ADD(alias, "mod", model  != X86_MODEL_ANY,  model);
 	strcat(alias, ":feature:*");
 	if (feature != X86_FEATURE_ANY)
 		sprintf(alias + strlen(alias), "%04X*", feature);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/8] arm64: advertise CPU features for modalias matching
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2014-01-06  8:22 ` [PATCH 4/8] x86: align with generic cpu modalias Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 6/8] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This enables the generic implementation in drivers/base/cpu.c
that allows modules to be loaded automatically based on the
optional features supported (and advertised over udev) by the
CPU.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig        |  3 +++
 arch/arm64/kernel/setup.c | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index cb9421b540c8..74650b68d575 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -214,6 +214,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	def_bool y
 
+config ARCH_HAS_CPU_AUTOPROBE
+	def_bool y
+
 source "mm/Kconfig"
 
 config XEN_DOM0
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index bb33fff09ba2..5ed082bbd61c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -379,3 +379,23 @@ const struct seq_operations cpuinfo_op = {
 	.stop	= c_stop,
 	.show	= c_show
 };
+
+ssize_t arch_print_cpu_modalias(struct device *dev,
+				struct device_attribute *attr,
+				char *buf)
+{
+	unsigned long caps;
+	ssize_t n;
+	int i;
+
+	/*
+	 * With 64 features maximum (taking 5 bytes each to print), we don't
+	 * need to worry about overrunning the PAGE_SIZE sized buffer.
+	 */
+	n = sprintf(buf, "cpu:type:%s:feature:", ELF_PLATFORM);
+	for (caps = elf_hwcap, i = 0; caps; caps >>= 1, i++)
+		if (caps & 1)
+			n += sprintf(&buf[n], ",%04X", i);
+	buf[n++] = '\n';
+	return n;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/8] arm64: defer reloading a task's FPSIMD state to userland resume
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2014-01-06  8:22 ` [PATCH 5/8] arm64: advertise CPU features for modalias matching Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 7/8] arm64: add support for kernel mode NEON in atomic context Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 8/8] arm64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.

This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task re-enters userland. If a task does not enter
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h      |  3 ++
 arch/arm64/include/asm/thread_info.h |  4 +-
 arch/arm64/kernel/entry.S            |  2 +-
 arch/arm64/kernel/fpsimd.c           | 79 +++++++++++++++++++++++++++++++-----
 arch/arm64/kernel/process.c          |  3 +-
 arch/arm64/kernel/signal.c           |  3 ++
 6 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac13008..609bc44ceb8d 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -37,6 +37,8 @@ struct fpsimd_state {
 			u32 fpcr;
 		};
 	};
+	/* the id of the last cpu to have restored this state */
+	unsigned int last_cpu;
 };
 
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
@@ -57,6 +59,7 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
 
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
+extern void fpsimd_reload_fpstate(void);
 
 #endif
 
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b66ffd..4a1ca1cfb2f8 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
+#define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_SYSCALL_TRACE	8
 #define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
@@ -112,10 +113,11 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
+#define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-				 _TIF_NOTIFY_RESUME)
+				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 7cfbb65f6aa9..24e91097013b 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -578,7 +578,7 @@ fast_work_pending:
 	str	x0, [sp, #S_X0]			// returned x0
 work_pending:
 	tbnz	x1, #TIF_NEED_RESCHED, work_resched
-	/* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+	/* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
 	ldr	x2, [sp, #S_PSTATE]
 	mov	x0, sp				// 'regs'
 	tst	x2, #PSR_MODE_MASK		// user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index bb785d23dbde..5b13c17e799f 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -34,6 +34,23 @@
 #define FPEXC_IDF	(1 << 7)
 
 /*
+ * In order to reduce the number of times the fpsimd state is frivolously saved
+ * and restored, keep track here of which task's userland owns the current state
+ * of the FPSIMD register file.
+ *
+ * This percpu variable points to the fpsimd_state.last_cpu field of the task
+ * whose FPSIMD state was most recently loaded onto this cpu. The last_cpu field
+ * itself contains the id of the cpu onto which the task's FPSIMD state was
+ * loaded most recently. So, to decide whether we can skip reloading the FPSIMD
+ * state, we need to check
+ * (a) whether this task was the last one to have its FPSIMD state loaded onto
+ *     this cpu
+ * (b) whether this task may have manipulated its FPSIMD state on another cpu in
+ *     the meantime
+ */
+static DEFINE_PER_CPU(unsigned int *, fpsimd_last_cpu);
+
+/*
  * Trapped FP/ASIMD access.
  */
 void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
@@ -71,18 +88,56 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
-	/* check if not kernel threads */
-	if (current->mm)
+	/*
+	 * The thread flag TIF_FOREIGN_FPSTATE conveys that the userland FPSIMD
+	 * state belonging to the current task is not present in the registers
+	 * but has (already) been saved to memory in order for the kernel to be
+	 * able to go off and use the registers for something else. Therefore,
+	 * we must not (re)save the register contents if this flag is set.
+	 */
+	if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
-	if (next->mm)
-		fpsimd_load_state(&next->thread.fpsimd_state);
+
+	if (next->mm) {
+		/*
+		 * If we are switching to a task whose most recent userland NEON
+		 * contents are already in the registers of *this* cpu, we can
+		 * skip loading the state from memory. Otherwise, set the
+		 * TIF_FOREIGN_FPSTATE flag so the state will be loaded upon the
+		 * next entry of userland.
+		 */
+		struct fpsimd_state *st = &next->thread.fpsimd_state;
+
+		if (__get_cpu_var(fpsimd_last_cpu) == &st->last_cpu
+		    && st->last_cpu == smp_processor_id())
+			clear_ti_thread_flag(task_thread_info(next),
+					     TIF_FOREIGN_FPSTATE);
+		else
+			set_ti_thread_flag(task_thread_info(next),
+					   TIF_FOREIGN_FPSTATE);
+	}
 }
 
 void fpsimd_flush_thread(void)
 {
-	preempt_disable();
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
-	fpsimd_load_state(&current->thread.fpsimd_state);
+	set_thread_flag(TIF_FOREIGN_FPSTATE);
+}
+
+void fpsimd_reload_fpstate(void)
+{
+	preempt_disable();
+	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		/*
+		 * We are entering userland and the userland context is not yet
+		 * present in the registers.
+		 */
+		struct fpsimd_state *st = &current->thread.fpsimd_state;
+
+		fpsimd_load_state(st);
+		__get_cpu_var(fpsimd_last_cpu) = &st->last_cpu;
+		st->last_cpu = smp_processor_id();
+	}
 	preempt_enable();
 }
 
@@ -97,16 +152,20 @@ void kernel_neon_begin(void)
 	BUG_ON(in_interrupt());
 	preempt_disable();
 
-	if (current->mm)
+	/*
+	 * Save the userland FPSIMD state if we have one and if we haven't done
+	 * so already. Clear fpsimd_last_cpu to indicate that there is no
+	 * longer userland context in the registers.
+	 */
+	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
+	__get_cpu_var(fpsimd_last_cpu) = NULL;
+
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
 void kernel_neon_end(void)
 {
-	if (current->mm)
-		fpsimd_load_state(&current->thread.fpsimd_state);
-
 	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0adb8f0f4549..c45ee7038f5e 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -202,7 +202,8 @@ void release_thread(struct task_struct *dead_task)
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	fpsimd_save_state(&current->thread.fpsimd_state);
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
+		fpsimd_save_state(&current->thread.fpsimd_state);
 	*dst = *src;
 	return 0;
 }
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591f75dd..0a9eccf4fc0f 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -416,4 +416,7 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 	}
+
+	if (thread_flags & _TIF_FOREIGN_FPSTATE)
+		fpsimd_reload_fpstate();
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/8] arm64: add support for kernel mode NEON in atomic context
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2014-01-06  8:22 ` [PATCH 6/8] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  2014-01-06  8:22 ` [PATCH 8/8] arm64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/fpsimd.h       | 17 ++++++++++++++
 arch/arm64/include/asm/fpsimdmacros.h | 37 ++++++++++++++++++++++++++++++
 arch/arm64/include/asm/neon.h         |  6 ++++-
 arch/arm64/kernel/entry-fpsimd.S      | 24 +++++++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 43 +++++++++++++++++++++++------------
 5 files changed, 111 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 609bc44ceb8d..dc9ef741c648 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,19 @@ struct fpsimd_state {
 	unsigned int last_cpu;
 };
 
+/*
+ * Struct for stacking the bottom 'n' FP/SIMD registers.
+ * Mainly intended for kernel use of v8 Crypto Extensions which only
+ * needs a few registers and may need to execute in atomic context.
+ */
+struct fpsimd_partial_state {
+	u32		num_regs;
+	u32		fpsr;
+	u32		fpcr;
+	__uint128_t	vregs[32] __aligned(16);
+} __aligned(16);
+
+
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /* Masks for extracting the FPSR and FPCR from the FPSCR */
 #define VFP_FPSCR_STAT_MASK	0xf800009f
@@ -57,6 +70,10 @@ struct task_struct;
 extern void fpsimd_save_state(struct fpsimd_state *state);
 extern void fpsimd_load_state(struct fpsimd_state *state);
 
+extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
+				      u32 num_regs);
+extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
+
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 extern void fpsimd_reload_fpstate(void);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bbec599c96bd..42990a82c671 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -62,3 +62,40 @@
 	ldr	w\tmpnr, [\state, #16 * 2 + 4]
 	msr	fpcr, x\tmpnr
 .endm
+
+.altmacro
+.macro	q2op, op, q1, q2, state
+	\op	q\q1, q\q2, [\state, # -16 * \q1 - 16]
+.endm
+
+.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
+	mrs	x\tmpnr1, fpsr
+	str	w\numnr, [\state]
+	mrs	x\tmpnr2, fpcr
+	stp	w\tmpnr1, w\tmpnr2, [\state, #4]
+	adr	x\tmpnr1, 0f
+	add	\state, \state, x\numnr, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+		qb = \qa + 1
+	q2op	stp, \qa, %qb, \state
+	.endr
+0:
+.endm
+
+.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
+	ldp	w\tmpnr1, w\tmpnr2, [\state, #4]
+	msr	fpsr, x\tmpnr1
+	msr	fpcr, x\tmpnr2
+	adr	x\tmpnr1, 0f
+	ldr	w\tmpnr2, [\state]
+	add	\state, \state, x\tmpnr2, lsl #4
+	sub	x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
+	br	x\tmpnr1
+	.irp	qa, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0
+		qb = \qa + 1
+	q2op	ldp, \qa, %qb, \state
+	.endr
+0:
+.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index b0cc58a97780..21a9e35655b7 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,7 +8,11 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/types.h>
+
 #define cpu_has_neon()		(1)
 
-void kernel_neon_begin(void);
+#define kernel_neon_begin() 	kernel_neon_begin_partial(32)
+
+void kernel_neon_begin_partial(u32 num_regs);
 void kernel_neon_end(void);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 6a27cd6dbfa6..d358ccacfc00 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,3 +41,27 @@ ENTRY(fpsimd_load_state)
 	fpsimd_restore x0, 8
 	ret
 ENDPROC(fpsimd_load_state)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Save the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_save_partial_state)
+	fpsimd_save_partial x0, 1, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+/*
+ * Load the bottom n FP registers.
+ *
+ * x0 - pointer to struct fpsimd_partial_state
+ */
+ENTRY(fpsimd_load_partial_state)
+	fpsimd_restore_partial x0, 8, 9
+	ret
+ENDPROC(fpsimd_load_partial_state)
+
+#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 5b13c17e799f..ac105da56c8c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -143,30 +143,43 @@ void fpsimd_reload_fpstate(void)
 
 #ifdef CONFIG_KERNEL_MODE_NEON
 
+static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
+static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+
 /*
  * Kernel-side NEON support functions
  */
-void kernel_neon_begin(void)
+void kernel_neon_begin_partial(u32 num_regs)
 {
-	/* Avoid using the NEON in interrupt context */
-	BUG_ON(in_interrupt());
-	preempt_disable();
-
-	/*
-	 * Save the userland FPSIMD state if we have one and if we haven't done
-	 * so already. Clear fpsimd_last_cpu to indicate that there is no
-	 * longer userland context in the registers.
-	 */
-	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
-		fpsimd_save_state(&current->thread.fpsimd_state);
-	__get_cpu_var(fpsimd_last_cpu) = NULL;
+	if (in_interrupt()) {
+		struct fpsimd_partial_state *s = this_cpu_ptr(
+			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
 
+		BUG_ON(num_regs > 32);
+		fpsimd_save_partial_state(s, roundup(num_regs, 2));
+	} else {
+		/*
+		 * Save the userland FPSIMD state if we have one and if we
+		 * haven't done so already. Clear fpsimd_last_cpu to indicate
+		 * that there is no longer userland context in the registers.
+		 */
+		preempt_disable();
+		if (current->mm &&
+		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+			fpsimd_save_state(&current->thread.fpsimd_state);
+		__get_cpu_var(fpsimd_last_cpu) = NULL;
+	}
 }
-EXPORT_SYMBOL(kernel_neon_begin);
+EXPORT_SYMBOL(kernel_neon_begin_partial);
 
 void kernel_neon_end(void)
 {
-	preempt_enable();
+	if (in_interrupt()) {
+		struct fpsimd_partial_state *s = this_cpu_ptr(
+			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
+		fpsimd_load_partial_state(s);
+	} else
+		preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 8/8] arm64: add Crypto Extensions based synchronous core AES cipher
  2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2014-01-06  8:22 ` [PATCH 7/8] arm64: add support for kernel mode NEON in atomic context Ard Biesheuvel
@ 2014-01-06  8:22 ` Ard Biesheuvel
  7 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2014-01-06  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This implements the core AES cipher using the Crypto Extensions,
using only NEON registers q0 and q1.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Makefile               |   1 +
 arch/arm64/crypto/Makefile        |  13 +++++
 arch/arm64/crypto/aes-ce-cipher.c | 112 ++++++++++++++++++++++++++++++++++++++
 crypto/Kconfig                    |   6 ++
 4 files changed, 132 insertions(+)
 create mode 100644 arch/arm64/crypto/Makefile
 create mode 100644 arch/arm64/crypto/aes-ce-cipher.c

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index e0b75464b7f1..a4b3e253557d 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -46,6 +46,7 @@ core-y		+= arch/arm64/emu/
 core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
 libs-y		:= arch/arm64/lib/ $(libs-y)
 libs-y		+= $(LIBGCC)
 
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
new file mode 100644
index 000000000000..ac58945c50b3
--- /dev/null
+++ b/arch/arm64/crypto/Makefile
@@ -0,0 +1,13 @@
+#
+# linux/arch/arm64/crypto/Makefile
+#
+# Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
+
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c
new file mode 100644
index 000000000000..dd132c2d69ab
--- /dev/null
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -0,0 +1,112 @@
+/*
+ * linux/arch/arm64/crypto/aes-sync.c
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+
+MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL");
+
+static struct cpu_feature const feat_hwcap_aes[] = {
+	{ CPU_FEATURE_HWCAP(HWCAP_AES) },
+	{ }
+};
+MODULE_DEVICE_TABLE(cpu, feat_hwcap_aes);
+
+static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 rounds = 6 + ctx->key_length / 4;
+
+	kernel_neon_begin_partial(2);
+
+	__asm__("	ld1		{v0.16b}, [%[in]]		;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"0:	aese		v0.16b, v1.16b			;"
+		"	subs		%[rounds], %[rounds], #1	;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"	beq		1f				;"
+		"	aesmc		v0.16b, v0.16b			;"
+		"	b		0b				;"
+		"1:	eor		v0.16b, v0.16b, v1.16b		;"
+		"	st1		{v0.16b}, [%[out]]		;"
+	: :
+		[out]		"r"(dst),
+		[in]		"r"(src),
+		[rounds]	"r"(rounds),
+		[key]		"r"(ctx->key_enc)
+	:			"cc");
+
+	kernel_neon_end();
+}
+
+static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
+{
+	struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 rounds = 6 + ctx->key_length / 4;
+
+	kernel_neon_begin_partial(2);
+
+	__asm__("	ld1		{v0.16b}, [%[in]]		;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"0:	aesd		v0.16b, v1.16b			;"
+		"	ld1		{v1.16b}, [%[key]], #16		;"
+		"	subs		%[rounds], %[rounds], #1	;"
+		"	beq		1f				;"
+		"	aesimc		v0.16b, v0.16b			;"
+		"	b		0b				;"
+		"1:	eor		v0.16b, v0.16b, v1.16b		;"
+		"	st1		{v0.16b}, [%[out]]		;"
+	: :
+		[out]		"r"(dst),
+		[in]		"r"(src),
+		[rounds]	"r"(rounds),
+		[key]		"r"(ctx->key_dec)
+	:			"cc");
+
+	kernel_neon_end();
+}
+
+static struct crypto_alg aes_alg = {
+	.cra_name		= "aes",
+	.cra_driver_name	= "aes-ce",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct crypto_aes_ctx),
+	.cra_module		= THIS_MODULE,
+	.cra_cipher = {
+		.cia_min_keysize	= AES_MIN_KEY_SIZE,
+		.cia_max_keysize	= AES_MAX_KEY_SIZE,
+		.cia_setkey		= crypto_aes_set_key,
+		.cia_encrypt		= aes_cipher_encrypt,
+		.cia_decrypt		= aes_cipher_decrypt
+	}
+};
+
+static int __init aes_mod_init(void)
+{
+	if (!(elf_hwcap & HWCAP_AES))
+		return -ENODEV;
+	return crypto_register_alg(&aes_alg);
+}
+
+static void __exit aes_mod_exit(void)
+{
+	crypto_unregister_alg(&aes_alg);
+}
+
+module_init(aes_mod_init);
+module_exit(aes_mod_exit);
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 7bcb70d216e1..f1d98bc346b6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -791,6 +791,12 @@ config CRYPTO_AES_ARM_BS
 	  This implementation does not rely on any lookup tables so it is
 	  believed to be invulnerable to cache timing attacks.
 
+config CRYPTO_AES_ARM64_CE
+	tristate "Synchronous AES cipher using ARMv8 Crypto Extensions"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_ALGAPI
+	select CRYPTO_AES
+
 config CRYPTO_ANUBIS
 	tristate "Anubis cipher algorithm"
 	select CRYPTO_ALGAPI
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-01-06  8:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-06  8:22 [PATCH 0/8] arm64: udev autoloaded module for Crypto Extensions sync AES Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 1/8] arm64: add kernel emulation for AES instructions Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 2/8] x86: move arch_cpu_uevent() to generic code Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 3/8] cpu: advertise CPU features over udev in a generic way Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 4/8] x86: align with generic cpu modalias Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 5/8] arm64: advertise CPU features for modalias matching Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 6/8] arm64: defer reloading a task's FPSIMD state to userland resume Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 7/8] arm64: add support for kernel mode NEON in atomic context Ard Biesheuvel
2014-01-06  8:22 ` [PATCH 8/8] arm64: add Crypto Extensions based synchronous core AES cipher Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).