* [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines
@ 2018-08-27 11:02 Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden Ard Biesheuvel
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-27 11:02 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: will.deacon, catalin.marinas, herbert, ebiggers, suzuki.poulose,
linux-kernel, Ard Biesheuvel
There are many crc32 users in the kernel that call the library routine
rather than the crypto API wrapper, and so none of these callers use the
accelerated arm64 instructions when available.
While this is not known to cause performance issues, calling a table based
time variant implementation with a non-negligible D-cache footprint (8 KB)
is wasteful in any case, and now that the crc32 instructions have been made
mandatory in the architecture, let's wire them up into the core crc routines.
This also means that they will be exposed to the crypto API via the generic
CRC32 driver, and so we can remove the scalar routines from the crypto API
driver. This leaves the PMULL code, which will only be useful on systems
that implement 64x64 PMULL but not the CRC32 instructions. Given that no
such systems are known to exist, this driver is removed entirely in patch #4.
Ard Biesheuvel (4):
lib/crc32: make core crc32() routines weak so they can be overridden
arm64: cpufeature: add feature for CRC32 instructions
arm64/lib: add accelerated crc32 routines
crypto: arm64/crc32 - remove PMULL based CRC32 driver
arch/arm64/Kconfig | 1 +
arch/arm64/configs/defconfig | 1 -
arch/arm64/crypto/Kconfig | 5 -
arch/arm64/crypto/Makefile | 3 -
arch/arm64/crypto/crc32-ce-core.S | 287 --------------------
arch/arm64/crypto/crc32-ce-glue.c | 244 -----------------
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/kernel/cpufeature.c | 9 +
arch/arm64/lib/Makefile | 2 +
arch/arm64/lib/crc32.S | 60 ++++
lib/crc32.c | 11 +-
11 files changed, 81 insertions(+), 545 deletions(-)
delete mode 100644 arch/arm64/crypto/crc32-ce-core.S
delete mode 100644 arch/arm64/crypto/crc32-ce-glue.c
create mode 100644 arch/arm64/lib/crc32.S
--
2.18.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden
2018-08-27 11:02 [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Ard Biesheuvel
@ 2018-08-27 11:02 ` Ard Biesheuvel
2018-09-04 9:44 ` Herbert Xu
2018-08-27 11:02 ` [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions Ard Biesheuvel
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-27 11:02 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: will.deacon, catalin.marinas, herbert, ebiggers, suzuki.poulose,
linux-kernel, Ard Biesheuvel
Allow architectures to drop in accelerated CRC32 routines by making
the crc32_le/__crc32c_le entry points weak, and exposing non-weak
aliases for them that may be used by the accelerated versions as
fallbacks in case the instructions they rely upon are not available.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
lib/crc32.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/lib/crc32.c b/lib/crc32.c
index a6c9afafc8c8..45b1d67a1767 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -183,21 +183,21 @@ static inline u32 __pure crc32_le_generic(u32 crc, unsigned char const *p,
}
#if CRC_LE_BITS == 1
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
+u32 __pure __weak crc32_le(u32 crc, unsigned char const *p, size_t len)
{
return crc32_le_generic(crc, p, len, NULL, CRC32_POLY_LE);
}
-u32 __pure __crc32c_le(u32 crc, unsigned char const *p, size_t len)
+u32 __pure __weak __crc32c_le(u32 crc, unsigned char const *p, size_t len)
{
return crc32_le_generic(crc, p, len, NULL, CRC32C_POLY_LE);
}
#else
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
+u32 __pure __weak crc32_le(u32 crc, unsigned char const *p, size_t len)
{
return crc32_le_generic(crc, p, len,
(const u32 (*)[256])crc32table_le, CRC32_POLY_LE);
}
-u32 __pure __crc32c_le(u32 crc, unsigned char const *p, size_t len)
+u32 __pure __weak __crc32c_le(u32 crc, unsigned char const *p, size_t len)
{
return crc32_le_generic(crc, p, len,
(const u32 (*)[256])crc32ctable_le, CRC32C_POLY_LE);
@@ -206,6 +206,9 @@ u32 __pure __crc32c_le(u32 crc, unsigned char const *p, size_t len)
EXPORT_SYMBOL(crc32_le);
EXPORT_SYMBOL(__crc32c_le);
+u32 crc32_le_base(u32, unsigned char const *, size_t) __alias(crc32_le);
+u32 __crc32c_le_base(u32, unsigned char const *, size_t) __alias(__crc32c_le);
+
/*
* This multiplies the polynomials x and y modulo the given modulus.
* This follows the "little-endian" CRC convention that the lsbit
--
2.18.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-08-27 11:02 [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden Ard Biesheuvel
@ 2018-08-27 11:02 ` Ard Biesheuvel
2018-08-28 17:01 ` Will Deacon
2018-08-27 11:02 ` [PATCH 3/4] arm64/lib: add accelerated crc32 routines Ard Biesheuvel
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-27 11:02 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: will.deacon, catalin.marinas, herbert, ebiggers, suzuki.poulose,
linux-kernel, Ard Biesheuvel
Add a CRC32 feature bit and wire it up to the CPU id register so we
will be able to use alternatives patching for CRC32 operations.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/include/asm/cpucaps.h | 3 ++-
arch/arm64/kernel/cpufeature.c | 9 +++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index ae1f70450fb2..9932aca9704b 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -51,7 +51,8 @@
#define ARM64_SSBD 30
#define ARM64_MISMATCHED_CACHE_TYPE 31
#define ARM64_HAS_STAGE2_FWB 32
+#define ARM64_HAS_CRC32 33
-#define ARM64_NCAPS 33
+#define ARM64_NCAPS 34
#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e238b7932096..7626b80128f5 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1222,6 +1222,15 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.cpu_enable = cpu_enable_hw_dbm,
},
#endif
+ {
+ .desc = "CRC32 instructions",
+ .capability = ARM64_HAS_CRC32,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_cpuid_feature,
+ .sys_reg = SYS_ID_AA64ISAR0_EL1,
+ .field_pos = ID_AA64ISAR0_CRC32_SHIFT,
+ .min_field_value = 1,
+ },
{},
};
--
2.18.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] arm64/lib: add accelerated crc32 routines
2018-08-27 11:02 [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions Ard Biesheuvel
@ 2018-08-27 11:02 ` Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 4/4] crypto: arm64/crc32 - remove PMULL based CRC32 driver Ard Biesheuvel
2018-08-27 14:53 ` [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Theodore Y. Ts'o
4 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-27 11:02 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: will.deacon, catalin.marinas, herbert, ebiggers, suzuki.poulose,
linux-kernel, Ard Biesheuvel
Unlike crc32c(), which is wired up to the crypto API internally so the
optimal driver is selected based on the platform's capabilities,
crc32_le() is implemented as a library function using a slice-by-8 table
based C implementation. Even though few of the call sites may be
bottlenecks, calling a time variant implementation with a non-negligible
D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates
support for the CRC32 instructions that were optional in ARMv8.0, but are
already widely available, even on the Cortex-A53 based Raspberry Pi.
So implement routines that use these instructions if available, and fall
back to the existing generic routines otherwise. The selection is based
on alternatives patching.
Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since
CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE,
this just codifies the status quo.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/Kconfig | 1 +
arch/arm64/lib/Makefile | 2 +
arch/arm64/lib/crc32.S | 60 ++++++++++++++++++++
3 files changed, 63 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 29e75b47becd..0625355f12fa 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -75,6 +75,7 @@ config ARM64
select CLONE_BACKWARDS
select COMMON_CLK
select CPU_PM if (SUSPEND || CPU_IDLE)
+ select CRC32
select DCACHE_WORD_ACCESS
select DMA_DIRECT_OPS
select EDAC_SUPPORT
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 68755fd70dcf..f28f91fd96a2 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -25,3 +25,5 @@ KCOV_INSTRUMENT_atomic_ll_sc.o := n
UBSAN_SANITIZE_atomic_ll_sc.o := n
lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
+
+obj-$(CONFIG_CRC32) += crc32.o
diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S
new file mode 100644
index 000000000000..5bc1e85b4e1c
--- /dev/null
+++ b/arch/arm64/lib/crc32.S
@@ -0,0 +1,60 @@
+/*
+ * Accelerated CRC32(C) using AArch64 CRC instructions
+ *
+ * Copyright (C) 2016 - 2018 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative.h>
+#include <asm/assembler.h>
+
+ .cpu generic+crc
+
+ .macro __crc32, c
+0: subs x2, x2, #16
+ b.mi 8f
+ ldp x3, x4, [x1], #16
+CPU_BE( rev x3, x3 )
+CPU_BE( rev x4, x4 )
+ crc32\c\()x w0, w0, x3
+ crc32\c\()x w0, w0, x4
+ b.ne 0b
+ ret
+
+8: tbz x2, #3, 4f
+ ldr x3, [x1], #8
+CPU_BE( rev x3, x3 )
+ crc32\c\()x w0, w0, x3
+4: tbz x2, #2, 2f
+ ldr w3, [x1], #4
+CPU_BE( rev w3, w3 )
+ crc32\c\()w w0, w0, w3
+2: tbz x2, #1, 1f
+ ldrh w3, [x1], #2
+CPU_BE( rev16 w3, w3 )
+ crc32\c\()h w0, w0, w3
+1: tbz x2, #0, 0f
+ ldrb w3, [x1]
+ crc32\c\()b w0, w0, w3
+0: ret
+ .endm
+
+ .align 5
+ENTRY(crc32_le)
+alternative_if_not ARM64_HAS_CRC32
+ b crc32_le_base
+alternative_else_nop_endif
+ __crc32
+ENDPROC(crc32_le)
+
+ .align 5
+ENTRY(__crc32c_le)
+alternative_if_not ARM64_HAS_CRC32
+ b __crc32c_le_base
+alternative_else_nop_endif
+ __crc32 c
+ENDPROC(__crc32c_le)
--
2.18.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] crypto: arm64/crc32 - remove PMULL based CRC32 driver
2018-08-27 11:02 [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Ard Biesheuvel
` (2 preceding siblings ...)
2018-08-27 11:02 ` [PATCH 3/4] arm64/lib: add accelerated crc32 routines Ard Biesheuvel
@ 2018-08-27 11:02 ` Ard Biesheuvel
2018-09-04 5:21 ` Herbert Xu
2018-08-27 14:53 ` [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Theodore Y. Ts'o
4 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-27 11:02 UTC (permalink / raw)
To: linux-arm-kernel, linux-crypto
Cc: will.deacon, catalin.marinas, herbert, ebiggers, suzuki.poulose,
linux-kernel, Ard Biesheuvel
Now that the scalar fallbacks have been moved out of this driver into
the core crc32()/crc32c() routines, we are left with a CRC32 crypto API
driver for arm64 that is based only on 64x64 polynomial multiplication,
which is an optional instruction in the ARMv8 architecture, and is less
and less likely to be available on cores that do not also implement the
CRC32 instructions, given that those are mandatory in the architecture
as of ARMv8.1.
Since the scalar instructions do not require the special handling that
SIMD instructions do, and since they turn out to be considerably faster
on some cores (Cortex-A53) as well, there is really no point in keeping
this code around so let's just remove it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/configs/defconfig | 1 -
arch/arm64/crypto/Kconfig | 5 -
arch/arm64/crypto/Makefile | 3 -
arch/arm64/crypto/crc32-ce-core.S | 287 --------------------
arch/arm64/crypto/crc32-ce-glue.c | 244 -----------------
5 files changed, 540 deletions(-)
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index f67e8d5e93ad..323da306e9f4 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -703,7 +703,6 @@ CONFIG_CRYPTO_SHA3_ARM64=m
CONFIG_CRYPTO_SM3_ARM64_CE=m
CONFIG_CRYPTO_GHASH_ARM64_CE=y
CONFIG_CRYPTO_CRCT10DIF_ARM64_CE=m
-CONFIG_CRYPTO_CRC32_ARM64_CE=m
CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
CONFIG_CRYPTO_CHACHA20_NEON=m
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index e3fdb0fd6f70..63dc00423ca0 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -66,11 +66,6 @@ config CRYPTO_CRCT10DIF_ARM64_CE
depends on KERNEL_MODE_NEON && CRC_T10DIF
select CRYPTO_HASH
-config CRYPTO_CRC32_ARM64_CE
- tristate "CRC32 and CRC32C digest algorithms using ARMv8 extensions"
- depends on CRC32
- select CRYPTO_HASH
-
config CRYPTO_AES_ARM64
tristate "AES core cipher using scalar instructions"
select CRYPTO_AES
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index bcafd016618e..776357a3be35 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -32,9 +32,6 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
-obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
-crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
-
obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
aes-ce-cipher-y := aes-ce-core.o aes-ce-glue.o
diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S
deleted file mode 100644
index 8061bf0f9c66..000000000000
--- a/arch/arm64/crypto/crc32-ce-core.S
+++ /dev/null
@@ -1,287 +0,0 @@
-/*
- * Accelerated CRC32(C) using arm64 CRC, NEON and Crypto Extensions instructions
- *
- * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-/* GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see http://www.gnu.org/licenses
- *
- * Please visit http://www.xyratex.com/contact if you need additional
- * information or have any questions.
- *
- * GPL HEADER END
- */
-
-/*
- * Copyright 2012 Xyratex Technology Limited
- *
- * Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
- * calculation.
- * CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
- * PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
- * at:
- * http://www.intel.com/products/processor/manuals/
- * Intel(R) 64 and IA-32 Architectures Software Developer's Manual
- * Volume 2B: Instruction Set Reference, N-Z
- *
- * Authors: Gregory Prestas <Gregory_Prestas@us.xyratex.com>
- * Alexander Boyko <Alexander_Boyko@xyratex.com>
- */
-
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-
- .section ".rodata", "a"
- .align 6
- .cpu generic+crypto+crc
-
-.Lcrc32_constants:
- /*
- * [x4*128+32 mod P(x) << 32)]' << 1 = 0x154442bd4
- * #define CONSTANT_R1 0x154442bd4LL
- *
- * [(x4*128-32 mod P(x) << 32)]' << 1 = 0x1c6e41596
- * #define CONSTANT_R2 0x1c6e41596LL
- */
- .octa 0x00000001c6e415960000000154442bd4
-
- /*
- * [(x128+32 mod P(x) << 32)]' << 1 = 0x1751997d0
- * #define CONSTANT_R3 0x1751997d0LL
- *
- * [(x128-32 mod P(x) << 32)]' << 1 = 0x0ccaa009e
- * #define CONSTANT_R4 0x0ccaa009eLL
- */
- .octa 0x00000000ccaa009e00000001751997d0
-
- /*
- * [(x64 mod P(x) << 32)]' << 1 = 0x163cd6124
- * #define CONSTANT_R5 0x163cd6124LL
- */
- .quad 0x0000000163cd6124
- .quad 0x00000000FFFFFFFF
-
- /*
- * #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
- *
- * Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
- * = 0x1F7011641LL
- * #define CONSTANT_RU 0x1F7011641LL
- */
- .octa 0x00000001F701164100000001DB710641
-
-.Lcrc32c_constants:
- .octa 0x000000009e4addf800000000740eef02
- .octa 0x000000014cd00bd600000000f20c0dfe
- .quad 0x00000000dd45aab8
- .quad 0x00000000FFFFFFFF
- .octa 0x00000000dea713f10000000105ec76f0
-
- vCONSTANT .req v0
- dCONSTANT .req d0
- qCONSTANT .req q0
-
- BUF .req x19
- LEN .req x20
- CRC .req x21
- CONST .req x22
-
- vzr .req v9
-
- /**
- * Calculate crc32
- * BUF - buffer
- * LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
- * CRC - initial crc32
- * return %eax crc32
- * uint crc32_pmull_le(unsigned char const *buffer,
- * size_t len, uint crc32)
- */
- .text
-ENTRY(crc32_pmull_le)
- adr_l x3, .Lcrc32_constants
- b 0f
-
-ENTRY(crc32c_pmull_le)
- adr_l x3, .Lcrc32c_constants
-
-0: frame_push 4, 64
-
- mov BUF, x0
- mov LEN, x1
- mov CRC, x2
- mov CONST, x3
-
- bic LEN, LEN, #15
- ld1 {v1.16b-v4.16b}, [BUF], #0x40
- movi vzr.16b, #0
- fmov dCONSTANT, CRC
- eor v1.16b, v1.16b, vCONSTANT.16b
- sub LEN, LEN, #0x40
- cmp LEN, #0x40
- b.lt less_64
-
- ldr qCONSTANT, [CONST]
-
-loop_64: /* 64 bytes Full cache line folding */
- sub LEN, LEN, #0x40
-
- pmull2 v5.1q, v1.2d, vCONSTANT.2d
- pmull2 v6.1q, v2.2d, vCONSTANT.2d
- pmull2 v7.1q, v3.2d, vCONSTANT.2d
- pmull2 v8.1q, v4.2d, vCONSTANT.2d
-
- pmull v1.1q, v1.1d, vCONSTANT.1d
- pmull v2.1q, v2.1d, vCONSTANT.1d
- pmull v3.1q, v3.1d, vCONSTANT.1d
- pmull v4.1q, v4.1d, vCONSTANT.1d
-
- eor v1.16b, v1.16b, v5.16b
- ld1 {v5.16b}, [BUF], #0x10
- eor v2.16b, v2.16b, v6.16b
- ld1 {v6.16b}, [BUF], #0x10
- eor v3.16b, v3.16b, v7.16b
- ld1 {v7.16b}, [BUF], #0x10
- eor v4.16b, v4.16b, v8.16b
- ld1 {v8.16b}, [BUF], #0x10
-
- eor v1.16b, v1.16b, v5.16b
- eor v2.16b, v2.16b, v6.16b
- eor v3.16b, v3.16b, v7.16b
- eor v4.16b, v4.16b, v8.16b
-
- cmp LEN, #0x40
- b.lt less_64
-
- if_will_cond_yield_neon
- stp q1, q2, [sp, #.Lframe_local_offset]
- stp q3, q4, [sp, #.Lframe_local_offset + 32]
- do_cond_yield_neon
- ldp q1, q2, [sp, #.Lframe_local_offset]
- ldp q3, q4, [sp, #.Lframe_local_offset + 32]
- ldr qCONSTANT, [CONST]
- movi vzr.16b, #0
- endif_yield_neon
- b loop_64
-
-less_64: /* Folding cache line into 128bit */
- ldr qCONSTANT, [CONST, #16]
-
- pmull2 v5.1q, v1.2d, vCONSTANT.2d
- pmull v1.1q, v1.1d, vCONSTANT.1d
- eor v1.16b, v1.16b, v5.16b
- eor v1.16b, v1.16b, v2.16b
-
- pmull2 v5.1q, v1.2d, vCONSTANT.2d
- pmull v1.1q, v1.1d, vCONSTANT.1d
- eor v1.16b, v1.16b, v5.16b
- eor v1.16b, v1.16b, v3.16b
-
- pmull2 v5.1q, v1.2d, vCONSTANT.2d
- pmull v1.1q, v1.1d, vCONSTANT.1d
- eor v1.16b, v1.16b, v5.16b
- eor v1.16b, v1.16b, v4.16b
-
- cbz LEN, fold_64
-
-loop_16: /* Folding rest buffer into 128bit */
- subs LEN, LEN, #0x10
-
- ld1 {v2.16b}, [BUF], #0x10
- pmull2 v5.1q, v1.2d, vCONSTANT.2d
- pmull v1.1q, v1.1d, vCONSTANT.1d
- eor v1.16b, v1.16b, v5.16b
- eor v1.16b, v1.16b, v2.16b
-
- b.ne loop_16
-
-fold_64:
- /* perform the last 64 bit fold, also adds 32 zeroes
- * to the input stream */
- ext v2.16b, v1.16b, v1.16b, #8
- pmull2 v2.1q, v2.2d, vCONSTANT.2d
- ext v1.16b, v1.16b, vzr.16b, #8
- eor v1.16b, v1.16b, v2.16b
-
- /* final 32-bit fold */
- ldr dCONSTANT, [CONST, #32]
- ldr d3, [CONST, #40]
-
- ext v2.16b, v1.16b, vzr.16b, #4
- and v1.16b, v1.16b, v3.16b
- pmull v1.1q, v1.1d, vCONSTANT.1d
- eor v1.16b, v1.16b, v2.16b
-
- /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
- ldr qCONSTANT, [CONST, #48]
-
- and v2.16b, v1.16b, v3.16b
- ext v2.16b, vzr.16b, v2.16b, #8
- pmull2 v2.1q, v2.2d, vCONSTANT.2d
- and v2.16b, v2.16b, v3.16b
- pmull v2.1q, v2.1d, vCONSTANT.1d
- eor v1.16b, v1.16b, v2.16b
- mov w0, v1.s[1]
-
- frame_pop
- ret
-ENDPROC(crc32_pmull_le)
-ENDPROC(crc32c_pmull_le)
-
- .macro __crc32, c
-0: subs x2, x2, #16
- b.mi 8f
- ldp x3, x4, [x1], #16
-CPU_BE( rev x3, x3 )
-CPU_BE( rev x4, x4 )
- crc32\c\()x w0, w0, x3
- crc32\c\()x w0, w0, x4
- b.ne 0b
- ret
-
-8: tbz x2, #3, 4f
- ldr x3, [x1], #8
-CPU_BE( rev x3, x3 )
- crc32\c\()x w0, w0, x3
-4: tbz x2, #2, 2f
- ldr w3, [x1], #4
-CPU_BE( rev w3, w3 )
- crc32\c\()w w0, w0, w3
-2: tbz x2, #1, 1f
- ldrh w3, [x1], #2
-CPU_BE( rev16 w3, w3 )
- crc32\c\()h w0, w0, w3
-1: tbz x2, #0, 0f
- ldrb w3, [x1]
- crc32\c\()b w0, w0, w3
-0: ret
- .endm
-
- .align 5
-ENTRY(crc32_armv8_le)
- __crc32
-ENDPROC(crc32_armv8_le)
-
- .align 5
-ENTRY(crc32c_armv8_le)
- __crc32 c
-ENDPROC(crc32c_armv8_le)
diff --git a/arch/arm64/crypto/crc32-ce-glue.c b/arch/arm64/crypto/crc32-ce-glue.c
deleted file mode 100644
index 34b4e3d46aab..000000000000
--- a/arch/arm64/crypto/crc32-ce-glue.c
+++ /dev/null
@@ -1,244 +0,0 @@
-/*
- * Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
- *
- * Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/cpufeature.h>
-#include <linux/crc32.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/string.h>
-
-#include <crypto/internal/hash.h>
-
-#include <asm/hwcap.h>
-#include <asm/neon.h>
-#include <asm/simd.h>
-#include <asm/unaligned.h>
-
-#define PMULL_MIN_LEN 64L /* minimum size of buffer
- * for crc32_pmull_le_16 */
-#define SCALE_F 16L /* size of NEON register */
-
-asmlinkage u32 crc32_pmull_le(const u8 buf[], u64 len, u32 init_crc);
-asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], size_t len);
-
-asmlinkage u32 crc32c_pmull_le(const u8 buf[], u64 len, u32 init_crc);
-asmlinkage u32 crc32c_armv8_le(u32 init_crc, const u8 buf[], size_t len);
-
-static u32 (*fallback_crc32)(u32 init_crc, const u8 buf[], size_t len);
-static u32 (*fallback_crc32c)(u32 init_crc, const u8 buf[], size_t len);
-
-static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
-{
- u32 *key = crypto_tfm_ctx(tfm);
-
- *key = 0;
- return 0;
-}
-
-static int crc32c_pmull_cra_init(struct crypto_tfm *tfm)
-{
- u32 *key = crypto_tfm_ctx(tfm);
-
- *key = ~0;
- return 0;
-}
-
-static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
- unsigned int keylen)
-{
- u32 *mctx = crypto_shash_ctx(hash);
-
- if (keylen != sizeof(u32)) {
- crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
- return -EINVAL;
- }
- *mctx = le32_to_cpup((__le32 *)key);
- return 0;
-}
-
-static int crc32_pmull_init(struct shash_desc *desc)
-{
- u32 *mctx = crypto_shash_ctx(desc->tfm);
- u32 *crc = shash_desc_ctx(desc);
-
- *crc = *mctx;
- return 0;
-}
-
-static int crc32_update(struct shash_desc *desc, const u8 *data,
- unsigned int length)
-{
- u32 *crc = shash_desc_ctx(desc);
-
- *crc = crc32_armv8_le(*crc, data, length);
- return 0;
-}
-
-static int crc32c_update(struct shash_desc *desc, const u8 *data,
- unsigned int length)
-{
- u32 *crc = shash_desc_ctx(desc);
-
- *crc = crc32c_armv8_le(*crc, data, length);
- return 0;
-}
-
-static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
- unsigned int length)
-{
- u32 *crc = shash_desc_ctx(desc);
- unsigned int l;
-
- if ((u64)data % SCALE_F) {
- l = min_t(u32, length, SCALE_F - ((u64)data % SCALE_F));
-
- *crc = fallback_crc32(*crc, data, l);
-
- data += l;
- length -= l;
- }
-
- if (length >= PMULL_MIN_LEN && may_use_simd()) {
- l = round_down(length, SCALE_F);
-
- kernel_neon_begin();
- *crc = crc32_pmull_le(data, l, *crc);
- kernel_neon_end();
-
- data += l;
- length -= l;
- }
-
- if (length > 0)
- *crc = fallback_crc32(*crc, data, length);
-
- return 0;
-}
-
-static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
- unsigned int length)
-{
- u32 *crc = shash_desc_ctx(desc);
- unsigned int l;
-
- if ((u64)data % SCALE_F) {
- l = min_t(u32, length, SCALE_F - ((u64)data % SCALE_F));
-
- *crc = fallback_crc32c(*crc, data, l);
-
- data += l;
- length -= l;
- }
-
- if (length >= PMULL_MIN_LEN && may_use_simd()) {
- l = round_down(length, SCALE_F);
-
- kernel_neon_begin();
- *crc = crc32c_pmull_le(data, l, *crc);
- kernel_neon_end();
-
- data += l;
- length -= l;
- }
-
- if (length > 0) {
- *crc = fallback_crc32c(*crc, data, length);
- }
-
- return 0;
-}
-
-static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
-{
- u32 *crc = shash_desc_ctx(desc);
-
- put_unaligned_le32(*crc, out);
- return 0;
-}
-
-static int crc32c_pmull_final(struct shash_desc *desc, u8 *out)
-{
- u32 *crc = shash_desc_ctx(desc);
-
- put_unaligned_le32(~*crc, out);
- return 0;
-}
-
-static struct shash_alg crc32_pmull_algs[] = { {
- .setkey = crc32_pmull_setkey,
- .init = crc32_pmull_init,
- .update = crc32_update,
- .final = crc32_pmull_final,
- .descsize = sizeof(u32),
- .digestsize = sizeof(u32),
-
- .base.cra_ctxsize = sizeof(u32),
- .base.cra_init = crc32_pmull_cra_init,
- .base.cra_name = "crc32",
- .base.cra_driver_name = "crc32-arm64-ce",
- .base.cra_priority = 200,
- .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY,
- .base.cra_blocksize = 1,
- .base.cra_module = THIS_MODULE,
-}, {
- .setkey = crc32_pmull_setkey,
- .init = crc32_pmull_init,
- .update = crc32c_update,
- .final = crc32c_pmull_final,
- .descsize = sizeof(u32),
- .digestsize = sizeof(u32),
-
- .base.cra_ctxsize = sizeof(u32),
- .base.cra_init = crc32c_pmull_cra_init,
- .base.cra_name = "crc32c",
- .base.cra_driver_name = "crc32c-arm64-ce",
- .base.cra_priority = 200,
- .base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY,
- .base.cra_blocksize = 1,
- .base.cra_module = THIS_MODULE,
-} };
-
-static int __init crc32_pmull_mod_init(void)
-{
- if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && (elf_hwcap & HWCAP_PMULL)) {
- crc32_pmull_algs[0].update = crc32_pmull_update;
- crc32_pmull_algs[1].update = crc32c_pmull_update;
-
- if (elf_hwcap & HWCAP_CRC32) {
- fallback_crc32 = crc32_armv8_le;
- fallback_crc32c = crc32c_armv8_le;
- } else {
- fallback_crc32 = crc32_le;
- fallback_crc32c = __crc32c_le;
- }
- } else if (!(elf_hwcap & HWCAP_CRC32)) {
- return -ENODEV;
- }
- return crypto_register_shashes(crc32_pmull_algs,
- ARRAY_SIZE(crc32_pmull_algs));
-}
-
-static void __exit crc32_pmull_mod_exit(void)
-{
- crypto_unregister_shashes(crc32_pmull_algs,
- ARRAY_SIZE(crc32_pmull_algs));
-}
-
-static const struct cpu_feature crc32_cpu_feature[] = {
- { cpu_feature(CRC32) }, { cpu_feature(PMULL) }, { }
-};
-MODULE_DEVICE_TABLE(cpu, crc32_cpu_feature);
-
-module_init(crc32_pmull_mod_init);
-module_exit(crc32_pmull_mod_exit);
-
-MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
-MODULE_LICENSE("GPL v2");
--
2.18.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines
2018-08-27 11:02 [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Ard Biesheuvel
` (3 preceding siblings ...)
2018-08-27 11:02 ` [PATCH 4/4] crypto: arm64/crc32 - remove PMULL based CRC32 driver Ard Biesheuvel
@ 2018-08-27 14:53 ` Theodore Y. Ts'o
2018-08-27 15:18 ` Ard Biesheuvel
4 siblings, 1 reply; 15+ messages in thread
From: Theodore Y. Ts'o @ 2018-08-27 14:53 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-crypto, will.deacon, catalin.marinas,
herbert, ebiggers, suzuki.poulose, linux-kernel
On Mon, Aug 27, 2018 at 01:02:41PM +0200, Ard Biesheuvel wrote:
> While this is not known to cause performance issues, calling a table based
> time variant implementation with a non-negligible D-cache footprint (8 KB)
> is wasteful in any case, and now that the crc32 instructions have been made
> mandatory in the architecture, let's wire them up into the core crc routines.
Stupid question --- are there any arm64 SOC's out there which do *not*
have the crc32 instructions? Presumably there won't be in the future,
because it's now mandatory --- but where there any in the past?
- Ted
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines
2018-08-27 14:53 ` [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Theodore Y. Ts'o
@ 2018-08-27 15:18 ` Ard Biesheuvel
0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-27 15:18 UTC (permalink / raw)
To: Theodore Y. Ts'o, Ard Biesheuvel, linux-arm-kernel,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Will Deacon,
Catalin Marinas, Herbert Xu, Eric Biggers, Suzuki K. Poulose,
Linux Kernel Mailing List
On 27 August 2018 at 16:53, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Mon, Aug 27, 2018 at 01:02:41PM +0200, Ard Biesheuvel wrote:
>> While this is not known to cause performance issues, calling a table based
>> time variant implementation with a non-negligible D-cache footprint (8 KB)
>> is wasteful in any case, and now that the crc32 instructions have been made
>> mandatory in the architecture, let's wire them up into the core crc routines.
>
> Stupid question --- are there any arm64 SOC's out there which do *not*
> have the crc32 instructions? Presumably there won't be in the future,
> because it's now mandatory --- but where there any in the past?
>
Yes, the APM Xgene for instance.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-08-27 11:02 ` [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions Ard Biesheuvel
@ 2018-08-28 17:01 ` Will Deacon
2018-08-28 18:43 ` Ard Biesheuvel
0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-08-28 17:01 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-crypto, catalin.marinas, herbert,
ebiggers, suzuki.poulose, linux-kernel
On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> Add a CRC32 feature bit and wire it up to the CPU id register so we
> will be able to use alternatives patching for CRC32 operations.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> arch/arm64/include/asm/cpucaps.h | 3 ++-
> arch/arm64/kernel/cpufeature.c | 9 +++++++++
> 2 files changed, 11 insertions(+), 1 deletion(-)
Acked-by: Will Deacon <will.deacon@arm.com>
With the minor caveat below...
> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> index ae1f70450fb2..9932aca9704b 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -51,7 +51,8 @@
> #define ARM64_SSBD 30
> #define ARM64_MISMATCHED_CACHE_TYPE 31
> #define ARM64_HAS_STAGE2_FWB 32
> +#define ARM64_HAS_CRC32 33
>
> -#define ARM64_NCAPS 33
> +#define ARM64_NCAPS 34
... if this goes via crypto, you'll almost certainly get a (trivial)
conflict with arm64, since these numbers get bumped all the time.
Will
> #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index e238b7932096..7626b80128f5 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1222,6 +1222,15 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .cpu_enable = cpu_enable_hw_dbm,
> },
> #endif
> + {
> + .desc = "CRC32 instructions",
> + .capability = ARM64_HAS_CRC32,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_cpuid_feature,
> + .sys_reg = SYS_ID_AA64ISAR0_EL1,
> + .field_pos = ID_AA64ISAR0_CRC32_SHIFT,
> + .min_field_value = 1,
> + },
> {},
> };
>
> --
> 2.18.0
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-08-28 17:01 ` Will Deacon
@ 2018-08-28 18:43 ` Ard Biesheuvel
2018-09-04 3:18 ` Herbert Xu
0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2018-08-28 18:43 UTC (permalink / raw)
To: Will Deacon
Cc: linux-arm-kernel,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Catalin Marinas,
Herbert Xu, Eric Biggers, Suzuki K. Poulose,
Linux Kernel Mailing List
On 28 August 2018 at 19:01, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
>> Add a CRC32 feature bit and wire it up to the CPU id register so we
>> will be able to use alternatives patching for CRC32 operations.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>> arch/arm64/include/asm/cpucaps.h | 3 ++-
>> arch/arm64/kernel/cpufeature.c | 9 +++++++++
>> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> Acked-by: Will Deacon <will.deacon@arm.com>
>
> With the minor caveat below...
>
>> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
>> index ae1f70450fb2..9932aca9704b 100644
>> --- a/arch/arm64/include/asm/cpucaps.h
>> +++ b/arch/arm64/include/asm/cpucaps.h
>> @@ -51,7 +51,8 @@
>> #define ARM64_SSBD 30
>> #define ARM64_MISMATCHED_CACHE_TYPE 31
>> #define ARM64_HAS_STAGE2_FWB 32
>> +#define ARM64_HAS_CRC32 33
>>
>> -#define ARM64_NCAPS 33
>> +#define ARM64_NCAPS 34
>
>
> ... if this goes via crypto, you'll almost certainly get a (trivial)
> conflict with arm64, since these numbers get bumped all the time.
>
I think the first three patches should go through the arm64 tree. The
last one just removes the now redundant crc32 SIMD driver, and Herbert
could pick that up separately, i.e., it should be totally independent.
>> #endif /* __ASM_CPUCAPS_H */
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index e238b7932096..7626b80128f5 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1222,6 +1222,15 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>> .cpu_enable = cpu_enable_hw_dbm,
>> },
>> #endif
>> + {
>> + .desc = "CRC32 instructions",
>> + .capability = ARM64_HAS_CRC32,
>> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
>> + .matches = has_cpuid_feature,
>> + .sys_reg = SYS_ID_AA64ISAR0_EL1,
>> + .field_pos = ID_AA64ISAR0_CRC32_SHIFT,
>> + .min_field_value = 1,
>> + },
>> {},
>> };
>>
>> --
>> 2.18.0
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-08-28 18:43 ` Ard Biesheuvel
@ 2018-09-04 3:18 ` Herbert Xu
2018-09-04 9:38 ` Will Deacon
2018-09-10 15:45 ` Catalin Marinas
0 siblings, 2 replies; 15+ messages in thread
From: Herbert Xu @ 2018-09-04 3:18 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Will Deacon, linux-arm-kernel,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Catalin Marinas,
Eric Biggers, Suzuki K. Poulose, Linux Kernel Mailing List
On Tue, Aug 28, 2018 at 08:43:35PM +0200, Ard Biesheuvel wrote:
> On 28 August 2018 at 19:01, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> >> Add a CRC32 feature bit and wire it up to the CPU id register so we
> >> will be able to use alternatives patching for CRC32 operations.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >> arch/arm64/include/asm/cpucaps.h | 3 ++-
> >> arch/arm64/kernel/cpufeature.c | 9 +++++++++
> >> 2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > Acked-by: Will Deacon <will.deacon@arm.com>
> >
> > With the minor caveat below...
> >
> >> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> >> index ae1f70450fb2..9932aca9704b 100644
> >> --- a/arch/arm64/include/asm/cpucaps.h
> >> +++ b/arch/arm64/include/asm/cpucaps.h
> >> @@ -51,7 +51,8 @@
> >> #define ARM64_SSBD 30
> >> #define ARM64_MISMATCHED_CACHE_TYPE 31
> >> #define ARM64_HAS_STAGE2_FWB 32
> >> +#define ARM64_HAS_CRC32 33
> >>
> >> -#define ARM64_NCAPS 33
> >> +#define ARM64_NCAPS 34
> >
> >
> > ... if this goes via crypto, you'll almost certainly get a (trivial)
> > conflict with arm64, since these numbers get bumped all the time.
> >
>
> I think the first three patches should go through the arm64 tree. The
> last one just removes the now redundant crc32 SIMD driver, and Herbert
> could pick that up separately, i.e., it should be totally independent.
Yes let's do that.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] crypto: arm64/crc32 - remove PMULL based CRC32 driver
2018-08-27 11:02 ` [PATCH 4/4] crypto: arm64/crc32 - remove PMULL based CRC32 driver Ard Biesheuvel
@ 2018-09-04 5:21 ` Herbert Xu
0 siblings, 0 replies; 15+ messages in thread
From: Herbert Xu @ 2018-09-04 5:21 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-crypto, will.deacon, catalin.marinas,
ebiggers, suzuki.poulose, linux-kernel
On Mon, Aug 27, 2018 at 01:02:45PM +0200, Ard Biesheuvel wrote:
> Now that the scalar fallbacks have been moved out of this driver into
> the core crc32()/crc32c() routines, we are left with a CRC32 crypto API
> driver for arm64 that is based only on 64x64 polynomial multiplication,
> which is an optional instruction in the ARMv8 architecture, and is less
> and less likely to be available on cores that do not also implement the
> CRC32 instructions, given that those are mandatory in the architecture
> as of ARMv8.1.
>
> Since the scalar instructions do not require the special handling that
> SIMD instructions do, and since they turn out to be considerably faster
> on some cores (Cortex-A53) as well, there is really no point in keeping
> this code around so let's just remove it.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-09-04 3:18 ` Herbert Xu
@ 2018-09-04 9:38 ` Will Deacon
2018-09-04 9:44 ` Herbert Xu
2018-09-10 15:45 ` Catalin Marinas
1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-09-04 9:38 UTC (permalink / raw)
To: Herbert Xu
Cc: Ard Biesheuvel, linux-arm-kernel,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Catalin Marinas,
Eric Biggers, Suzuki K. Poulose, Linux Kernel Mailing List
On Tue, Sep 04, 2018 at 11:18:55AM +0800, Herbert Xu wrote:
> On Tue, Aug 28, 2018 at 08:43:35PM +0200, Ard Biesheuvel wrote:
> > On 28 August 2018 at 19:01, Will Deacon <will.deacon@arm.com> wrote:
> > > On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> > >> Add a CRC32 feature bit and wire it up to the CPU id register so we
> > >> will be able to use alternatives patching for CRC32 operations.
> > >>
> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >> ---
> > >> arch/arm64/include/asm/cpucaps.h | 3 ++-
> > >> arch/arm64/kernel/cpufeature.c | 9 +++++++++
> > >> 2 files changed, 11 insertions(+), 1 deletion(-)
> > >
> > > Acked-by: Will Deacon <will.deacon@arm.com>
> > >
> > > With the minor caveat below...
> > >
> > >> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> > >> index ae1f70450fb2..9932aca9704b 100644
> > >> --- a/arch/arm64/include/asm/cpucaps.h
> > >> +++ b/arch/arm64/include/asm/cpucaps.h
> > >> @@ -51,7 +51,8 @@
> > >> #define ARM64_SSBD 30
> > >> #define ARM64_MISMATCHED_CACHE_TYPE 31
> > >> #define ARM64_HAS_STAGE2_FWB 32
> > >> +#define ARM64_HAS_CRC32 33
> > >>
> > >> -#define ARM64_NCAPS 33
> > >> +#define ARM64_NCAPS 34
> > >
> > >
> > > ... if this goes via crypto, you'll almost certainly get a (trivial)
> > > conflict with arm64, since these numbers get bumped all the time.
> > >
> >
> > I think the first three patches should go through the arm64 tree. The
> > last one just removes the now redundant crc32 SIMD driver, and Herbert
> > could pick that up separately, i.e., it should be totally independent.
>
> Yes let's do that.
Okey doke! In which case, please can we have your Ack on the first patch?
Cheers,
Will
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden
2018-08-27 11:02 ` [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden Ard Biesheuvel
@ 2018-09-04 9:44 ` Herbert Xu
0 siblings, 0 replies; 15+ messages in thread
From: Herbert Xu @ 2018-09-04 9:44 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-crypto, will.deacon, catalin.marinas,
ebiggers, suzuki.poulose, linux-kernel
On Mon, Aug 27, 2018 at 01:02:42PM +0200, Ard Biesheuvel wrote:
> Allow architectures to drop in accelerated CRC32 routines by making
> the crc32_le/__crc32c_le entry points weak, and exposing non-weak
> aliases for them that may be used by the accelerated versions as
> fallbacks in case the instructions they rely upon are not available.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-09-04 9:38 ` Will Deacon
@ 2018-09-04 9:44 ` Herbert Xu
0 siblings, 0 replies; 15+ messages in thread
From: Herbert Xu @ 2018-09-04 9:44 UTC (permalink / raw)
To: Will Deacon
Cc: Ard Biesheuvel, linux-arm-kernel,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Catalin Marinas,
Eric Biggers, Suzuki K. Poulose, Linux Kernel Mailing List
On Tue, Sep 04, 2018 at 10:38:45AM +0100, Will Deacon wrote:
> On Tue, Sep 04, 2018 at 11:18:55AM +0800, Herbert Xu wrote:
> > On Tue, Aug 28, 2018 at 08:43:35PM +0200, Ard Biesheuvel wrote:
> > > On 28 August 2018 at 19:01, Will Deacon <will.deacon@arm.com> wrote:
> > > > On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> > > >> Add a CRC32 feature bit and wire it up to the CPU id register so we
> > > >> will be able to use alternatives patching for CRC32 operations.
> > > >>
> > > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > >> ---
> > > >> arch/arm64/include/asm/cpucaps.h | 3 ++-
> > > >> arch/arm64/kernel/cpufeature.c | 9 +++++++++
> > > >> 2 files changed, 11 insertions(+), 1 deletion(-)
> > > >
> > > > Acked-by: Will Deacon <will.deacon@arm.com>
> > > >
> > > > With the minor caveat below...
> > > >
> > > >> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> > > >> index ae1f70450fb2..9932aca9704b 100644
> > > >> --- a/arch/arm64/include/asm/cpucaps.h
> > > >> +++ b/arch/arm64/include/asm/cpucaps.h
> > > >> @@ -51,7 +51,8 @@
> > > >> #define ARM64_SSBD 30
> > > >> #define ARM64_MISMATCHED_CACHE_TYPE 31
> > > >> #define ARM64_HAS_STAGE2_FWB 32
> > > >> +#define ARM64_HAS_CRC32 33
> > > >>
> > > >> -#define ARM64_NCAPS 33
> > > >> +#define ARM64_NCAPS 34
> > > >
> > > >
> > > > ... if this goes via crypto, you'll almost certainly get a (trivial)
> > > > conflict with arm64, since these numbers get bumped all the time.
> > > >
> > >
> > > I think the first three patches should go through the arm64 tree. The
> > > last one just removes the now redundant crc32 SIMD driver, and Herbert
> > > could pick that up separately, i.e., it should be totally independent.
> >
> > Yes let's do that.
>
> Okey doke! In which case, please can we have your Ack on the first patch?
Sure, I have just sent an ack for that patch.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions
2018-09-04 3:18 ` Herbert Xu
2018-09-04 9:38 ` Will Deacon
@ 2018-09-10 15:45 ` Catalin Marinas
1 sibling, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2018-09-10 15:45 UTC (permalink / raw)
To: Herbert Xu
Cc: Ard Biesheuvel, Suzuki K. Poulose, Eric Biggers, Will Deacon,
Linux Kernel Mailing List,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
linux-arm-kernel
On Tue, Sep 04, 2018 at 11:18:55AM +0800, Herbert Xu wrote:
> On Tue, Aug 28, 2018 at 08:43:35PM +0200, Ard Biesheuvel wrote:
> > On 28 August 2018 at 19:01, Will Deacon <will.deacon@arm.com> wrote:
> > > On Mon, Aug 27, 2018 at 01:02:43PM +0200, Ard Biesheuvel wrote:
> > >> Add a CRC32 feature bit and wire it up to the CPU id register so we
> > >> will be able to use alternatives patching for CRC32 operations.
> > >>
> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >> ---
> > >> arch/arm64/include/asm/cpucaps.h | 3 ++-
> > >> arch/arm64/kernel/cpufeature.c | 9 +++++++++
> > >> 2 files changed, 11 insertions(+), 1 deletion(-)
> > >
> > > Acked-by: Will Deacon <will.deacon@arm.com>
> > >
> > > With the minor caveat below...
> > >
> > >> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> > >> index ae1f70450fb2..9932aca9704b 100644
> > >> --- a/arch/arm64/include/asm/cpucaps.h
> > >> +++ b/arch/arm64/include/asm/cpucaps.h
> > >> @@ -51,7 +51,8 @@
> > >> #define ARM64_SSBD 30
> > >> #define ARM64_MISMATCHED_CACHE_TYPE 31
> > >> #define ARM64_HAS_STAGE2_FWB 32
> > >> +#define ARM64_HAS_CRC32 33
> > >>
> > >> -#define ARM64_NCAPS 33
> > >> +#define ARM64_NCAPS 34
> > >
> > >
> > > ... if this goes via crypto, you'll almost certainly get a (trivial)
> > > conflict with arm64, since these numbers get bumped all the time.
> > >
> >
> > I think the first three patches should go through the arm64 tree. The
> > last one just removes the now redundant crc32 SIMD driver, and Herbert
> > could pick that up separately, i.e., it should be totally independent.
>
> Yes let's do that.
I queued the first 3 patches for 4.19. Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-09-10 15:45 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27 11:02 [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 1/4] lib/crc32: make core crc32() routines weak so they can be overridden Ard Biesheuvel
2018-09-04 9:44 ` Herbert Xu
2018-08-27 11:02 ` [PATCH 2/4] arm64: cpufeature: add feature for CRC32 instructions Ard Biesheuvel
2018-08-28 17:01 ` Will Deacon
2018-08-28 18:43 ` Ard Biesheuvel
2018-09-04 3:18 ` Herbert Xu
2018-09-04 9:38 ` Will Deacon
2018-09-04 9:44 ` Herbert Xu
2018-09-10 15:45 ` Catalin Marinas
2018-08-27 11:02 ` [PATCH 3/4] arm64/lib: add accelerated crc32 routines Ard Biesheuvel
2018-08-27 11:02 ` [PATCH 4/4] crypto: arm64/crc32 - remove PMULL based CRC32 driver Ard Biesheuvel
2018-09-04 5:21 ` Herbert Xu
2018-08-27 14:53 ` [PATCH 0/4] arm64: wire CRC32 instructions into core crc32 routines Theodore Y. Ts'o
2018-08-27 15:18 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).