All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
@ 2021-01-11 16:52 Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 1/7] crypto: crc-t10dif - turn library wrapper for shash into generic library Ard Biesheuvel
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

CRC-T10DIF is a very poor match for the crypto API:
- every user in the kernel calls it via a library wrapper around the
  shash API, so all callers share a single instance of the transform
- each architecture provides at most a single optimized implementation,
  based on SIMD instructions for carryless multiplication

In other words, none of the flexibility it provides is put to good use,
and so it is better to get rid of this complexity, and expose the optimized
implementations via static calls instead. This removes a substantial chunk
of code, and also gets rid of any indirect calls on architectures that
obsess about those (x86)

If this approach is deemed suitable, there are other places where we might
consider adopting it: CRC32 and CRC32(C).

Patch #1 does some preparatory refactoring and removes the library wrapper
around the shash transform.

Patch #2 introduces the actual static calls, along with the registration
routines to update the crc-t10dif implementation at runtime.

Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes
between the optimized library code and the generic library code.

Patches #4 to #7 update the various arch implementations to switch over to
the new method.

Special request to Peter to take a look at patch #2, and in particular,
whether synchronize_rcu_tasks() is sufficient to ensure that a module
providing the target of a static call can be unloaded safely.
 
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>

Ard Biesheuvel (7):
  crypto: crc-t10dif - turn library wrapper for shash into generic
    library
  crypto: lib/crc-t10dif - add static call support for optimized
    versions
  crypto: generic/crc-t10dif - expose both arch and generic shashes
  crypto: x86/crc-t10dif - convert to static call library API
  crypto: arm/crc-t10dif - convert to static call library API
  crypto: arm64/crc-t10dif - convert to static call API
  crypto: powerpc/crc-t10dif - convert to static call API

 arch/arm/crypto/Kconfig                     |   2 +-
 arch/arm/crypto/crct10dif-ce-glue.c         |  58 ++------
 arch/arm64/crypto/Kconfig                   |   3 +-
 arch/arm64/crypto/crct10dif-ce-glue.c       |  85 ++---------
 arch/powerpc/crypto/crct10dif-vpmsum_glue.c |  51 +------
 arch/x86/crypto/crct10dif-pclmul_glue.c     |  90 ++----------
 crypto/Kconfig                              |   7 +-
 crypto/Makefile                             |   2 +-
 crypto/crct10dif_common.c                   |  82 -----------
 crypto/crct10dif_generic.c                  | 100 +++++++++----
 include/linux/crc-t10dif.h                  |  21 ++-
 lib/Kconfig                                 |   2 -
 lib/crc-t10dif.c                            | 152 +++++++++-----------
 13 files changed, 204 insertions(+), 451 deletions(-)
 delete mode 100644 crypto/crct10dif_common.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] crypto: crc-t10dif - turn library wrapper for shash into generic library
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 2/7] crypto: lib/crc-t10dif - add static call support for optimized versions Ard Biesheuvel
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

As a first step towards moving CRC-T10DIF out of the crypto API, drop
the shash wrapping code from the library implementation, and move the
generic core CRC-T10DIF into it from the generic crypto SHASH driver.

In a future patch, the library interface will be augmented with the
facilities to register and unregister optimized implementations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/crypto/Kconfig   |   2 +-
 arch/arm64/crypto/Kconfig |   3 +-
 crypto/Kconfig            |   7 +-
 crypto/Makefile           |   2 +-
 crypto/crct10dif_common.c |  82 ------------
 lib/Kconfig               |   2 -
 lib/crc-t10dif.c          | 136 +++++++-------------
 7 files changed, 57 insertions(+), 177 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 2b575792363e..939de9ceed0f 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -138,8 +138,8 @@ config CRYPTO_GHASH_ARM_CE
 config CRYPTO_CRCT10DIF_ARM_CE
 	tristate "CRCT10DIF digest algorithm using PMULL instructions"
 	depends on KERNEL_MODE_NEON
-	depends on CRC_T10DIF
 	select CRYPTO_HASH
+	select CRC_T10DIF
 
 config CRYPTO_CRC32_ARM_CE
 	tristate "CRC32(C) digest algorithm using CRC and/or PMULL instructions"
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index b8eb0453123d..e5d2f989521f 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -62,8 +62,9 @@ config CRYPTO_GHASH_ARM64_CE
 
 config CRYPTO_CRCT10DIF_ARM64_CE
 	tristate "CRCT10DIF digest algorithm using PMULL instructions"
-	depends on KERNEL_MODE_NEON && CRC_T10DIF
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
+	select CRC_T10DIF
 
 config CRYPTO_AES_ARM64
 	tristate "AES core cipher using scalar instructions"
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 94f0fde06b94..3e8cf6c2215a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -700,6 +700,7 @@ config CRYPTO_BLAKE2S_X86
 config CRYPTO_CRCT10DIF
 	tristate "CRCT10DIF algorithm"
 	select CRYPTO_HASH
+	select CRC_T10DIF
 	help
 	  CRC T10 Data Integrity Field computation is being cast as
 	  a crypto transform.  This allows for faster crc t10 diff
@@ -707,8 +708,9 @@ config CRYPTO_CRCT10DIF
 
 config CRYPTO_CRCT10DIF_PCLMUL
 	tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
-	depends on X86 && 64BIT && CRC_T10DIF
+	depends on X86 && 64BIT
 	select CRYPTO_HASH
+	select CRC_T10DIF
 	help
 	  For x86_64 processors with SSE4.2 and PCLMULQDQ supported,
 	  CRC T10 DIF PCLMULQDQ computation can be hardware
@@ -718,8 +720,9 @@ config CRYPTO_CRCT10DIF_PCLMUL
 
 config CRYPTO_CRCT10DIF_VPMSUM
 	tristate "CRC32T10DIF powerpc64 hardware acceleration"
-	depends on PPC64 && ALTIVEC && CRC_T10DIF
+	depends on PPC64 && ALTIVEC
 	select CRYPTO_HASH
+	select CRC_T10DIF
 	help
 	  CRC10T10DIF algorithm implemented using vector polynomial
 	  multiply-sum (vpmsum) instructions, introduced in POWER8. Enable on
diff --git a/crypto/Makefile b/crypto/Makefile
index b279483fba50..69d06a28c8e6 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -148,7 +148,7 @@ obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
 obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c_generic.o
 obj-$(CONFIG_CRYPTO_CRC32) += crc32_generic.o
-obj-$(CONFIG_CRYPTO_CRCT10DIF) += crct10dif_common.o crct10dif_generic.o
+obj-$(CONFIG_CRYPTO_CRCT10DIF) += crct10dif_generic.o
 obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o authencesn.o
 obj-$(CONFIG_CRYPTO_LZO) += lzo.o lzo-rle.o
 obj-$(CONFIG_CRYPTO_LZ4) += lz4.o
diff --git a/crypto/crct10dif_common.c b/crypto/crct10dif_common.c
deleted file mode 100644
index b2fab366f518..000000000000
--- a/crypto/crct10dif_common.c
+++ /dev/null
@@ -1,82 +0,0 @@
-/*
- * Cryptographic API.
- *
- * T10 Data Integrity Field CRC16 Crypto Transform
- *
- * Copyright (c) 2007 Oracle Corporation.  All rights reserved.
- * Written by Martin K. Petersen <martin.petersen@oracle.com>
- * Copyright (C) 2013 Intel Corporation
- * Author: Tim Chen <tim.c.chen@linux.intel.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-#include <linux/crc-t10dif.h>
-#include <linux/module.h>
-#include <linux/kernel.h>
-
-/* Table generated using the following polynomium:
- * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
- * gt: 0x8bb7
- */
-static const __u16 t10_dif_crc_table[256] = {
-	0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
-	0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
-	0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
-	0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
-	0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
-	0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
-	0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
-	0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
-	0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
-	0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
-	0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
-	0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
-	0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
-	0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
-	0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
-	0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
-	0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
-	0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
-	0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
-	0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
-	0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
-	0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
-	0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
-	0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
-	0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
-	0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
-	0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
-	0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
-	0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
-	0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
-	0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
-	0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
-};
-
-__u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
-{
-	unsigned int i;
-
-	for (i = 0 ; i < len ; i++)
-		crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
-
-	return crc;
-}
-EXPORT_SYMBOL(crc_t10dif_generic);
-
-MODULE_DESCRIPTION("T10 DIF CRC calculation common code");
-MODULE_LICENSE("GPL");
diff --git a/lib/Kconfig b/lib/Kconfig
index 46806332a8cc..203a50674602 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -120,8 +120,6 @@ config CRC16
 
 config CRC_T10DIF
 	tristate "CRC calculation for the T10 Data Integrity Field"
-	select CRYPTO
-	select CRYPTO_CRCT10DIF
 	help
 	  This option is only needed if a module that's not in the
 	  kernel tree needs to calculate CRC checks for use with the
diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index 1ed2ed487097..cbb739f0a0d7 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -4,6 +4,8 @@
  *
  * Copyright (c) 2007 Oracle Corporation.  All rights reserved.
  * Written by Martin K. Petersen <martin.petersen@oracle.com>
+ * Copyright (C) 2013 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
  */
 
 #include <linux/types.h>
@@ -16,70 +18,59 @@
 #include <linux/static_key.h>
 #include <linux/notifier.h>
 
-static struct crypto_shash __rcu *crct10dif_tfm;
-static DEFINE_STATIC_KEY_TRUE(crct10dif_fallback);
-static DEFINE_MUTEX(crc_t10dif_mutex);
-static struct work_struct crct10dif_rehash_work;
-
-static int crc_t10dif_notify(struct notifier_block *self, unsigned long val, void *data)
-{
-	struct crypto_alg *alg = data;
-
-	if (val != CRYPTO_MSG_ALG_LOADED ||
-	    strcmp(alg->cra_name, CRC_T10DIF_STRING))
-		return NOTIFY_DONE;
-
-	schedule_work(&crct10dif_rehash_work);
-	return NOTIFY_OK;
-}
+/* Table generated using the following polynomium:
+ * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
+ * gt: 0x8bb7
+ */
+static const __u16 __cacheline_aligned t10_dif_crc_table[256] = {
+	0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+	0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+	0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+	0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+	0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+	0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+	0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+	0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+	0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+	0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+	0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+	0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+	0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+	0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+	0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+	0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+	0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+	0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+	0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+	0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+	0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+	0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+	0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+	0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+	0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+	0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+	0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+	0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+	0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+	0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+	0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+	0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
 
-static void crc_t10dif_rehash(struct work_struct *work)
+__u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
 {
-	struct crypto_shash *new, *old;
+	unsigned int i;
 
-	mutex_lock(&crc_t10dif_mutex);
-	old = rcu_dereference_protected(crct10dif_tfm,
-					lockdep_is_held(&crc_t10dif_mutex));
-	new = crypto_alloc_shash(CRC_T10DIF_STRING, 0, 0);
-	if (IS_ERR(new)) {
-		mutex_unlock(&crc_t10dif_mutex);
-		return;
-	}
-	rcu_assign_pointer(crct10dif_tfm, new);
-	mutex_unlock(&crc_t10dif_mutex);
+	for (i = 0 ; i < len ; i++)
+		crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
 
-	if (old) {
-		synchronize_rcu();
-		crypto_free_shash(old);
-	} else {
-		static_branch_disable(&crct10dif_fallback);
-	}
+	return crc;
 }
-
-static struct notifier_block crc_t10dif_nb = {
-	.notifier_call = crc_t10dif_notify,
-};
+EXPORT_SYMBOL(crc_t10dif_generic);
 
 __u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
 {
-	struct {
-		struct shash_desc shash;
-		__u16 crc;
-	} desc;
-	int err;
-
-	if (static_branch_unlikely(&crct10dif_fallback))
-		return crc_t10dif_generic(crc, buffer, len);
-
-	rcu_read_lock();
-	desc.shash.tfm = rcu_dereference(crct10dif_tfm);
-	desc.crc = crc;
-	err = crypto_shash_update(&desc.shash, buffer, len);
-	rcu_read_unlock();
-
-	BUG_ON(err);
-
-	return desc.crc;
+	return crc_t10dif_generic(crc, buffer, len);
 }
 EXPORT_SYMBOL(crc_t10dif_update);
 
@@ -89,43 +80,12 @@ __u16 crc_t10dif(const unsigned char *buffer, size_t len)
 }
 EXPORT_SYMBOL(crc_t10dif);
 
-static int __init crc_t10dif_mod_init(void)
-{
-	INIT_WORK(&crct10dif_rehash_work, crc_t10dif_rehash);
-	crypto_register_notifier(&crc_t10dif_nb);
-	crc_t10dif_rehash(&crct10dif_rehash_work);
-	return 0;
-}
-
-static void __exit crc_t10dif_mod_fini(void)
-{
-	crypto_unregister_notifier(&crc_t10dif_nb);
-	cancel_work_sync(&crct10dif_rehash_work);
-	crypto_free_shash(rcu_dereference_protected(crct10dif_tfm, 1));
-}
-
-module_init(crc_t10dif_mod_init);
-module_exit(crc_t10dif_mod_fini);
-
 static int crc_t10dif_transform_show(char *buffer, const struct kernel_param *kp)
 {
-	struct crypto_shash *tfm;
-	int len;
-
-	if (static_branch_unlikely(&crct10dif_fallback))
-		return sprintf(buffer, "fallback\n");
-
-	rcu_read_lock();
-	tfm = rcu_dereference(crct10dif_tfm);
-	len = snprintf(buffer, PAGE_SIZE, "%s\n",
-		       crypto_shash_driver_name(tfm));
-	rcu_read_unlock();
-
-	return len;
+	return sprintf(buffer, "fallback\n");
 }
 
 module_param_call(transform, NULL, crc_t10dif_transform_show, NULL, 0444);
 
 MODULE_DESCRIPTION("T10 DIF CRC calculation (library API)");
 MODULE_LICENSE("GPL");
-MODULE_SOFTDEP("pre: crct10dif");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] crypto: lib/crc-t10dif - add static call support for optimized versions
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 1/7] crypto: crc-t10dif - turn library wrapper for shash into generic library Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 3/7] crypto: generic/crc-t10dif - expose both arch and generic shashes Ard Biesheuvel
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

Wire up the new static call facility to the CRC-T10DIF library code, so
that optimized implementations can be swapped in easily, without having
to rely on the complexity of the crypto API shash infrastructure.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/crc-t10dif.h | 21 ++++++++++++--
 lib/crc-t10dif.c           | 30 +++++++++++++++-----
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/include/linux/crc-t10dif.h b/include/linux/crc-t10dif.h
index 6bb0c0bf357b..ad8b0a358680 100644
--- a/include/linux/crc-t10dif.h
+++ b/include/linux/crc-t10dif.h
@@ -3,6 +3,7 @@
 #define _LINUX_CRC_T10DIF_H
 
 #include <linux/types.h>
+#include <linux/static_call.h>
 
 #define CRC_T10DIF_DIGEST_SIZE 2
 #define CRC_T10DIF_BLOCK_SIZE 1
@@ -10,7 +11,23 @@
 
 extern __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
 				size_t len);
-extern __u16 crc_t10dif(unsigned char const *, size_t);
-extern __u16 crc_t10dif_update(__u16 crc, unsigned char const *, size_t);
+
+DECLARE_STATIC_CALL(crc_t10dif_arch, crc_t10dif_generic);
+
+static inline __u16 crc_t10dif_update(__u16 crc, unsigned char const *buffer,
+				      size_t len)
+{
+	return static_call(crc_t10dif_arch)(crc, buffer, len);
+}
+
+static inline __u16 crc_t10dif(unsigned char const *buffer, size_t len)
+{
+	return crc_t10dif_update(0, buffer, len);
+}
+
+int crc_t10dif_register(__u16 (*)(__u16, const unsigned char *, size_t),
+			const char *name);
+
+int crc_t10dif_unregister(void);
 
 #endif
diff --git a/lib/crc-t10dif.c b/lib/crc-t10dif.c
index cbb739f0a0d7..b657cdfb118a 100644
--- a/lib/crc-t10dif.c
+++ b/lib/crc-t10dif.c
@@ -18,6 +18,11 @@
 #include <linux/static_key.h>
 #include <linux/notifier.h>
 
+DEFINE_STATIC_CALL(crc_t10dif_arch, crc_t10dif_generic);
+EXPORT_STATIC_CALL(crc_t10dif_arch);
+
+static char const *crc_t10dif_arch_name;
+
 /* Table generated using the following polynomium:
  * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
  * gt: 0x8bb7
@@ -68,21 +73,32 @@ __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len)
 }
 EXPORT_SYMBOL(crc_t10dif_generic);
 
-__u16 crc_t10dif_update(__u16 crc, const unsigned char *buffer, size_t len)
+int crc_t10dif_register(__u16 (*func)(__u16, const unsigned char *, size_t),
+			const char *name)
 {
-	return crc_t10dif_generic(crc, buffer, len);
+	if (!name || cmpxchg(&crc_t10dif_arch_name, NULL, name) != NULL)
+		return -EBUSY;
+
+	static_call_update(crc_t10dif_arch, func);
+	return 0;
 }
-EXPORT_SYMBOL(crc_t10dif_update);
+EXPORT_SYMBOL_NS(crc_t10dif_register, CRYPTO_INTERNAL);
 
-__u16 crc_t10dif(const unsigned char *buffer, size_t len)
+int crc_t10dif_unregister(void)
 {
-	return crc_t10dif_update(0, buffer, len);
+	// revert to generic implementation
+	static_call_update(crc_t10dif_arch, crc_t10dif_generic);
+	crc_t10dif_arch_name = NULL;
+
+	// wait for all potential callers of the unregistered routine to finish
+	synchronize_rcu_tasks();
+	return 0;
 }
-EXPORT_SYMBOL(crc_t10dif);
+EXPORT_SYMBOL_NS(crc_t10dif_unregister, CRYPTO_INTERNAL);
 
 static int crc_t10dif_transform_show(char *buffer, const struct kernel_param *kp)
 {
-	return sprintf(buffer, "fallback\n");
+	return sprintf(buffer, "%s\n", crc_t10dif_arch_name ?: "fallback");
 }
 
 module_param_call(transform, NULL, crc_t10dif_transform_show, NULL, 0444);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] crypto: generic/crc-t10dif - expose both arch and generic shashes
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 1/7] crypto: crc-t10dif - turn library wrapper for shash into generic library Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 2/7] crypto: lib/crc-t10dif - add static call support for optimized versions Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API Ard Biesheuvel
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

In order to retain the ability to check accelerated implementations of
CRC-T10DIF against the generic C implementation, expose both via the
generic crypto API shash driver. This relies on the arch specific version
to be registered beforehand - this will generally be the case, given that
the modules in question are autoloaded via CPU feature matching, but
let's use a module soft-dependency as well to trigger a load of such
modules if they exist on the system.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 crypto/crct10dif_generic.c | 100 ++++++++++++++------
 1 file changed, 72 insertions(+), 28 deletions(-)

diff --git a/crypto/crct10dif_generic.c b/crypto/crct10dif_generic.c
index e843982073bb..7878eda7513e 100644
--- a/crypto/crct10dif_generic.c
+++ b/crypto/crct10dif_generic.c
@@ -48,8 +48,8 @@ static int chksum_init(struct shash_desc *desc)
 	return 0;
 }
 
-static int chksum_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int length)
+static int chksum_generic_update(struct shash_desc *desc, const u8 *data,
+				 unsigned int length)
 {
 	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
 
@@ -65,54 +65,97 @@ static int chksum_final(struct shash_desc *desc, u8 *out)
 	return 0;
 }
 
-static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
+static int __chksum_generic_finup(__u16 crc, const u8 *data, unsigned int len,
+				  u8 *out)
 {
 	*(__u16 *)out = crc_t10dif_generic(crc, data, len);
 	return 0;
 }
 
-static int chksum_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
+static int chksum_generic_finup(struct shash_desc *desc, const u8 *data,
+				unsigned int len, u8 *out)
 {
 	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
 
-	return __chksum_finup(ctx->crc, data, len, out);
+	return __chksum_generic_finup(ctx->crc, data, len, out);
 }
 
-static int chksum_digest(struct shash_desc *desc, const u8 *data,
-			 unsigned int length, u8 *out)
+static int chksum_generic_digest(struct shash_desc *desc, const u8 *data,
+				 unsigned int length, u8 *out)
 {
-	return __chksum_finup(0, data, length, out);
+	return __chksum_generic_finup(0, data, length, out);
 }
 
-static struct shash_alg alg = {
-	.digestsize		=	CRC_T10DIF_DIGEST_SIZE,
-	.init		=	chksum_init,
-	.update		=	chksum_update,
-	.final		=	chksum_final,
-	.finup		=	chksum_finup,
-	.digest		=	chksum_digest,
-	.descsize		=	sizeof(struct chksum_desc_ctx),
-	.base			=	{
-		.cra_name		=	"crct10dif",
-		.cra_driver_name	=	"crct10dif-generic",
-		.cra_priority		=	100,
-		.cra_blocksize		=	CRC_T10DIF_BLOCK_SIZE,
-		.cra_module		=	THIS_MODULE,
-	}
-};
+static int chksum_arch_update(struct shash_desc *desc, const u8 *data,
+			      unsigned int length)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	ctx->crc = crc_t10dif_update(ctx->crc, data, length);
+	return 0;
+}
+
+static int __chksum_arch_finup(__u16 crc, const u8 *data, unsigned int len,
+			       u8 *out)
+{
+	*(__u16 *)out = crc_t10dif_update(crc, data, len);
+	return 0;
+}
+
+static int chksum_arch_finup(struct shash_desc *desc, const u8 *data,
+			     unsigned int len, u8 *out)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	return __chksum_arch_finup(ctx->crc, data, len, out);
+}
+
+static int chksum_arch_digest(struct shash_desc *desc, const u8 *data,
+			      unsigned int length, u8 *out)
+{
+	return __chksum_arch_finup(0, data, length, out);
+}
+
+static struct shash_alg algs[] = {{
+	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
+	.init			= chksum_init,
+	.update			= chksum_generic_update,
+	.final			= chksum_final,
+	.finup			= chksum_generic_finup,
+	.digest			= chksum_generic_digest,
+	.descsize		= sizeof(struct chksum_desc_ctx),
+
+	.base.cra_name		= "crct10dif",
+	.base.cra_driver_name	= "crct10dif-generic",
+	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
+	.init			= chksum_init,
+	.update			= chksum_arch_update,
+	.final			= chksum_final,
+	.finup			= chksum_arch_finup,
+	.digest			= chksum_arch_digest,
+	.descsize		= sizeof(struct chksum_desc_ctx),
+
+	.base.cra_name		= "crct10dif",
+	.base.cra_driver_name	= "crct10dif-arch",
+	.base.cra_priority	= 100,
+	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+}};
 
 static int __init crct10dif_mod_init(void)
 {
-	return crypto_register_shash(&alg);
+	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
 }
 
 static void __exit crct10dif_mod_fini(void)
 {
-	crypto_unregister_shash(&alg);
+	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
 }
 
-subsys_initcall(crct10dif_mod_init);
+module_init(crct10dif_mod_init);
 module_exit(crct10dif_mod_fini);
 
 MODULE_AUTHOR("Tim Chen <tim.c.chen@linux.intel.com>");
@@ -120,3 +163,4 @@ MODULE_DESCRIPTION("T10 DIF CRC calculation.");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CRYPTO("crct10dif");
 MODULE_ALIAS_CRYPTO("crct10dif-generic");
+MODULE_SOFTDEP("pre: crct10dif-arch");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2021-01-11 16:52 ` [PATCH 3/7] crypto: generic/crc-t10dif - expose both arch and generic shashes Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-12  0:01     ` kernel test robot
  2021-01-11 16:52 ` [PATCH 5/7] crypto: arm/crc-t10dif " Ard Biesheuvel
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

Get rid of the shash boilerplate, and register the accelerated x86
version of the CRC-T10DIF algorithm with the library interface instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/crypto/crct10dif-pclmul_glue.c | 90 +++-----------------
 1 file changed, 10 insertions(+), 80 deletions(-)

diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c
index 71291d5af9f4..6f06499d96a3 100644
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c
+++ b/arch/x86/crypto/crct10dif-pclmul_glue.c
@@ -25,7 +25,6 @@
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/crc-t10dif.h>
-#include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
 #include <linux/init.h>
 #include <linux/string.h>
@@ -36,82 +35,17 @@
 
 asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len);
 
-struct chksum_desc_ctx {
-	__u16 crc;
-};
-
-static int chksum_init(struct shash_desc *desc)
-{
-	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	ctx->crc = 0;
-
-	return 0;
-}
-
-static int chksum_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int length)
-{
-	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	if (length >= 16 && crypto_simd_usable()) {
-		kernel_fpu_begin();
-		ctx->crc = crc_t10dif_pcl(ctx->crc, data, length);
-		kernel_fpu_end();
-	} else
-		ctx->crc = crc_t10dif_generic(ctx->crc, data, length);
-	return 0;
-}
-
-static int chksum_final(struct shash_desc *desc, u8 *out)
-{
-	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	*(__u16 *)out = ctx->crc;
-	return 0;
-}
-
-static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out)
+static u16 crc_t10dif_x86(u16 crc, const u8 *data, size_t len)
 {
 	if (len >= 16 && crypto_simd_usable()) {
 		kernel_fpu_begin();
-		*(__u16 *)out = crc_t10dif_pcl(crc, data, len);
+		crc = crc_t10dif_pcl(crc, data, len);
 		kernel_fpu_end();
-	} else
-		*(__u16 *)out = crc_t10dif_generic(crc, data, len);
-	return 0;
-}
-
-static int chksum_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
-{
-	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
-
-	return __chksum_finup(ctx->crc, data, len, out);
-}
-
-static int chksum_digest(struct shash_desc *desc, const u8 *data,
-			 unsigned int length, u8 *out)
-{
-	return __chksum_finup(0, data, length, out);
-}
-
-static struct shash_alg alg = {
-	.digestsize		=	CRC_T10DIF_DIGEST_SIZE,
-	.init		=	chksum_init,
-	.update		=	chksum_update,
-	.final		=	chksum_final,
-	.finup		=	chksum_finup,
-	.digest		=	chksum_digest,
-	.descsize		=	sizeof(struct chksum_desc_ctx),
-	.base			=	{
-		.cra_name		=	"crct10dif",
-		.cra_driver_name	=	"crct10dif-pclmul",
-		.cra_priority		=	200,
-		.cra_blocksize		=	CRC_T10DIF_BLOCK_SIZE,
-		.cra_module		=	THIS_MODULE,
+	} else {
+		crc = crc_t10dif_generic(crc, data, len);
 	}
-};
+	return crc;
+}
 
 static const struct x86_cpu_id crct10dif_cpu_id[] = {
 	X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
@@ -121,15 +55,12 @@ MODULE_DEVICE_TABLE(x86cpu, crct10dif_cpu_id);
 
 static int __init crct10dif_intel_mod_init(void)
 {
-	if (!x86_match_cpu(crct10dif_cpu_id))
-		return -ENODEV;
-
-	return crypto_register_shash(&alg);
+	return crc_t10dif_register(crc_t10dif_x86, "crct10dif-pclmul");
 }
 
 static void __exit crct10dif_intel_mod_fini(void)
 {
-	crypto_unregister_shash(&alg);
+	crc_t10dif_unregister();
 }
 
 module_init(crct10dif_intel_mod_init);
@@ -138,6 +69,5 @@ module_exit(crct10dif_intel_mod_fini);
 MODULE_AUTHOR("Tim Chen <tim.c.chen@linux.intel.com>");
 MODULE_DESCRIPTION("T10 DIF CRC calculation accelerated with PCLMULQDQ.");
 MODULE_LICENSE("GPL");
-
-MODULE_ALIAS_CRYPTO("crct10dif");
-MODULE_ALIAS_CRYPTO("crct10dif-pclmul");
+MODULE_ALIAS("crct10dif-arch");
+MODULE_IMPORT_NS(CRYPTO_INTERNAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] crypto: arm/crc-t10dif - convert to static call library API
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2021-01-11 16:52 ` [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 6/7] crypto: arm64/crc-t10dif - convert to static call API Ard Biesheuvel
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

Get rid of the shash boilerplate, and register the accelerated ARM
version of the CRC-T10DIF algorithm with the library interface instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/crypto/crct10dif-ce-glue.c | 58 ++++----------------
 1 file changed, 11 insertions(+), 47 deletions(-)

diff --git a/arch/arm/crypto/crct10dif-ce-glue.c b/arch/arm/crypto/crct10dif-ce-glue.c
index e9191a8c87b9..ce21f958fd49 100644
--- a/arch/arm/crypto/crct10dif-ce-glue.c
+++ b/arch/arm/crypto/crct10dif-ce-glue.c
@@ -5,13 +5,13 @@
  * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
  */
 
+#include <linux/cpufeature.h>
 #include <linux/crc-t10dif.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/string.h>
 
-#include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
 
 #include <asm/neon.h>
@@ -21,68 +21,32 @@
 
 asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 *buf, size_t len);
 
-static int crct10dif_init(struct shash_desc *desc)
+static u16 crc_t10dif_arm(u16 crc, const u8 *data, size_t len)
 {
-	u16 *crc = shash_desc_ctx(desc);
-
-	*crc = 0;
-	return 0;
-}
-
-static int crct10dif_update(struct shash_desc *desc, const u8 *data,
-			    unsigned int length)
-{
-	u16 *crc = shash_desc_ctx(desc);
-
-	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && crypto_simd_usable()) {
+	if (len >= CRC_T10DIF_PMULL_CHUNK_SIZE && crypto_simd_usable()) {
 		kernel_neon_begin();
-		*crc = crc_t10dif_pmull(*crc, data, length);
+		crc = crc_t10dif_pmull(crc, data, len);
 		kernel_neon_end();
 	} else {
-		*crc = crc_t10dif_generic(*crc, data, length);
+		crc = crc_t10dif_generic(crc, data, len);
 	}
-
-	return 0;
+	return crc;
 }
 
-static int crct10dif_final(struct shash_desc *desc, u8 *out)
-{
-	u16 *crc = shash_desc_ctx(desc);
-
-	*(u16 *)out = *crc;
-	return 0;
-}
-
-static struct shash_alg crc_t10dif_alg = {
-	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
-	.init			= crct10dif_init,
-	.update			= crct10dif_update,
-	.final			= crct10dif_final,
-	.descsize		= CRC_T10DIF_DIGEST_SIZE,
-
-	.base.cra_name		= "crct10dif",
-	.base.cra_driver_name	= "crct10dif-arm-ce",
-	.base.cra_priority	= 200,
-	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-};
-
 static int __init crc_t10dif_mod_init(void)
 {
-	if (!(elf_hwcap2 & HWCAP2_PMULL))
-		return -ENODEV;
-
-	return crypto_register_shash(&crc_t10dif_alg);
+	return crc_t10dif_register(crc_t10dif_arm, "crct10dif-arm-ce");
 }
 
 static void __exit crc_t10dif_mod_exit(void)
 {
-	crypto_unregister_shash(&crc_t10dif_alg);
+	crc_t10dif_unregister();
 }
 
-module_init(crc_t10dif_mod_init);
+module_cpu_feature_match(PMULL, crc_t10dif_mod_init);
 module_exit(crc_t10dif_mod_exit);
 
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
-MODULE_ALIAS_CRYPTO("crct10dif");
+MODULE_ALIAS("crct10dif-arch");
+MODULE_IMPORT_NS(CRYPTO_INTERNAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] crypto: arm64/crc-t10dif - convert to static call API
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2021-01-11 16:52 ` [PATCH 5/7] crypto: arm/crc-t10dif " Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-11 16:52 ` [PATCH 7/7] crypto: powerpc/crc-t10dif " Ard Biesheuvel
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

Get rid of the shash boilerplate, and register the accelerated arm64
version of the CRC-T10DIF algorithm with the library interface instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/crct10dif-ce-glue.c | 85 ++++----------------
 1 file changed, 15 insertions(+), 70 deletions(-)

diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
index ccc3f6067742..70c730866d4a 100644
--- a/arch/arm64/crypto/crct10dif-ce-glue.c
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -12,7 +12,6 @@
 #include <linux/module.h>
 #include <linux/string.h>
 
-#include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
 
 #include <asm/neon.h>
@@ -23,97 +22,43 @@
 asmlinkage u16 crc_t10dif_pmull_p8(u16 init_crc, const u8 *buf, size_t len);
 asmlinkage u16 crc_t10dif_pmull_p64(u16 init_crc, const u8 *buf, size_t len);
 
-static int crct10dif_init(struct shash_desc *desc)
+static u16 crct10dif_update_pmull_p8(u16 crc, const u8 *data, size_t length)
 {
-	u16 *crc = shash_desc_ctx(desc);
-
-	*crc = 0;
-	return 0;
-}
-
-static int crct10dif_update_pmull_p8(struct shash_desc *desc, const u8 *data,
-			    unsigned int length)
-{
-	u16 *crc = shash_desc_ctx(desc);
-
 	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && crypto_simd_usable()) {
 		kernel_neon_begin();
-		*crc = crc_t10dif_pmull_p8(*crc, data, length);
+		crc = crc_t10dif_pmull_p8(crc, data, length);
 		kernel_neon_end();
 	} else {
-		*crc = crc_t10dif_generic(*crc, data, length);
+		crc = crc_t10dif_generic(crc, data, length);
 	}
-
-	return 0;
+	return crc;
 }
 
-static int crct10dif_update_pmull_p64(struct shash_desc *desc, const u8 *data,
-			    unsigned int length)
+static u16 crct10dif_update_pmull_p64(u16 crc, const u8 *data, size_t length)
 {
-	u16 *crc = shash_desc_ctx(desc);
-
 	if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && crypto_simd_usable()) {
 		kernel_neon_begin();
-		*crc = crc_t10dif_pmull_p64(*crc, data, length);
+		crc = crc_t10dif_pmull_p64(crc, data, length);
 		kernel_neon_end();
 	} else {
-		*crc = crc_t10dif_generic(*crc, data, length);
+		crc = crc_t10dif_generic(crc, data, length);
 	}
-
-	return 0;
+	return crc;
 }
 
-static int crct10dif_final(struct shash_desc *desc, u8 *out)
-{
-	u16 *crc = shash_desc_ctx(desc);
-
-	*(u16 *)out = *crc;
-	return 0;
-}
-
-static struct shash_alg crc_t10dif_alg[] = {{
-	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
-	.init			= crct10dif_init,
-	.update			= crct10dif_update_pmull_p8,
-	.final			= crct10dif_final,
-	.descsize		= CRC_T10DIF_DIGEST_SIZE,
-
-	.base.cra_name		= "crct10dif",
-	.base.cra_driver_name	= "crct10dif-arm64-neon",
-	.base.cra_priority	= 100,
-	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-}, {
-	.digestsize		= CRC_T10DIF_DIGEST_SIZE,
-	.init			= crct10dif_init,
-	.update			= crct10dif_update_pmull_p64,
-	.final			= crct10dif_final,
-	.descsize		= CRC_T10DIF_DIGEST_SIZE,
-
-	.base.cra_name		= "crct10dif",
-	.base.cra_driver_name	= "crct10dif-arm64-ce",
-	.base.cra_priority	= 200,
-	.base.cra_blocksize	= CRC_T10DIF_BLOCK_SIZE,
-	.base.cra_module	= THIS_MODULE,
-}};
-
 static int __init crc_t10dif_mod_init(void)
 {
 	if (cpu_have_named_feature(PMULL))
-		return crypto_register_shashes(crc_t10dif_alg,
-					       ARRAY_SIZE(crc_t10dif_alg));
+		return crc_t10dif_register(crct10dif_update_pmull_p64,
+					   "crct10dif-arm64-ce");
 	else
-		/* only register the first array element */
-		return crypto_register_shash(crc_t10dif_alg);
+		return crc_t10dif_register(crct10dif_update_pmull_p8,
+					   "crct10dif-arm64-neon");
 }
 
 static void __exit crc_t10dif_mod_exit(void)
 {
-	if (cpu_have_named_feature(PMULL))
-		crypto_unregister_shashes(crc_t10dif_alg,
-					  ARRAY_SIZE(crc_t10dif_alg));
-	else
-		crypto_unregister_shash(crc_t10dif_alg);
+	crc_t10dif_unregister();
 }
 
 module_cpu_feature_match(ASIMD, crc_t10dif_mod_init);
@@ -121,5 +66,5 @@ module_exit(crc_t10dif_mod_exit);
 
 MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
 MODULE_LICENSE("GPL v2");
-MODULE_ALIAS_CRYPTO("crct10dif");
-MODULE_ALIAS_CRYPTO("crct10dif-arm64-ce");
+MODULE_ALIAS("crct10dif-arch");
+MODULE_IMPORT_NS(CRYPTO_INTERNAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] crypto: powerpc/crc-t10dif - convert to static call API
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2021-01-11 16:52 ` [PATCH 6/7] crypto: arm64/crc-t10dif - convert to static call API Ard Biesheuvel
@ 2021-01-11 16:52 ` Ard Biesheuvel
  2021-01-11 17:27 ` [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
  2021-01-11 21:05 ` Eric Biggers
  8 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 16:52 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, Ard Biesheuvel, Martin K. Petersen, Eric Biggers,
	Peter Zijlstra

Get rid of the shash boilerplate, and register the accelerated PowerPC
version of the CRC-T10DIF algorithm with the library interface instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/powerpc/crypto/crct10dif-vpmsum_glue.c | 51 ++------------------
 1 file changed, 4 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/crypto/crct10dif-vpmsum_glue.c b/arch/powerpc/crypto/crct10dif-vpmsum_glue.c
index 1dc8b6915178..26c7da181018 100644
--- a/arch/powerpc/crypto/crct10dif-vpmsum_glue.c
+++ b/arch/powerpc/crypto/crct10dif-vpmsum_glue.c
@@ -7,7 +7,6 @@
  */
 
 #include <linux/crc-t10dif.h>
-#include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
 #include <linux/init.h>
 #include <linux/module.h>
@@ -61,59 +60,17 @@ static u16 crct10dif_vpmsum(u16 crci, unsigned char const *p, size_t len)
 	return crc & 0xffff;
 }
 
-static int crct10dif_vpmsum_init(struct shash_desc *desc)
-{
-	u16 *crc = shash_desc_ctx(desc);
-
-	*crc = 0;
-	return 0;
-}
-
-static int crct10dif_vpmsum_update(struct shash_desc *desc, const u8 *data,
-			    unsigned int length)
-{
-	u16 *crc = shash_desc_ctx(desc);
-
-	*crc = crct10dif_vpmsum(*crc, data, length);
-
-	return 0;
-}
-
-
-static int crct10dif_vpmsum_final(struct shash_desc *desc, u8 *out)
-{
-	u16 *crcp = shash_desc_ctx(desc);
-
-	*(u16 *)out = *crcp;
-	return 0;
-}
-
-static struct shash_alg alg = {
-	.init		= crct10dif_vpmsum_init,
-	.update		= crct10dif_vpmsum_update,
-	.final		= crct10dif_vpmsum_final,
-	.descsize	= CRC_T10DIF_DIGEST_SIZE,
-	.digestsize	= CRC_T10DIF_DIGEST_SIZE,
-	.base		= {
-		.cra_name		= "crct10dif",
-		.cra_driver_name	= "crct10dif-vpmsum",
-		.cra_priority		= 200,
-		.cra_blocksize		= CRC_T10DIF_BLOCK_SIZE,
-		.cra_module		= THIS_MODULE,
-	}
-};
-
 static int __init crct10dif_vpmsum_mod_init(void)
 {
 	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
 		return -ENODEV;
 
-	return crypto_register_shash(&alg);
+	return crc_t10dif_register(crct10dif_vpmsum, "crct10dif-vpmsum");
 }
 
 static void __exit crct10dif_vpmsum_mod_fini(void)
 {
-	crypto_unregister_shash(&alg);
+	crc_t10dif_unregister();
 }
 
 module_cpu_feature_match(PPC_MODULE_FEATURE_VEC_CRYPTO, crct10dif_vpmsum_mod_init);
@@ -122,5 +79,5 @@ module_exit(crct10dif_vpmsum_mod_fini);
 MODULE_AUTHOR("Daniel Axtens <dja@axtens.net>");
 MODULE_DESCRIPTION("CRCT10DIF using vector polynomial multiply-sum instructions");
 MODULE_LICENSE("GPL");
-MODULE_ALIAS_CRYPTO("crct10dif");
-MODULE_ALIAS_CRYPTO("crct10dif-vpmsum");
+MODULE_ALIAS("crct10dif-arch");
+MODULE_IMPORT_NS(CRYPTO_INTERNAL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2021-01-11 16:52 ` [PATCH 7/7] crypto: powerpc/crc-t10dif " Ard Biesheuvel
@ 2021-01-11 17:27 ` Ard Biesheuvel
  2021-01-11 18:36   ` Ard Biesheuvel
  2021-01-11 21:05 ` Eric Biggers
  8 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 17:27 UTC (permalink / raw)
  To: Linux Crypto Mailing List
  Cc: Herbert Xu, Martin K. Petersen, Eric Biggers, Peter Zijlstra

On Mon, 11 Jan 2021 at 17:52, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> CRC-T10DIF is a very poor match for the crypto API:
> - every user in the kernel calls it via a library wrapper around the
>   shash API, so all callers share a single instance of the transform
> - each architecture provides at most a single optimized implementation,
>   based on SIMD instructions for carryless multiplication
>
> In other words, none of the flexibility it provides is put to good use,
> and so it is better to get rid of this complexity, and expose the optimized
> implementations via static calls instead. This removes a substantial chunk
> of code, and also gets rid of any indirect calls on architectures that
> obsess about those (x86)
>
> If this approach is deemed suitable, there are other places where we might
> consider adopting it: CRC32 and CRC32(C).
>
> Patch #1 does some preparatory refactoring and removes the library wrapper
> around the shash transform.
>
> Patch #2 introduces the actual static calls, along with the registration
> routines to update the crc-t10dif implementation at runtime.
>
> Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes
> between the optimized library code and the generic library code.
>
> Patches #4 to #7 update the various arch implementations to switch over to
> the new method.
>
> Special request to Peter to take a look at patch #2, and in particular,
> whether synchronize_rcu_tasks() is sufficient to ensure that a module
> providing the target of a static call can be unloaded safely.
>

It seems I may have managed to confuse myself slightly here: without
an upper bound on the size of the input of the crc_t10dif() routine, I
suppose we can never assume that all its callers have finished.

Insights welcome ...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 17:27 ` [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
@ 2021-01-11 18:36   ` Ard Biesheuvel
  2021-01-11 20:56     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 18:36 UTC (permalink / raw)
  To: Linux Crypto Mailing List
  Cc: Herbert Xu, Martin K. Petersen, Eric Biggers, Peter Zijlstra

On Mon, 11 Jan 2021 at 18:27, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 11 Jan 2021 at 17:52, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > CRC-T10DIF is a very poor match for the crypto API:
> > - every user in the kernel calls it via a library wrapper around the
> >   shash API, so all callers share a single instance of the transform
> > - each architecture provides at most a single optimized implementation,
> >   based on SIMD instructions for carryless multiplication
> >
> > In other words, none of the flexibility it provides is put to good use,
> > and so it is better to get rid of this complexity, and expose the optimized
> > implementations via static calls instead. This removes a substantial chunk
> > of code, and also gets rid of any indirect calls on architectures that
> > obsess about those (x86)
> >
> > If this approach is deemed suitable, there are other places where we might
> > consider adopting it: CRC32 and CRC32(C).
> >
> > Patch #1 does some preparatory refactoring and removes the library wrapper
> > around the shash transform.
> >
> > Patch #2 introduces the actual static calls, along with the registration
> > routines to update the crc-t10dif implementation at runtime.
> >
> > Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes
> > between the optimized library code and the generic library code.
> >
> > Patches #4 to #7 update the various arch implementations to switch over to
> > the new method.
> >
> > Special request to Peter to take a look at patch #2, and in particular,
> > whether synchronize_rcu_tasks() is sufficient to ensure that a module
> > providing the target of a static call can be unloaded safely.
> >
>
> It seems I may have managed to confuse myself slightly here: without
> an upper bound on the size of the input of the crc_t10dif() routine, I
> suppose we can never assume that all its callers have finished.
>

Replying to self again - apologies.

I think this is actually correct after all: synchronize_rcu_tasks()
guarantees that all tasks have passed through a 'safe state', i.e.,
voluntary schedule(), return to userland, etc, which guarantees that
no task could be executing the old static call target after
synchronize_rcu_tasks() returns.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 18:36   ` Ard Biesheuvel
@ 2021-01-11 20:56     ` Peter Zijlstra
  2021-01-11 21:01       ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2021-01-11 20:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Crypto Mailing List, Herbert Xu, Martin K. Petersen, Eric Biggers

On Mon, Jan 11, 2021 at 07:36:20PM +0100, Ard Biesheuvel wrote:
> On Mon, 11 Jan 2021 at 18:27, Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Mon, 11 Jan 2021 at 17:52, Ard Biesheuvel <ardb@kernel.org> wrote:

> > > Special request to Peter to take a look at patch #2, and in particular,
> > > whether synchronize_rcu_tasks() is sufficient to ensure that a module
> > > providing the target of a static call can be unloaded safely.
> >
> > It seems I may have managed to confuse myself slightly here: without
> > an upper bound on the size of the input of the crc_t10dif() routine, I
> > suppose we can never assume that all its callers have finished.
> >
> 
> Replying to self again - apologies.
> 
> I think this is actually correct after all: synchronize_rcu_tasks()
> guarantees that all tasks have passed through a 'safe state', i.e.,
> voluntary schedule(), return to userland, etc, which guarantees that
> no task could be executing the old static call target after
> synchronize_rcu_tasks() returns.

Right, I think it should work.

My initial question was why you'd want to support the unreg at all.
AFAICT these implementations are tiny, why bother having them as a
module, or if you insist having them as a module, why allowing removal?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 20:56     ` Peter Zijlstra
@ 2021-01-11 21:01       ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 21:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Crypto Mailing List, Herbert Xu, Martin K. Petersen, Eric Biggers

On Mon, 11 Jan 2021 at 21:56, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Jan 11, 2021 at 07:36:20PM +0100, Ard Biesheuvel wrote:
> > On Mon, 11 Jan 2021 at 18:27, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > On Mon, 11 Jan 2021 at 17:52, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > > > Special request to Peter to take a look at patch #2, and in particular,
> > > > whether synchronize_rcu_tasks() is sufficient to ensure that a module
> > > > providing the target of a static call can be unloaded safely.
> > >
> > > It seems I may have managed to confuse myself slightly here: without
> > > an upper bound on the size of the input of the crc_t10dif() routine, I
> > > suppose we can never assume that all its callers have finished.
> > >
> >
> > Replying to self again - apologies.
> >
> > I think this is actually correct after all: synchronize_rcu_tasks()
> > guarantees that all tasks have passed through a 'safe state', i.e.,
> > voluntary schedule(), return to userland, etc, which guarantees that
> > no task could be executing the old static call target after
> > synchronize_rcu_tasks() returns.
>
> Right, I think it should work.
>
> My initial question was why you'd want to support the unreg at all.
> AFAICT these implementations are tiny, why bother having them as a
> module, or if you insist having them as a module, why allowing removal?

Yeah, good question.

Having the accelerated version as a module makes sense imo: it will
only be loaded if it is supported on the system, and it can be
blacklisted by the user if it does not want it for any reason. Unload
is just for completeness/symmetry, but it is unlikely to be crucial to
anyone in practice.

In any case, thanks for confirming - if this looks sane to you then we
may be able to use this pattern in other places as well.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2021-01-11 17:27 ` [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
@ 2021-01-11 21:05 ` Eric Biggers
  2021-01-11 21:31   ` Ard Biesheuvel
  8 siblings, 1 reply; 17+ messages in thread
From: Eric Biggers @ 2021-01-11 21:05 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, herbert, Martin K. Petersen, Peter Zijlstra

On Mon, Jan 11, 2021 at 05:52:30PM +0100, Ard Biesheuvel wrote:
> CRC-T10DIF is a very poor match for the crypto API:
> - every user in the kernel calls it via a library wrapper around the
>   shash API, so all callers share a single instance of the transform
> - each architecture provides at most a single optimized implementation,
>   based on SIMD instructions for carryless multiplication
> 
> In other words, none of the flexibility it provides is put to good use,
> and so it is better to get rid of this complexity, and expose the optimized
> implementations via static calls instead. This removes a substantial chunk
> of code, and also gets rid of any indirect calls on architectures that
> obsess about those (x86)
> 
> If this approach is deemed suitable, there are other places where we might
> consider adopting it: CRC32 and CRC32(C).
> 
> Patch #1 does some preparatory refactoring and removes the library wrapper
> around the shash transform.
> 
> Patch #2 introduces the actual static calls, along with the registration
> routines to update the crc-t10dif implementation at runtime.
> 
> Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes
> between the optimized library code and the generic library code.
> 
> Patches #4 to #7 update the various arch implementations to switch over to
> the new method.
> 
> Special request to Peter to take a look at patch #2, and in particular,
> whether synchronize_rcu_tasks() is sufficient to ensure that a module
> providing the target of a static call can be unloaded safely.
>  
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: Eric Biggers <ebiggers@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> 
> Ard Biesheuvel (7):
>   crypto: crc-t10dif - turn library wrapper for shash into generic
>     library
>   crypto: lib/crc-t10dif - add static call support for optimized
>     versions
>   crypto: generic/crc-t10dif - expose both arch and generic shashes
>   crypto: x86/crc-t10dif - convert to static call library API
>   crypto: arm/crc-t10dif - convert to static call library API
>   crypto: arm64/crc-t10dif - convert to static call API
>   crypto: powerpc/crc-t10dif - convert to static call API
> 
>  arch/arm/crypto/Kconfig                     |   2 +-
>  arch/arm/crypto/crct10dif-ce-glue.c         |  58 ++------
>  arch/arm64/crypto/Kconfig                   |   3 +-
>  arch/arm64/crypto/crct10dif-ce-glue.c       |  85 ++---------
>  arch/powerpc/crypto/crct10dif-vpmsum_glue.c |  51 +------
>  arch/x86/crypto/crct10dif-pclmul_glue.c     |  90 ++----------
>  crypto/Kconfig                              |   7 +-
>  crypto/Makefile                             |   2 +-
>  crypto/crct10dif_common.c                   |  82 -----------
>  crypto/crct10dif_generic.c                  | 100 +++++++++----
>  include/linux/crc-t10dif.h                  |  21 ++-
>  lib/Kconfig                                 |   2 -
>  lib/crc-t10dif.c                            | 152 +++++++++-----------
>  13 files changed, 204 insertions(+), 451 deletions(-)
>  delete mode 100644 crypto/crct10dif_common.c

There is already a library API for two other hash functions, BLAKE2s and
Poly1305, which takes advantage of architecture-specific implementations without
using static calls.  Also, those algorithms are likewise also exposed through
the shash API, but in a different way from what this patchset proposes.

Is there a reason not to do things in the same way?  What are the advantages of
the new approach that you're proposing?

- Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 21:05 ` Eric Biggers
@ 2021-01-11 21:31   ` Ard Biesheuvel
  2021-01-28  8:19     ` Ard Biesheuvel
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-11 21:31 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Crypto Mailing List, Herbert Xu, Martin K. Petersen,
	Peter Zijlstra

On Mon, 11 Jan 2021 at 22:05, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Mon, Jan 11, 2021 at 05:52:30PM +0100, Ard Biesheuvel wrote:
> > CRC-T10DIF is a very poor match for the crypto API:
> > - every user in the kernel calls it via a library wrapper around the
> >   shash API, so all callers share a single instance of the transform
> > - each architecture provides at most a single optimized implementation,
> >   based on SIMD instructions for carryless multiplication
> >
> > In other words, none of the flexibility it provides is put to good use,
> > and so it is better to get rid of this complexity, and expose the optimized
> > implementations via static calls instead. This removes a substantial chunk
> > of code, and also gets rid of any indirect calls on architectures that
> > obsess about those (x86)
> >
> > If this approach is deemed suitable, there are other places where we might
> > consider adopting it: CRC32 and CRC32(C).
> >
> > Patch #1 does some preparatory refactoring and removes the library wrapper
> > around the shash transform.
> >
> > Patch #2 introduces the actual static calls, along with the registration
> > routines to update the crc-t10dif implementation at runtime.
> >
> > Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes
> > between the optimized library code and the generic library code.
> >
> > Patches #4 to #7 update the various arch implementations to switch over to
> > the new method.
> >
> > Special request to Peter to take a look at patch #2, and in particular,
> > whether synchronize_rcu_tasks() is sufficient to ensure that a module
> > providing the target of a static call can be unloaded safely.
> >
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > Cc: Eric Biggers <ebiggers@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> >
> > Ard Biesheuvel (7):
> >   crypto: crc-t10dif - turn library wrapper for shash into generic
> >     library
> >   crypto: lib/crc-t10dif - add static call support for optimized
> >     versions
> >   crypto: generic/crc-t10dif - expose both arch and generic shashes
> >   crypto: x86/crc-t10dif - convert to static call library API
> >   crypto: arm/crc-t10dif - convert to static call library API
> >   crypto: arm64/crc-t10dif - convert to static call API
> >   crypto: powerpc/crc-t10dif - convert to static call API
> >
> >  arch/arm/crypto/Kconfig                     |   2 +-
> >  arch/arm/crypto/crct10dif-ce-glue.c         |  58 ++------
> >  arch/arm64/crypto/Kconfig                   |   3 +-
> >  arch/arm64/crypto/crct10dif-ce-glue.c       |  85 ++---------
> >  arch/powerpc/crypto/crct10dif-vpmsum_glue.c |  51 +------
> >  arch/x86/crypto/crct10dif-pclmul_glue.c     |  90 ++----------
> >  crypto/Kconfig                              |   7 +-
> >  crypto/Makefile                             |   2 +-
> >  crypto/crct10dif_common.c                   |  82 -----------
> >  crypto/crct10dif_generic.c                  | 100 +++++++++----
> >  include/linux/crc-t10dif.h                  |  21 ++-
> >  lib/Kconfig                                 |   2 -
> >  lib/crc-t10dif.c                            | 152 +++++++++-----------
> >  13 files changed, 204 insertions(+), 451 deletions(-)
> >  delete mode 100644 crypto/crct10dif_common.c
>
> There is already a library API for two other hash functions, BLAKE2s and
> Poly1305, which takes advantage of architecture-specific implementations without
> using static calls.  Also, those algorithms are likewise also exposed through
> the shash API, but in a different way from what this patchset proposes.
>
> Is there a reason not to do things in the same way?  What are the advantages of
> the new approach that you're proposing?
>

The current approach uses build time dependencies - i.e., if you
decide to build the accelerated implementation, you always have to
load it, even if the accelerated implementation cannot be used on the
system in question, and it is up to that code to use fallbacks for
everything, This was kind of a compromise on my part when we were
having the crypto library vs Wireguard discussion - I mentioned at the
time that [in my opinion] it was a temporary approach because static
call support was taking so long to arrive. (lwn article here [0] but
the links in it seem to be dead). It also results in some nasty
Kconfig dependencies because building the generic code into the kernel
and building the accelerated code as a module gives problems.

I agree that having different approaches for doing the same thing is
suboptimal, but I think the situation may be slightly different for
plain SIMD code such as blake2 and poly1305 versus crc implementations
based on carryless multiplication instructions which may be rare but
10-20x faster if they are supported.

So in summary, I think this approach is better. The generic code can
be built in and superseded by module code, and there are no
dependencies going both ways.

There are some complications as well, though:
- module softdeps don't work {afaik) if the depender is builtin
- our poly1305 implementations use a different layout for the state
that is passed between the init/update/final calls, so it may be
tricky to use a similar approach there.




[0] https://lwn.net/Articles/802376/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API
  2021-01-11 16:52 ` [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API Ard Biesheuvel
@ 2021-01-12  0:01     ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-01-12  0:01 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-crypto
  Cc: kbuild-all, clang-built-linux, herbert, Ard Biesheuvel,
	Martin K. Petersen, Eric Biggers, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2596 bytes --]

Hi Ard,

I love your patch! Perhaps something to improve:

[auto build test WARNING on cryptodev/master]
[also build test WARNING on crypto/master linus/master v5.11-rc3 next-20210111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ard-Biesheuvel/crypto-switch-to-static-calls-for-CRC-T10DIF/20210112-005632
base:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: x86_64-randconfig-a012-20210111 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 7be3285248bf54d0784a76174cf44cf7c1e780a5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/df3e7f87b3c1edc4daace5f5df09d9a1896b97dc
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ard-Biesheuvel/crypto-switch-to-static-calls-for-CRC-T10DIF/20210112-005632
        git checkout df3e7f87b3c1edc4daace5f5df09d9a1896b97dc
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/crypto/crct10dif-pclmul_glue.c:50:32: warning: unused variable 'crct10dif_cpu_id' [-Wunused-const-variable]
   static const struct x86_cpu_id crct10dif_cpu_id[] = {
                                  ^
   1 warning generated.


vim +/crct10dif_cpu_id +50 arch/x86/crypto/crct10dif-pclmul_glue.c

68411521cc6055ed Herbert Xu      2013-09-07  49  
68411521cc6055ed Herbert Xu      2013-09-07 @50  static const struct x86_cpu_id crct10dif_cpu_id[] = {
f30cfacad1ee948c Thomas Gleixner 2020-03-20  51  	X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
68411521cc6055ed Herbert Xu      2013-09-07  52  	{}
68411521cc6055ed Herbert Xu      2013-09-07  53  };
68411521cc6055ed Herbert Xu      2013-09-07  54  MODULE_DEVICE_TABLE(x86cpu, crct10dif_cpu_id);
68411521cc6055ed Herbert Xu      2013-09-07  55  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35642 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API
@ 2021-01-12  0:01     ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-01-12  0:01 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

Hi Ard,

I love your patch! Perhaps something to improve:

[auto build test WARNING on cryptodev/master]
[also build test WARNING on crypto/master linus/master v5.11-rc3 next-20210111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ard-Biesheuvel/crypto-switch-to-static-calls-for-CRC-T10DIF/20210112-005632
base:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: x86_64-randconfig-a012-20210111 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 7be3285248bf54d0784a76174cf44cf7c1e780a5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/df3e7f87b3c1edc4daace5f5df09d9a1896b97dc
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ard-Biesheuvel/crypto-switch-to-static-calls-for-CRC-T10DIF/20210112-005632
        git checkout df3e7f87b3c1edc4daace5f5df09d9a1896b97dc
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/crypto/crct10dif-pclmul_glue.c:50:32: warning: unused variable 'crct10dif_cpu_id' [-Wunused-const-variable]
   static const struct x86_cpu_id crct10dif_cpu_id[] = {
                                  ^
   1 warning generated.


vim +/crct10dif_cpu_id +50 arch/x86/crypto/crct10dif-pclmul_glue.c

68411521cc6055ed Herbert Xu      2013-09-07  49  
68411521cc6055ed Herbert Xu      2013-09-07 @50  static const struct x86_cpu_id crct10dif_cpu_id[] = {
f30cfacad1ee948c Thomas Gleixner 2020-03-20  51  	X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
68411521cc6055ed Herbert Xu      2013-09-07  52  	{}
68411521cc6055ed Herbert Xu      2013-09-07  53  };
68411521cc6055ed Herbert Xu      2013-09-07  54  MODULE_DEVICE_TABLE(x86cpu, crct10dif_cpu_id);
68411521cc6055ed Herbert Xu      2013-09-07  55  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35642 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF
  2021-01-11 21:31   ` Ard Biesheuvel
@ 2021-01-28  8:19     ` Ard Biesheuvel
  0 siblings, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-28  8:19 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Crypto Mailing List, Herbert Xu, Martin K. Petersen,
	Peter Zijlstra

On Mon, 11 Jan 2021 at 22:31, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 11 Jan 2021 at 22:05, Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Mon, Jan 11, 2021 at 05:52:30PM +0100, Ard Biesheuvel wrote:
> > > CRC-T10DIF is a very poor match for the crypto API:
> > > - every user in the kernel calls it via a library wrapper around the
> > >   shash API, so all callers share a single instance of the transform
> > > - each architecture provides at most a single optimized implementation,
> > >   based on SIMD instructions for carryless multiplication
> > >
> > > In other words, none of the flexibility it provides is put to good use,
> > > and so it is better to get rid of this complexity, and expose the optimized
> > > implementations via static calls instead. This removes a substantial chunk
> > > of code, and also gets rid of any indirect calls on architectures that
> > > obsess about those (x86)
> > >
> > > If this approach is deemed suitable, there are other places where we might
> > > consider adopting it: CRC32 and CRC32(C).
> > >
> > > Patch #1 does some preparatory refactoring and removes the library wrapper
> > > around the shash transform.
> > >
> > > Patch #2 introduces the actual static calls, along with the registration
> > > routines to update the crc-t10dif implementation at runtime.
> > >
> > > Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes
> > > between the optimized library code and the generic library code.
> > >
> > > Patches #4 to #7 update the various arch implementations to switch over to
> > > the new method.
> > >
> > > Special request to Peter to take a look at patch #2, and in particular,
> > > whether synchronize_rcu_tasks() is sufficient to ensure that a module
> > > providing the target of a static call can be unloaded safely.
> > >
> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > > Cc: Eric Biggers <ebiggers@google.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > >
> > > Ard Biesheuvel (7):
> > >   crypto: crc-t10dif - turn library wrapper for shash into generic
> > >     library
> > >   crypto: lib/crc-t10dif - add static call support for optimized
> > >     versions
> > >   crypto: generic/crc-t10dif - expose both arch and generic shashes
> > >   crypto: x86/crc-t10dif - convert to static call library API
> > >   crypto: arm/crc-t10dif - convert to static call library API
> > >   crypto: arm64/crc-t10dif - convert to static call API
> > >   crypto: powerpc/crc-t10dif - convert to static call API
> > >
> > >  arch/arm/crypto/Kconfig                     |   2 +-
> > >  arch/arm/crypto/crct10dif-ce-glue.c         |  58 ++------
> > >  arch/arm64/crypto/Kconfig                   |   3 +-
> > >  arch/arm64/crypto/crct10dif-ce-glue.c       |  85 ++---------
> > >  arch/powerpc/crypto/crct10dif-vpmsum_glue.c |  51 +------
> > >  arch/x86/crypto/crct10dif-pclmul_glue.c     |  90 ++----------
> > >  crypto/Kconfig                              |   7 +-
> > >  crypto/Makefile                             |   2 +-
> > >  crypto/crct10dif_common.c                   |  82 -----------
> > >  crypto/crct10dif_generic.c                  | 100 +++++++++----
> > >  include/linux/crc-t10dif.h                  |  21 ++-
> > >  lib/Kconfig                                 |   2 -
> > >  lib/crc-t10dif.c                            | 152 +++++++++-----------
> > >  13 files changed, 204 insertions(+), 451 deletions(-)
> > >  delete mode 100644 crypto/crct10dif_common.c
> >
> > There is already a library API for two other hash functions, BLAKE2s and
> > Poly1305, which takes advantage of architecture-specific implementations without
> > using static calls.  Also, those algorithms are likewise also exposed through
> > the shash API, but in a different way from what this patchset proposes.
> >
> > Is there a reason not to do things in the same way?  What are the advantages of
> > the new approach that you're proposing?
> >
>
> The current approach uses build time dependencies - i.e., if you
> decide to build the accelerated implementation, you always have to
> load it, even if the accelerated implementation cannot be used on the
> system in question, and it is up to that code to use fallbacks for
> everything, This was kind of a compromise on my part when we were
> having the crypto library vs Wireguard discussion - I mentioned at the
> time that [in my opinion] it was a temporary approach because static
> call support was taking so long to arrive. (lwn article here [0] but
> the links in it seem to be dead). It also results in some nasty
> Kconfig dependencies because building the generic code into the kernel
> and building the accelerated code as a module gives problems.
>
> I agree that having different approaches for doing the same thing is
> suboptimal, but I think the situation may be slightly different for
> plain SIMD code such as blake2 and poly1305 versus crc implementations
> based on carryless multiplication instructions which may be rare but
> 10-20x faster if they are supported.
>
> So in summary, I think this approach is better. The generic code can
> be built in and superseded by module code, and there are no
> dependencies going both ways.
>
> There are some complications as well, though:
> - module softdeps don't work {afaik) if the depender is builtin
> - our poly1305 implementations use a different layout for the state
> that is passed between the init/update/final calls, so it may be
> tricky to use a similar approach there.
>
>

This does not actually work. The kernel_fpu_end() occurring in the
CRC-T10 code amounts to a voluntary schedule, which means
synchronize_rcu_tasks() does not guarantee that the accelerated code
is no longer live after the static call is updated to refer to the
generic code again, and so the remove is risky.

There are some other issues I want to focus on first, so I am going to
disregard this series for now.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-28  8:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 16:52 [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
2021-01-11 16:52 ` [PATCH 1/7] crypto: crc-t10dif - turn library wrapper for shash into generic library Ard Biesheuvel
2021-01-11 16:52 ` [PATCH 2/7] crypto: lib/crc-t10dif - add static call support for optimized versions Ard Biesheuvel
2021-01-11 16:52 ` [PATCH 3/7] crypto: generic/crc-t10dif - expose both arch and generic shashes Ard Biesheuvel
2021-01-11 16:52 ` [PATCH 4/7] crypto: x86/crc-t10dif - convert to static call library API Ard Biesheuvel
2021-01-12  0:01   ` kernel test robot
2021-01-12  0:01     ` kernel test robot
2021-01-11 16:52 ` [PATCH 5/7] crypto: arm/crc-t10dif " Ard Biesheuvel
2021-01-11 16:52 ` [PATCH 6/7] crypto: arm64/crc-t10dif - convert to static call API Ard Biesheuvel
2021-01-11 16:52 ` [PATCH 7/7] crypto: powerpc/crc-t10dif " Ard Biesheuvel
2021-01-11 17:27 ` [PATCH 0/7] crypto: switch to static calls for CRC-T10DIF Ard Biesheuvel
2021-01-11 18:36   ` Ard Biesheuvel
2021-01-11 20:56     ` Peter Zijlstra
2021-01-11 21:01       ` Ard Biesheuvel
2021-01-11 21:05 ` Eric Biggers
2021-01-11 21:31   ` Ard Biesheuvel
2021-01-28  8:19     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.