All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] crypto: Speck support
@ 2018-02-12 23:52 ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, Eric Biggers

Hello,

This series adds Speck support to the crypto API, including the Speck128
and Speck64 variants.  Speck is a lightweight block cipher that can be
much faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

Thus, patch 1 adds a generic implementation of Speck, and the following
patches add a 32-bit ARM NEON implementation of Speck-XTS.  The
NEON-accelerated implementation is much faster than the generic
implementation and therefore is the implementation that would primarily
be used in practice on the devices we are targeting.

There is no AArch64 implementation added, since such CPUs are likely to
have the Cryptography Extensions, allowing the use of AES.

Changed since v1:

  - Use the word order recommended by the Speck authors.  All test
    vectors were updated.

Eric Biggers (5):
  crypto: add support for the Speck block cipher
  crypto: speck - export common helpers
  crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  crypto: speck - add test vectors for Speck128-XTS
  crypto: speck - add test vectors for Speck64-XTS

 arch/arm/crypto/Kconfig           |    6 +
 arch/arm/crypto/Makefile          |    2 +
 arch/arm/crypto/speck-neon-core.S |  432 +++++++++++
 arch/arm/crypto/speck-neon-glue.c |  290 ++++++++
 crypto/Kconfig                    |   14 +
 crypto/Makefile                   |    1 +
 crypto/speck.c                    |  307 ++++++++
 crypto/testmgr.c                  |   36 +
 crypto/testmgr.h                  | 1486 +++++++++++++++++++++++++++++++++++++
 include/crypto/speck.h            |   62 ++
 10 files changed, 2636 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c
 create mode 100644 crypto/speck.c
 create mode 100644 include/crypto/speck.h

-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-02-12 23:52 ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

This series adds Speck support to the crypto API, including the Speck128
and Speck64 variants.  Speck is a lightweight block cipher that can be
much faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

Thus, patch 1 adds a generic implementation of Speck, and the following
patches add a 32-bit ARM NEON implementation of Speck-XTS.  The
NEON-accelerated implementation is much faster than the generic
implementation and therefore is the implementation that would primarily
be used in practice on the devices we are targeting.

There is no AArch64 implementation added, since such CPUs are likely to
have the Cryptography Extensions, allowing the use of AES.

Changed since v1:

  - Use the word order recommended by the Speck authors.  All test
    vectors were updated.

Eric Biggers (5):
  crypto: add support for the Speck block cipher
  crypto: speck - export common helpers
  crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  crypto: speck - add test vectors for Speck128-XTS
  crypto: speck - add test vectors for Speck64-XTS

 arch/arm/crypto/Kconfig           |    6 +
 arch/arm/crypto/Makefile          |    2 +
 arch/arm/crypto/speck-neon-core.S |  432 +++++++++++
 arch/arm/crypto/speck-neon-glue.c |  290 ++++++++
 crypto/Kconfig                    |   14 +
 crypto/Makefile                   |    1 +
 crypto/speck.c                    |  307 ++++++++
 crypto/testmgr.c                  |   36 +
 crypto/testmgr.h                  | 1486 +++++++++++++++++++++++++++++++++++++
 include/crypto/speck.h            |   62 ++
 10 files changed, 2636 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c
 create mode 100644 crypto/speck.c
 create mode 100644 include/crypto/speck.h

-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/5] crypto: add support for the Speck block cipher
  2018-02-12 23:52 ` Eric Biggers
@ 2018-02-12 23:52   ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, Eric Biggers

Add a generic implementation of Speck, including the Speck128 and
Speck64 variants.  Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them.  But we have implemented it using
the orders the authors recommended in a correspondence with them.  The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/Kconfig   |  14 +++
 crypto/Makefile  |   1 +
 crypto/speck.c   | 299 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 crypto/testmgr.c |  18 ++++
 crypto/testmgr.h | 128 ++++++++++++++++++++++++
 5 files changed, 460 insertions(+)
 create mode 100644 crypto/speck.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index b75264b09a46..558eff07b799 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1508,6 +1508,20 @@ config CRYPTO_SERPENT_AVX2_X86_64
 	  See also:
 	  <http://www.cl.cam.ac.uk/~rja14/serpent.html>
 
+config CRYPTO_SPECK
+	tristate "Speck cipher algorithm"
+	select CRYPTO_ALGAPI
+	help
+	  Speck is a lightweight block cipher that is tuned for optimal
+	  performance in software (rather than hardware).
+
+	  Speck may not be as secure as AES, and should only be used on systems
+	  where AES is not fast enough.
+
+	  See also: <https://eprint.iacr.org/2013/404.pdf>
+
+	  If unsure, say N.
+
 config CRYPTO_TEA
 	tristate "TEA, XTEA and XETA cipher algorithms"
 	select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index cdbc03b35510..ba6019471447 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_SEED) += seed.o
+obj-$(CONFIG_CRYPTO_SPECK) += speck.o
 obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
 obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
 obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
diff --git a/crypto/speck.c b/crypto/speck.c
new file mode 100644
index 000000000000..4e80ad76bcd7
--- /dev/null
+++ b/crypto/speck.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Speck: a lightweight block cipher
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Speck has 10 variants, including 5 block sizes.  For now we only implement
+ * the variants Speck128/128, Speck128/192, Speck128/256, Speck64/96, and
+ * Speck64/128.   Speck${B}/${K} denotes the variant with a block size of B bits
+ * and a key size of K bits.  The Speck128 variants are believed to be the most
+ * secure variants, and they use the same block size and key sizes as AES.  The
+ * Speck64 variants are less secure, but on 32-bit processors are usually
+ * faster.  The remaining variants (Speck32, Speck48, and Speck96) are even less
+ * secure and/or not as well suited for implementation on either 32-bit or
+ * 64-bit processors, so are omitted.
+ *
+ * Reference: "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * In a correspondence, the Speck designers have also clarified that the words
+ * should be interpreted in little-endian format, and the words should be
+ * ordered such that the first word of each block is 'y' rather than 'x', and
+ * the first key word (rather than the last) becomes the first round key.
+ */
+
+#include <asm/unaligned.h>
+#include <linux/bitops.h>
+#include <linux/crypto.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+/* Speck128 */
+
+#define SPECK128_BLOCK_SIZE	16
+
+#define SPECK128_128_KEY_SIZE	16
+#define SPECK128_128_NROUNDS	32
+
+#define SPECK128_192_KEY_SIZE	24
+#define SPECK128_192_NROUNDS	33
+
+#define SPECK128_256_KEY_SIZE	32
+#define SPECK128_256_NROUNDS	34
+
+struct speck128_tfm_ctx {
+	u64 round_keys[SPECK128_256_NROUNDS];
+	int nrounds;
+};
+
+static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
+{
+	*x = ror64(*x, 8);
+	*x += *y;
+	*x ^= k;
+	*y = rol64(*y, 3);
+	*y ^= *x;
+}
+
+static __always_inline void speck128_unround(u64 *x, u64 *y, u64 k)
+{
+	*y ^= *x;
+	*y = ror64(*y, 3);
+	*x ^= k;
+	*x -= *y;
+	*x = rol64(*x, 8);
+}
+
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u64 y = get_unaligned_le64(in);
+	u64 x = get_unaligned_le64(in + 8);
+	int i;
+
+	for (i = 0; i < ctx->nrounds; i++)
+		speck128_round(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le64(y, out);
+	put_unaligned_le64(x, out + 8);
+}
+
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u64 y = get_unaligned_le64(in);
+	u64 x = get_unaligned_le64(in + 8);
+	int i;
+
+	for (i = ctx->nrounds - 1; i >= 0; i--)
+		speck128_unround(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le64(y, out);
+	put_unaligned_le64(x, out + 8);
+}
+
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+			   unsigned int keylen)
+{
+	struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u64 l[3];
+	u64 k;
+	int i;
+
+	switch (keylen) {
+	case SPECK128_128_KEY_SIZE:
+		k = get_unaligned_le64(key);
+		l[0] = get_unaligned_le64(key + 8);
+		ctx->nrounds = SPECK128_128_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck128_round(&l[0], &k, i);
+		}
+		break;
+	case SPECK128_192_KEY_SIZE:
+		k = get_unaligned_le64(key);
+		l[0] = get_unaligned_le64(key + 8);
+		l[1] = get_unaligned_le64(key + 16);
+		ctx->nrounds = SPECK128_192_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck128_round(&l[i % 2], &k, i);
+		}
+		break;
+	case SPECK128_256_KEY_SIZE:
+		k = get_unaligned_le64(key);
+		l[0] = get_unaligned_le64(key + 8);
+		l[1] = get_unaligned_le64(key + 16);
+		l[2] = get_unaligned_le64(key + 24);
+		ctx->nrounds = SPECK128_256_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck128_round(&l[i % 3], &k, i);
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Speck64 */
+
+#define SPECK64_BLOCK_SIZE	8
+
+#define SPECK64_96_KEY_SIZE	12
+#define SPECK64_96_NROUNDS	26
+
+#define SPECK64_128_KEY_SIZE	16
+#define SPECK64_128_NROUNDS	27
+
+struct speck64_tfm_ctx {
+	u32 round_keys[SPECK64_128_NROUNDS];
+	int nrounds;
+};
+
+static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
+{
+	*x = ror32(*x, 8);
+	*x += *y;
+	*x ^= k;
+	*y = rol32(*y, 3);
+	*y ^= *x;
+}
+
+static __always_inline void speck64_unround(u32 *x, u32 *y, u32 k)
+{
+	*y ^= *x;
+	*y = ror32(*y, 3);
+	*x ^= k;
+	*x -= *y;
+	*x = rol32(*x, 8);
+}
+
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 y = get_unaligned_le32(in);
+	u32 x = get_unaligned_le32(in + 4);
+	int i;
+
+	for (i = 0; i < ctx->nrounds; i++)
+		speck64_round(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le32(y, out);
+	put_unaligned_le32(x, out + 4);
+}
+
+static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 y = get_unaligned_le32(in);
+	u32 x = get_unaligned_le32(in + 4);
+	int i;
+
+	for (i = ctx->nrounds - 1; i >= 0; i--)
+		speck64_unround(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le32(y, out);
+	put_unaligned_le32(x, out + 4);
+}
+
+static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+			  unsigned int keylen)
+{
+	struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 l[3];
+	u32 k;
+	int i;
+
+	switch (keylen) {
+	case SPECK64_96_KEY_SIZE:
+		k = get_unaligned_le32(key);
+		l[0] = get_unaligned_le32(key + 4);
+		l[1] = get_unaligned_le32(key + 8);
+		ctx->nrounds = SPECK64_96_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck64_round(&l[i % 2], &k, i);
+		}
+		break;
+	case SPECK64_128_KEY_SIZE:
+		k = get_unaligned_le32(key);
+		l[0] = get_unaligned_le32(key + 4);
+		l[1] = get_unaligned_le32(key + 8);
+		l[2] = get_unaligned_le32(key + 12);
+		ctx->nrounds = SPECK64_128_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck64_round(&l[i % 3], &k, i);
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Algorithm definitions */
+
+static struct crypto_alg speck_algs[] = {
+	{
+		.cra_name		= "speck128",
+		.cra_driver_name	= "speck128-generic",
+		.cra_priority		= 100,
+		.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+		.cra_blocksize		= SPECK128_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct speck128_tfm_ctx),
+		.cra_module		= THIS_MODULE,
+		.cra_u			= {
+			.cipher = {
+				.cia_min_keysize	= SPECK128_128_KEY_SIZE,
+				.cia_max_keysize	= SPECK128_256_KEY_SIZE,
+				.cia_setkey		= speck128_setkey,
+				.cia_encrypt		= speck128_encrypt,
+				.cia_decrypt		= speck128_decrypt
+			}
+		}
+	}, {
+		.cra_name		= "speck64",
+		.cra_driver_name	= "speck64-generic",
+		.cra_priority		= 100,
+		.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+		.cra_blocksize		= SPECK64_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct speck64_tfm_ctx),
+		.cra_module		= THIS_MODULE,
+		.cra_u			= {
+			.cipher = {
+				.cia_min_keysize	= SPECK64_96_KEY_SIZE,
+				.cia_max_keysize	= SPECK64_128_KEY_SIZE,
+				.cia_setkey		= speck64_setkey,
+				.cia_encrypt		= speck64_encrypt,
+				.cia_decrypt		= speck64_decrypt
+			}
+		}
+	}
+};
+
+static int __init speck_module_init(void)
+{
+	return crypto_register_algs(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_module_exit(void)
+{
+	crypto_unregister_algs(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_module_init);
+module_exit(speck_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (generic)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("speck128");
+MODULE_ALIAS_CRYPTO("speck128-generic");
+MODULE_ALIAS_CRYPTO("speck64");
+MODULE_ALIAS_CRYPTO("speck64-generic");
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index d5e23a142a04..058ed5eb6620 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3000,6 +3000,24 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.dec = __VECS(serpent_dec_tv_template)
 			}
 		}
+	}, {
+		.alg = "ecb(speck128)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck128_enc_tv_template),
+				.dec = __VECS(speck128_dec_tv_template)
+			}
+		}
+	}, {
+		.alg = "ecb(speck64)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck64_enc_tv_template),
+				.dec = __VECS(speck64_dec_tv_template)
+			}
+		}
 	}, {
 		.alg = "ecb(tea)",
 		.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 6044f6906bd6..3818210f77cf 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14323,6 +14323,134 @@ static const struct cipher_testvec serpent_xts_dec_tv_template[] = {
 	},
 };
 
+/*
+ * Speck test vectors taken from the original paper:
+ * "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * Note that the paper does not make byte and word order clear.  But it was
+ * confirmed with the authors that the intended orders are little endian byte
+ * order and (y, x) word order.  Equivalently, the printed test vectors, when
+ * looking at only the bytes (ignoring the whitespace that divides them into
+ * words), are backwards: the left-most byte is actually the one with the
+ * highest memory address, while the right-most byte is actually the one with
+ * the lowest memory address.
+ */
+
+static const struct cipher_testvec speck128_enc_tv_template[] = {
+	{ /* Speck128/128 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
+		.klen	= 16,
+		.input	= "\x20\x6d\x61\x64\x65\x20\x69\x74"
+			  "\x20\x65\x71\x75\x69\x76\x61\x6c",
+		.ilen	= 16,
+		.result	= "\x18\x0d\x57\x5c\xdf\xfe\x60\x78"
+			  "\x65\x32\x78\x79\x51\x98\x5d\xa6",
+		.rlen	= 16,
+	}, { /* Speck128/192 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17",
+		.klen	= 24,
+		.input	= "\x65\x6e\x74\x20\x74\x6f\x20\x43"
+			  "\x68\x69\x65\x66\x20\x48\x61\x72",
+		.ilen	= 16,
+		.result	= "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9"
+			  "\x66\x55\x13\x13\x3a\xcf\xe4\x1b",
+		.rlen	= 16,
+	}, { /* Speck128/256 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f",
+		.klen	= 32,
+		.input	= "\x70\x6f\x6f\x6e\x65\x72\x2e\x20"
+			  "\x49\x6e\x20\x74\x68\x6f\x73\x65",
+		.ilen	= 16,
+		.result	= "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e"
+			  "\x3e\xf5\xc0\x05\x04\x01\x09\x41",
+		.rlen	= 16,
+	},
+};
+
+static const struct cipher_testvec speck128_dec_tv_template[] = {
+	{ /* Speck128/128 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
+		.klen	= 16,
+		.input	= "\x18\x0d\x57\x5c\xdf\xfe\x60\x78"
+			  "\x65\x32\x78\x79\x51\x98\x5d\xa6",
+		.ilen	= 16,
+		.result	= "\x20\x6d\x61\x64\x65\x20\x69\x74"
+			  "\x20\x65\x71\x75\x69\x76\x61\x6c",
+		.rlen	= 16,
+	}, { /* Speck128/192 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17",
+		.klen	= 24,
+		.input	= "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9"
+			  "\x66\x55\x13\x13\x3a\xcf\xe4\x1b",
+		.ilen	= 16,
+		.result	= "\x65\x6e\x74\x20\x74\x6f\x20\x43"
+			  "\x68\x69\x65\x66\x20\x48\x61\x72",
+		.rlen	= 16,
+	}, { /* Speck128/256 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f",
+		.klen	= 32,
+		.input	= "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e"
+			  "\x3e\xf5\xc0\x05\x04\x01\x09\x41",
+		.ilen	= 16,
+		.result	= "\x70\x6f\x6f\x6e\x65\x72\x2e\x20"
+			  "\x49\x6e\x20\x74\x68\x6f\x73\x65",
+		.rlen	= 16,
+	},
+};
+
+static const struct cipher_testvec speck64_enc_tv_template[] = {
+	{ /* Speck64/96 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13",
+		.klen	= 12,
+		.input	= "\x65\x61\x6e\x73\x20\x46\x61\x74",
+		.ilen	= 8,
+		.result	= "\x6c\x94\x75\x41\xec\x52\x79\x9f",
+		.rlen	= 8,
+	}, { /* Speck64/128 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13\x18\x19\x1a\x1b",
+		.klen	= 16,
+		.input	= "\x2d\x43\x75\x74\x74\x65\x72\x3b",
+		.ilen	= 8,
+		.result	= "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c",
+		.rlen	= 8,
+	},
+};
+
+static const struct cipher_testvec speck64_dec_tv_template[] = {
+	{ /* Speck64/96 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13",
+		.klen	= 12,
+		.input	= "\x6c\x94\x75\x41\xec\x52\x79\x9f",
+		.ilen	= 8,
+		.result	= "\x65\x61\x6e\x73\x20\x46\x61\x74",
+		.rlen	= 8,
+	}, { /* Speck64/128 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13\x18\x19\x1a\x1b",
+		.klen	= 16,
+		.input	= "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c",
+		.ilen	= 8,
+		.result	= "\x2d\x43\x75\x74\x74\x65\x72\x3b",
+		.rlen	= 8,
+	},
+};
+
 /* Cast6 test vectors from RFC 2612 */
 static const struct cipher_testvec cast6_enc_tv_template[] = {
 	{
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 1/5] crypto: add support for the Speck block cipher
@ 2018-02-12 23:52   ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-arm-kernel

Add a generic implementation of Speck, including the Speck128 and
Speck64 variants.  Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them.  But we have implemented it using
the orders the authors recommended in a correspondence with them.  The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/Kconfig   |  14 +++
 crypto/Makefile  |   1 +
 crypto/speck.c   | 299 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 crypto/testmgr.c |  18 ++++
 crypto/testmgr.h | 128 ++++++++++++++++++++++++
 5 files changed, 460 insertions(+)
 create mode 100644 crypto/speck.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index b75264b09a46..558eff07b799 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1508,6 +1508,20 @@ config CRYPTO_SERPENT_AVX2_X86_64
 	  See also:
 	  <http://www.cl.cam.ac.uk/~rja14/serpent.html>
 
+config CRYPTO_SPECK
+	tristate "Speck cipher algorithm"
+	select CRYPTO_ALGAPI
+	help
+	  Speck is a lightweight block cipher that is tuned for optimal
+	  performance in software (rather than hardware).
+
+	  Speck may not be as secure as AES, and should only be used on systems
+	  where AES is not fast enough.
+
+	  See also: <https://eprint.iacr.org/2013/404.pdf>
+
+	  If unsure, say N.
+
 config CRYPTO_TEA
 	tristate "TEA, XTEA and XETA cipher algorithms"
 	select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index cdbc03b35510..ba6019471447 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_SEED) += seed.o
+obj-$(CONFIG_CRYPTO_SPECK) += speck.o
 obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
 obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
 obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
diff --git a/crypto/speck.c b/crypto/speck.c
new file mode 100644
index 000000000000..4e80ad76bcd7
--- /dev/null
+++ b/crypto/speck.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Speck: a lightweight block cipher
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Speck has 10 variants, including 5 block sizes.  For now we only implement
+ * the variants Speck128/128, Speck128/192, Speck128/256, Speck64/96, and
+ * Speck64/128.   Speck${B}/${K} denotes the variant with a block size of B bits
+ * and a key size of K bits.  The Speck128 variants are believed to be the most
+ * secure variants, and they use the same block size and key sizes as AES.  The
+ * Speck64 variants are less secure, but on 32-bit processors are usually
+ * faster.  The remaining variants (Speck32, Speck48, and Speck96) are even less
+ * secure and/or not as well suited for implementation on either 32-bit or
+ * 64-bit processors, so are omitted.
+ *
+ * Reference: "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * In a correspondence, the Speck designers have also clarified that the words
+ * should be interpreted in little-endian format, and the words should be
+ * ordered such that the first word of each block is 'y' rather than 'x', and
+ * the first key word (rather than the last) becomes the first round key.
+ */
+
+#include <asm/unaligned.h>
+#include <linux/bitops.h>
+#include <linux/crypto.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+/* Speck128 */
+
+#define SPECK128_BLOCK_SIZE	16
+
+#define SPECK128_128_KEY_SIZE	16
+#define SPECK128_128_NROUNDS	32
+
+#define SPECK128_192_KEY_SIZE	24
+#define SPECK128_192_NROUNDS	33
+
+#define SPECK128_256_KEY_SIZE	32
+#define SPECK128_256_NROUNDS	34
+
+struct speck128_tfm_ctx {
+	u64 round_keys[SPECK128_256_NROUNDS];
+	int nrounds;
+};
+
+static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
+{
+	*x = ror64(*x, 8);
+	*x += *y;
+	*x ^= k;
+	*y = rol64(*y, 3);
+	*y ^= *x;
+}
+
+static __always_inline void speck128_unround(u64 *x, u64 *y, u64 k)
+{
+	*y ^= *x;
+	*y = ror64(*y, 3);
+	*x ^= k;
+	*x -= *y;
+	*x = rol64(*x, 8);
+}
+
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u64 y = get_unaligned_le64(in);
+	u64 x = get_unaligned_le64(in + 8);
+	int i;
+
+	for (i = 0; i < ctx->nrounds; i++)
+		speck128_round(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le64(y, out);
+	put_unaligned_le64(x, out + 8);
+}
+
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u64 y = get_unaligned_le64(in);
+	u64 x = get_unaligned_le64(in + 8);
+	int i;
+
+	for (i = ctx->nrounds - 1; i >= 0; i--)
+		speck128_unround(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le64(y, out);
+	put_unaligned_le64(x, out + 8);
+}
+
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+			   unsigned int keylen)
+{
+	struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u64 l[3];
+	u64 k;
+	int i;
+
+	switch (keylen) {
+	case SPECK128_128_KEY_SIZE:
+		k = get_unaligned_le64(key);
+		l[0] = get_unaligned_le64(key + 8);
+		ctx->nrounds = SPECK128_128_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck128_round(&l[0], &k, i);
+		}
+		break;
+	case SPECK128_192_KEY_SIZE:
+		k = get_unaligned_le64(key);
+		l[0] = get_unaligned_le64(key + 8);
+		l[1] = get_unaligned_le64(key + 16);
+		ctx->nrounds = SPECK128_192_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck128_round(&l[i % 2], &k, i);
+		}
+		break;
+	case SPECK128_256_KEY_SIZE:
+		k = get_unaligned_le64(key);
+		l[0] = get_unaligned_le64(key + 8);
+		l[1] = get_unaligned_le64(key + 16);
+		l[2] = get_unaligned_le64(key + 24);
+		ctx->nrounds = SPECK128_256_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck128_round(&l[i % 3], &k, i);
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Speck64 */
+
+#define SPECK64_BLOCK_SIZE	8
+
+#define SPECK64_96_KEY_SIZE	12
+#define SPECK64_96_NROUNDS	26
+
+#define SPECK64_128_KEY_SIZE	16
+#define SPECK64_128_NROUNDS	27
+
+struct speck64_tfm_ctx {
+	u32 round_keys[SPECK64_128_NROUNDS];
+	int nrounds;
+};
+
+static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
+{
+	*x = ror32(*x, 8);
+	*x += *y;
+	*x ^= k;
+	*y = rol32(*y, 3);
+	*y ^= *x;
+}
+
+static __always_inline void speck64_unround(u32 *x, u32 *y, u32 k)
+{
+	*y ^= *x;
+	*y = ror32(*y, 3);
+	*x ^= k;
+	*x -= *y;
+	*x = rol32(*x, 8);
+}
+
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 y = get_unaligned_le32(in);
+	u32 x = get_unaligned_le32(in + 4);
+	int i;
+
+	for (i = 0; i < ctx->nrounds; i++)
+		speck64_round(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le32(y, out);
+	put_unaligned_le32(x, out + 4);
+}
+
+static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 y = get_unaligned_le32(in);
+	u32 x = get_unaligned_le32(in + 4);
+	int i;
+
+	for (i = ctx->nrounds - 1; i >= 0; i--)
+		speck64_unround(&x, &y, ctx->round_keys[i]);
+
+	put_unaligned_le32(y, out);
+	put_unaligned_le32(x, out + 4);
+}
+
+static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+			  unsigned int keylen)
+{
+	struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
+	u32 l[3];
+	u32 k;
+	int i;
+
+	switch (keylen) {
+	case SPECK64_96_KEY_SIZE:
+		k = get_unaligned_le32(key);
+		l[0] = get_unaligned_le32(key + 4);
+		l[1] = get_unaligned_le32(key + 8);
+		ctx->nrounds = SPECK64_96_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck64_round(&l[i % 2], &k, i);
+		}
+		break;
+	case SPECK64_128_KEY_SIZE:
+		k = get_unaligned_le32(key);
+		l[0] = get_unaligned_le32(key + 4);
+		l[1] = get_unaligned_le32(key + 8);
+		l[2] = get_unaligned_le32(key + 12);
+		ctx->nrounds = SPECK64_128_NROUNDS;
+		for (i = 0; i < ctx->nrounds; i++) {
+			ctx->round_keys[i] = k;
+			speck64_round(&l[i % 3], &k, i);
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Algorithm definitions */
+
+static struct crypto_alg speck_algs[] = {
+	{
+		.cra_name		= "speck128",
+		.cra_driver_name	= "speck128-generic",
+		.cra_priority		= 100,
+		.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+		.cra_blocksize		= SPECK128_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct speck128_tfm_ctx),
+		.cra_module		= THIS_MODULE,
+		.cra_u			= {
+			.cipher = {
+				.cia_min_keysize	= SPECK128_128_KEY_SIZE,
+				.cia_max_keysize	= SPECK128_256_KEY_SIZE,
+				.cia_setkey		= speck128_setkey,
+				.cia_encrypt		= speck128_encrypt,
+				.cia_decrypt		= speck128_decrypt
+			}
+		}
+	}, {
+		.cra_name		= "speck64",
+		.cra_driver_name	= "speck64-generic",
+		.cra_priority		= 100,
+		.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+		.cra_blocksize		= SPECK64_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct speck64_tfm_ctx),
+		.cra_module		= THIS_MODULE,
+		.cra_u			= {
+			.cipher = {
+				.cia_min_keysize	= SPECK64_96_KEY_SIZE,
+				.cia_max_keysize	= SPECK64_128_KEY_SIZE,
+				.cia_setkey		= speck64_setkey,
+				.cia_encrypt		= speck64_encrypt,
+				.cia_decrypt		= speck64_decrypt
+			}
+		}
+	}
+};
+
+static int __init speck_module_init(void)
+{
+	return crypto_register_algs(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_module_exit(void)
+{
+	crypto_unregister_algs(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_module_init);
+module_exit(speck_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (generic)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("speck128");
+MODULE_ALIAS_CRYPTO("speck128-generic");
+MODULE_ALIAS_CRYPTO("speck64");
+MODULE_ALIAS_CRYPTO("speck64-generic");
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index d5e23a142a04..058ed5eb6620 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3000,6 +3000,24 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.dec = __VECS(serpent_dec_tv_template)
 			}
 		}
+	}, {
+		.alg = "ecb(speck128)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck128_enc_tv_template),
+				.dec = __VECS(speck128_dec_tv_template)
+			}
+		}
+	}, {
+		.alg = "ecb(speck64)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck64_enc_tv_template),
+				.dec = __VECS(speck64_dec_tv_template)
+			}
+		}
 	}, {
 		.alg = "ecb(tea)",
 		.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 6044f6906bd6..3818210f77cf 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14323,6 +14323,134 @@ static const struct cipher_testvec serpent_xts_dec_tv_template[] = {
 	},
 };
 
+/*
+ * Speck test vectors taken from the original paper:
+ * "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * Note that the paper does not make byte and word order clear.  But it was
+ * confirmed with the authors that the intended orders are little endian byte
+ * order and (y, x) word order.  Equivalently, the printed test vectors, when
+ * looking at only the bytes (ignoring the whitespace that divides them into
+ * words), are backwards: the left-most byte is actually the one with the
+ * highest memory address, while the right-most byte is actually the one with
+ * the lowest memory address.
+ */
+
+static const struct cipher_testvec speck128_enc_tv_template[] = {
+	{ /* Speck128/128 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
+		.klen	= 16,
+		.input	= "\x20\x6d\x61\x64\x65\x20\x69\x74"
+			  "\x20\x65\x71\x75\x69\x76\x61\x6c",
+		.ilen	= 16,
+		.result	= "\x18\x0d\x57\x5c\xdf\xfe\x60\x78"
+			  "\x65\x32\x78\x79\x51\x98\x5d\xa6",
+		.rlen	= 16,
+	}, { /* Speck128/192 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17",
+		.klen	= 24,
+		.input	= "\x65\x6e\x74\x20\x74\x6f\x20\x43"
+			  "\x68\x69\x65\x66\x20\x48\x61\x72",
+		.ilen	= 16,
+		.result	= "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9"
+			  "\x66\x55\x13\x13\x3a\xcf\xe4\x1b",
+		.rlen	= 16,
+	}, { /* Speck128/256 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f",
+		.klen	= 32,
+		.input	= "\x70\x6f\x6f\x6e\x65\x72\x2e\x20"
+			  "\x49\x6e\x20\x74\x68\x6f\x73\x65",
+		.ilen	= 16,
+		.result	= "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e"
+			  "\x3e\xf5\xc0\x05\x04\x01\x09\x41",
+		.rlen	= 16,
+	},
+};
+
+static const struct cipher_testvec speck128_dec_tv_template[] = {
+	{ /* Speck128/128 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
+		.klen	= 16,
+		.input	= "\x18\x0d\x57\x5c\xdf\xfe\x60\x78"
+			  "\x65\x32\x78\x79\x51\x98\x5d\xa6",
+		.ilen	= 16,
+		.result	= "\x20\x6d\x61\x64\x65\x20\x69\x74"
+			  "\x20\x65\x71\x75\x69\x76\x61\x6c",
+		.rlen	= 16,
+	}, { /* Speck128/192 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17",
+		.klen	= 24,
+		.input	= "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9"
+			  "\x66\x55\x13\x13\x3a\xcf\xe4\x1b",
+		.ilen	= 16,
+		.result	= "\x65\x6e\x74\x20\x74\x6f\x20\x43"
+			  "\x68\x69\x65\x66\x20\x48\x61\x72",
+		.rlen	= 16,
+	}, { /* Speck128/256 */
+		.key	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f",
+		.klen	= 32,
+		.input	= "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e"
+			  "\x3e\xf5\xc0\x05\x04\x01\x09\x41",
+		.ilen	= 16,
+		.result	= "\x70\x6f\x6f\x6e\x65\x72\x2e\x20"
+			  "\x49\x6e\x20\x74\x68\x6f\x73\x65",
+		.rlen	= 16,
+	},
+};
+
+static const struct cipher_testvec speck64_enc_tv_template[] = {
+	{ /* Speck64/96 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13",
+		.klen	= 12,
+		.input	= "\x65\x61\x6e\x73\x20\x46\x61\x74",
+		.ilen	= 8,
+		.result	= "\x6c\x94\x75\x41\xec\x52\x79\x9f",
+		.rlen	= 8,
+	}, { /* Speck64/128 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13\x18\x19\x1a\x1b",
+		.klen	= 16,
+		.input	= "\x2d\x43\x75\x74\x74\x65\x72\x3b",
+		.ilen	= 8,
+		.result	= "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c",
+		.rlen	= 8,
+	},
+};
+
+static const struct cipher_testvec speck64_dec_tv_template[] = {
+	{ /* Speck64/96 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13",
+		.klen	= 12,
+		.input	= "\x6c\x94\x75\x41\xec\x52\x79\x9f",
+		.ilen	= 8,
+		.result	= "\x65\x61\x6e\x73\x20\x46\x61\x74",
+		.rlen	= 8,
+	}, { /* Speck64/128 */
+		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+			  "\x10\x11\x12\x13\x18\x19\x1a\x1b",
+		.klen	= 16,
+		.input	= "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c",
+		.ilen	= 8,
+		.result	= "\x2d\x43\x75\x74\x74\x65\x72\x3b",
+		.rlen	= 8,
+	},
+};
+
 /* Cast6 test vectors from RFC 2612 */
 static const struct cipher_testvec cast6_enc_tv_template[] = {
 	{
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/5] crypto: speck - export common helpers
  2018-02-12 23:52 ` Eric Biggers
@ 2018-02-12 23:52   ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, Eric Biggers

Export the Speck constants and transform context and the ->setkey(),
->encrypt(), and ->decrypt() functions so that they can be reused by the
ARM NEON implementation of Speck-XTS.  The generic key expansion code
will be reused because it is not performance-critical and is not
vectorizable, while the generic encryption and decryption functions are
needed as fallbacks and for the XTS tweak encryption.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/speck.c         | 90 +++++++++++++++++++++++++++-----------------------
 include/crypto/speck.h | 62 ++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+), 41 deletions(-)
 create mode 100644 include/crypto/speck.h

diff --git a/crypto/speck.c b/crypto/speck.c
index 4e80ad76bcd7..58aa9f7f91f7 100644
--- a/crypto/speck.c
+++ b/crypto/speck.c
@@ -24,6 +24,7 @@
  */
 
 #include <asm/unaligned.h>
+#include <crypto/speck.h>
 #include <linux/bitops.h>
 #include <linux/crypto.h>
 #include <linux/init.h>
@@ -31,22 +32,6 @@
 
 /* Speck128 */
 
-#define SPECK128_BLOCK_SIZE	16
-
-#define SPECK128_128_KEY_SIZE	16
-#define SPECK128_128_NROUNDS	32
-
-#define SPECK128_192_KEY_SIZE	24
-#define SPECK128_192_NROUNDS	33
-
-#define SPECK128_256_KEY_SIZE	32
-#define SPECK128_256_NROUNDS	34
-
-struct speck128_tfm_ctx {
-	u64 round_keys[SPECK128_256_NROUNDS];
-	int nrounds;
-};
-
 static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
 {
 	*x = ror64(*x, 8);
@@ -65,9 +50,9 @@ static __always_inline void speck128_unround(u64 *x, u64 *y, u64 k)
 	*x = rol64(*x, 8);
 }
 
-static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in)
 {
-	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u64 y = get_unaligned_le64(in);
 	u64 x = get_unaligned_le64(in + 8);
 	int i;
@@ -78,10 +63,16 @@ static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le64(y, out);
 	put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_encrypt);
 
-static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck128_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in)
 {
-	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u64 y = get_unaligned_le64(in);
 	u64 x = get_unaligned_le64(in + 8);
 	int i;
@@ -92,11 +83,16 @@ static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le64(y, out);
 	put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_decrypt);
 
-static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck128_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
 			   unsigned int keylen)
 {
-	struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u64 l[3];
 	u64 k;
 	int i;
@@ -138,21 +134,15 @@ static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_setkey);
 
-/* Speck64 */
-
-#define SPECK64_BLOCK_SIZE	8
-
-#define SPECK64_96_KEY_SIZE	12
-#define SPECK64_96_NROUNDS	26
-
-#define SPECK64_128_KEY_SIZE	16
-#define SPECK64_128_NROUNDS	27
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+			   unsigned int keylen)
+{
+	return crypto_speck128_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
 
-struct speck64_tfm_ctx {
-	u32 round_keys[SPECK64_128_NROUNDS];
-	int nrounds;
-};
+/* Speck64 */
 
 static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
 {
@@ -172,9 +162,9 @@ static __always_inline void speck64_unround(u32 *x, u32 *y, u32 k)
 	*x = rol32(*x, 8);
 }
 
-static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in)
 {
-	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u32 y = get_unaligned_le32(in);
 	u32 x = get_unaligned_le32(in + 4);
 	int i;
@@ -185,10 +175,16 @@ static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le32(y, out);
 	put_unaligned_le32(x, out + 4);
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_encrypt);
 
-static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck64_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in)
 {
-	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u32 y = get_unaligned_le32(in);
 	u32 x = get_unaligned_le32(in + 4);
 	int i;
@@ -199,11 +195,16 @@ static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le32(y, out);
 	put_unaligned_le32(x, out + 4);
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_decrypt);
 
-static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck64_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key,
 			  unsigned int keylen)
 {
-	struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u32 l[3];
 	u32 k;
 	int i;
@@ -236,6 +237,13 @@ static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_setkey);
+
+static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+			  unsigned int keylen)
+{
+	return crypto_speck64_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
 
 /* Algorithm definitions */
 
diff --git a/include/crypto/speck.h b/include/crypto/speck.h
new file mode 100644
index 000000000000..73cfc952d405
--- /dev/null
+++ b/include/crypto/speck.h
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common values for the Speck algorithm
+ */
+
+#ifndef _CRYPTO_SPECK_H
+#define _CRYPTO_SPECK_H
+
+#include <linux/types.h>
+
+/* Speck128 */
+
+#define SPECK128_BLOCK_SIZE	16
+
+#define SPECK128_128_KEY_SIZE	16
+#define SPECK128_128_NROUNDS	32
+
+#define SPECK128_192_KEY_SIZE	24
+#define SPECK128_192_NROUNDS	33
+
+#define SPECK128_256_KEY_SIZE	32
+#define SPECK128_256_NROUNDS	34
+
+struct speck128_tfm_ctx {
+	u64 round_keys[SPECK128_256_NROUNDS];
+	int nrounds;
+};
+
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in);
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in);
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
+			   unsigned int keysize);
+
+/* Speck64 */
+
+#define SPECK64_BLOCK_SIZE	8
+
+#define SPECK64_96_KEY_SIZE	12
+#define SPECK64_96_NROUNDS	26
+
+#define SPECK64_128_KEY_SIZE	16
+#define SPECK64_128_NROUNDS	27
+
+struct speck64_tfm_ctx {
+	u32 round_keys[SPECK64_128_NROUNDS];
+	int nrounds;
+};
+
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in);
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in);
+
+int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key,
+			  unsigned int keysize);
+
+#endif /* _CRYPTO_SPECK_H */
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/5] crypto: speck - export common helpers
@ 2018-02-12 23:52   ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-arm-kernel

Export the Speck constants and transform context and the ->setkey(),
->encrypt(), and ->decrypt() functions so that they can be reused by the
ARM NEON implementation of Speck-XTS.  The generic key expansion code
will be reused because it is not performance-critical and is not
vectorizable, while the generic encryption and decryption functions are
needed as fallbacks and for the XTS tweak encryption.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/speck.c         | 90 +++++++++++++++++++++++++++-----------------------
 include/crypto/speck.h | 62 ++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+), 41 deletions(-)
 create mode 100644 include/crypto/speck.h

diff --git a/crypto/speck.c b/crypto/speck.c
index 4e80ad76bcd7..58aa9f7f91f7 100644
--- a/crypto/speck.c
+++ b/crypto/speck.c
@@ -24,6 +24,7 @@
  */
 
 #include <asm/unaligned.h>
+#include <crypto/speck.h>
 #include <linux/bitops.h>
 #include <linux/crypto.h>
 #include <linux/init.h>
@@ -31,22 +32,6 @@
 
 /* Speck128 */
 
-#define SPECK128_BLOCK_SIZE	16
-
-#define SPECK128_128_KEY_SIZE	16
-#define SPECK128_128_NROUNDS	32
-
-#define SPECK128_192_KEY_SIZE	24
-#define SPECK128_192_NROUNDS	33
-
-#define SPECK128_256_KEY_SIZE	32
-#define SPECK128_256_NROUNDS	34
-
-struct speck128_tfm_ctx {
-	u64 round_keys[SPECK128_256_NROUNDS];
-	int nrounds;
-};
-
 static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
 {
 	*x = ror64(*x, 8);
@@ -65,9 +50,9 @@ static __always_inline void speck128_unround(u64 *x, u64 *y, u64 k)
 	*x = rol64(*x, 8);
 }
 
-static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in)
 {
-	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u64 y = get_unaligned_le64(in);
 	u64 x = get_unaligned_le64(in + 8);
 	int i;
@@ -78,10 +63,16 @@ static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le64(y, out);
 	put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_encrypt);
 
-static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck128_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in)
 {
-	const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u64 y = get_unaligned_le64(in);
 	u64 x = get_unaligned_le64(in + 8);
 	int i;
@@ -92,11 +83,16 @@ static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le64(y, out);
 	put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_decrypt);
 
-static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck128_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
 			   unsigned int keylen)
 {
-	struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u64 l[3];
 	u64 k;
 	int i;
@@ -138,21 +134,15 @@ static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_setkey);
 
-/* Speck64 */
-
-#define SPECK64_BLOCK_SIZE	8
-
-#define SPECK64_96_KEY_SIZE	12
-#define SPECK64_96_NROUNDS	26
-
-#define SPECK64_128_KEY_SIZE	16
-#define SPECK64_128_NROUNDS	27
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+			   unsigned int keylen)
+{
+	return crypto_speck128_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
 
-struct speck64_tfm_ctx {
-	u32 round_keys[SPECK64_128_NROUNDS];
-	int nrounds;
-};
+/* Speck64 */
 
 static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
 {
@@ -172,9 +162,9 @@ static __always_inline void speck64_unround(u32 *x, u32 *y, u32 k)
 	*x = rol32(*x, 8);
 }
 
-static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in)
 {
-	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u32 y = get_unaligned_le32(in);
 	u32 x = get_unaligned_le32(in + 4);
 	int i;
@@ -185,10 +175,16 @@ static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le32(y, out);
 	put_unaligned_le32(x, out + 4);
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_encrypt);
 
-static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck64_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in)
 {
-	const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u32 y = get_unaligned_le32(in);
 	u32 x = get_unaligned_le32(in + 4);
 	int i;
@@ -199,11 +195,16 @@ static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 	put_unaligned_le32(y, out);
 	put_unaligned_le32(x, out + 4);
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_decrypt);
 
-static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	crypto_speck64_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key,
 			  unsigned int keylen)
 {
-	struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
 	u32 l[3];
 	u32 k;
 	int i;
@@ -236,6 +237,13 @@ static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_setkey);
+
+static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+			  unsigned int keylen)
+{
+	return crypto_speck64_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
 
 /* Algorithm definitions */
 
diff --git a/include/crypto/speck.h b/include/crypto/speck.h
new file mode 100644
index 000000000000..73cfc952d405
--- /dev/null
+++ b/include/crypto/speck.h
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common values for the Speck algorithm
+ */
+
+#ifndef _CRYPTO_SPECK_H
+#define _CRYPTO_SPECK_H
+
+#include <linux/types.h>
+
+/* Speck128 */
+
+#define SPECK128_BLOCK_SIZE	16
+
+#define SPECK128_128_KEY_SIZE	16
+#define SPECK128_128_NROUNDS	32
+
+#define SPECK128_192_KEY_SIZE	24
+#define SPECK128_192_NROUNDS	33
+
+#define SPECK128_256_KEY_SIZE	32
+#define SPECK128_256_NROUNDS	34
+
+struct speck128_tfm_ctx {
+	u64 round_keys[SPECK128_256_NROUNDS];
+	int nrounds;
+};
+
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in);
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+			     u8 *out, const u8 *in);
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
+			   unsigned int keysize);
+
+/* Speck64 */
+
+#define SPECK64_BLOCK_SIZE	8
+
+#define SPECK64_96_KEY_SIZE	12
+#define SPECK64_96_NROUNDS	26
+
+#define SPECK64_128_KEY_SIZE	16
+#define SPECK64_128_NROUNDS	27
+
+struct speck64_tfm_ctx {
+	u32 round_keys[SPECK64_128_NROUNDS];
+	int nrounds;
+};
+
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in);
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+			    u8 *out, const u8 *in);
+
+int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key,
+			  unsigned int keysize);
+
+#endif /* _CRYPTO_SPECK_H */
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  2018-02-12 23:52 ` Eric Biggers
@ 2018-02-12 23:52   ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, Eric Biggers

Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.

The performance depends on the processor but can be about 3 times faster
than the generic code.  For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:

    xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
    xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s

In comparison to AES-256-XTS without the Cryptography Extensions:

    xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
    xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
    xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s

Speck64/128-XTS is even faster:

    xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s

Note that as with the generic code, only the Speck128 and Speck64
variants are supported.  Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases.  The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback.  Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.

The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper.  Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/arm/crypto/Kconfig           |   6 +
 arch/arm/crypto/Makefile          |   2 +
 arch/arm/crypto/speck-neon-core.S | 432 ++++++++++++++++++++++++++++++++++++++
 arch/arm/crypto/speck-neon-glue.c | 290 +++++++++++++++++++++++++
 4 files changed, 730 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index b8e69fe282b8..925d1364727a 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
 	select CRYPTO_BLKCIPHER
 	select CRYPTO_CHACHA20
 
+config CRYPTO_SPECK_NEON
+	tristate "NEON accelerated Speck cipher algorithms"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_SPECK
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 30ef8e291271..a758107c5525 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -53,6 +54,7 @@ ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+speck-neon-y := speck-neon-core.o speck-neon-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
new file mode 100644
index 000000000000..3c1e203e53b9
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -0,0 +1,432 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Author: Eric Biggers <ebiggers@google.com>
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.fpu		neon
+
+	// arguments
+	ROUND_KEYS	.req	r0	// const {u64,u32} *round_keys
+	NROUNDS		.req	r1	// int nrounds
+	DST		.req	r2	// void *dst
+	SRC		.req	r3	// const void *src
+	NBYTES		.req	r4	// unsigned int nbytes
+	TWEAK		.req	r5	// void *tweak
+
+	// registers which hold the data being encrypted/decrypted
+	X0		.req	q0
+	X0_L		.req	d0
+	X0_H		.req	d1
+	Y0		.req	q1
+	Y0_H		.req	d3
+	X1		.req	q2
+	X1_L		.req	d4
+	X1_H		.req	d5
+	Y1		.req	q3
+	Y1_H		.req	d7
+	X2		.req	q4
+	X2_L		.req	d8
+	X2_H		.req	d9
+	Y2		.req	q5
+	Y2_H		.req	d11
+	X3		.req	q6
+	X3_L		.req	d12
+	X3_H		.req	d13
+	Y3		.req	q7
+	Y3_H		.req	d15
+
+	// the round key, duplicated in all lanes
+	ROUND_KEY	.req	q8
+	ROUND_KEY_L	.req	d16
+	ROUND_KEY_H	.req	d17
+
+	// index vector for vtbl-based 8-bit rotates
+	ROTATE_TABLE	.req	d18
+
+	// multiplication table for updating XTS tweaks
+	GF128MUL_TABLE	.req	d19
+	GF64MUL_TABLE	.req	d19
+
+	// current XTS tweak value(s)
+	TWEAKV		.req	q10
+	TWEAKV_L	.req	d20
+	TWEAKV_H	.req	d21
+
+	TMP0		.req	q12
+	TMP0_L		.req	d24
+	TMP0_H		.req	d25
+	TMP1		.req	q13
+	TMP2		.req	q14
+	TMP3		.req	q15
+
+	.align		4
+.Lror64_8_table:
+	.byte		1, 2, 3, 4, 5, 6, 7, 0
+.Lror32_8_table:
+	.byte		1, 2, 3, 0, 5, 6, 7, 4
+.Lrol64_8_table:
+	.byte		7, 0, 1, 2, 3, 4, 5, 6
+.Lrol32_8_table:
+	.byte		3, 0, 1, 2, 7, 4, 5, 6
+.Lgf128mul_table:
+	.byte		0, 0x87
+	.fill		14
+.Lgf64mul_table:
+	.byte		0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
+	.fill		12
+
+/*
+ * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
+ *
+ * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
+ * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
+ * of ROUND_KEY.  'n' is the lane size: 64 for Speck128, or 32 for Speck64.
+ *
+ * The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
+ * the vtbl approach is faster on some processors and the same speed on others.
+ */
+.macro _speck_round_128bytes	n
+
+	// x = ror(x, 8)
+	vtbl.8		X0_L, {X0_L}, ROTATE_TABLE
+	vtbl.8		X0_H, {X0_H}, ROTATE_TABLE
+	vtbl.8		X1_L, {X1_L}, ROTATE_TABLE
+	vtbl.8		X1_H, {X1_H}, ROTATE_TABLE
+	vtbl.8		X2_L, {X2_L}, ROTATE_TABLE
+	vtbl.8		X2_H, {X2_H}, ROTATE_TABLE
+	vtbl.8		X3_L, {X3_L}, ROTATE_TABLE
+	vtbl.8		X3_H, {X3_H}, ROTATE_TABLE
+
+	// x += y
+	vadd.u\n	X0, Y0
+	vadd.u\n	X1, Y1
+	vadd.u\n	X2, Y2
+	vadd.u\n	X3, Y3
+
+	// x ^= k
+	veor		X0, ROUND_KEY
+	veor		X1, ROUND_KEY
+	veor		X2, ROUND_KEY
+	veor		X3, ROUND_KEY
+
+	// y = rol(y, 3)
+	vshl.u\n	TMP0, Y0, #3
+	vshl.u\n	TMP1, Y1, #3
+	vshl.u\n	TMP2, Y2, #3
+	vshl.u\n	TMP3, Y3, #3
+	vsri.u\n	TMP0, Y0, #(\n - 3)
+	vsri.u\n	TMP1, Y1, #(\n - 3)
+	vsri.u\n	TMP2, Y2, #(\n - 3)
+	vsri.u\n	TMP3, Y3, #(\n - 3)
+
+	// y ^= x
+	veor		Y0, TMP0, X0
+	veor		Y1, TMP1, X1
+	veor		Y2, TMP2, X2
+	veor		Y3, TMP3, X3
+.endm
+
+/*
+ * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
+ *
+ * This is the inverse of _speck_round_128bytes().
+ */
+.macro _speck_unround_128bytes	n
+
+	// y ^= x
+	veor		TMP0, Y0, X0
+	veor		TMP1, Y1, X1
+	veor		TMP2, Y2, X2
+	veor		TMP3, Y3, X3
+
+	// y = ror(y, 3)
+	vshr.u\n	Y0, TMP0, #3
+	vshr.u\n	Y1, TMP1, #3
+	vshr.u\n	Y2, TMP2, #3
+	vshr.u\n	Y3, TMP3, #3
+	vsli.u\n	Y0, TMP0, #(\n - 3)
+	vsli.u\n	Y1, TMP1, #(\n - 3)
+	vsli.u\n	Y2, TMP2, #(\n - 3)
+	vsli.u\n	Y3, TMP3, #(\n - 3)
+
+	// x ^= k
+	veor		X0, ROUND_KEY
+	veor		X1, ROUND_KEY
+	veor		X2, ROUND_KEY
+	veor		X3, ROUND_KEY
+
+	// x -= y
+	vsub.u\n	X0, Y0
+	vsub.u\n	X1, Y1
+	vsub.u\n	X2, Y2
+	vsub.u\n	X3, Y3
+
+	// x = rol(x, 8);
+	vtbl.8		X0_L, {X0_L}, ROTATE_TABLE
+	vtbl.8		X0_H, {X0_H}, ROTATE_TABLE
+	vtbl.8		X1_L, {X1_L}, ROTATE_TABLE
+	vtbl.8		X1_H, {X1_H}, ROTATE_TABLE
+	vtbl.8		X2_L, {X2_L}, ROTATE_TABLE
+	vtbl.8		X2_H, {X2_H}, ROTATE_TABLE
+	vtbl.8		X3_L, {X3_L}, ROTATE_TABLE
+	vtbl.8		X3_H, {X3_H}, ROTATE_TABLE
+.endm
+
+.macro _xts128_precrypt_one	dst_reg, tweak_buf, tmp
+
+	// Load the next source block
+	vld1.8		{\dst_reg}, [SRC]!
+
+	// Save the current tweak in the tweak buffer
+	vst1.8		{TWEAKV}, [\tweak_buf:128]!
+
+	// XOR the next source block with the current tweak
+	veor		\dst_reg, TWEAKV
+
+	/*
+	 * Calculate the next tweak by multiplying the current one by x,
+	 * modulo p(x) = x^128 + x^7 + x^2 + x + 1.
+	 */
+	vshr.u64	\tmp, TWEAKV, #63
+	vshl.u64	TWEAKV, #1
+	veor		TWEAKV_H, \tmp\()_L
+	vtbl.8		\tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
+	veor		TWEAKV_L, \tmp\()_H
+.endm
+
+.macro _xts64_precrypt_two	dst_reg, tweak_buf, tmp
+
+	// Load the next two source blocks
+	vld1.8		{\dst_reg}, [SRC]!
+
+	// Save the current two tweaks in the tweak buffer
+	vst1.8		{TWEAKV}, [\tweak_buf:128]!
+
+	// XOR the next two source blocks with the current two tweaks
+	veor		\dst_reg, TWEAKV
+
+	/*
+	 * Calculate the next two tweaks by multiplying the current ones by x^2,
+	 * modulo p(x) = x^64 + x^4 + x^3 + x + 1.
+	 */
+	vshr.u64	\tmp, TWEAKV, #62
+	vshl.u64	TWEAKV, #2
+	vtbl.8		\tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
+	vtbl.8		\tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
+	veor		TWEAKV, \tmp
+.endm
+
+/*
+ * _speck_xts_crypt() - Speck-XTS encryption/decryption
+ *
+ * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
+ * using Speck-XTS, specifically the variant with a block size of '2n' and round
+ * count given by NROUNDS.  The expanded round keys are given in ROUND_KEYS, and
+ * the current XTS tweak value is given in TWEAK.  It's assumed that NBYTES is a
+ * nonzero multiple of 128.
+ */
+.macro _speck_xts_crypt	n, decrypting
+	push		{r4-r7}
+	mov		r7, sp
+
+	/*
+	 * The first four parameters were passed in registers r0-r3.  Load the
+	 * additional parameters, which were passed on the stack.
+	 */
+	ldr		NBYTES, [sp, #16]
+	ldr		TWEAK, [sp, #20]
+
+	/*
+	 * If decrypting, modify the ROUND_KEYS parameter to point to the last
+	 * round key rather than the first, since for decryption the round keys
+	 * are used in reverse order.
+	 */
+.if \decrypting
+.if \n == 64
+	add		ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
+	sub		ROUND_KEYS, #8
+.else
+	add		ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
+	sub		ROUND_KEYS, #4
+.endif
+.endif
+
+	// Load the index vector for vtbl-based 8-bit rotates
+.if \decrypting
+	ldr		r12, =.Lrol\n\()_8_table
+.else
+	ldr		r12, =.Lror\n\()_8_table
+.endif
+	vld1.8		{ROTATE_TABLE}, [r12:64]
+
+	// One-time XTS preparation
+
+	/*
+	 * Allocate stack space to store 128 bytes worth of tweaks.  For
+	 * performance, this space is aligned to a 16-byte boundary so that we
+	 * can use the load/store instructions that declare 16-byte alignment.
+	 */
+	sub		sp, #128
+	bic		sp, #0xf
+
+.if \n == 64
+	// Load first tweak
+	vld1.8		{TWEAKV}, [TWEAK]
+
+	// Load GF(2^128) multiplication table
+	ldr		r12, =.Lgf128mul_table
+	vld1.8		{GF128MUL_TABLE}, [r12:64]
+.else
+	// Load first tweak
+	vld1.8		{TWEAKV_L}, [TWEAK]
+
+	// Load GF(2^64) multiplication table
+	ldr		r12, =.Lgf64mul_table
+	vld1.8		{GF64MUL_TABLE}, [r12:64]
+
+	// Calculate second tweak, packing it together with the first
+	vshr.u64	TMP0_L, TWEAKV_L, #63
+	vtbl.u8		TMP0_L, {GF64MUL_TABLE}, TMP0_L
+	vshl.u64	TWEAKV_H, TWEAKV_L, #1
+	veor		TWEAKV_H, TMP0_L
+.endif
+
+.Lnext_128bytes_\@:
+
+	/*
+	 * Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
+	 * values, and save the tweaks on the stack for later.  Then
+	 * de-interleave the 'x' and 'y' elements of each block, i.e. make it so
+	 * that the X[0-3] registers contain only the second halves of blocks,
+	 * and the Y[0-3] registers contain only the first halves of blocks.
+	 * (Speck uses the order (y, x) rather than the more intuitive (x, y).)
+	 */
+	mov		r12, sp
+.if \n == 64
+	_xts128_precrypt_one	X0, r12, TMP0
+	_xts128_precrypt_one	Y0, r12, TMP0
+	_xts128_precrypt_one	X1, r12, TMP0
+	_xts128_precrypt_one	Y1, r12, TMP0
+	_xts128_precrypt_one	X2, r12, TMP0
+	_xts128_precrypt_one	Y2, r12, TMP0
+	_xts128_precrypt_one	X3, r12, TMP0
+	_xts128_precrypt_one	Y3, r12, TMP0
+	vswp		X0_L, Y0_H
+	vswp		X1_L, Y1_H
+	vswp		X2_L, Y2_H
+	vswp		X3_L, Y3_H
+.else
+	_xts64_precrypt_two	X0, r12, TMP0
+	_xts64_precrypt_two	Y0, r12, TMP0
+	_xts64_precrypt_two	X1, r12, TMP0
+	_xts64_precrypt_two	Y1, r12, TMP0
+	_xts64_precrypt_two	X2, r12, TMP0
+	_xts64_precrypt_two	Y2, r12, TMP0
+	_xts64_precrypt_two	X3, r12, TMP0
+	_xts64_precrypt_two	Y3, r12, TMP0
+	vuzp.32		Y0, X0
+	vuzp.32		Y1, X1
+	vuzp.32		Y2, X2
+	vuzp.32		Y3, X3
+.endif
+
+	// Do the cipher rounds
+
+	mov		r12, ROUND_KEYS
+	mov		r6, NROUNDS
+
+.Lnext_round_\@:
+.if \decrypting
+.if \n == 64
+	vld1.64		ROUND_KEY_L, [r12]
+	sub		r12, #8
+	vmov		ROUND_KEY_H, ROUND_KEY_L
+.else
+	vld1.32		{ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
+	sub		r12, #4
+.endif
+	_speck_unround_128bytes	\n
+.else
+.if \n == 64
+	vld1.64		ROUND_KEY_L, [r12]!
+	vmov		ROUND_KEY_H, ROUND_KEY_L
+.else
+	vld1.32		{ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
+.endif
+	_speck_round_128bytes	\n
+.endif
+	subs		r6, r6, #1
+	bne		.Lnext_round_\@
+
+	// Re-interleave the 'x' and 'y' elements of each block
+.if \n == 64
+	vswp		X0_L, Y0_H
+	vswp		X1_L, Y1_H
+	vswp		X2_L, Y2_H
+	vswp		X3_L, Y3_H
+.else
+	vzip.32		Y0, X0
+	vzip.32		Y1, X1
+	vzip.32		Y2, X2
+	vzip.32		Y3, X3
+.endif
+
+	// XOR the encrypted/decrypted blocks with the tweaks we saved earlier
+	mov		r12, sp
+	vld1.8		{TMP0, TMP1}, [r12:128]!
+	vld1.8		{TMP2, TMP3}, [r12:128]!
+	veor		X0, TMP0
+	veor		Y0, TMP1
+	veor		X1, TMP2
+	veor		Y1, TMP3
+	vld1.8		{TMP0, TMP1}, [r12:128]!
+	vld1.8		{TMP2, TMP3}, [r12:128]!
+	veor		X2, TMP0
+	veor		Y2, TMP1
+	veor		X3, TMP2
+	veor		Y3, TMP3
+
+	// Store the ciphertext in the destination buffer
+	vst1.8		{X0, Y0}, [DST]!
+	vst1.8		{X1, Y1}, [DST]!
+	vst1.8		{X2, Y2}, [DST]!
+	vst1.8		{X3, Y3}, [DST]!
+
+	// Continue if there are more 128-byte chunks remaining, else return
+	subs		NBYTES, #128
+	bne		.Lnext_128bytes_\@
+
+	// Store the next tweak
+.if \n == 64
+	vst1.8		{TWEAKV}, [TWEAK]
+.else
+	vst1.8		{TWEAKV_L}, [TWEAK]
+.endif
+
+	mov		sp, r7
+	pop		{r4-r7}
+	bx		lr
+.endm
+
+ENTRY(speck128_xts_encrypt_neon)
+	_speck_xts_crypt	n=64, decrypting=0
+ENDPROC(speck128_xts_encrypt_neon)
+
+ENTRY(speck128_xts_decrypt_neon)
+	_speck_xts_crypt	n=64, decrypting=1
+ENDPROC(speck128_xts_decrypt_neon)
+
+ENTRY(speck64_xts_encrypt_neon)
+	_speck_xts_crypt	n=32, decrypting=0
+ENDPROC(speck64_xts_encrypt_neon)
+
+ENTRY(speck64_xts_decrypt_neon)
+	_speck_xts_crypt	n=32, decrypting=1
+ENDPROC(speck64_xts_decrypt_neon)
diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
new file mode 100644
index 000000000000..3987dd6e063e
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-glue.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Note: the NIST recommendation for XTS only specifies a 128-bit block size,
+ * but a 64-bit version (needed for Speck64) is fairly straightforward; the math
+ * is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
+ * x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
+ * "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
+ * OCB and PMAC"), represented as 0x1B.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <crypto/algapi.h>
+#include <crypto/gf128mul.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/speck.h>
+#include <crypto/xts.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* The assembly functions only handle multiples of 128 bytes */
+#define SPECK_NEON_CHUNK_SIZE	128
+
+/* Speck128 */
+
+struct speck128_xts_tfm_ctx {
+	struct speck128_tfm_ctx main_key;
+	struct speck128_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
+					  void *dst, const void *src,
+					  unsigned int nbytes, void *tweak);
+
+asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
+					  void *dst, const void *src,
+					  unsigned int nbytes, void *tweak);
+
+typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
+				     u8 *, const u8 *);
+typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
+					  const void *, unsigned int, void *);
+
+
+static __always_inline int
+__speck128_xts_crypt(struct skcipher_request *req,
+		     speck128_crypt_one_t crypt_one,
+		     speck128_xts_crypt_many_t crypt_many)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	le128 tweak;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+		u8 *dst = walk.dst.virt.addr;
+		const u8 *src = walk.src.virt.addr;
+
+		if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+			unsigned int count;
+
+			count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+			kernel_neon_begin();
+			(*crypt_many)(ctx->main_key.round_keys,
+				      ctx->main_key.nrounds,
+				      dst, src, count, &tweak);
+			kernel_neon_end();
+			dst += count;
+			src += count;
+			nbytes -= count;
+		}
+
+		/* Handle any remainder with generic code */
+		while (nbytes >= sizeof(le128)) {
+			le128_xor((le128 *)dst, (const le128 *)src, &tweak);
+			(*crypt_one)(&ctx->main_key, dst, dst);
+			le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
+			gf128mul_x_ble(&tweak, &tweak);
+
+			dst += sizeof(le128);
+			src += sizeof(le128);
+			nbytes -= sizeof(le128);
+		}
+		err = skcipher_walk_done(&walk, nbytes);
+	}
+
+	return err;
+}
+
+static int speck128_xts_encrypt(struct skcipher_request *req)
+{
+	return __speck128_xts_crypt(req, crypto_speck128_encrypt,
+				    speck128_xts_encrypt_neon);
+
+}
+
+static int speck128_xts_decrypt(struct skcipher_request *req)
+{
+	return __speck128_xts_crypt(req, crypto_speck128_decrypt,
+				    speck128_xts_decrypt_neon);
+}
+
+static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			       unsigned int keylen)
+{
+	struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
+	if (err)
+		return err;
+
+	return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+/* Speck64 */
+
+struct speck64_xts_tfm_ctx {
+	struct speck64_tfm_ctx main_key;
+	struct speck64_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
+					 void *dst, const void *src,
+					 unsigned int nbytes, void *tweak);
+
+asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
+					 void *dst, const void *src,
+					 unsigned int nbytes, void *tweak);
+
+typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
+				    u8 *, const u8 *);
+typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
+					 const void *, unsigned int, void *);
+
+static __always_inline int
+__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
+		    speck64_xts_crypt_many_t crypt_many)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	u64 tweak;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+		u8 *dst = walk.dst.virt.addr;
+		const u8 *src = walk.src.virt.addr;
+
+		if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+			unsigned int count;
+
+			count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+			kernel_neon_begin();
+			(*crypt_many)(ctx->main_key.round_keys,
+				      ctx->main_key.nrounds,
+				      dst, src, count, &tweak);
+			kernel_neon_end();
+			dst += count;
+			src += count;
+			nbytes -= count;
+		}
+
+		/* Handle any remainder with generic code */
+		while (nbytes >= sizeof(u64)) {
+			*(u64 *)dst = *(u64 *)src ^ tweak;
+			(*crypt_one)(&ctx->main_key, dst, dst);
+			*(u64 *)dst ^= tweak;
+			tweak = (tweak << 1) ^
+				((tweak & (1ULL << 63)) ? 0x1B : 0);
+
+			dst += sizeof(u64);
+			src += sizeof(u64);
+			nbytes -= sizeof(u64);
+		}
+		err = skcipher_walk_done(&walk, nbytes);
+	}
+
+	return err;
+}
+
+static int speck64_xts_encrypt(struct skcipher_request *req)
+{
+	return __speck64_xts_crypt(req, crypto_speck64_encrypt,
+				   speck64_xts_encrypt_neon);
+}
+
+static int speck64_xts_decrypt(struct skcipher_request *req)
+{
+	return __speck64_xts_crypt(req, crypto_speck64_decrypt,
+				   speck64_xts_decrypt_neon);
+}
+
+static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			      unsigned int keylen)
+{
+	struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
+	if (err)
+		return err;
+
+	return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+static struct skcipher_alg speck_algs[] = {
+	{
+		.base.cra_name		= "xts(speck128)",
+		.base.cra_driver_name	= "xts-speck128-neon",
+		.base.cra_priority	= 300,
+		.base.cra_blocksize	= SPECK128_BLOCK_SIZE,
+		.base.cra_ctxsize	= sizeof(struct speck128_xts_tfm_ctx),
+		.base.cra_alignmask	= 7,
+		.base.cra_module	= THIS_MODULE,
+		.min_keysize		= 2 * SPECK128_128_KEY_SIZE,
+		.max_keysize		= 2 * SPECK128_256_KEY_SIZE,
+		.ivsize			= SPECK128_BLOCK_SIZE,
+		.walksize		= SPECK_NEON_CHUNK_SIZE,
+		.setkey			= speck128_xts_setkey,
+		.encrypt		= speck128_xts_encrypt,
+		.decrypt		= speck128_xts_decrypt,
+	}, {
+		.base.cra_name		= "xts(speck64)",
+		.base.cra_driver_name	= "xts-speck64-neon",
+		.base.cra_priority	= 300,
+		.base.cra_blocksize	= SPECK64_BLOCK_SIZE,
+		.base.cra_ctxsize	= sizeof(struct speck64_xts_tfm_ctx),
+		.base.cra_alignmask	= 7,
+		.base.cra_module	= THIS_MODULE,
+		.min_keysize		= 2 * SPECK64_96_KEY_SIZE,
+		.max_keysize		= 2 * SPECK64_128_KEY_SIZE,
+		.ivsize			= SPECK64_BLOCK_SIZE,
+		.walksize		= SPECK_NEON_CHUNK_SIZE,
+		.setkey			= speck64_xts_setkey,
+		.encrypt		= speck64_xts_encrypt,
+		.decrypt		= speck64_xts_decrypt,
+	}
+};
+
+static int __init speck_neon_module_init(void)
+{
+	if (!(elf_hwcap & HWCAP_NEON))
+		return -ENODEV;
+	return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_neon_module_exit(void)
+{
+	crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_neon_module_init);
+module_exit(speck_neon_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("xts(speck128)");
+MODULE_ALIAS_CRYPTO("xts-speck128-neon");
+MODULE_ALIAS_CRYPTO("xts(speck64)");
+MODULE_ALIAS_CRYPTO("xts-speck64-neon");
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
@ 2018-02-12 23:52   ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-arm-kernel

Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.

The performance depends on the processor but can be about 3 times faster
than the generic code.  For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:

    xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
    xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s

In comparison to AES-256-XTS without the Cryptography Extensions:

    xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
    xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
    xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s

Speck64/128-XTS is even faster:

    xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s

Note that as with the generic code, only the Speck128 and Speck64
variants are supported.  Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases.  The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback.  Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.

The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper.  Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/arm/crypto/Kconfig           |   6 +
 arch/arm/crypto/Makefile          |   2 +
 arch/arm/crypto/speck-neon-core.S | 432 ++++++++++++++++++++++++++++++++++++++
 arch/arm/crypto/speck-neon-glue.c | 290 +++++++++++++++++++++++++
 4 files changed, 730 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index b8e69fe282b8..925d1364727a 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
 	select CRYPTO_BLKCIPHER
 	select CRYPTO_CHACHA20
 
+config CRYPTO_SPECK_NEON
+	tristate "NEON accelerated Speck cipher algorithms"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_SPECK
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 30ef8e291271..a758107c5525 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -53,6 +54,7 @@ ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+speck-neon-y := speck-neon-core.o speck-neon-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
new file mode 100644
index 000000000000..3c1e203e53b9
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -0,0 +1,432 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Author: Eric Biggers <ebiggers@google.com>
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.fpu		neon
+
+	// arguments
+	ROUND_KEYS	.req	r0	// const {u64,u32} *round_keys
+	NROUNDS		.req	r1	// int nrounds
+	DST		.req	r2	// void *dst
+	SRC		.req	r3	// const void *src
+	NBYTES		.req	r4	// unsigned int nbytes
+	TWEAK		.req	r5	// void *tweak
+
+	// registers which hold the data being encrypted/decrypted
+	X0		.req	q0
+	X0_L		.req	d0
+	X0_H		.req	d1
+	Y0		.req	q1
+	Y0_H		.req	d3
+	X1		.req	q2
+	X1_L		.req	d4
+	X1_H		.req	d5
+	Y1		.req	q3
+	Y1_H		.req	d7
+	X2		.req	q4
+	X2_L		.req	d8
+	X2_H		.req	d9
+	Y2		.req	q5
+	Y2_H		.req	d11
+	X3		.req	q6
+	X3_L		.req	d12
+	X3_H		.req	d13
+	Y3		.req	q7
+	Y3_H		.req	d15
+
+	// the round key, duplicated in all lanes
+	ROUND_KEY	.req	q8
+	ROUND_KEY_L	.req	d16
+	ROUND_KEY_H	.req	d17
+
+	// index vector for vtbl-based 8-bit rotates
+	ROTATE_TABLE	.req	d18
+
+	// multiplication table for updating XTS tweaks
+	GF128MUL_TABLE	.req	d19
+	GF64MUL_TABLE	.req	d19
+
+	// current XTS tweak value(s)
+	TWEAKV		.req	q10
+	TWEAKV_L	.req	d20
+	TWEAKV_H	.req	d21
+
+	TMP0		.req	q12
+	TMP0_L		.req	d24
+	TMP0_H		.req	d25
+	TMP1		.req	q13
+	TMP2		.req	q14
+	TMP3		.req	q15
+
+	.align		4
+.Lror64_8_table:
+	.byte		1, 2, 3, 4, 5, 6, 7, 0
+.Lror32_8_table:
+	.byte		1, 2, 3, 0, 5, 6, 7, 4
+.Lrol64_8_table:
+	.byte		7, 0, 1, 2, 3, 4, 5, 6
+.Lrol32_8_table:
+	.byte		3, 0, 1, 2, 7, 4, 5, 6
+.Lgf128mul_table:
+	.byte		0, 0x87
+	.fill		14
+.Lgf64mul_table:
+	.byte		0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
+	.fill		12
+
+/*
+ * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
+ *
+ * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
+ * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
+ * of ROUND_KEY.  'n' is the lane size: 64 for Speck128, or 32 for Speck64.
+ *
+ * The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
+ * the vtbl approach is faster on some processors and the same speed on others.
+ */
+.macro _speck_round_128bytes	n
+
+	// x = ror(x, 8)
+	vtbl.8		X0_L, {X0_L}, ROTATE_TABLE
+	vtbl.8		X0_H, {X0_H}, ROTATE_TABLE
+	vtbl.8		X1_L, {X1_L}, ROTATE_TABLE
+	vtbl.8		X1_H, {X1_H}, ROTATE_TABLE
+	vtbl.8		X2_L, {X2_L}, ROTATE_TABLE
+	vtbl.8		X2_H, {X2_H}, ROTATE_TABLE
+	vtbl.8		X3_L, {X3_L}, ROTATE_TABLE
+	vtbl.8		X3_H, {X3_H}, ROTATE_TABLE
+
+	// x += y
+	vadd.u\n	X0, Y0
+	vadd.u\n	X1, Y1
+	vadd.u\n	X2, Y2
+	vadd.u\n	X3, Y3
+
+	// x ^= k
+	veor		X0, ROUND_KEY
+	veor		X1, ROUND_KEY
+	veor		X2, ROUND_KEY
+	veor		X3, ROUND_KEY
+
+	// y = rol(y, 3)
+	vshl.u\n	TMP0, Y0, #3
+	vshl.u\n	TMP1, Y1, #3
+	vshl.u\n	TMP2, Y2, #3
+	vshl.u\n	TMP3, Y3, #3
+	vsri.u\n	TMP0, Y0, #(\n - 3)
+	vsri.u\n	TMP1, Y1, #(\n - 3)
+	vsri.u\n	TMP2, Y2, #(\n - 3)
+	vsri.u\n	TMP3, Y3, #(\n - 3)
+
+	// y ^= x
+	veor		Y0, TMP0, X0
+	veor		Y1, TMP1, X1
+	veor		Y2, TMP2, X2
+	veor		Y3, TMP3, X3
+.endm
+
+/*
+ * _speck_unround_128bytes() - Speck decryption round on 128 bytes@a time
+ *
+ * This is the inverse of _speck_round_128bytes().
+ */
+.macro _speck_unround_128bytes	n
+
+	// y ^= x
+	veor		TMP0, Y0, X0
+	veor		TMP1, Y1, X1
+	veor		TMP2, Y2, X2
+	veor		TMP3, Y3, X3
+
+	// y = ror(y, 3)
+	vshr.u\n	Y0, TMP0, #3
+	vshr.u\n	Y1, TMP1, #3
+	vshr.u\n	Y2, TMP2, #3
+	vshr.u\n	Y3, TMP3, #3
+	vsli.u\n	Y0, TMP0, #(\n - 3)
+	vsli.u\n	Y1, TMP1, #(\n - 3)
+	vsli.u\n	Y2, TMP2, #(\n - 3)
+	vsli.u\n	Y3, TMP3, #(\n - 3)
+
+	// x ^= k
+	veor		X0, ROUND_KEY
+	veor		X1, ROUND_KEY
+	veor		X2, ROUND_KEY
+	veor		X3, ROUND_KEY
+
+	// x -= y
+	vsub.u\n	X0, Y0
+	vsub.u\n	X1, Y1
+	vsub.u\n	X2, Y2
+	vsub.u\n	X3, Y3
+
+	// x = rol(x, 8);
+	vtbl.8		X0_L, {X0_L}, ROTATE_TABLE
+	vtbl.8		X0_H, {X0_H}, ROTATE_TABLE
+	vtbl.8		X1_L, {X1_L}, ROTATE_TABLE
+	vtbl.8		X1_H, {X1_H}, ROTATE_TABLE
+	vtbl.8		X2_L, {X2_L}, ROTATE_TABLE
+	vtbl.8		X2_H, {X2_H}, ROTATE_TABLE
+	vtbl.8		X3_L, {X3_L}, ROTATE_TABLE
+	vtbl.8		X3_H, {X3_H}, ROTATE_TABLE
+.endm
+
+.macro _xts128_precrypt_one	dst_reg, tweak_buf, tmp
+
+	// Load the next source block
+	vld1.8		{\dst_reg}, [SRC]!
+
+	// Save the current tweak in the tweak buffer
+	vst1.8		{TWEAKV}, [\tweak_buf:128]!
+
+	// XOR the next source block with the current tweak
+	veor		\dst_reg, TWEAKV
+
+	/*
+	 * Calculate the next tweak by multiplying the current one by x,
+	 * modulo p(x) = x^128 + x^7 + x^2 + x + 1.
+	 */
+	vshr.u64	\tmp, TWEAKV, #63
+	vshl.u64	TWEAKV, #1
+	veor		TWEAKV_H, \tmp\()_L
+	vtbl.8		\tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
+	veor		TWEAKV_L, \tmp\()_H
+.endm
+
+.macro _xts64_precrypt_two	dst_reg, tweak_buf, tmp
+
+	// Load the next two source blocks
+	vld1.8		{\dst_reg}, [SRC]!
+
+	// Save the current two tweaks in the tweak buffer
+	vst1.8		{TWEAKV}, [\tweak_buf:128]!
+
+	// XOR the next two source blocks with the current two tweaks
+	veor		\dst_reg, TWEAKV
+
+	/*
+	 * Calculate the next two tweaks by multiplying the current ones by x^2,
+	 * modulo p(x) = x^64 + x^4 + x^3 + x + 1.
+	 */
+	vshr.u64	\tmp, TWEAKV, #62
+	vshl.u64	TWEAKV, #2
+	vtbl.8		\tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
+	vtbl.8		\tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
+	veor		TWEAKV, \tmp
+.endm
+
+/*
+ * _speck_xts_crypt() - Speck-XTS encryption/decryption
+ *
+ * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
+ * using Speck-XTS, specifically the variant with a block size of '2n' and round
+ * count given by NROUNDS.  The expanded round keys are given in ROUND_KEYS, and
+ * the current XTS tweak value is given in TWEAK.  It's assumed that NBYTES is a
+ * nonzero multiple of 128.
+ */
+.macro _speck_xts_crypt	n, decrypting
+	push		{r4-r7}
+	mov		r7, sp
+
+	/*
+	 * The first four parameters were passed in registers r0-r3.  Load the
+	 * additional parameters, which were passed on the stack.
+	 */
+	ldr		NBYTES, [sp, #16]
+	ldr		TWEAK, [sp, #20]
+
+	/*
+	 * If decrypting, modify the ROUND_KEYS parameter to point to the last
+	 * round key rather than the first, since for decryption the round keys
+	 * are used in reverse order.
+	 */
+.if \decrypting
+.if \n == 64
+	add		ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
+	sub		ROUND_KEYS, #8
+.else
+	add		ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
+	sub		ROUND_KEYS, #4
+.endif
+.endif
+
+	// Load the index vector for vtbl-based 8-bit rotates
+.if \decrypting
+	ldr		r12, =.Lrol\n\()_8_table
+.else
+	ldr		r12, =.Lror\n\()_8_table
+.endif
+	vld1.8		{ROTATE_TABLE}, [r12:64]
+
+	// One-time XTS preparation
+
+	/*
+	 * Allocate stack space to store 128 bytes worth of tweaks.  For
+	 * performance, this space is aligned to a 16-byte boundary so that we
+	 * can use the load/store instructions that declare 16-byte alignment.
+	 */
+	sub		sp, #128
+	bic		sp, #0xf
+
+.if \n == 64
+	// Load first tweak
+	vld1.8		{TWEAKV}, [TWEAK]
+
+	// Load GF(2^128) multiplication table
+	ldr		r12, =.Lgf128mul_table
+	vld1.8		{GF128MUL_TABLE}, [r12:64]
+.else
+	// Load first tweak
+	vld1.8		{TWEAKV_L}, [TWEAK]
+
+	// Load GF(2^64) multiplication table
+	ldr		r12, =.Lgf64mul_table
+	vld1.8		{GF64MUL_TABLE}, [r12:64]
+
+	// Calculate second tweak, packing it together with the first
+	vshr.u64	TMP0_L, TWEAKV_L, #63
+	vtbl.u8		TMP0_L, {GF64MUL_TABLE}, TMP0_L
+	vshl.u64	TWEAKV_H, TWEAKV_L, #1
+	veor		TWEAKV_H, TMP0_L
+.endif
+
+.Lnext_128bytes_\@:
+
+	/*
+	 * Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
+	 * values, and save the tweaks on the stack for later.  Then
+	 * de-interleave the 'x' and 'y' elements of each block, i.e. make it so
+	 * that the X[0-3] registers contain only the second halves of blocks,
+	 * and the Y[0-3] registers contain only the first halves of blocks.
+	 * (Speck uses the order (y, x) rather than the more intuitive (x, y).)
+	 */
+	mov		r12, sp
+.if \n == 64
+	_xts128_precrypt_one	X0, r12, TMP0
+	_xts128_precrypt_one	Y0, r12, TMP0
+	_xts128_precrypt_one	X1, r12, TMP0
+	_xts128_precrypt_one	Y1, r12, TMP0
+	_xts128_precrypt_one	X2, r12, TMP0
+	_xts128_precrypt_one	Y2, r12, TMP0
+	_xts128_precrypt_one	X3, r12, TMP0
+	_xts128_precrypt_one	Y3, r12, TMP0
+	vswp		X0_L, Y0_H
+	vswp		X1_L, Y1_H
+	vswp		X2_L, Y2_H
+	vswp		X3_L, Y3_H
+.else
+	_xts64_precrypt_two	X0, r12, TMP0
+	_xts64_precrypt_two	Y0, r12, TMP0
+	_xts64_precrypt_two	X1, r12, TMP0
+	_xts64_precrypt_two	Y1, r12, TMP0
+	_xts64_precrypt_two	X2, r12, TMP0
+	_xts64_precrypt_two	Y2, r12, TMP0
+	_xts64_precrypt_two	X3, r12, TMP0
+	_xts64_precrypt_two	Y3, r12, TMP0
+	vuzp.32		Y0, X0
+	vuzp.32		Y1, X1
+	vuzp.32		Y2, X2
+	vuzp.32		Y3, X3
+.endif
+
+	// Do the cipher rounds
+
+	mov		r12, ROUND_KEYS
+	mov		r6, NROUNDS
+
+.Lnext_round_\@:
+.if \decrypting
+.if \n == 64
+	vld1.64		ROUND_KEY_L, [r12]
+	sub		r12, #8
+	vmov		ROUND_KEY_H, ROUND_KEY_L
+.else
+	vld1.32		{ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
+	sub		r12, #4
+.endif
+	_speck_unround_128bytes	\n
+.else
+.if \n == 64
+	vld1.64		ROUND_KEY_L, [r12]!
+	vmov		ROUND_KEY_H, ROUND_KEY_L
+.else
+	vld1.32		{ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
+.endif
+	_speck_round_128bytes	\n
+.endif
+	subs		r6, r6, #1
+	bne		.Lnext_round_\@
+
+	// Re-interleave the 'x' and 'y' elements of each block
+.if \n == 64
+	vswp		X0_L, Y0_H
+	vswp		X1_L, Y1_H
+	vswp		X2_L, Y2_H
+	vswp		X3_L, Y3_H
+.else
+	vzip.32		Y0, X0
+	vzip.32		Y1, X1
+	vzip.32		Y2, X2
+	vzip.32		Y3, X3
+.endif
+
+	// XOR the encrypted/decrypted blocks with the tweaks we saved earlier
+	mov		r12, sp
+	vld1.8		{TMP0, TMP1}, [r12:128]!
+	vld1.8		{TMP2, TMP3}, [r12:128]!
+	veor		X0, TMP0
+	veor		Y0, TMP1
+	veor		X1, TMP2
+	veor		Y1, TMP3
+	vld1.8		{TMP0, TMP1}, [r12:128]!
+	vld1.8		{TMP2, TMP3}, [r12:128]!
+	veor		X2, TMP0
+	veor		Y2, TMP1
+	veor		X3, TMP2
+	veor		Y3, TMP3
+
+	// Store the ciphertext in the destination buffer
+	vst1.8		{X0, Y0}, [DST]!
+	vst1.8		{X1, Y1}, [DST]!
+	vst1.8		{X2, Y2}, [DST]!
+	vst1.8		{X3, Y3}, [DST]!
+
+	// Continue if there are more 128-byte chunks remaining, else return
+	subs		NBYTES, #128
+	bne		.Lnext_128bytes_\@
+
+	// Store the next tweak
+.if \n == 64
+	vst1.8		{TWEAKV}, [TWEAK]
+.else
+	vst1.8		{TWEAKV_L}, [TWEAK]
+.endif
+
+	mov		sp, r7
+	pop		{r4-r7}
+	bx		lr
+.endm
+
+ENTRY(speck128_xts_encrypt_neon)
+	_speck_xts_crypt	n=64, decrypting=0
+ENDPROC(speck128_xts_encrypt_neon)
+
+ENTRY(speck128_xts_decrypt_neon)
+	_speck_xts_crypt	n=64, decrypting=1
+ENDPROC(speck128_xts_decrypt_neon)
+
+ENTRY(speck64_xts_encrypt_neon)
+	_speck_xts_crypt	n=32, decrypting=0
+ENDPROC(speck64_xts_encrypt_neon)
+
+ENTRY(speck64_xts_decrypt_neon)
+	_speck_xts_crypt	n=32, decrypting=1
+ENDPROC(speck64_xts_decrypt_neon)
diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
new file mode 100644
index 000000000000..3987dd6e063e
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-glue.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Note: the NIST recommendation for XTS only specifies a 128-bit block size,
+ * but a 64-bit version (needed for Speck64) is fairly straightforward; the math
+ * is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
+ * x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
+ * "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
+ * OCB and PMAC"), represented as 0x1B.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <crypto/algapi.h>
+#include <crypto/gf128mul.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/speck.h>
+#include <crypto/xts.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* The assembly functions only handle multiples of 128 bytes */
+#define SPECK_NEON_CHUNK_SIZE	128
+
+/* Speck128 */
+
+struct speck128_xts_tfm_ctx {
+	struct speck128_tfm_ctx main_key;
+	struct speck128_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
+					  void *dst, const void *src,
+					  unsigned int nbytes, void *tweak);
+
+asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
+					  void *dst, const void *src,
+					  unsigned int nbytes, void *tweak);
+
+typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
+				     u8 *, const u8 *);
+typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
+					  const void *, unsigned int, void *);
+
+
+static __always_inline int
+__speck128_xts_crypt(struct skcipher_request *req,
+		     speck128_crypt_one_t crypt_one,
+		     speck128_xts_crypt_many_t crypt_many)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	le128 tweak;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+		u8 *dst = walk.dst.virt.addr;
+		const u8 *src = walk.src.virt.addr;
+
+		if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+			unsigned int count;
+
+			count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+			kernel_neon_begin();
+			(*crypt_many)(ctx->main_key.round_keys,
+				      ctx->main_key.nrounds,
+				      dst, src, count, &tweak);
+			kernel_neon_end();
+			dst += count;
+			src += count;
+			nbytes -= count;
+		}
+
+		/* Handle any remainder with generic code */
+		while (nbytes >= sizeof(le128)) {
+			le128_xor((le128 *)dst, (const le128 *)src, &tweak);
+			(*crypt_one)(&ctx->main_key, dst, dst);
+			le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
+			gf128mul_x_ble(&tweak, &tweak);
+
+			dst += sizeof(le128);
+			src += sizeof(le128);
+			nbytes -= sizeof(le128);
+		}
+		err = skcipher_walk_done(&walk, nbytes);
+	}
+
+	return err;
+}
+
+static int speck128_xts_encrypt(struct skcipher_request *req)
+{
+	return __speck128_xts_crypt(req, crypto_speck128_encrypt,
+				    speck128_xts_encrypt_neon);
+
+}
+
+static int speck128_xts_decrypt(struct skcipher_request *req)
+{
+	return __speck128_xts_crypt(req, crypto_speck128_decrypt,
+				    speck128_xts_decrypt_neon);
+}
+
+static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			       unsigned int keylen)
+{
+	struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
+	if (err)
+		return err;
+
+	return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+/* Speck64 */
+
+struct speck64_xts_tfm_ctx {
+	struct speck64_tfm_ctx main_key;
+	struct speck64_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
+					 void *dst, const void *src,
+					 unsigned int nbytes, void *tweak);
+
+asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
+					 void *dst, const void *src,
+					 unsigned int nbytes, void *tweak);
+
+typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
+				    u8 *, const u8 *);
+typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
+					 const void *, unsigned int, void *);
+
+static __always_inline int
+__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
+		    speck64_xts_crypt_many_t crypt_many)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	u64 tweak;
+	int err;
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+		u8 *dst = walk.dst.virt.addr;
+		const u8 *src = walk.src.virt.addr;
+
+		if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+			unsigned int count;
+
+			count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+			kernel_neon_begin();
+			(*crypt_many)(ctx->main_key.round_keys,
+				      ctx->main_key.nrounds,
+				      dst, src, count, &tweak);
+			kernel_neon_end();
+			dst += count;
+			src += count;
+			nbytes -= count;
+		}
+
+		/* Handle any remainder with generic code */
+		while (nbytes >= sizeof(u64)) {
+			*(u64 *)dst = *(u64 *)src ^ tweak;
+			(*crypt_one)(&ctx->main_key, dst, dst);
+			*(u64 *)dst ^= tweak;
+			tweak = (tweak << 1) ^
+				((tweak & (1ULL << 63)) ? 0x1B : 0);
+
+			dst += sizeof(u64);
+			src += sizeof(u64);
+			nbytes -= sizeof(u64);
+		}
+		err = skcipher_walk_done(&walk, nbytes);
+	}
+
+	return err;
+}
+
+static int speck64_xts_encrypt(struct skcipher_request *req)
+{
+	return __speck64_xts_crypt(req, crypto_speck64_encrypt,
+				   speck64_xts_encrypt_neon);
+}
+
+static int speck64_xts_decrypt(struct skcipher_request *req)
+{
+	return __speck64_xts_crypt(req, crypto_speck64_decrypt,
+				   speck64_xts_decrypt_neon);
+}
+
+static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			      unsigned int keylen)
+{
+	struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
+	if (err)
+		return err;
+
+	return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+static struct skcipher_alg speck_algs[] = {
+	{
+		.base.cra_name		= "xts(speck128)",
+		.base.cra_driver_name	= "xts-speck128-neon",
+		.base.cra_priority	= 300,
+		.base.cra_blocksize	= SPECK128_BLOCK_SIZE,
+		.base.cra_ctxsize	= sizeof(struct speck128_xts_tfm_ctx),
+		.base.cra_alignmask	= 7,
+		.base.cra_module	= THIS_MODULE,
+		.min_keysize		= 2 * SPECK128_128_KEY_SIZE,
+		.max_keysize		= 2 * SPECK128_256_KEY_SIZE,
+		.ivsize			= SPECK128_BLOCK_SIZE,
+		.walksize		= SPECK_NEON_CHUNK_SIZE,
+		.setkey			= speck128_xts_setkey,
+		.encrypt		= speck128_xts_encrypt,
+		.decrypt		= speck128_xts_decrypt,
+	}, {
+		.base.cra_name		= "xts(speck64)",
+		.base.cra_driver_name	= "xts-speck64-neon",
+		.base.cra_priority	= 300,
+		.base.cra_blocksize	= SPECK64_BLOCK_SIZE,
+		.base.cra_ctxsize	= sizeof(struct speck64_xts_tfm_ctx),
+		.base.cra_alignmask	= 7,
+		.base.cra_module	= THIS_MODULE,
+		.min_keysize		= 2 * SPECK64_96_KEY_SIZE,
+		.max_keysize		= 2 * SPECK64_128_KEY_SIZE,
+		.ivsize			= SPECK64_BLOCK_SIZE,
+		.walksize		= SPECK_NEON_CHUNK_SIZE,
+		.setkey			= speck64_xts_setkey,
+		.encrypt		= speck64_xts_encrypt,
+		.decrypt		= speck64_xts_decrypt,
+	}
+};
+
+static int __init speck_neon_module_init(void)
+{
+	if (!(elf_hwcap & HWCAP_NEON))
+		return -ENODEV;
+	return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_neon_module_exit(void)
+{
+	crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_neon_module_init);
+module_exit(speck_neon_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("xts(speck128)");
+MODULE_ALIAS_CRYPTO("xts-speck128-neon");
+MODULE_ALIAS_CRYPTO("xts(speck64)");
+MODULE_ALIAS_CRYPTO("xts-speck64-neon");
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/5] crypto: speck - add test vectors for Speck128-XTS
  2018-02-12 23:52 ` Eric Biggers
@ 2018-02-12 23:52   ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, Eric Biggers

Add test vectors for Speck128-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors.

Both xts(speck128-generic) and xts-speck128-neon pass these tests.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 687 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 696 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 058ed5eb6620..e011a347d51b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3575,6 +3575,15 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.dec = __VECS(serpent_xts_dec_tv_template)
 			}
 		}
+	}, {
+		.alg = "xts(speck128)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck128_xts_enc_tv_template),
+				.dec = __VECS(speck128_xts_dec_tv_template)
+			}
+		}
 	}, {
 		.alg = "xts(twofish)",
 		.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 3818210f77cf..0212e0ebcd0c 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14411,6 +14411,693 @@ static const struct cipher_testvec speck128_dec_tv_template[] = {
 	},
 };
 
+/*
+ * Speck128-XTS test vectors, taken from the AES-XTS test vectors with the
+ * result recomputed with Speck128 as the cipher
+ */
+
+static const struct cipher_testvec speck128_xts_enc_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.ilen	= 32,
+		.result	= "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+			  "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+			  "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+			  "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+			  "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+			  "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+			  "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+			  "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+			  "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+			  "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82"
+			  "\x53\xd0\xed\x2d\x30\xc1\x20\xef"
+			  "\x70\x67\x5e\xff\x09\x70\xbb\xc1"
+			  "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48"
+			  "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7"
+			  "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9"
+			  "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44"
+			  "\x19\xc5\x58\x84\x63\xb9\x12\x68"
+			  "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c"
+			  "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd"
+			  "\x74\x79\x2e\xb4\x44\xd7\x69\xc4"
+			  "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d"
+			  "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb"
+			  "\x6d\x13\x65\xa0\xf9\x31\x12\xe2"
+			  "\x26\xd1\xec\x2b\x0a\x8b\x59\x99"
+			  "\xa7\x49\xa0\x0e\x09\x33\x85\x50"
+			  "\xc3\x23\xca\x7a\xdd\x13\x45\x5f"
+			  "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f"
+			  "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6"
+			  "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f"
+			  "\x79\x91\x8d\x36\x13\x7b\xd0\x4a"
+			  "\x6c\x39\xfb\x53\xb8\x6f\x02\x51"
+			  "\xa5\x20\xac\x24\x1c\x73\x59\x73"
+			  "\x58\x61\x3a\x87\x58\xb3\x20\x56"
+			  "\x39\x06\x2b\x4d\xd3\x20\x2b\x89"
+			  "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd"
+			  "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91"
+			  "\x09\x35\x71\x50\x65\xac\x92\xe3"
+			  "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92"
+			  "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9"
+			  "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d"
+			  "\x77\x04\x80\xa9\xbf\x38\xb5\xbd"
+			  "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8"
+			  "\x2a\x26\xcc\x49\x14\x6d\x55\x01"
+			  "\x06\x94\xd8\xb2\x2d\x53\x83\x1b"
+			  "\x8f\xd4\xdd\x57\x12\x7e\x18\xba"
+			  "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d"
+			  "\x24\xa9\x60\xa4\x97\x85\x86\x2a"
+			  "\x01\x00\x09\xf1\xcb\x4a\x24\x1c"
+			  "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4"
+			  "\x97\x1c\x10\xc6\x4d\x66\x4f\x98"
+			  "\x87\x30\xac\xd5\xea\x73\x49\x10"
+			  "\x80\xea\xe5\x5f\x4d\x5f\x03\x33"
+			  "\x66\x02\x35\x3d\x60\x06\x36\x4f"
+			  "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8"
+			  "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28"
+			  "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93"
+			  "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30"
+			  "\xcc\x75\xcf\x16\x26\xa9\x26\x3b"
+			  "\xe7\x68\x2f\x15\x21\x5b\xe4\x00"
+			  "\xbd\x48\x50\xcd\x75\x70\xc4\x62"
+			  "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b"
+			  "\x51\x66\x02\x69\x04\x97\x36\xd4"
+			  "\x75\xae\x0b\xa3\x42\xf8\xca\x79"
+			  "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2"
+			  "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd"
+			  "\xea\x15\x5a\xa0\x85\x7e\x81\x0d"
+			  "\x03\xe7\x05\x39\xf5\x05\x26\xee"
+			  "\xec\xaa\x1f\x3d\xc9\x98\x76\x01"
+			  "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4"
+			  "\x50\x65\x50\x6d\x04\x1f\xdf\x5a"
+			  "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca"
+			  "\x47\x26\xef\x39\xb8\xb4\xf2\xd1"
+			  "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95"
+			  "\x02\x88\x41\x97\x16\x93\x99\x37"
+			  "\x51\x05\x82\x09\x74\x94\x45\x92",
+		.klen	= 64,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1"
+			  "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb"
+			  "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73"
+			  "\x92\x99\xde\xd3\x76\xed\xcd\x63"
+			  "\x64\x3a\x22\x57\xc1\x43\x49\xd4"
+			  "\x79\x36\x31\x19\x62\xae\x10\x7e"
+			  "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa"
+			  "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0"
+			  "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00"
+			  "\xfc\x81\x99\x8a\x14\x62\xf5\x7e"
+			  "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec"
+			  "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6"
+			  "\x62\x62\x37\xfe\x0a\x4c\x4a\x37"
+			  "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e"
+			  "\x85\x3c\x4f\x26\x64\x85\xbc\x68"
+			  "\xb0\xe0\x86\x5e\x26\x41\xce\x11"
+			  "\x50\xda\x97\x14\xe9\x9e\xc7\x6d"
+			  "\x3b\xdc\x43\xde\x2b\x27\x69\x7d"
+			  "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31"
+			  "\x14\x4d\xf0\x74\x37\xfd\x07\x25"
+			  "\x96\x55\xe5\xfc\x9e\x27\x2a\x74"
+			  "\x1b\x83\x4d\x15\x83\xac\x57\xa0"
+			  "\xac\xa5\xd0\x38\xef\x19\x56\x53"
+			  "\x25\x4b\xfc\xce\x04\x23\xe5\x6b"
+			  "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5"
+			  "\xed\x22\x34\x1c\x5d\xed\x17\x06"
+			  "\x36\xa3\xe6\x77\xb9\x97\x46\xb8"
+			  "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc"
+			  "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82"
+			  "\x35\x91\x3d\x1b\xe4\x97\x9f\x92"
+			  "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1"
+			  "\x8d\x39\xfc\x42\xfb\x38\x80\xb9"
+			  "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1"
+			  "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7"
+			  "\xa1\xbf\xf7\xda\x95\x93\x4b\x78"
+			  "\x19\xf5\x94\xf9\xd2\x00\x33\x37"
+			  "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee"
+			  "\x42\xb2\x9e\x2c\x5f\x48\x23\x26"
+			  "\x15\x25\x17\x03\x3d\xfe\x2c\xfc"
+			  "\xeb\xba\xda\xe0\x00\x05\xb6\xa6"
+			  "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf"
+			  "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a"
+			  "\x49\xa1\xc3\xfa\x10\x52\xb9\x14"
+			  "\xad\xb7\x73\xf8\x78\x12\xc8\x59"
+			  "\x17\x80\x4c\x57\x39\xf1\x6d\x80"
+			  "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21"
+			  "\xec\xce\xb7\xc8\x02\x8a\xed\x53"
+			  "\x2c\x25\x68\x2e\x1f\x85\x5e\x67"
+			  "\xd1\x07\x7a\x3a\x89\x08\xe0\x34"
+			  "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40"
+			  "\x31\x15\x72\xa0\xf0\x73\xd9\x3b"
+			  "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2"
+			  "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8"
+			  "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6"
+			  "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58"
+			  "\xcc\x1f\x48\x49\x65\x47\x75\xe9"
+			  "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07"
+			  "\xf2\xec\x76\xd8\x8f\x09\xf3\x16"
+			  "\xa1\x51\x89\x3b\xeb\x96\x42\xac"
+			  "\x65\xe0\x67\x63\x29\xdc\xb4\x7d"
+			  "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb"
+			  "\x66\x8d\x13\xca\xe0\x59\x2a\x00"
+			  "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5"
+			  "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
+static const struct cipher_testvec speck128_xts_dec_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+			  "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+			  "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+			  "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+		.ilen	= 32,
+		.result	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+			  "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+			  "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+			  "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+			  "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+			  "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+			  "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82"
+			  "\x53\xd0\xed\x2d\x30\xc1\x20\xef"
+			  "\x70\x67\x5e\xff\x09\x70\xbb\xc1"
+			  "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48"
+			  "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7"
+			  "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9"
+			  "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44"
+			  "\x19\xc5\x58\x84\x63\xb9\x12\x68"
+			  "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c"
+			  "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd"
+			  "\x74\x79\x2e\xb4\x44\xd7\x69\xc4"
+			  "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d"
+			  "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb"
+			  "\x6d\x13\x65\xa0\xf9\x31\x12\xe2"
+			  "\x26\xd1\xec\x2b\x0a\x8b\x59\x99"
+			  "\xa7\x49\xa0\x0e\x09\x33\x85\x50"
+			  "\xc3\x23\xca\x7a\xdd\x13\x45\x5f"
+			  "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f"
+			  "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6"
+			  "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f"
+			  "\x79\x91\x8d\x36\x13\x7b\xd0\x4a"
+			  "\x6c\x39\xfb\x53\xb8\x6f\x02\x51"
+			  "\xa5\x20\xac\x24\x1c\x73\x59\x73"
+			  "\x58\x61\x3a\x87\x58\xb3\x20\x56"
+			  "\x39\x06\x2b\x4d\xd3\x20\x2b\x89"
+			  "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd"
+			  "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91"
+			  "\x09\x35\x71\x50\x65\xac\x92\xe3"
+			  "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92"
+			  "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9"
+			  "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d"
+			  "\x77\x04\x80\xa9\xbf\x38\xb5\xbd"
+			  "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8"
+			  "\x2a\x26\xcc\x49\x14\x6d\x55\x01"
+			  "\x06\x94\xd8\xb2\x2d\x53\x83\x1b"
+			  "\x8f\xd4\xdd\x57\x12\x7e\x18\xba"
+			  "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d"
+			  "\x24\xa9\x60\xa4\x97\x85\x86\x2a"
+			  "\x01\x00\x09\xf1\xcb\x4a\x24\x1c"
+			  "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4"
+			  "\x97\x1c\x10\xc6\x4d\x66\x4f\x98"
+			  "\x87\x30\xac\xd5\xea\x73\x49\x10"
+			  "\x80\xea\xe5\x5f\x4d\x5f\x03\x33"
+			  "\x66\x02\x35\x3d\x60\x06\x36\x4f"
+			  "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8"
+			  "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28"
+			  "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93"
+			  "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30"
+			  "\xcc\x75\xcf\x16\x26\xa9\x26\x3b"
+			  "\xe7\x68\x2f\x15\x21\x5b\xe4\x00"
+			  "\xbd\x48\x50\xcd\x75\x70\xc4\x62"
+			  "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b"
+			  "\x51\x66\x02\x69\x04\x97\x36\xd4"
+			  "\x75\xae\x0b\xa3\x42\xf8\xca\x79"
+			  "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2"
+			  "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd"
+			  "\xea\x15\x5a\xa0\x85\x7e\x81\x0d"
+			  "\x03\xe7\x05\x39\xf5\x05\x26\xee"
+			  "\xec\xaa\x1f\x3d\xc9\x98\x76\x01"
+			  "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4"
+			  "\x50\x65\x50\x6d\x04\x1f\xdf\x5a"
+			  "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca"
+			  "\x47\x26\xef\x39\xb8\xb4\xf2\xd1"
+			  "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95"
+			  "\x02\x88\x41\x97\x16\x93\x99\x37"
+			  "\x51\x05\x82\x09\x74\x94\x45\x92",
+		.klen	= 64,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1"
+			  "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb"
+			  "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73"
+			  "\x92\x99\xde\xd3\x76\xed\xcd\x63"
+			  "\x64\x3a\x22\x57\xc1\x43\x49\xd4"
+			  "\x79\x36\x31\x19\x62\xae\x10\x7e"
+			  "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa"
+			  "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0"
+			  "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00"
+			  "\xfc\x81\x99\x8a\x14\x62\xf5\x7e"
+			  "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec"
+			  "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6"
+			  "\x62\x62\x37\xfe\x0a\x4c\x4a\x37"
+			  "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e"
+			  "\x85\x3c\x4f\x26\x64\x85\xbc\x68"
+			  "\xb0\xe0\x86\x5e\x26\x41\xce\x11"
+			  "\x50\xda\x97\x14\xe9\x9e\xc7\x6d"
+			  "\x3b\xdc\x43\xde\x2b\x27\x69\x7d"
+			  "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31"
+			  "\x14\x4d\xf0\x74\x37\xfd\x07\x25"
+			  "\x96\x55\xe5\xfc\x9e\x27\x2a\x74"
+			  "\x1b\x83\x4d\x15\x83\xac\x57\xa0"
+			  "\xac\xa5\xd0\x38\xef\x19\x56\x53"
+			  "\x25\x4b\xfc\xce\x04\x23\xe5\x6b"
+			  "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5"
+			  "\xed\x22\x34\x1c\x5d\xed\x17\x06"
+			  "\x36\xa3\xe6\x77\xb9\x97\x46\xb8"
+			  "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc"
+			  "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82"
+			  "\x35\x91\x3d\x1b\xe4\x97\x9f\x92"
+			  "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1"
+			  "\x8d\x39\xfc\x42\xfb\x38\x80\xb9"
+			  "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1"
+			  "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7"
+			  "\xa1\xbf\xf7\xda\x95\x93\x4b\x78"
+			  "\x19\xf5\x94\xf9\xd2\x00\x33\x37"
+			  "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee"
+			  "\x42\xb2\x9e\x2c\x5f\x48\x23\x26"
+			  "\x15\x25\x17\x03\x3d\xfe\x2c\xfc"
+			  "\xeb\xba\xda\xe0\x00\x05\xb6\xa6"
+			  "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf"
+			  "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a"
+			  "\x49\xa1\xc3\xfa\x10\x52\xb9\x14"
+			  "\xad\xb7\x73\xf8\x78\x12\xc8\x59"
+			  "\x17\x80\x4c\x57\x39\xf1\x6d\x80"
+			  "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21"
+			  "\xec\xce\xb7\xc8\x02\x8a\xed\x53"
+			  "\x2c\x25\x68\x2e\x1f\x85\x5e\x67"
+			  "\xd1\x07\x7a\x3a\x89\x08\xe0\x34"
+			  "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40"
+			  "\x31\x15\x72\xa0\xf0\x73\xd9\x3b"
+			  "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2"
+			  "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8"
+			  "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6"
+			  "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58"
+			  "\xcc\x1f\x48\x49\x65\x47\x75\xe9"
+			  "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07"
+			  "\xf2\xec\x76\xd8\x8f\x09\xf3\x16"
+			  "\xa1\x51\x89\x3b\xeb\x96\x42\xac"
+			  "\x65\xe0\x67\x63\x29\xdc\xb4\x7d"
+			  "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb"
+			  "\x66\x8d\x13\xca\xe0\x59\x2a\x00"
+			  "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5"
+			  "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
 static const struct cipher_testvec speck64_enc_tv_template[] = {
 	{ /* Speck64/96 */
 		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/5] crypto: speck - add test vectors for Speck128-XTS
@ 2018-02-12 23:52   ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-arm-kernel

Add test vectors for Speck128-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors.

Both xts(speck128-generic) and xts-speck128-neon pass these tests.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 687 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 696 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 058ed5eb6620..e011a347d51b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3575,6 +3575,15 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.dec = __VECS(serpent_xts_dec_tv_template)
 			}
 		}
+	}, {
+		.alg = "xts(speck128)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck128_xts_enc_tv_template),
+				.dec = __VECS(speck128_xts_dec_tv_template)
+			}
+		}
 	}, {
 		.alg = "xts(twofish)",
 		.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 3818210f77cf..0212e0ebcd0c 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14411,6 +14411,693 @@ static const struct cipher_testvec speck128_dec_tv_template[] = {
 	},
 };
 
+/*
+ * Speck128-XTS test vectors, taken from the AES-XTS test vectors with the
+ * result recomputed with Speck128 as the cipher
+ */
+
+static const struct cipher_testvec speck128_xts_enc_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.ilen	= 32,
+		.result	= "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+			  "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+			  "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+			  "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+			  "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+			  "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+			  "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+			  "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+			  "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+			  "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82"
+			  "\x53\xd0\xed\x2d\x30\xc1\x20\xef"
+			  "\x70\x67\x5e\xff\x09\x70\xbb\xc1"
+			  "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48"
+			  "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7"
+			  "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9"
+			  "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44"
+			  "\x19\xc5\x58\x84\x63\xb9\x12\x68"
+			  "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c"
+			  "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd"
+			  "\x74\x79\x2e\xb4\x44\xd7\x69\xc4"
+			  "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d"
+			  "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb"
+			  "\x6d\x13\x65\xa0\xf9\x31\x12\xe2"
+			  "\x26\xd1\xec\x2b\x0a\x8b\x59\x99"
+			  "\xa7\x49\xa0\x0e\x09\x33\x85\x50"
+			  "\xc3\x23\xca\x7a\xdd\x13\x45\x5f"
+			  "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f"
+			  "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6"
+			  "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f"
+			  "\x79\x91\x8d\x36\x13\x7b\xd0\x4a"
+			  "\x6c\x39\xfb\x53\xb8\x6f\x02\x51"
+			  "\xa5\x20\xac\x24\x1c\x73\x59\x73"
+			  "\x58\x61\x3a\x87\x58\xb3\x20\x56"
+			  "\x39\x06\x2b\x4d\xd3\x20\x2b\x89"
+			  "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd"
+			  "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91"
+			  "\x09\x35\x71\x50\x65\xac\x92\xe3"
+			  "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92"
+			  "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9"
+			  "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d"
+			  "\x77\x04\x80\xa9\xbf\x38\xb5\xbd"
+			  "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8"
+			  "\x2a\x26\xcc\x49\x14\x6d\x55\x01"
+			  "\x06\x94\xd8\xb2\x2d\x53\x83\x1b"
+			  "\x8f\xd4\xdd\x57\x12\x7e\x18\xba"
+			  "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d"
+			  "\x24\xa9\x60\xa4\x97\x85\x86\x2a"
+			  "\x01\x00\x09\xf1\xcb\x4a\x24\x1c"
+			  "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4"
+			  "\x97\x1c\x10\xc6\x4d\x66\x4f\x98"
+			  "\x87\x30\xac\xd5\xea\x73\x49\x10"
+			  "\x80\xea\xe5\x5f\x4d\x5f\x03\x33"
+			  "\x66\x02\x35\x3d\x60\x06\x36\x4f"
+			  "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8"
+			  "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28"
+			  "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93"
+			  "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30"
+			  "\xcc\x75\xcf\x16\x26\xa9\x26\x3b"
+			  "\xe7\x68\x2f\x15\x21\x5b\xe4\x00"
+			  "\xbd\x48\x50\xcd\x75\x70\xc4\x62"
+			  "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b"
+			  "\x51\x66\x02\x69\x04\x97\x36\xd4"
+			  "\x75\xae\x0b\xa3\x42\xf8\xca\x79"
+			  "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2"
+			  "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd"
+			  "\xea\x15\x5a\xa0\x85\x7e\x81\x0d"
+			  "\x03\xe7\x05\x39\xf5\x05\x26\xee"
+			  "\xec\xaa\x1f\x3d\xc9\x98\x76\x01"
+			  "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4"
+			  "\x50\x65\x50\x6d\x04\x1f\xdf\x5a"
+			  "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca"
+			  "\x47\x26\xef\x39\xb8\xb4\xf2\xd1"
+			  "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95"
+			  "\x02\x88\x41\x97\x16\x93\x99\x37"
+			  "\x51\x05\x82\x09\x74\x94\x45\x92",
+		.klen	= 64,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1"
+			  "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb"
+			  "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73"
+			  "\x92\x99\xde\xd3\x76\xed\xcd\x63"
+			  "\x64\x3a\x22\x57\xc1\x43\x49\xd4"
+			  "\x79\x36\x31\x19\x62\xae\x10\x7e"
+			  "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa"
+			  "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0"
+			  "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00"
+			  "\xfc\x81\x99\x8a\x14\x62\xf5\x7e"
+			  "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec"
+			  "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6"
+			  "\x62\x62\x37\xfe\x0a\x4c\x4a\x37"
+			  "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e"
+			  "\x85\x3c\x4f\x26\x64\x85\xbc\x68"
+			  "\xb0\xe0\x86\x5e\x26\x41\xce\x11"
+			  "\x50\xda\x97\x14\xe9\x9e\xc7\x6d"
+			  "\x3b\xdc\x43\xde\x2b\x27\x69\x7d"
+			  "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31"
+			  "\x14\x4d\xf0\x74\x37\xfd\x07\x25"
+			  "\x96\x55\xe5\xfc\x9e\x27\x2a\x74"
+			  "\x1b\x83\x4d\x15\x83\xac\x57\xa0"
+			  "\xac\xa5\xd0\x38\xef\x19\x56\x53"
+			  "\x25\x4b\xfc\xce\x04\x23\xe5\x6b"
+			  "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5"
+			  "\xed\x22\x34\x1c\x5d\xed\x17\x06"
+			  "\x36\xa3\xe6\x77\xb9\x97\x46\xb8"
+			  "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc"
+			  "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82"
+			  "\x35\x91\x3d\x1b\xe4\x97\x9f\x92"
+			  "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1"
+			  "\x8d\x39\xfc\x42\xfb\x38\x80\xb9"
+			  "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1"
+			  "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7"
+			  "\xa1\xbf\xf7\xda\x95\x93\x4b\x78"
+			  "\x19\xf5\x94\xf9\xd2\x00\x33\x37"
+			  "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee"
+			  "\x42\xb2\x9e\x2c\x5f\x48\x23\x26"
+			  "\x15\x25\x17\x03\x3d\xfe\x2c\xfc"
+			  "\xeb\xba\xda\xe0\x00\x05\xb6\xa6"
+			  "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf"
+			  "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a"
+			  "\x49\xa1\xc3\xfa\x10\x52\xb9\x14"
+			  "\xad\xb7\x73\xf8\x78\x12\xc8\x59"
+			  "\x17\x80\x4c\x57\x39\xf1\x6d\x80"
+			  "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21"
+			  "\xec\xce\xb7\xc8\x02\x8a\xed\x53"
+			  "\x2c\x25\x68\x2e\x1f\x85\x5e\x67"
+			  "\xd1\x07\x7a\x3a\x89\x08\xe0\x34"
+			  "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40"
+			  "\x31\x15\x72\xa0\xf0\x73\xd9\x3b"
+			  "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2"
+			  "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8"
+			  "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6"
+			  "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58"
+			  "\xcc\x1f\x48\x49\x65\x47\x75\xe9"
+			  "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07"
+			  "\xf2\xec\x76\xd8\x8f\x09\xf3\x16"
+			  "\xa1\x51\x89\x3b\xeb\x96\x42\xac"
+			  "\x65\xe0\x67\x63\x29\xdc\xb4\x7d"
+			  "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb"
+			  "\x66\x8d\x13\xca\xe0\x59\x2a\x00"
+			  "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5"
+			  "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
+static const struct cipher_testvec speck128_xts_dec_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+			  "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+			  "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+			  "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+		.ilen	= 32,
+		.result	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+			  "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+			  "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+			  "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 32,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+			  "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+			  "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+			  "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95",
+		.klen	= 32,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82"
+			  "\x53\xd0\xed\x2d\x30\xc1\x20\xef"
+			  "\x70\x67\x5e\xff\x09\x70\xbb\xc1"
+			  "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48"
+			  "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7"
+			  "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9"
+			  "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44"
+			  "\x19\xc5\x58\x84\x63\xb9\x12\x68"
+			  "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c"
+			  "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd"
+			  "\x74\x79\x2e\xb4\x44\xd7\x69\xc4"
+			  "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d"
+			  "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb"
+			  "\x6d\x13\x65\xa0\xf9\x31\x12\xe2"
+			  "\x26\xd1\xec\x2b\x0a\x8b\x59\x99"
+			  "\xa7\x49\xa0\x0e\x09\x33\x85\x50"
+			  "\xc3\x23\xca\x7a\xdd\x13\x45\x5f"
+			  "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f"
+			  "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6"
+			  "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f"
+			  "\x79\x91\x8d\x36\x13\x7b\xd0\x4a"
+			  "\x6c\x39\xfb\x53\xb8\x6f\x02\x51"
+			  "\xa5\x20\xac\x24\x1c\x73\x59\x73"
+			  "\x58\x61\x3a\x87\x58\xb3\x20\x56"
+			  "\x39\x06\x2b\x4d\xd3\x20\x2b\x89"
+			  "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd"
+			  "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91"
+			  "\x09\x35\x71\x50\x65\xac\x92\xe3"
+			  "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92"
+			  "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9"
+			  "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d"
+			  "\x77\x04\x80\xa9\xbf\x38\xb5\xbd"
+			  "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8"
+			  "\x2a\x26\xcc\x49\x14\x6d\x55\x01"
+			  "\x06\x94\xd8\xb2\x2d\x53\x83\x1b"
+			  "\x8f\xd4\xdd\x57\x12\x7e\x18\xba"
+			  "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d"
+			  "\x24\xa9\x60\xa4\x97\x85\x86\x2a"
+			  "\x01\x00\x09\xf1\xcb\x4a\x24\x1c"
+			  "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4"
+			  "\x97\x1c\x10\xc6\x4d\x66\x4f\x98"
+			  "\x87\x30\xac\xd5\xea\x73\x49\x10"
+			  "\x80\xea\xe5\x5f\x4d\x5f\x03\x33"
+			  "\x66\x02\x35\x3d\x60\x06\x36\x4f"
+			  "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8"
+			  "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28"
+			  "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93"
+			  "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30"
+			  "\xcc\x75\xcf\x16\x26\xa9\x26\x3b"
+			  "\xe7\x68\x2f\x15\x21\x5b\xe4\x00"
+			  "\xbd\x48\x50\xcd\x75\x70\xc4\x62"
+			  "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b"
+			  "\x51\x66\x02\x69\x04\x97\x36\xd4"
+			  "\x75\xae\x0b\xa3\x42\xf8\xca\x79"
+			  "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2"
+			  "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd"
+			  "\xea\x15\x5a\xa0\x85\x7e\x81\x0d"
+			  "\x03\xe7\x05\x39\xf5\x05\x26\xee"
+			  "\xec\xaa\x1f\x3d\xc9\x98\x76\x01"
+			  "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4"
+			  "\x50\x65\x50\x6d\x04\x1f\xdf\x5a"
+			  "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca"
+			  "\x47\x26\xef\x39\xb8\xb4\xf2\xd1"
+			  "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93"
+			  "\x23\x84\x62\x64\x33\x83\x27\x95"
+			  "\x02\x88\x41\x97\x16\x93\x99\x37"
+			  "\x51\x05\x82\x09\x74\x94\x45\x92",
+		.klen	= 64,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1"
+			  "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb"
+			  "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73"
+			  "\x92\x99\xde\xd3\x76\xed\xcd\x63"
+			  "\x64\x3a\x22\x57\xc1\x43\x49\xd4"
+			  "\x79\x36\x31\x19\x62\xae\x10\x7e"
+			  "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa"
+			  "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0"
+			  "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00"
+			  "\xfc\x81\x99\x8a\x14\x62\xf5\x7e"
+			  "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec"
+			  "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6"
+			  "\x62\x62\x37\xfe\x0a\x4c\x4a\x37"
+			  "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e"
+			  "\x85\x3c\x4f\x26\x64\x85\xbc\x68"
+			  "\xb0\xe0\x86\x5e\x26\x41\xce\x11"
+			  "\x50\xda\x97\x14\xe9\x9e\xc7\x6d"
+			  "\x3b\xdc\x43\xde\x2b\x27\x69\x7d"
+			  "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31"
+			  "\x14\x4d\xf0\x74\x37\xfd\x07\x25"
+			  "\x96\x55\xe5\xfc\x9e\x27\x2a\x74"
+			  "\x1b\x83\x4d\x15\x83\xac\x57\xa0"
+			  "\xac\xa5\xd0\x38\xef\x19\x56\x53"
+			  "\x25\x4b\xfc\xce\x04\x23\xe5\x6b"
+			  "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5"
+			  "\xed\x22\x34\x1c\x5d\xed\x17\x06"
+			  "\x36\xa3\xe6\x77\xb9\x97\x46\xb8"
+			  "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc"
+			  "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82"
+			  "\x35\x91\x3d\x1b\xe4\x97\x9f\x92"
+			  "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1"
+			  "\x8d\x39\xfc\x42\xfb\x38\x80\xb9"
+			  "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1"
+			  "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7"
+			  "\xa1\xbf\xf7\xda\x95\x93\x4b\x78"
+			  "\x19\xf5\x94\xf9\xd2\x00\x33\x37"
+			  "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee"
+			  "\x42\xb2\x9e\x2c\x5f\x48\x23\x26"
+			  "\x15\x25\x17\x03\x3d\xfe\x2c\xfc"
+			  "\xeb\xba\xda\xe0\x00\x05\xb6\xa6"
+			  "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf"
+			  "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a"
+			  "\x49\xa1\xc3\xfa\x10\x52\xb9\x14"
+			  "\xad\xb7\x73\xf8\x78\x12\xc8\x59"
+			  "\x17\x80\x4c\x57\x39\xf1\x6d\x80"
+			  "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21"
+			  "\xec\xce\xb7\xc8\x02\x8a\xed\x53"
+			  "\x2c\x25\x68\x2e\x1f\x85\x5e\x67"
+			  "\xd1\x07\x7a\x3a\x89\x08\xe0\x34"
+			  "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40"
+			  "\x31\x15\x72\xa0\xf0\x73\xd9\x3b"
+			  "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2"
+			  "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8"
+			  "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6"
+			  "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58"
+			  "\xcc\x1f\x48\x49\x65\x47\x75\xe9"
+			  "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07"
+			  "\xf2\xec\x76\xd8\x8f\x09\xf3\x16"
+			  "\xa1\x51\x89\x3b\xeb\x96\x42\xac"
+			  "\x65\xe0\x67\x63\x29\xdc\xb4\x7d"
+			  "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb"
+			  "\x66\x8d\x13\xca\xe0\x59\x2a\x00"
+			  "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5"
+			  "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
 static const struct cipher_testvec speck64_enc_tv_template[] = {
 	{ /* Speck64/96 */
 		.key	= "\x00\x01\x02\x03\x08\x09\x0a\x0b"
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/5] crypto: speck - add test vectors for Speck64-XTS
  2018-02-12 23:52 ` Eric Biggers
@ 2018-02-12 23:52   ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, Eric Biggers

Add test vectors for Speck64-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors, with key lengths
adjusted.

xts-speck64-neon passes these tests.  However, they aren't currently
applicable for the generic XTS template, as that only supports a 128-bit
block size.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 671 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 680 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index e011a347d51b..9f82e7bc9c56 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3584,6 +3584,15 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.dec = __VECS(speck128_xts_dec_tv_template)
 			}
 		}
+	}, {
+		.alg = "xts(speck64)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck64_xts_enc_tv_template),
+				.dec = __VECS(speck64_xts_dec_tv_template)
+			}
+		}
 	}, {
 		.alg = "xts(twofish)",
 		.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 0212e0ebcd0c..da72fd394f35 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -15138,6 +15138,677 @@ static const struct cipher_testvec speck64_dec_tv_template[] = {
 	},
 };
 
+/*
+ * Speck64-XTS test vectors, taken from the AES-XTS test vectors with the result
+ * recomputed with Speck64 as the cipher, and key lengths adjusted
+ */
+
+static const struct cipher_testvec speck64_xts_enc_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.ilen	= 32,
+		.result	= "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+			  "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+			  "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+			  "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+			  "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+			  "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+			  "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+			  "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+			  "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+			  "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e"
+			  "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09"
+			  "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3"
+			  "\x11\xc7\x39\x96\xd0\x95\xf4\x56"
+			  "\xf4\xdd\x03\x38\x01\x44\x2c\xcf"
+			  "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66"
+			  "\xfe\x3d\xc6\xfb\x01\x23\x51\x43"
+			  "\xd5\xd2\x13\x86\x94\x34\xe9\x62"
+			  "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef"
+			  "\x76\x35\x04\x3f\xdb\x23\x9d\x0b"
+			  "\x85\x42\xb9\x02\xd6\xcc\xdb\x96"
+			  "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d"
+			  "\xae\xd2\x04\xd5\xda\xc1\x7e\x24"
+			  "\x8c\x73\xbe\x48\x7e\xcf\x65\x28"
+			  "\x29\xe5\xbe\x54\x30\xcb\x46\x95"
+			  "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe"
+			  "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69"
+			  "\xa1\x09\x95\x71\x26\xe9\xc4\xdf"
+			  "\xe6\x31\xc3\x46\xda\xaf\x0b\x41"
+			  "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3"
+			  "\x82\xc0\x37\x27\xfc\x91\xa7\x05"
+			  "\xfb\xc5\xdc\x2b\x74\x96\x48\x43"
+			  "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f"
+			  "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a"
+			  "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c"
+			  "\x07\xff\xf3\x72\x74\x48\xb5\x40"
+			  "\x50\xb5\xdd\x90\x43\x31\x18\x15"
+			  "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a"
+			  "\x29\x93\x90\x8b\xda\x07\xf0\x35"
+			  "\x6d\x90\x88\x09\x4e\x83\xf5\x5b"
+			  "\x94\x12\xbb\x33\x27\x1d\x3f\x23"
+			  "\x51\xa8\x7c\x07\xa2\xae\x77\xa6"
+			  "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f"
+			  "\x66\xdd\xcd\x75\x24\x8b\x33\xf7"
+			  "\x20\xdb\x83\x9b\x4f\x11\x63\x6e"
+			  "\xcf\x37\xef\xc9\x11\x01\x5c\x45"
+			  "\x32\x99\x7c\x3c\x9e\x42\x89\xe3"
+			  "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05"
+			  "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc"
+			  "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d"
+			  "\xa0\xa8\x89\x3b\x73\x39\xa5\x94"
+			  "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89"
+			  "\x10\xff\xaf\xef\xca\xdd\x4f\x80"
+			  "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7"
+			  "\x33\xca\x00\x8b\x8b\x3f\xea\xec"
+			  "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f"
+			  "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5"
+			  "\x64\xa3\xf1\x1a\x76\x28\xcc\x35"
+			  "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b"
+			  "\xc7\x1b\x53\x17\x02\xea\xd1\xad"
+			  "\x13\x51\x73\xc0\xa0\xb2\x05\x32"
+			  "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19"
+			  "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d"
+			  "\x59\xda\xee\x1a\x22\x18\xda\x0d"
+			  "\x88\x0f\x55\x8b\x72\x62\xfd\xc1"
+			  "\x69\x13\xcd\x0d\x5f\xc1\x09\x52"
+			  "\xee\xd6\xe3\x84\x4d\xee\xf6\x88"
+			  "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f"
+			  "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54"
+			  "\x7d\x69\x8d\x00\x62\x77\x0d\x14"
+			  "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3"
+			  "\x50\xf7\x5f\xf4\xc2\xca\x41\x97"
+			  "\x37\xbe\x75\x74\xcd\xf0\x75\x6e"
+			  "\x25\x23\x94\xbd\xda\x8d\xb0\xd4",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27",
+		.klen	= 32,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\x55\xed\x71\xd3\x02\x8e\x15\x3b"
+			  "\xc6\x71\x29\x2d\x3e\x89\x9f\x59"
+			  "\x68\x6a\xcc\x8a\x56\x97\xf3\x95"
+			  "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c"
+			  "\x78\x16\xea\x80\xdb\x33\x75\x94"
+			  "\xf9\x29\xc4\x2b\x76\x75\x97\xc7"
+			  "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b"
+			  "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee"
+			  "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a"
+			  "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c"
+			  "\xf5\xec\x32\x74\xa3\xb8\x03\x88"
+			  "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f"
+			  "\x84\x5e\x46\xed\x20\x89\xb6\x44"
+			  "\x8d\xd0\xed\x54\x47\x16\xbe\x95"
+			  "\x8a\xb3\x6b\x72\xc4\x32\x52\x13"
+			  "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6"
+			  "\x44\x18\xdd\x8c\x6e\xca\x6e\x45"
+			  "\x8f\x1e\x10\x07\x57\x25\x98\x7b"
+			  "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8"
+			  "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb"
+			  "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff"
+			  "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e"
+			  "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d"
+			  "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65"
+			  "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a"
+			  "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a"
+			  "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78"
+			  "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3"
+			  "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e"
+			  "\x35\x10\x30\x82\x0d\xe7\xc5\x9b"
+			  "\xde\x44\x18\xbd\x9f\xd1\x45\xa9"
+			  "\x7b\x7a\x4a\xad\x35\x65\x27\xca"
+			  "\xb2\xc3\xd4\x9b\x71\x86\x70\xee"
+			  "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf"
+			  "\xfc\x42\xc8\x31\x59\xbe\x16\x60"
+			  "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14"
+			  "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef"
+			  "\x52\x7f\x29\x51\x94\x20\x67\x3c"
+			  "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63"
+			  "\xe7\xff\x73\x25\xd1\xdd\x96\x8a"
+			  "\x98\x52\x6d\xf3\xac\x3e\xf2\x18"
+			  "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed"
+			  "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e"
+			  "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad"
+			  "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa"
+			  "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81"
+			  "\x65\x53\x0f\x41\x11\xbd\x98\x99"
+			  "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d"
+			  "\x84\x98\xf9\x34\xed\x33\x2a\x1f"
+			  "\x82\xed\xc1\x73\x98\xd3\x02\xdc"
+			  "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76"
+			  "\x63\x51\x34\x9d\x96\x12\xae\xce"
+			  "\x83\xc9\x76\x5e\xa4\x1b\x53\x37"
+			  "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d"
+			  "\x54\x27\x74\xbb\x10\x86\x57\x46"
+			  "\x68\xe1\xed\x14\xe7\x9d\xfc\x84"
+			  "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf"
+			  "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d"
+			  "\x7b\x4f\x38\x55\x36\x71\x64\xc1"
+			  "\xfc\x5c\x75\x52\x33\x02\x18\xf8"
+			  "\x17\xe1\x2b\xc2\x43\x39\xbd\x76"
+			  "\x9b\x63\x76\x32\x2f\x19\x72\x10"
+			  "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5"
+			  "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
+static const struct cipher_testvec speck64_xts_dec_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+			  "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+			  "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+			  "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+		.ilen	= 32,
+		.result	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+			  "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+			  "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+			  "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+			  "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+			  "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+			  "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e"
+			  "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09"
+			  "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3"
+			  "\x11\xc7\x39\x96\xd0\x95\xf4\x56"
+			  "\xf4\xdd\x03\x38\x01\x44\x2c\xcf"
+			  "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66"
+			  "\xfe\x3d\xc6\xfb\x01\x23\x51\x43"
+			  "\xd5\xd2\x13\x86\x94\x34\xe9\x62"
+			  "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef"
+			  "\x76\x35\x04\x3f\xdb\x23\x9d\x0b"
+			  "\x85\x42\xb9\x02\xd6\xcc\xdb\x96"
+			  "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d"
+			  "\xae\xd2\x04\xd5\xda\xc1\x7e\x24"
+			  "\x8c\x73\xbe\x48\x7e\xcf\x65\x28"
+			  "\x29\xe5\xbe\x54\x30\xcb\x46\x95"
+			  "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe"
+			  "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69"
+			  "\xa1\x09\x95\x71\x26\xe9\xc4\xdf"
+			  "\xe6\x31\xc3\x46\xda\xaf\x0b\x41"
+			  "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3"
+			  "\x82\xc0\x37\x27\xfc\x91\xa7\x05"
+			  "\xfb\xc5\xdc\x2b\x74\x96\x48\x43"
+			  "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f"
+			  "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a"
+			  "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c"
+			  "\x07\xff\xf3\x72\x74\x48\xb5\x40"
+			  "\x50\xb5\xdd\x90\x43\x31\x18\x15"
+			  "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a"
+			  "\x29\x93\x90\x8b\xda\x07\xf0\x35"
+			  "\x6d\x90\x88\x09\x4e\x83\xf5\x5b"
+			  "\x94\x12\xbb\x33\x27\x1d\x3f\x23"
+			  "\x51\xa8\x7c\x07\xa2\xae\x77\xa6"
+			  "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f"
+			  "\x66\xdd\xcd\x75\x24\x8b\x33\xf7"
+			  "\x20\xdb\x83\x9b\x4f\x11\x63\x6e"
+			  "\xcf\x37\xef\xc9\x11\x01\x5c\x45"
+			  "\x32\x99\x7c\x3c\x9e\x42\x89\xe3"
+			  "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05"
+			  "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc"
+			  "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d"
+			  "\xa0\xa8\x89\x3b\x73\x39\xa5\x94"
+			  "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89"
+			  "\x10\xff\xaf\xef\xca\xdd\x4f\x80"
+			  "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7"
+			  "\x33\xca\x00\x8b\x8b\x3f\xea\xec"
+			  "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f"
+			  "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5"
+			  "\x64\xa3\xf1\x1a\x76\x28\xcc\x35"
+			  "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b"
+			  "\xc7\x1b\x53\x17\x02\xea\xd1\xad"
+			  "\x13\x51\x73\xc0\xa0\xb2\x05\x32"
+			  "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19"
+			  "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d"
+			  "\x59\xda\xee\x1a\x22\x18\xda\x0d"
+			  "\x88\x0f\x55\x8b\x72\x62\xfd\xc1"
+			  "\x69\x13\xcd\x0d\x5f\xc1\x09\x52"
+			  "\xee\xd6\xe3\x84\x4d\xee\xf6\x88"
+			  "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f"
+			  "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54"
+			  "\x7d\x69\x8d\x00\x62\x77\x0d\x14"
+			  "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3"
+			  "\x50\xf7\x5f\xf4\xc2\xca\x41\x97"
+			  "\x37\xbe\x75\x74\xcd\xf0\x75\x6e"
+			  "\x25\x23\x94\xbd\xda\x8d\xb0\xd4",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27",
+		.klen	= 32,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x55\xed\x71\xd3\x02\x8e\x15\x3b"
+			  "\xc6\x71\x29\x2d\x3e\x89\x9f\x59"
+			  "\x68\x6a\xcc\x8a\x56\x97\xf3\x95"
+			  "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c"
+			  "\x78\x16\xea\x80\xdb\x33\x75\x94"
+			  "\xf9\x29\xc4\x2b\x76\x75\x97\xc7"
+			  "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b"
+			  "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee"
+			  "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a"
+			  "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c"
+			  "\xf5\xec\x32\x74\xa3\xb8\x03\x88"
+			  "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f"
+			  "\x84\x5e\x46\xed\x20\x89\xb6\x44"
+			  "\x8d\xd0\xed\x54\x47\x16\xbe\x95"
+			  "\x8a\xb3\x6b\x72\xc4\x32\x52\x13"
+			  "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6"
+			  "\x44\x18\xdd\x8c\x6e\xca\x6e\x45"
+			  "\x8f\x1e\x10\x07\x57\x25\x98\x7b"
+			  "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8"
+			  "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb"
+			  "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff"
+			  "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e"
+			  "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d"
+			  "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65"
+			  "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a"
+			  "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a"
+			  "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78"
+			  "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3"
+			  "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e"
+			  "\x35\x10\x30\x82\x0d\xe7\xc5\x9b"
+			  "\xde\x44\x18\xbd\x9f\xd1\x45\xa9"
+			  "\x7b\x7a\x4a\xad\x35\x65\x27\xca"
+			  "\xb2\xc3\xd4\x9b\x71\x86\x70\xee"
+			  "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf"
+			  "\xfc\x42\xc8\x31\x59\xbe\x16\x60"
+			  "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14"
+			  "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef"
+			  "\x52\x7f\x29\x51\x94\x20\x67\x3c"
+			  "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63"
+			  "\xe7\xff\x73\x25\xd1\xdd\x96\x8a"
+			  "\x98\x52\x6d\xf3\xac\x3e\xf2\x18"
+			  "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed"
+			  "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e"
+			  "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad"
+			  "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa"
+			  "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81"
+			  "\x65\x53\x0f\x41\x11\xbd\x98\x99"
+			  "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d"
+			  "\x84\x98\xf9\x34\xed\x33\x2a\x1f"
+			  "\x82\xed\xc1\x73\x98\xd3\x02\xdc"
+			  "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76"
+			  "\x63\x51\x34\x9d\x96\x12\xae\xce"
+			  "\x83\xc9\x76\x5e\xa4\x1b\x53\x37"
+			  "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d"
+			  "\x54\x27\x74\xbb\x10\x86\x57\x46"
+			  "\x68\xe1\xed\x14\xe7\x9d\xfc\x84"
+			  "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf"
+			  "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d"
+			  "\x7b\x4f\x38\x55\x36\x71\x64\xc1"
+			  "\xfc\x5c\x75\x52\x33\x02\x18\xf8"
+			  "\x17\xe1\x2b\xc2\x43\x39\xbd\x76"
+			  "\x9b\x63\x76\x32\x2f\x19\x72\x10"
+			  "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5"
+			  "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
 /* Cast6 test vectors from RFC 2612 */
 static const struct cipher_testvec cast6_enc_tv_template[] = {
 	{
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/5] crypto: speck - add test vectors for Speck64-XTS
@ 2018-02-12 23:52   ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-12 23:52 UTC (permalink / raw)
  To: linux-arm-kernel

Add test vectors for Speck64-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors, with key lengths
adjusted.

xts-speck64-neon passes these tests.  However, they aren't currently
applicable for the generic XTS template, as that only supports a 128-bit
block size.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 671 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 680 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index e011a347d51b..9f82e7bc9c56 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3584,6 +3584,15 @@ static const struct alg_test_desc alg_test_descs[] = {
 				.dec = __VECS(speck128_xts_dec_tv_template)
 			}
 		}
+	}, {
+		.alg = "xts(speck64)",
+		.test = alg_test_skcipher,
+		.suite = {
+			.cipher = {
+				.enc = __VECS(speck64_xts_enc_tv_template),
+				.dec = __VECS(speck64_xts_dec_tv_template)
+			}
+		}
 	}, {
 		.alg = "xts(twofish)",
 		.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 0212e0ebcd0c..da72fd394f35 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -15138,6 +15138,677 @@ static const struct cipher_testvec speck64_dec_tv_template[] = {
 	},
 };
 
+/*
+ * Speck64-XTS test vectors, taken from the AES-XTS test vectors with the result
+ * recomputed with Speck64 as the cipher, and key lengths adjusted
+ */
+
+static const struct cipher_testvec speck64_xts_enc_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.ilen	= 32,
+		.result	= "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+			  "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+			  "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+			  "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+			  "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+			  "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+			  "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.ilen	= 32,
+		.result	= "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+			  "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+			  "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+			  "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e"
+			  "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09"
+			  "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3"
+			  "\x11\xc7\x39\x96\xd0\x95\xf4\x56"
+			  "\xf4\xdd\x03\x38\x01\x44\x2c\xcf"
+			  "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66"
+			  "\xfe\x3d\xc6\xfb\x01\x23\x51\x43"
+			  "\xd5\xd2\x13\x86\x94\x34\xe9\x62"
+			  "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef"
+			  "\x76\x35\x04\x3f\xdb\x23\x9d\x0b"
+			  "\x85\x42\xb9\x02\xd6\xcc\xdb\x96"
+			  "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d"
+			  "\xae\xd2\x04\xd5\xda\xc1\x7e\x24"
+			  "\x8c\x73\xbe\x48\x7e\xcf\x65\x28"
+			  "\x29\xe5\xbe\x54\x30\xcb\x46\x95"
+			  "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe"
+			  "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69"
+			  "\xa1\x09\x95\x71\x26\xe9\xc4\xdf"
+			  "\xe6\x31\xc3\x46\xda\xaf\x0b\x41"
+			  "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3"
+			  "\x82\xc0\x37\x27\xfc\x91\xa7\x05"
+			  "\xfb\xc5\xdc\x2b\x74\x96\x48\x43"
+			  "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f"
+			  "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a"
+			  "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c"
+			  "\x07\xff\xf3\x72\x74\x48\xb5\x40"
+			  "\x50\xb5\xdd\x90\x43\x31\x18\x15"
+			  "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a"
+			  "\x29\x93\x90\x8b\xda\x07\xf0\x35"
+			  "\x6d\x90\x88\x09\x4e\x83\xf5\x5b"
+			  "\x94\x12\xbb\x33\x27\x1d\x3f\x23"
+			  "\x51\xa8\x7c\x07\xa2\xae\x77\xa6"
+			  "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f"
+			  "\x66\xdd\xcd\x75\x24\x8b\x33\xf7"
+			  "\x20\xdb\x83\x9b\x4f\x11\x63\x6e"
+			  "\xcf\x37\xef\xc9\x11\x01\x5c\x45"
+			  "\x32\x99\x7c\x3c\x9e\x42\x89\xe3"
+			  "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05"
+			  "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc"
+			  "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d"
+			  "\xa0\xa8\x89\x3b\x73\x39\xa5\x94"
+			  "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89"
+			  "\x10\xff\xaf\xef\xca\xdd\x4f\x80"
+			  "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7"
+			  "\x33\xca\x00\x8b\x8b\x3f\xea\xec"
+			  "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f"
+			  "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5"
+			  "\x64\xa3\xf1\x1a\x76\x28\xcc\x35"
+			  "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b"
+			  "\xc7\x1b\x53\x17\x02\xea\xd1\xad"
+			  "\x13\x51\x73\xc0\xa0\xb2\x05\x32"
+			  "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19"
+			  "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d"
+			  "\x59\xda\xee\x1a\x22\x18\xda\x0d"
+			  "\x88\x0f\x55\x8b\x72\x62\xfd\xc1"
+			  "\x69\x13\xcd\x0d\x5f\xc1\x09\x52"
+			  "\xee\xd6\xe3\x84\x4d\xee\xf6\x88"
+			  "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f"
+			  "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54"
+			  "\x7d\x69\x8d\x00\x62\x77\x0d\x14"
+			  "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3"
+			  "\x50\xf7\x5f\xf4\xc2\xca\x41\x97"
+			  "\x37\xbe\x75\x74\xcd\xf0\x75\x6e"
+			  "\x25\x23\x94\xbd\xda\x8d\xb0\xd4",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27",
+		.klen	= 32,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.ilen	= 512,
+		.result	= "\x55\xed\x71\xd3\x02\x8e\x15\x3b"
+			  "\xc6\x71\x29\x2d\x3e\x89\x9f\x59"
+			  "\x68\x6a\xcc\x8a\x56\x97\xf3\x95"
+			  "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c"
+			  "\x78\x16\xea\x80\xdb\x33\x75\x94"
+			  "\xf9\x29\xc4\x2b\x76\x75\x97\xc7"
+			  "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b"
+			  "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee"
+			  "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a"
+			  "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c"
+			  "\xf5\xec\x32\x74\xa3\xb8\x03\x88"
+			  "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f"
+			  "\x84\x5e\x46\xed\x20\x89\xb6\x44"
+			  "\x8d\xd0\xed\x54\x47\x16\xbe\x95"
+			  "\x8a\xb3\x6b\x72\xc4\x32\x52\x13"
+			  "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6"
+			  "\x44\x18\xdd\x8c\x6e\xca\x6e\x45"
+			  "\x8f\x1e\x10\x07\x57\x25\x98\x7b"
+			  "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8"
+			  "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb"
+			  "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff"
+			  "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e"
+			  "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d"
+			  "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65"
+			  "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a"
+			  "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a"
+			  "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78"
+			  "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3"
+			  "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e"
+			  "\x35\x10\x30\x82\x0d\xe7\xc5\x9b"
+			  "\xde\x44\x18\xbd\x9f\xd1\x45\xa9"
+			  "\x7b\x7a\x4a\xad\x35\x65\x27\xca"
+			  "\xb2\xc3\xd4\x9b\x71\x86\x70\xee"
+			  "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf"
+			  "\xfc\x42\xc8\x31\x59\xbe\x16\x60"
+			  "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14"
+			  "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef"
+			  "\x52\x7f\x29\x51\x94\x20\x67\x3c"
+			  "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63"
+			  "\xe7\xff\x73\x25\xd1\xdd\x96\x8a"
+			  "\x98\x52\x6d\xf3\xac\x3e\xf2\x18"
+			  "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed"
+			  "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e"
+			  "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad"
+			  "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa"
+			  "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81"
+			  "\x65\x53\x0f\x41\x11\xbd\x98\x99"
+			  "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d"
+			  "\x84\x98\xf9\x34\xed\x33\x2a\x1f"
+			  "\x82\xed\xc1\x73\x98\xd3\x02\xdc"
+			  "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76"
+			  "\x63\x51\x34\x9d\x96\x12\xae\xce"
+			  "\x83\xc9\x76\x5e\xa4\x1b\x53\x37"
+			  "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d"
+			  "\x54\x27\x74\xbb\x10\x86\x57\x46"
+			  "\x68\xe1\xed\x14\xe7\x9d\xfc\x84"
+			  "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf"
+			  "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d"
+			  "\x7b\x4f\x38\x55\x36\x71\x64\xc1"
+			  "\xfc\x5c\x75\x52\x33\x02\x18\xf8"
+			  "\x17\xe1\x2b\xc2\x43\x39\xbd\x76"
+			  "\x9b\x63\x76\x32\x2f\x19\x72\x10"
+			  "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5"
+			  "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
+static const struct cipher_testvec speck64_xts_dec_tv_template[] = {
+	{
+		.key	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+			  "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+			  "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+			  "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+		.ilen	= 32,
+		.result	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.rlen	= 32,
+	}, {
+		.key	= "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x11\x11\x11\x11\x11\x11\x11\x11"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+			  "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+			  "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+			  "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+			  "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+			  "\x22\x22\x22\x22\x22\x22\x22\x22",
+		.klen	= 24,
+		.iv	= "\x33\x33\x33\x33\x33\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+			  "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+			  "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+			  "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+		.ilen	= 32,
+		.result	= "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44"
+			  "\x44\x44\x44\x44\x44\x44\x44\x44",
+		.rlen	= 32,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x31\x41\x59\x26\x53\x58\x97\x93",
+		.klen	= 24,
+		.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e"
+			  "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09"
+			  "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3"
+			  "\x11\xc7\x39\x96\xd0\x95\xf4\x56"
+			  "\xf4\xdd\x03\x38\x01\x44\x2c\xcf"
+			  "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66"
+			  "\xfe\x3d\xc6\xfb\x01\x23\x51\x43"
+			  "\xd5\xd2\x13\x86\x94\x34\xe9\x62"
+			  "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef"
+			  "\x76\x35\x04\x3f\xdb\x23\x9d\x0b"
+			  "\x85\x42\xb9\x02\xd6\xcc\xdb\x96"
+			  "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d"
+			  "\xae\xd2\x04\xd5\xda\xc1\x7e\x24"
+			  "\x8c\x73\xbe\x48\x7e\xcf\x65\x28"
+			  "\x29\xe5\xbe\x54\x30\xcb\x46\x95"
+			  "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe"
+			  "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69"
+			  "\xa1\x09\x95\x71\x26\xe9\xc4\xdf"
+			  "\xe6\x31\xc3\x46\xda\xaf\x0b\x41"
+			  "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3"
+			  "\x82\xc0\x37\x27\xfc\x91\xa7\x05"
+			  "\xfb\xc5\xdc\x2b\x74\x96\x48\x43"
+			  "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f"
+			  "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a"
+			  "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c"
+			  "\x07\xff\xf3\x72\x74\x48\xb5\x40"
+			  "\x50\xb5\xdd\x90\x43\x31\x18\x15"
+			  "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a"
+			  "\x29\x93\x90\x8b\xda\x07\xf0\x35"
+			  "\x6d\x90\x88\x09\x4e\x83\xf5\x5b"
+			  "\x94\x12\xbb\x33\x27\x1d\x3f\x23"
+			  "\x51\xa8\x7c\x07\xa2\xae\x77\xa6"
+			  "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f"
+			  "\x66\xdd\xcd\x75\x24\x8b\x33\xf7"
+			  "\x20\xdb\x83\x9b\x4f\x11\x63\x6e"
+			  "\xcf\x37\xef\xc9\x11\x01\x5c\x45"
+			  "\x32\x99\x7c\x3c\x9e\x42\x89\xe3"
+			  "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05"
+			  "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc"
+			  "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d"
+			  "\xa0\xa8\x89\x3b\x73\x39\xa5\x94"
+			  "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89"
+			  "\x10\xff\xaf\xef\xca\xdd\x4f\x80"
+			  "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7"
+			  "\x33\xca\x00\x8b\x8b\x3f\xea\xec"
+			  "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f"
+			  "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5"
+			  "\x64\xa3\xf1\x1a\x76\x28\xcc\x35"
+			  "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b"
+			  "\xc7\x1b\x53\x17\x02\xea\xd1\xad"
+			  "\x13\x51\x73\xc0\xa0\xb2\x05\x32"
+			  "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19"
+			  "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d"
+			  "\x59\xda\xee\x1a\x22\x18\xda\x0d"
+			  "\x88\x0f\x55\x8b\x72\x62\xfd\xc1"
+			  "\x69\x13\xcd\x0d\x5f\xc1\x09\x52"
+			  "\xee\xd6\xe3\x84\x4d\xee\xf6\x88"
+			  "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f"
+			  "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54"
+			  "\x7d\x69\x8d\x00\x62\x77\x0d\x14"
+			  "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3"
+			  "\x50\xf7\x5f\xf4\xc2\xca\x41\x97"
+			  "\x37\xbe\x75\x74\xcd\xf0\x75\x6e"
+			  "\x25\x23\x94\xbd\xda\x8d\xb0\xd4",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+	}, {
+		.key	= "\x27\x18\x28\x18\x28\x45\x90\x45"
+			  "\x23\x53\x60\x28\x74\x71\x35\x26"
+			  "\x62\x49\x77\x57\x24\x70\x93\x69"
+			  "\x99\x59\x57\x49\x66\x96\x76\x27",
+		.klen	= 32,
+		.iv	= "\xff\x00\x00\x00\x00\x00\x00\x00"
+			  "\x00\x00\x00\x00\x00\x00\x00\x00",
+		.input	= "\x55\xed\x71\xd3\x02\x8e\x15\x3b"
+			  "\xc6\x71\x29\x2d\x3e\x89\x9f\x59"
+			  "\x68\x6a\xcc\x8a\x56\x97\xf3\x95"
+			  "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c"
+			  "\x78\x16\xea\x80\xdb\x33\x75\x94"
+			  "\xf9\x29\xc4\x2b\x76\x75\x97\xc7"
+			  "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b"
+			  "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee"
+			  "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a"
+			  "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c"
+			  "\xf5\xec\x32\x74\xa3\xb8\x03\x88"
+			  "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f"
+			  "\x84\x5e\x46\xed\x20\x89\xb6\x44"
+			  "\x8d\xd0\xed\x54\x47\x16\xbe\x95"
+			  "\x8a\xb3\x6b\x72\xc4\x32\x52\x13"
+			  "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6"
+			  "\x44\x18\xdd\x8c\x6e\xca\x6e\x45"
+			  "\x8f\x1e\x10\x07\x57\x25\x98\x7b"
+			  "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8"
+			  "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb"
+			  "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff"
+			  "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e"
+			  "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d"
+			  "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65"
+			  "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a"
+			  "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a"
+			  "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78"
+			  "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3"
+			  "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e"
+			  "\x35\x10\x30\x82\x0d\xe7\xc5\x9b"
+			  "\xde\x44\x18\xbd\x9f\xd1\x45\xa9"
+			  "\x7b\x7a\x4a\xad\x35\x65\x27\xca"
+			  "\xb2\xc3\xd4\x9b\x71\x86\x70\xee"
+			  "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf"
+			  "\xfc\x42\xc8\x31\x59\xbe\x16\x60"
+			  "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14"
+			  "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef"
+			  "\x52\x7f\x29\x51\x94\x20\x67\x3c"
+			  "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63"
+			  "\xe7\xff\x73\x25\xd1\xdd\x96\x8a"
+			  "\x98\x52\x6d\xf3\xac\x3e\xf2\x18"
+			  "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed"
+			  "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e"
+			  "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad"
+			  "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa"
+			  "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81"
+			  "\x65\x53\x0f\x41\x11\xbd\x98\x99"
+			  "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d"
+			  "\x84\x98\xf9\x34\xed\x33\x2a\x1f"
+			  "\x82\xed\xc1\x73\x98\xd3\x02\xdc"
+			  "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76"
+			  "\x63\x51\x34\x9d\x96\x12\xae\xce"
+			  "\x83\xc9\x76\x5e\xa4\x1b\x53\x37"
+			  "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d"
+			  "\x54\x27\x74\xbb\x10\x86\x57\x46"
+			  "\x68\xe1\xed\x14\xe7\x9d\xfc\x84"
+			  "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf"
+			  "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d"
+			  "\x7b\x4f\x38\x55\x36\x71\x64\xc1"
+			  "\xfc\x5c\x75\x52\x33\x02\x18\xf8"
+			  "\x17\xe1\x2b\xc2\x43\x39\xbd\x76"
+			  "\x9b\x63\x76\x32\x2f\x19\x72\x10"
+			  "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5"
+			  "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c",
+		.ilen	= 512,
+		.result	= "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+			  "\x00\x01\x02\x03\x04\x05\x06\x07"
+			  "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+			  "\x10\x11\x12\x13\x14\x15\x16\x17"
+			  "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+			  "\x20\x21\x22\x23\x24\x25\x26\x27"
+			  "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+			  "\x30\x31\x32\x33\x34\x35\x36\x37"
+			  "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+			  "\x40\x41\x42\x43\x44\x45\x46\x47"
+			  "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+			  "\x50\x51\x52\x53\x54\x55\x56\x57"
+			  "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+			  "\x60\x61\x62\x63\x64\x65\x66\x67"
+			  "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+			  "\x70\x71\x72\x73\x74\x75\x76\x77"
+			  "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+			  "\x80\x81\x82\x83\x84\x85\x86\x87"
+			  "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+			  "\x90\x91\x92\x93\x94\x95\x96\x97"
+			  "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+			  "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+			  "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+			  "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+			  "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+			  "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+			  "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+			  "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+			  "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+			  "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+			  "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+			  "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+			  "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+		.rlen	= 512,
+		.also_non_np = 1,
+		.np	= 3,
+		.tap	= { 512 - 20, 4, 16 },
+	}
+};
+
 /* Cast6 test vectors from RFC 2612 */
 static const struct cipher_testvec cast6_enc_tv_template[] = {
 	{
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  2018-02-12 23:52   ` Eric Biggers
@ 2018-02-13 11:34     ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 11:34 UTC (permalink / raw)
  To: Eric Biggers
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman

Hi Eric,

On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
> Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
> 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
> Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
> encrypted/decrypted (doing one cipher round for all the blocks, then the
> next round, etc.), then goes through XTS postprocessing.
>
> The performance depends on the processor but can be about 3 times faster
> than the generic code.  For example, on an ARMv7 processor we observe
> the following performance with Speck128/256-XTS:
>
>     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
>     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>
> In comparison to AES-256-XTS without the Cryptography Extensions:
>
>     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
>     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
>     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
>
> Speck64/128-XTS is even faster:
>
>     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
>
> Note that as with the generic code, only the Speck128 and Speck64
> variants are supported.  Also, for now only the XTS mode of operation is
> supported, to target the disk and file encryption use cases.  The NEON
> code also only handles the portion of the data that is evenly divisible
> into 128-byte chunks, with any remainder handled by a C fallback.  Of
> course, other modes of operation could be added later if needed, and/or
> the NEON code could be updated to handle other buffer sizes.
>
> The XTS specification is only defined for AES which has a 128-bit block
> size, so for the GF(2^64) math needed for Speck64-XTS we use the
> reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
> paper.  Of course, when possible users should use Speck128-XTS, but even
> that may be too slow on some processors; Speck64-XTS can be faster.
>

I think this is excellent work. Speck seems an appropriate solution to
this problem, and I'm glad we are not ending up with a stream cipher
for block encryption.

Also, I think an arm64 port would be nice. I may take a stab at this
if nobody else beats me to it.

I did run into an issue with this code though: On big-endian, I get

[    0.272381] alg: skcipher: Test 1 failed (invalid result) on
encryption for xts-speck64-neon
[    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
[    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b

so there may be a byte order corner case you missed in the rewrite (or
the issue existed before, as I did not test your v1)

-- 
Ard.


> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/arm/crypto/Kconfig           |   6 +
>  arch/arm/crypto/Makefile          |   2 +
>  arch/arm/crypto/speck-neon-core.S | 432 ++++++++++++++++++++++++++++++++++++++
>  arch/arm/crypto/speck-neon-glue.c | 290 +++++++++++++++++++++++++
>  4 files changed, 730 insertions(+)
>  create mode 100644 arch/arm/crypto/speck-neon-core.S
>  create mode 100644 arch/arm/crypto/speck-neon-glue.c
>
> diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
> index b8e69fe282b8..925d1364727a 100644
> --- a/arch/arm/crypto/Kconfig
> +++ b/arch/arm/crypto/Kconfig
> @@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
>         select CRYPTO_BLKCIPHER
>         select CRYPTO_CHACHA20
>
> +config CRYPTO_SPECK_NEON
> +       tristate "NEON accelerated Speck cipher algorithms"
> +       depends on KERNEL_MODE_NEON
> +       select CRYPTO_BLKCIPHER
> +       select CRYPTO_SPECK
> +
>  endif
> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
> index 30ef8e291271..a758107c5525 100644
> --- a/arch/arm/crypto/Makefile
> +++ b/arch/arm/crypto/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>  obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
>  obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>
>  ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
>  ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
> @@ -53,6 +54,7 @@ ghash-arm-ce-y        := ghash-ce-core.o ghash-ce-glue.o
>  crct10dif-arm-ce-y     := crct10dif-ce-core.o crct10dif-ce-glue.o
>  crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
>  chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
>  quiet_cmd_perl = PERL    $@
>        cmd_perl = $(PERL) $(<) > $(@)
> diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
> new file mode 100644
> index 000000000000..3c1e203e53b9
> --- /dev/null
> +++ b/arch/arm/crypto/speck-neon-core.S
> @@ -0,0 +1,432 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> + *
> + * Copyright (c) 2018 Google, Inc
> + *
> + * Author: Eric Biggers <ebiggers@google.com>
> + */
> +
> +#include <linux/linkage.h>
> +
> +       .text
> +       .fpu            neon
> +
> +       // arguments
> +       ROUND_KEYS      .req    r0      // const {u64,u32} *round_keys
> +       NROUNDS         .req    r1      // int nrounds
> +       DST             .req    r2      // void *dst
> +       SRC             .req    r3      // const void *src
> +       NBYTES          .req    r4      // unsigned int nbytes
> +       TWEAK           .req    r5      // void *tweak
> +
> +       // registers which hold the data being encrypted/decrypted
> +       X0              .req    q0
> +       X0_L            .req    d0
> +       X0_H            .req    d1
> +       Y0              .req    q1
> +       Y0_H            .req    d3
> +       X1              .req    q2
> +       X1_L            .req    d4
> +       X1_H            .req    d5
> +       Y1              .req    q3
> +       Y1_H            .req    d7
> +       X2              .req    q4
> +       X2_L            .req    d8
> +       X2_H            .req    d9
> +       Y2              .req    q5
> +       Y2_H            .req    d11
> +       X3              .req    q6
> +       X3_L            .req    d12
> +       X3_H            .req    d13
> +       Y3              .req    q7
> +       Y3_H            .req    d15
> +
> +       // the round key, duplicated in all lanes
> +       ROUND_KEY       .req    q8
> +       ROUND_KEY_L     .req    d16
> +       ROUND_KEY_H     .req    d17
> +
> +       // index vector for vtbl-based 8-bit rotates
> +       ROTATE_TABLE    .req    d18
> +
> +       // multiplication table for updating XTS tweaks
> +       GF128MUL_TABLE  .req    d19
> +       GF64MUL_TABLE   .req    d19
> +
> +       // current XTS tweak value(s)
> +       TWEAKV          .req    q10
> +       TWEAKV_L        .req    d20
> +       TWEAKV_H        .req    d21
> +
> +       TMP0            .req    q12
> +       TMP0_L          .req    d24
> +       TMP0_H          .req    d25
> +       TMP1            .req    q13
> +       TMP2            .req    q14
> +       TMP3            .req    q15
> +
> +       .align          4
> +.Lror64_8_table:
> +       .byte           1, 2, 3, 4, 5, 6, 7, 0
> +.Lror32_8_table:
> +       .byte           1, 2, 3, 0, 5, 6, 7, 4
> +.Lrol64_8_table:
> +       .byte           7, 0, 1, 2, 3, 4, 5, 6
> +.Lrol32_8_table:
> +       .byte           3, 0, 1, 2, 7, 4, 5, 6
> +.Lgf128mul_table:
> +       .byte           0, 0x87
> +       .fill           14
> +.Lgf64mul_table:
> +       .byte           0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
> +       .fill           12
> +
> +/*
> + * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
> + *
> + * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
> + * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
> + * of ROUND_KEY.  'n' is the lane size: 64 for Speck128, or 32 for Speck64.
> + *
> + * The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
> + * the vtbl approach is faster on some processors and the same speed on others.
> + */
> +.macro _speck_round_128bytes   n
> +
> +       // x = ror(x, 8)
> +       vtbl.8          X0_L, {X0_L}, ROTATE_TABLE
> +       vtbl.8          X0_H, {X0_H}, ROTATE_TABLE
> +       vtbl.8          X1_L, {X1_L}, ROTATE_TABLE
> +       vtbl.8          X1_H, {X1_H}, ROTATE_TABLE
> +       vtbl.8          X2_L, {X2_L}, ROTATE_TABLE
> +       vtbl.8          X2_H, {X2_H}, ROTATE_TABLE
> +       vtbl.8          X3_L, {X3_L}, ROTATE_TABLE
> +       vtbl.8          X3_H, {X3_H}, ROTATE_TABLE
> +
> +       // x += y
> +       vadd.u\n        X0, Y0
> +       vadd.u\n        X1, Y1
> +       vadd.u\n        X2, Y2
> +       vadd.u\n        X3, Y3
> +
> +       // x ^= k
> +       veor            X0, ROUND_KEY
> +       veor            X1, ROUND_KEY
> +       veor            X2, ROUND_KEY
> +       veor            X3, ROUND_KEY
> +
> +       // y = rol(y, 3)
> +       vshl.u\n        TMP0, Y0, #3
> +       vshl.u\n        TMP1, Y1, #3
> +       vshl.u\n        TMP2, Y2, #3
> +       vshl.u\n        TMP3, Y3, #3
> +       vsri.u\n        TMP0, Y0, #(\n - 3)
> +       vsri.u\n        TMP1, Y1, #(\n - 3)
> +       vsri.u\n        TMP2, Y2, #(\n - 3)
> +       vsri.u\n        TMP3, Y3, #(\n - 3)
> +
> +       // y ^= x
> +       veor            Y0, TMP0, X0
> +       veor            Y1, TMP1, X1
> +       veor            Y2, TMP2, X2
> +       veor            Y3, TMP3, X3
> +.endm
> +
> +/*
> + * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
> + *
> + * This is the inverse of _speck_round_128bytes().
> + */
> +.macro _speck_unround_128bytes n
> +
> +       // y ^= x
> +       veor            TMP0, Y0, X0
> +       veor            TMP1, Y1, X1
> +       veor            TMP2, Y2, X2
> +       veor            TMP3, Y3, X3
> +
> +       // y = ror(y, 3)
> +       vshr.u\n        Y0, TMP0, #3
> +       vshr.u\n        Y1, TMP1, #3
> +       vshr.u\n        Y2, TMP2, #3
> +       vshr.u\n        Y3, TMP3, #3
> +       vsli.u\n        Y0, TMP0, #(\n - 3)
> +       vsli.u\n        Y1, TMP1, #(\n - 3)
> +       vsli.u\n        Y2, TMP2, #(\n - 3)
> +       vsli.u\n        Y3, TMP3, #(\n - 3)
> +
> +       // x ^= k
> +       veor            X0, ROUND_KEY
> +       veor            X1, ROUND_KEY
> +       veor            X2, ROUND_KEY
> +       veor            X3, ROUND_KEY
> +
> +       // x -= y
> +       vsub.u\n        X0, Y0
> +       vsub.u\n        X1, Y1
> +       vsub.u\n        X2, Y2
> +       vsub.u\n        X3, Y3
> +
> +       // x = rol(x, 8);
> +       vtbl.8          X0_L, {X0_L}, ROTATE_TABLE
> +       vtbl.8          X0_H, {X0_H}, ROTATE_TABLE
> +       vtbl.8          X1_L, {X1_L}, ROTATE_TABLE
> +       vtbl.8          X1_H, {X1_H}, ROTATE_TABLE
> +       vtbl.8          X2_L, {X2_L}, ROTATE_TABLE
> +       vtbl.8          X2_H, {X2_H}, ROTATE_TABLE
> +       vtbl.8          X3_L, {X3_L}, ROTATE_TABLE
> +       vtbl.8          X3_H, {X3_H}, ROTATE_TABLE
> +.endm
> +
> +.macro _xts128_precrypt_one    dst_reg, tweak_buf, tmp
> +
> +       // Load the next source block
> +       vld1.8          {\dst_reg}, [SRC]!
> +
> +       // Save the current tweak in the tweak buffer
> +       vst1.8          {TWEAKV}, [\tweak_buf:128]!
> +
> +       // XOR the next source block with the current tweak
> +       veor            \dst_reg, TWEAKV
> +
> +       /*
> +        * Calculate the next tweak by multiplying the current one by x,
> +        * modulo p(x) = x^128 + x^7 + x^2 + x + 1.
> +        */
> +       vshr.u64        \tmp, TWEAKV, #63
> +       vshl.u64        TWEAKV, #1
> +       veor            TWEAKV_H, \tmp\()_L
> +       vtbl.8          \tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
> +       veor            TWEAKV_L, \tmp\()_H
> +.endm
> +
> +.macro _xts64_precrypt_two     dst_reg, tweak_buf, tmp
> +
> +       // Load the next two source blocks
> +       vld1.8          {\dst_reg}, [SRC]!
> +
> +       // Save the current two tweaks in the tweak buffer
> +       vst1.8          {TWEAKV}, [\tweak_buf:128]!
> +
> +       // XOR the next two source blocks with the current two tweaks
> +       veor            \dst_reg, TWEAKV
> +
> +       /*
> +        * Calculate the next two tweaks by multiplying the current ones by x^2,
> +        * modulo p(x) = x^64 + x^4 + x^3 + x + 1.
> +        */
> +       vshr.u64        \tmp, TWEAKV, #62
> +       vshl.u64        TWEAKV, #2
> +       vtbl.8          \tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
> +       vtbl.8          \tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
> +       veor            TWEAKV, \tmp
> +.endm
> +
> +/*
> + * _speck_xts_crypt() - Speck-XTS encryption/decryption
> + *
> + * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
> + * using Speck-XTS, specifically the variant with a block size of '2n' and round
> + * count given by NROUNDS.  The expanded round keys are given in ROUND_KEYS, and
> + * the current XTS tweak value is given in TWEAK.  It's assumed that NBYTES is a
> + * nonzero multiple of 128.
> + */
> +.macro _speck_xts_crypt        n, decrypting
> +       push            {r4-r7}
> +       mov             r7, sp
> +
> +       /*
> +        * The first four parameters were passed in registers r0-r3.  Load the
> +        * additional parameters, which were passed on the stack.
> +        */
> +       ldr             NBYTES, [sp, #16]
> +       ldr             TWEAK, [sp, #20]
> +
> +       /*
> +        * If decrypting, modify the ROUND_KEYS parameter to point to the last
> +        * round key rather than the first, since for decryption the round keys
> +        * are used in reverse order.
> +        */
> +.if \decrypting
> +.if \n == 64
> +       add             ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
> +       sub             ROUND_KEYS, #8
> +.else
> +       add             ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
> +       sub             ROUND_KEYS, #4
> +.endif
> +.endif
> +
> +       // Load the index vector for vtbl-based 8-bit rotates
> +.if \decrypting
> +       ldr             r12, =.Lrol\n\()_8_table
> +.else
> +       ldr             r12, =.Lror\n\()_8_table
> +.endif
> +       vld1.8          {ROTATE_TABLE}, [r12:64]
> +
> +       // One-time XTS preparation
> +
> +       /*
> +        * Allocate stack space to store 128 bytes worth of tweaks.  For
> +        * performance, this space is aligned to a 16-byte boundary so that we
> +        * can use the load/store instructions that declare 16-byte alignment.
> +        */
> +       sub             sp, #128
> +       bic             sp, #0xf
> +
> +.if \n == 64
> +       // Load first tweak
> +       vld1.8          {TWEAKV}, [TWEAK]
> +
> +       // Load GF(2^128) multiplication table
> +       ldr             r12, =.Lgf128mul_table
> +       vld1.8          {GF128MUL_TABLE}, [r12:64]
> +.else
> +       // Load first tweak
> +       vld1.8          {TWEAKV_L}, [TWEAK]
> +
> +       // Load GF(2^64) multiplication table
> +       ldr             r12, =.Lgf64mul_table
> +       vld1.8          {GF64MUL_TABLE}, [r12:64]
> +
> +       // Calculate second tweak, packing it together with the first
> +       vshr.u64        TMP0_L, TWEAKV_L, #63
> +       vtbl.u8         TMP0_L, {GF64MUL_TABLE}, TMP0_L
> +       vshl.u64        TWEAKV_H, TWEAKV_L, #1
> +       veor            TWEAKV_H, TMP0_L
> +.endif
> +
> +.Lnext_128bytes_\@:
> +
> +       /*
> +        * Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
> +        * values, and save the tweaks on the stack for later.  Then
> +        * de-interleave the 'x' and 'y' elements of each block, i.e. make it so
> +        * that the X[0-3] registers contain only the second halves of blocks,
> +        * and the Y[0-3] registers contain only the first halves of blocks.
> +        * (Speck uses the order (y, x) rather than the more intuitive (x, y).)
> +        */
> +       mov             r12, sp
> +.if \n == 64
> +       _xts128_precrypt_one    X0, r12, TMP0
> +       _xts128_precrypt_one    Y0, r12, TMP0
> +       _xts128_precrypt_one    X1, r12, TMP0
> +       _xts128_precrypt_one    Y1, r12, TMP0
> +       _xts128_precrypt_one    X2, r12, TMP0
> +       _xts128_precrypt_one    Y2, r12, TMP0
> +       _xts128_precrypt_one    X3, r12, TMP0
> +       _xts128_precrypt_one    Y3, r12, TMP0
> +       vswp            X0_L, Y0_H
> +       vswp            X1_L, Y1_H
> +       vswp            X2_L, Y2_H
> +       vswp            X3_L, Y3_H
> +.else
> +       _xts64_precrypt_two     X0, r12, TMP0
> +       _xts64_precrypt_two     Y0, r12, TMP0
> +       _xts64_precrypt_two     X1, r12, TMP0
> +       _xts64_precrypt_two     Y1, r12, TMP0
> +       _xts64_precrypt_two     X2, r12, TMP0
> +       _xts64_precrypt_two     Y2, r12, TMP0
> +       _xts64_precrypt_two     X3, r12, TMP0
> +       _xts64_precrypt_two     Y3, r12, TMP0
> +       vuzp.32         Y0, X0
> +       vuzp.32         Y1, X1
> +       vuzp.32         Y2, X2
> +       vuzp.32         Y3, X3
> +.endif
> +
> +       // Do the cipher rounds
> +
> +       mov             r12, ROUND_KEYS
> +       mov             r6, NROUNDS
> +
> +.Lnext_round_\@:
> +.if \decrypting
> +.if \n == 64
> +       vld1.64         ROUND_KEY_L, [r12]
> +       sub             r12, #8
> +       vmov            ROUND_KEY_H, ROUND_KEY_L
> +.else
> +       vld1.32         {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
> +       sub             r12, #4
> +.endif
> +       _speck_unround_128bytes \n
> +.else
> +.if \n == 64
> +       vld1.64         ROUND_KEY_L, [r12]!
> +       vmov            ROUND_KEY_H, ROUND_KEY_L
> +.else
> +       vld1.32         {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
> +.endif
> +       _speck_round_128bytes   \n
> +.endif
> +       subs            r6, r6, #1
> +       bne             .Lnext_round_\@
> +
> +       // Re-interleave the 'x' and 'y' elements of each block
> +.if \n == 64
> +       vswp            X0_L, Y0_H
> +       vswp            X1_L, Y1_H
> +       vswp            X2_L, Y2_H
> +       vswp            X3_L, Y3_H
> +.else
> +       vzip.32         Y0, X0
> +       vzip.32         Y1, X1
> +       vzip.32         Y2, X2
> +       vzip.32         Y3, X3
> +.endif
> +
> +       // XOR the encrypted/decrypted blocks with the tweaks we saved earlier
> +       mov             r12, sp
> +       vld1.8          {TMP0, TMP1}, [r12:128]!
> +       vld1.8          {TMP2, TMP3}, [r12:128]!
> +       veor            X0, TMP0
> +       veor            Y0, TMP1
> +       veor            X1, TMP2
> +       veor            Y1, TMP3
> +       vld1.8          {TMP0, TMP1}, [r12:128]!
> +       vld1.8          {TMP2, TMP3}, [r12:128]!
> +       veor            X2, TMP0
> +       veor            Y2, TMP1
> +       veor            X3, TMP2
> +       veor            Y3, TMP3
> +
> +       // Store the ciphertext in the destination buffer
> +       vst1.8          {X0, Y0}, [DST]!
> +       vst1.8          {X1, Y1}, [DST]!
> +       vst1.8          {X2, Y2}, [DST]!
> +       vst1.8          {X3, Y3}, [DST]!
> +
> +       // Continue if there are more 128-byte chunks remaining, else return
> +       subs            NBYTES, #128
> +       bne             .Lnext_128bytes_\@
> +
> +       // Store the next tweak
> +.if \n == 64
> +       vst1.8          {TWEAKV}, [TWEAK]
> +.else
> +       vst1.8          {TWEAKV_L}, [TWEAK]
> +.endif
> +
> +       mov             sp, r7
> +       pop             {r4-r7}
> +       bx              lr
> +.endm
> +
> +ENTRY(speck128_xts_encrypt_neon)
> +       _speck_xts_crypt        n=64, decrypting=0
> +ENDPROC(speck128_xts_encrypt_neon)
> +
> +ENTRY(speck128_xts_decrypt_neon)
> +       _speck_xts_crypt        n=64, decrypting=1
> +ENDPROC(speck128_xts_decrypt_neon)
> +
> +ENTRY(speck64_xts_encrypt_neon)
> +       _speck_xts_crypt        n=32, decrypting=0
> +ENDPROC(speck64_xts_encrypt_neon)
> +
> +ENTRY(speck64_xts_decrypt_neon)
> +       _speck_xts_crypt        n=32, decrypting=1
> +ENDPROC(speck64_xts_decrypt_neon)
> diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
> new file mode 100644
> index 000000000000..3987dd6e063e
> --- /dev/null
> +++ b/arch/arm/crypto/speck-neon-glue.c
> @@ -0,0 +1,290 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> + *
> + * Copyright (c) 2018 Google, Inc
> + *
> + * Note: the NIST recommendation for XTS only specifies a 128-bit block size,
> + * but a 64-bit version (needed for Speck64) is fairly straightforward; the math
> + * is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
> + * x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
> + * "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
> + * OCB and PMAC"), represented as 0x1B.
> + */
> +
> +#include <asm/hwcap.h>
> +#include <asm/neon.h>
> +#include <asm/simd.h>
> +#include <crypto/algapi.h>
> +#include <crypto/gf128mul.h>
> +#include <crypto/internal/skcipher.h>
> +#include <crypto/speck.h>
> +#include <crypto/xts.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +/* The assembly functions only handle multiples of 128 bytes */
> +#define SPECK_NEON_CHUNK_SIZE  128
> +
> +/* Speck128 */
> +
> +struct speck128_xts_tfm_ctx {
> +       struct speck128_tfm_ctx main_key;
> +       struct speck128_tfm_ctx tweak_key;
> +};
> +
> +asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
> +                                         void *dst, const void *src,
> +                                         unsigned int nbytes, void *tweak);
> +
> +asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
> +                                         void *dst, const void *src,
> +                                         unsigned int nbytes, void *tweak);
> +
> +typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
> +                                    u8 *, const u8 *);
> +typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
> +                                         const void *, unsigned int, void *);
> +
> +
> +static __always_inline int
> +__speck128_xts_crypt(struct skcipher_request *req,
> +                    speck128_crypt_one_t crypt_one,
> +                    speck128_xts_crypt_many_t crypt_many)
> +{
> +       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +       const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       struct skcipher_walk walk;
> +       le128 tweak;
> +       int err;
> +
> +       err = skcipher_walk_virt(&walk, req, true);
> +
> +       crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
> +
> +       while (walk.nbytes > 0) {
> +               unsigned int nbytes = walk.nbytes;
> +               u8 *dst = walk.dst.virt.addr;
> +               const u8 *src = walk.src.virt.addr;
> +
> +               if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
> +                       unsigned int count;
> +
> +                       count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
> +                       kernel_neon_begin();
> +                       (*crypt_many)(ctx->main_key.round_keys,
> +                                     ctx->main_key.nrounds,
> +                                     dst, src, count, &tweak);
> +                       kernel_neon_end();
> +                       dst += count;
> +                       src += count;
> +                       nbytes -= count;
> +               }
> +
> +               /* Handle any remainder with generic code */
> +               while (nbytes >= sizeof(le128)) {
> +                       le128_xor((le128 *)dst, (const le128 *)src, &tweak);
> +                       (*crypt_one)(&ctx->main_key, dst, dst);
> +                       le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
> +                       gf128mul_x_ble(&tweak, &tweak);
> +
> +                       dst += sizeof(le128);
> +                       src += sizeof(le128);
> +                       nbytes -= sizeof(le128);
> +               }
> +               err = skcipher_walk_done(&walk, nbytes);
> +       }
> +
> +       return err;
> +}
> +
> +static int speck128_xts_encrypt(struct skcipher_request *req)
> +{
> +       return __speck128_xts_crypt(req, crypto_speck128_encrypt,
> +                                   speck128_xts_encrypt_neon);
> +
> +}
> +
> +static int speck128_xts_decrypt(struct skcipher_request *req)
> +{
> +       return __speck128_xts_crypt(req, crypto_speck128_decrypt,
> +                                   speck128_xts_decrypt_neon);
> +}
> +
> +static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
> +                              unsigned int keylen)
> +{
> +       struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       int err;
> +
> +       err = xts_verify_key(tfm, key, keylen);
> +       if (err)
> +               return err;
> +
> +       keylen /= 2;
> +
> +       err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
> +       if (err)
> +               return err;
> +
> +       return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
> +}
> +
> +/* Speck64 */
> +
> +struct speck64_xts_tfm_ctx {
> +       struct speck64_tfm_ctx main_key;
> +       struct speck64_tfm_ctx tweak_key;
> +};
> +
> +asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
> +                                        void *dst, const void *src,
> +                                        unsigned int nbytes, void *tweak);
> +
> +asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
> +                                        void *dst, const void *src,
> +                                        unsigned int nbytes, void *tweak);
> +
> +typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
> +                                   u8 *, const u8 *);
> +typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
> +                                        const void *, unsigned int, void *);
> +
> +static __always_inline int
> +__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
> +                   speck64_xts_crypt_many_t crypt_many)
> +{
> +       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +       const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       struct skcipher_walk walk;
> +       u64 tweak;
> +       int err;
> +
> +       err = skcipher_walk_virt(&walk, req, true);
> +
> +       crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
> +
> +       while (walk.nbytes > 0) {
> +               unsigned int nbytes = walk.nbytes;
> +               u8 *dst = walk.dst.virt.addr;
> +               const u8 *src = walk.src.virt.addr;
> +
> +               if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
> +                       unsigned int count;
> +
> +                       count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
> +                       kernel_neon_begin();
> +                       (*crypt_many)(ctx->main_key.round_keys,
> +                                     ctx->main_key.nrounds,
> +                                     dst, src, count, &tweak);
> +                       kernel_neon_end();
> +                       dst += count;
> +                       src += count;
> +                       nbytes -= count;
> +               }
> +
> +               /* Handle any remainder with generic code */
> +               while (nbytes >= sizeof(u64)) {
> +                       *(u64 *)dst = *(u64 *)src ^ tweak;
> +                       (*crypt_one)(&ctx->main_key, dst, dst);
> +                       *(u64 *)dst ^= tweak;
> +                       tweak = (tweak << 1) ^
> +                               ((tweak & (1ULL << 63)) ? 0x1B : 0);
> +
> +                       dst += sizeof(u64);
> +                       src += sizeof(u64);
> +                       nbytes -= sizeof(u64);
> +               }
> +               err = skcipher_walk_done(&walk, nbytes);
> +       }
> +
> +       return err;
> +}
> +
> +static int speck64_xts_encrypt(struct skcipher_request *req)
> +{
> +       return __speck64_xts_crypt(req, crypto_speck64_encrypt,
> +                                  speck64_xts_encrypt_neon);
> +}
> +
> +static int speck64_xts_decrypt(struct skcipher_request *req)
> +{
> +       return __speck64_xts_crypt(req, crypto_speck64_decrypt,
> +                                  speck64_xts_decrypt_neon);
> +}
> +
> +static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
> +                             unsigned int keylen)
> +{
> +       struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       int err;
> +
> +       err = xts_verify_key(tfm, key, keylen);
> +       if (err)
> +               return err;
> +
> +       keylen /= 2;
> +
> +       err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
> +       if (err)
> +               return err;
> +
> +       return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
> +}
> +
> +static struct skcipher_alg speck_algs[] = {
> +       {
> +               .base.cra_name          = "xts(speck128)",
> +               .base.cra_driver_name   = "xts-speck128-neon",
> +               .base.cra_priority      = 300,
> +               .base.cra_blocksize     = SPECK128_BLOCK_SIZE,
> +               .base.cra_ctxsize       = sizeof(struct speck128_xts_tfm_ctx),
> +               .base.cra_alignmask     = 7,
> +               .base.cra_module        = THIS_MODULE,
> +               .min_keysize            = 2 * SPECK128_128_KEY_SIZE,
> +               .max_keysize            = 2 * SPECK128_256_KEY_SIZE,
> +               .ivsize                 = SPECK128_BLOCK_SIZE,
> +               .walksize               = SPECK_NEON_CHUNK_SIZE,
> +               .setkey                 = speck128_xts_setkey,
> +               .encrypt                = speck128_xts_encrypt,
> +               .decrypt                = speck128_xts_decrypt,
> +       }, {
> +               .base.cra_name          = "xts(speck64)",
> +               .base.cra_driver_name   = "xts-speck64-neon",
> +               .base.cra_priority      = 300,
> +               .base.cra_blocksize     = SPECK64_BLOCK_SIZE,
> +               .base.cra_ctxsize       = sizeof(struct speck64_xts_tfm_ctx),
> +               .base.cra_alignmask     = 7,
> +               .base.cra_module        = THIS_MODULE,
> +               .min_keysize            = 2 * SPECK64_96_KEY_SIZE,
> +               .max_keysize            = 2 * SPECK64_128_KEY_SIZE,
> +               .ivsize                 = SPECK64_BLOCK_SIZE,
> +               .walksize               = SPECK_NEON_CHUNK_SIZE,
> +               .setkey                 = speck64_xts_setkey,
> +               .encrypt                = speck64_xts_encrypt,
> +               .decrypt                = speck64_xts_decrypt,
> +       }
> +};
> +
> +static int __init speck_neon_module_init(void)
> +{
> +       if (!(elf_hwcap & HWCAP_NEON))
> +               return -ENODEV;
> +       return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
> +}
> +
> +static void __exit speck_neon_module_exit(void)
> +{
> +       crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
> +}
> +
> +module_init(speck_neon_module_init);
> +module_exit(speck_neon_module_exit);
> +
> +MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
> +MODULE_ALIAS_CRYPTO("xts(speck128)");
> +MODULE_ALIAS_CRYPTO("xts-speck128-neon");
> +MODULE_ALIAS_CRYPTO("xts(speck64)");
> +MODULE_ALIAS_CRYPTO("xts-speck64-neon");
> --
> 2.16.0.rc1.238.g530d649a79-goog
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
@ 2018-02-13 11:34     ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
> Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
> 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
> Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
> encrypted/decrypted (doing one cipher round for all the blocks, then the
> next round, etc.), then goes through XTS postprocessing.
>
> The performance depends on the processor but can be about 3 times faster
> than the generic code.  For example, on an ARMv7 processor we observe
> the following performance with Speck128/256-XTS:
>
>     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
>     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>
> In comparison to AES-256-XTS without the Cryptography Extensions:
>
>     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
>     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
>     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
>
> Speck64/128-XTS is even faster:
>
>     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
>
> Note that as with the generic code, only the Speck128 and Speck64
> variants are supported.  Also, for now only the XTS mode of operation is
> supported, to target the disk and file encryption use cases.  The NEON
> code also only handles the portion of the data that is evenly divisible
> into 128-byte chunks, with any remainder handled by a C fallback.  Of
> course, other modes of operation could be added later if needed, and/or
> the NEON code could be updated to handle other buffer sizes.
>
> The XTS specification is only defined for AES which has a 128-bit block
> size, so for the GF(2^64) math needed for Speck64-XTS we use the
> reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
> paper.  Of course, when possible users should use Speck128-XTS, but even
> that may be too slow on some processors; Speck64-XTS can be faster.
>

I think this is excellent work. Speck seems an appropriate solution to
this problem, and I'm glad we are not ending up with a stream cipher
for block encryption.

Also, I think an arm64 port would be nice. I may take a stab at this
if nobody else beats me to it.

I did run into an issue with this code though: On big-endian, I get

[    0.272381] alg: skcipher: Test 1 failed (invalid result) on
encryption for xts-speck64-neon
[    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
[    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b

so there may be a byte order corner case you missed in the rewrite (or
the issue existed before, as I did not test your v1)

-- 
Ard.


> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/arm/crypto/Kconfig           |   6 +
>  arch/arm/crypto/Makefile          |   2 +
>  arch/arm/crypto/speck-neon-core.S | 432 ++++++++++++++++++++++++++++++++++++++
>  arch/arm/crypto/speck-neon-glue.c | 290 +++++++++++++++++++++++++
>  4 files changed, 730 insertions(+)
>  create mode 100644 arch/arm/crypto/speck-neon-core.S
>  create mode 100644 arch/arm/crypto/speck-neon-glue.c
>
> diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
> index b8e69fe282b8..925d1364727a 100644
> --- a/arch/arm/crypto/Kconfig
> +++ b/arch/arm/crypto/Kconfig
> @@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
>         select CRYPTO_BLKCIPHER
>         select CRYPTO_CHACHA20
>
> +config CRYPTO_SPECK_NEON
> +       tristate "NEON accelerated Speck cipher algorithms"
> +       depends on KERNEL_MODE_NEON
> +       select CRYPTO_BLKCIPHER
> +       select CRYPTO_SPECK
> +
>  endif
> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
> index 30ef8e291271..a758107c5525 100644
> --- a/arch/arm/crypto/Makefile
> +++ b/arch/arm/crypto/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>  obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
>  obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>
>  ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
>  ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
> @@ -53,6 +54,7 @@ ghash-arm-ce-y        := ghash-ce-core.o ghash-ce-glue.o
>  crct10dif-arm-ce-y     := crct10dif-ce-core.o crct10dif-ce-glue.o
>  crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
>  chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
>  quiet_cmd_perl = PERL    $@
>        cmd_perl = $(PERL) $(<) > $(@)
> diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
> new file mode 100644
> index 000000000000..3c1e203e53b9
> --- /dev/null
> +++ b/arch/arm/crypto/speck-neon-core.S
> @@ -0,0 +1,432 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> + *
> + * Copyright (c) 2018 Google, Inc
> + *
> + * Author: Eric Biggers <ebiggers@google.com>
> + */
> +
> +#include <linux/linkage.h>
> +
> +       .text
> +       .fpu            neon
> +
> +       // arguments
> +       ROUND_KEYS      .req    r0      // const {u64,u32} *round_keys
> +       NROUNDS         .req    r1      // int nrounds
> +       DST             .req    r2      // void *dst
> +       SRC             .req    r3      // const void *src
> +       NBYTES          .req    r4      // unsigned int nbytes
> +       TWEAK           .req    r5      // void *tweak
> +
> +       // registers which hold the data being encrypted/decrypted
> +       X0              .req    q0
> +       X0_L            .req    d0
> +       X0_H            .req    d1
> +       Y0              .req    q1
> +       Y0_H            .req    d3
> +       X1              .req    q2
> +       X1_L            .req    d4
> +       X1_H            .req    d5
> +       Y1              .req    q3
> +       Y1_H            .req    d7
> +       X2              .req    q4
> +       X2_L            .req    d8
> +       X2_H            .req    d9
> +       Y2              .req    q5
> +       Y2_H            .req    d11
> +       X3              .req    q6
> +       X3_L            .req    d12
> +       X3_H            .req    d13
> +       Y3              .req    q7
> +       Y3_H            .req    d15
> +
> +       // the round key, duplicated in all lanes
> +       ROUND_KEY       .req    q8
> +       ROUND_KEY_L     .req    d16
> +       ROUND_KEY_H     .req    d17
> +
> +       // index vector for vtbl-based 8-bit rotates
> +       ROTATE_TABLE    .req    d18
> +
> +       // multiplication table for updating XTS tweaks
> +       GF128MUL_TABLE  .req    d19
> +       GF64MUL_TABLE   .req    d19
> +
> +       // current XTS tweak value(s)
> +       TWEAKV          .req    q10
> +       TWEAKV_L        .req    d20
> +       TWEAKV_H        .req    d21
> +
> +       TMP0            .req    q12
> +       TMP0_L          .req    d24
> +       TMP0_H          .req    d25
> +       TMP1            .req    q13
> +       TMP2            .req    q14
> +       TMP3            .req    q15
> +
> +       .align          4
> +.Lror64_8_table:
> +       .byte           1, 2, 3, 4, 5, 6, 7, 0
> +.Lror32_8_table:
> +       .byte           1, 2, 3, 0, 5, 6, 7, 4
> +.Lrol64_8_table:
> +       .byte           7, 0, 1, 2, 3, 4, 5, 6
> +.Lrol32_8_table:
> +       .byte           3, 0, 1, 2, 7, 4, 5, 6
> +.Lgf128mul_table:
> +       .byte           0, 0x87
> +       .fill           14
> +.Lgf64mul_table:
> +       .byte           0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
> +       .fill           12
> +
> +/*
> + * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
> + *
> + * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
> + * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
> + * of ROUND_KEY.  'n' is the lane size: 64 for Speck128, or 32 for Speck64.
> + *
> + * The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
> + * the vtbl approach is faster on some processors and the same speed on others.
> + */
> +.macro _speck_round_128bytes   n
> +
> +       // x = ror(x, 8)
> +       vtbl.8          X0_L, {X0_L}, ROTATE_TABLE
> +       vtbl.8          X0_H, {X0_H}, ROTATE_TABLE
> +       vtbl.8          X1_L, {X1_L}, ROTATE_TABLE
> +       vtbl.8          X1_H, {X1_H}, ROTATE_TABLE
> +       vtbl.8          X2_L, {X2_L}, ROTATE_TABLE
> +       vtbl.8          X2_H, {X2_H}, ROTATE_TABLE
> +       vtbl.8          X3_L, {X3_L}, ROTATE_TABLE
> +       vtbl.8          X3_H, {X3_H}, ROTATE_TABLE
> +
> +       // x += y
> +       vadd.u\n        X0, Y0
> +       vadd.u\n        X1, Y1
> +       vadd.u\n        X2, Y2
> +       vadd.u\n        X3, Y3
> +
> +       // x ^= k
> +       veor            X0, ROUND_KEY
> +       veor            X1, ROUND_KEY
> +       veor            X2, ROUND_KEY
> +       veor            X3, ROUND_KEY
> +
> +       // y = rol(y, 3)
> +       vshl.u\n        TMP0, Y0, #3
> +       vshl.u\n        TMP1, Y1, #3
> +       vshl.u\n        TMP2, Y2, #3
> +       vshl.u\n        TMP3, Y3, #3
> +       vsri.u\n        TMP0, Y0, #(\n - 3)
> +       vsri.u\n        TMP1, Y1, #(\n - 3)
> +       vsri.u\n        TMP2, Y2, #(\n - 3)
> +       vsri.u\n        TMP3, Y3, #(\n - 3)
> +
> +       // y ^= x
> +       veor            Y0, TMP0, X0
> +       veor            Y1, TMP1, X1
> +       veor            Y2, TMP2, X2
> +       veor            Y3, TMP3, X3
> +.endm
> +
> +/*
> + * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
> + *
> + * This is the inverse of _speck_round_128bytes().
> + */
> +.macro _speck_unround_128bytes n
> +
> +       // y ^= x
> +       veor            TMP0, Y0, X0
> +       veor            TMP1, Y1, X1
> +       veor            TMP2, Y2, X2
> +       veor            TMP3, Y3, X3
> +
> +       // y = ror(y, 3)
> +       vshr.u\n        Y0, TMP0, #3
> +       vshr.u\n        Y1, TMP1, #3
> +       vshr.u\n        Y2, TMP2, #3
> +       vshr.u\n        Y3, TMP3, #3
> +       vsli.u\n        Y0, TMP0, #(\n - 3)
> +       vsli.u\n        Y1, TMP1, #(\n - 3)
> +       vsli.u\n        Y2, TMP2, #(\n - 3)
> +       vsli.u\n        Y3, TMP3, #(\n - 3)
> +
> +       // x ^= k
> +       veor            X0, ROUND_KEY
> +       veor            X1, ROUND_KEY
> +       veor            X2, ROUND_KEY
> +       veor            X3, ROUND_KEY
> +
> +       // x -= y
> +       vsub.u\n        X0, Y0
> +       vsub.u\n        X1, Y1
> +       vsub.u\n        X2, Y2
> +       vsub.u\n        X3, Y3
> +
> +       // x = rol(x, 8);
> +       vtbl.8          X0_L, {X0_L}, ROTATE_TABLE
> +       vtbl.8          X0_H, {X0_H}, ROTATE_TABLE
> +       vtbl.8          X1_L, {X1_L}, ROTATE_TABLE
> +       vtbl.8          X1_H, {X1_H}, ROTATE_TABLE
> +       vtbl.8          X2_L, {X2_L}, ROTATE_TABLE
> +       vtbl.8          X2_H, {X2_H}, ROTATE_TABLE
> +       vtbl.8          X3_L, {X3_L}, ROTATE_TABLE
> +       vtbl.8          X3_H, {X3_H}, ROTATE_TABLE
> +.endm
> +
> +.macro _xts128_precrypt_one    dst_reg, tweak_buf, tmp
> +
> +       // Load the next source block
> +       vld1.8          {\dst_reg}, [SRC]!
> +
> +       // Save the current tweak in the tweak buffer
> +       vst1.8          {TWEAKV}, [\tweak_buf:128]!
> +
> +       // XOR the next source block with the current tweak
> +       veor            \dst_reg, TWEAKV
> +
> +       /*
> +        * Calculate the next tweak by multiplying the current one by x,
> +        * modulo p(x) = x^128 + x^7 + x^2 + x + 1.
> +        */
> +       vshr.u64        \tmp, TWEAKV, #63
> +       vshl.u64        TWEAKV, #1
> +       veor            TWEAKV_H, \tmp\()_L
> +       vtbl.8          \tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
> +       veor            TWEAKV_L, \tmp\()_H
> +.endm
> +
> +.macro _xts64_precrypt_two     dst_reg, tweak_buf, tmp
> +
> +       // Load the next two source blocks
> +       vld1.8          {\dst_reg}, [SRC]!
> +
> +       // Save the current two tweaks in the tweak buffer
> +       vst1.8          {TWEAKV}, [\tweak_buf:128]!
> +
> +       // XOR the next two source blocks with the current two tweaks
> +       veor            \dst_reg, TWEAKV
> +
> +       /*
> +        * Calculate the next two tweaks by multiplying the current ones by x^2,
> +        * modulo p(x) = x^64 + x^4 + x^3 + x + 1.
> +        */
> +       vshr.u64        \tmp, TWEAKV, #62
> +       vshl.u64        TWEAKV, #2
> +       vtbl.8          \tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
> +       vtbl.8          \tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
> +       veor            TWEAKV, \tmp
> +.endm
> +
> +/*
> + * _speck_xts_crypt() - Speck-XTS encryption/decryption
> + *
> + * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
> + * using Speck-XTS, specifically the variant with a block size of '2n' and round
> + * count given by NROUNDS.  The expanded round keys are given in ROUND_KEYS, and
> + * the current XTS tweak value is given in TWEAK.  It's assumed that NBYTES is a
> + * nonzero multiple of 128.
> + */
> +.macro _speck_xts_crypt        n, decrypting
> +       push            {r4-r7}
> +       mov             r7, sp
> +
> +       /*
> +        * The first four parameters were passed in registers r0-r3.  Load the
> +        * additional parameters, which were passed on the stack.
> +        */
> +       ldr             NBYTES, [sp, #16]
> +       ldr             TWEAK, [sp, #20]
> +
> +       /*
> +        * If decrypting, modify the ROUND_KEYS parameter to point to the last
> +        * round key rather than the first, since for decryption the round keys
> +        * are used in reverse order.
> +        */
> +.if \decrypting
> +.if \n == 64
> +       add             ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
> +       sub             ROUND_KEYS, #8
> +.else
> +       add             ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
> +       sub             ROUND_KEYS, #4
> +.endif
> +.endif
> +
> +       // Load the index vector for vtbl-based 8-bit rotates
> +.if \decrypting
> +       ldr             r12, =.Lrol\n\()_8_table
> +.else
> +       ldr             r12, =.Lror\n\()_8_table
> +.endif
> +       vld1.8          {ROTATE_TABLE}, [r12:64]
> +
> +       // One-time XTS preparation
> +
> +       /*
> +        * Allocate stack space to store 128 bytes worth of tweaks.  For
> +        * performance, this space is aligned to a 16-byte boundary so that we
> +        * can use the load/store instructions that declare 16-byte alignment.
> +        */
> +       sub             sp, #128
> +       bic             sp, #0xf
> +
> +.if \n == 64
> +       // Load first tweak
> +       vld1.8          {TWEAKV}, [TWEAK]
> +
> +       // Load GF(2^128) multiplication table
> +       ldr             r12, =.Lgf128mul_table
> +       vld1.8          {GF128MUL_TABLE}, [r12:64]
> +.else
> +       // Load first tweak
> +       vld1.8          {TWEAKV_L}, [TWEAK]
> +
> +       // Load GF(2^64) multiplication table
> +       ldr             r12, =.Lgf64mul_table
> +       vld1.8          {GF64MUL_TABLE}, [r12:64]
> +
> +       // Calculate second tweak, packing it together with the first
> +       vshr.u64        TMP0_L, TWEAKV_L, #63
> +       vtbl.u8         TMP0_L, {GF64MUL_TABLE}, TMP0_L
> +       vshl.u64        TWEAKV_H, TWEAKV_L, #1
> +       veor            TWEAKV_H, TMP0_L
> +.endif
> +
> +.Lnext_128bytes_\@:
> +
> +       /*
> +        * Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
> +        * values, and save the tweaks on the stack for later.  Then
> +        * de-interleave the 'x' and 'y' elements of each block, i.e. make it so
> +        * that the X[0-3] registers contain only the second halves of blocks,
> +        * and the Y[0-3] registers contain only the first halves of blocks.
> +        * (Speck uses the order (y, x) rather than the more intuitive (x, y).)
> +        */
> +       mov             r12, sp
> +.if \n == 64
> +       _xts128_precrypt_one    X0, r12, TMP0
> +       _xts128_precrypt_one    Y0, r12, TMP0
> +       _xts128_precrypt_one    X1, r12, TMP0
> +       _xts128_precrypt_one    Y1, r12, TMP0
> +       _xts128_precrypt_one    X2, r12, TMP0
> +       _xts128_precrypt_one    Y2, r12, TMP0
> +       _xts128_precrypt_one    X3, r12, TMP0
> +       _xts128_precrypt_one    Y3, r12, TMP0
> +       vswp            X0_L, Y0_H
> +       vswp            X1_L, Y1_H
> +       vswp            X2_L, Y2_H
> +       vswp            X3_L, Y3_H
> +.else
> +       _xts64_precrypt_two     X0, r12, TMP0
> +       _xts64_precrypt_two     Y0, r12, TMP0
> +       _xts64_precrypt_two     X1, r12, TMP0
> +       _xts64_precrypt_two     Y1, r12, TMP0
> +       _xts64_precrypt_two     X2, r12, TMP0
> +       _xts64_precrypt_two     Y2, r12, TMP0
> +       _xts64_precrypt_two     X3, r12, TMP0
> +       _xts64_precrypt_two     Y3, r12, TMP0
> +       vuzp.32         Y0, X0
> +       vuzp.32         Y1, X1
> +       vuzp.32         Y2, X2
> +       vuzp.32         Y3, X3
> +.endif
> +
> +       // Do the cipher rounds
> +
> +       mov             r12, ROUND_KEYS
> +       mov             r6, NROUNDS
> +
> +.Lnext_round_\@:
> +.if \decrypting
> +.if \n == 64
> +       vld1.64         ROUND_KEY_L, [r12]
> +       sub             r12, #8
> +       vmov            ROUND_KEY_H, ROUND_KEY_L
> +.else
> +       vld1.32         {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
> +       sub             r12, #4
> +.endif
> +       _speck_unround_128bytes \n
> +.else
> +.if \n == 64
> +       vld1.64         ROUND_KEY_L, [r12]!
> +       vmov            ROUND_KEY_H, ROUND_KEY_L
> +.else
> +       vld1.32         {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
> +.endif
> +       _speck_round_128bytes   \n
> +.endif
> +       subs            r6, r6, #1
> +       bne             .Lnext_round_\@
> +
> +       // Re-interleave the 'x' and 'y' elements of each block
> +.if \n == 64
> +       vswp            X0_L, Y0_H
> +       vswp            X1_L, Y1_H
> +       vswp            X2_L, Y2_H
> +       vswp            X3_L, Y3_H
> +.else
> +       vzip.32         Y0, X0
> +       vzip.32         Y1, X1
> +       vzip.32         Y2, X2
> +       vzip.32         Y3, X3
> +.endif
> +
> +       // XOR the encrypted/decrypted blocks with the tweaks we saved earlier
> +       mov             r12, sp
> +       vld1.8          {TMP0, TMP1}, [r12:128]!
> +       vld1.8          {TMP2, TMP3}, [r12:128]!
> +       veor            X0, TMP0
> +       veor            Y0, TMP1
> +       veor            X1, TMP2
> +       veor            Y1, TMP3
> +       vld1.8          {TMP0, TMP1}, [r12:128]!
> +       vld1.8          {TMP2, TMP3}, [r12:128]!
> +       veor            X2, TMP0
> +       veor            Y2, TMP1
> +       veor            X3, TMP2
> +       veor            Y3, TMP3
> +
> +       // Store the ciphertext in the destination buffer
> +       vst1.8          {X0, Y0}, [DST]!
> +       vst1.8          {X1, Y1}, [DST]!
> +       vst1.8          {X2, Y2}, [DST]!
> +       vst1.8          {X3, Y3}, [DST]!
> +
> +       // Continue if there are more 128-byte chunks remaining, else return
> +       subs            NBYTES, #128
> +       bne             .Lnext_128bytes_\@
> +
> +       // Store the next tweak
> +.if \n == 64
> +       vst1.8          {TWEAKV}, [TWEAK]
> +.else
> +       vst1.8          {TWEAKV_L}, [TWEAK]
> +.endif
> +
> +       mov             sp, r7
> +       pop             {r4-r7}
> +       bx              lr
> +.endm
> +
> +ENTRY(speck128_xts_encrypt_neon)
> +       _speck_xts_crypt        n=64, decrypting=0
> +ENDPROC(speck128_xts_encrypt_neon)
> +
> +ENTRY(speck128_xts_decrypt_neon)
> +       _speck_xts_crypt        n=64, decrypting=1
> +ENDPROC(speck128_xts_decrypt_neon)
> +
> +ENTRY(speck64_xts_encrypt_neon)
> +       _speck_xts_crypt        n=32, decrypting=0
> +ENDPROC(speck64_xts_encrypt_neon)
> +
> +ENTRY(speck64_xts_decrypt_neon)
> +       _speck_xts_crypt        n=32, decrypting=1
> +ENDPROC(speck64_xts_decrypt_neon)
> diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
> new file mode 100644
> index 000000000000..3987dd6e063e
> --- /dev/null
> +++ b/arch/arm/crypto/speck-neon-glue.c
> @@ -0,0 +1,290 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> + *
> + * Copyright (c) 2018 Google, Inc
> + *
> + * Note: the NIST recommendation for XTS only specifies a 128-bit block size,
> + * but a 64-bit version (needed for Speck64) is fairly straightforward; the math
> + * is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
> + * x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
> + * "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
> + * OCB and PMAC"), represented as 0x1B.
> + */
> +
> +#include <asm/hwcap.h>
> +#include <asm/neon.h>
> +#include <asm/simd.h>
> +#include <crypto/algapi.h>
> +#include <crypto/gf128mul.h>
> +#include <crypto/internal/skcipher.h>
> +#include <crypto/speck.h>
> +#include <crypto/xts.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +/* The assembly functions only handle multiples of 128 bytes */
> +#define SPECK_NEON_CHUNK_SIZE  128
> +
> +/* Speck128 */
> +
> +struct speck128_xts_tfm_ctx {
> +       struct speck128_tfm_ctx main_key;
> +       struct speck128_tfm_ctx tweak_key;
> +};
> +
> +asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
> +                                         void *dst, const void *src,
> +                                         unsigned int nbytes, void *tweak);
> +
> +asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
> +                                         void *dst, const void *src,
> +                                         unsigned int nbytes, void *tweak);
> +
> +typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
> +                                    u8 *, const u8 *);
> +typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
> +                                         const void *, unsigned int, void *);
> +
> +
> +static __always_inline int
> +__speck128_xts_crypt(struct skcipher_request *req,
> +                    speck128_crypt_one_t crypt_one,
> +                    speck128_xts_crypt_many_t crypt_many)
> +{
> +       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +       const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       struct skcipher_walk walk;
> +       le128 tweak;
> +       int err;
> +
> +       err = skcipher_walk_virt(&walk, req, true);
> +
> +       crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
> +
> +       while (walk.nbytes > 0) {
> +               unsigned int nbytes = walk.nbytes;
> +               u8 *dst = walk.dst.virt.addr;
> +               const u8 *src = walk.src.virt.addr;
> +
> +               if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
> +                       unsigned int count;
> +
> +                       count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
> +                       kernel_neon_begin();
> +                       (*crypt_many)(ctx->main_key.round_keys,
> +                                     ctx->main_key.nrounds,
> +                                     dst, src, count, &tweak);
> +                       kernel_neon_end();
> +                       dst += count;
> +                       src += count;
> +                       nbytes -= count;
> +               }
> +
> +               /* Handle any remainder with generic code */
> +               while (nbytes >= sizeof(le128)) {
> +                       le128_xor((le128 *)dst, (const le128 *)src, &tweak);
> +                       (*crypt_one)(&ctx->main_key, dst, dst);
> +                       le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
> +                       gf128mul_x_ble(&tweak, &tweak);
> +
> +                       dst += sizeof(le128);
> +                       src += sizeof(le128);
> +                       nbytes -= sizeof(le128);
> +               }
> +               err = skcipher_walk_done(&walk, nbytes);
> +       }
> +
> +       return err;
> +}
> +
> +static int speck128_xts_encrypt(struct skcipher_request *req)
> +{
> +       return __speck128_xts_crypt(req, crypto_speck128_encrypt,
> +                                   speck128_xts_encrypt_neon);
> +
> +}
> +
> +static int speck128_xts_decrypt(struct skcipher_request *req)
> +{
> +       return __speck128_xts_crypt(req, crypto_speck128_decrypt,
> +                                   speck128_xts_decrypt_neon);
> +}
> +
> +static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
> +                              unsigned int keylen)
> +{
> +       struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       int err;
> +
> +       err = xts_verify_key(tfm, key, keylen);
> +       if (err)
> +               return err;
> +
> +       keylen /= 2;
> +
> +       err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
> +       if (err)
> +               return err;
> +
> +       return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
> +}
> +
> +/* Speck64 */
> +
> +struct speck64_xts_tfm_ctx {
> +       struct speck64_tfm_ctx main_key;
> +       struct speck64_tfm_ctx tweak_key;
> +};
> +
> +asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
> +                                        void *dst, const void *src,
> +                                        unsigned int nbytes, void *tweak);
> +
> +asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
> +                                        void *dst, const void *src,
> +                                        unsigned int nbytes, void *tweak);
> +
> +typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
> +                                   u8 *, const u8 *);
> +typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
> +                                        const void *, unsigned int, void *);
> +
> +static __always_inline int
> +__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
> +                   speck64_xts_crypt_many_t crypt_many)
> +{
> +       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +       const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       struct skcipher_walk walk;
> +       u64 tweak;
> +       int err;
> +
> +       err = skcipher_walk_virt(&walk, req, true);
> +
> +       crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
> +
> +       while (walk.nbytes > 0) {
> +               unsigned int nbytes = walk.nbytes;
> +               u8 *dst = walk.dst.virt.addr;
> +               const u8 *src = walk.src.virt.addr;
> +
> +               if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
> +                       unsigned int count;
> +
> +                       count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
> +                       kernel_neon_begin();
> +                       (*crypt_many)(ctx->main_key.round_keys,
> +                                     ctx->main_key.nrounds,
> +                                     dst, src, count, &tweak);
> +                       kernel_neon_end();
> +                       dst += count;
> +                       src += count;
> +                       nbytes -= count;
> +               }
> +
> +               /* Handle any remainder with generic code */
> +               while (nbytes >= sizeof(u64)) {
> +                       *(u64 *)dst = *(u64 *)src ^ tweak;
> +                       (*crypt_one)(&ctx->main_key, dst, dst);
> +                       *(u64 *)dst ^= tweak;
> +                       tweak = (tweak << 1) ^
> +                               ((tweak & (1ULL << 63)) ? 0x1B : 0);
> +
> +                       dst += sizeof(u64);
> +                       src += sizeof(u64);
> +                       nbytes -= sizeof(u64);
> +               }
> +               err = skcipher_walk_done(&walk, nbytes);
> +       }
> +
> +       return err;
> +}
> +
> +static int speck64_xts_encrypt(struct skcipher_request *req)
> +{
> +       return __speck64_xts_crypt(req, crypto_speck64_encrypt,
> +                                  speck64_xts_encrypt_neon);
> +}
> +
> +static int speck64_xts_decrypt(struct skcipher_request *req)
> +{
> +       return __speck64_xts_crypt(req, crypto_speck64_decrypt,
> +                                  speck64_xts_decrypt_neon);
> +}
> +
> +static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
> +                             unsigned int keylen)
> +{
> +       struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +       int err;
> +
> +       err = xts_verify_key(tfm, key, keylen);
> +       if (err)
> +               return err;
> +
> +       keylen /= 2;
> +
> +       err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
> +       if (err)
> +               return err;
> +
> +       return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
> +}
> +
> +static struct skcipher_alg speck_algs[] = {
> +       {
> +               .base.cra_name          = "xts(speck128)",
> +               .base.cra_driver_name   = "xts-speck128-neon",
> +               .base.cra_priority      = 300,
> +               .base.cra_blocksize     = SPECK128_BLOCK_SIZE,
> +               .base.cra_ctxsize       = sizeof(struct speck128_xts_tfm_ctx),
> +               .base.cra_alignmask     = 7,
> +               .base.cra_module        = THIS_MODULE,
> +               .min_keysize            = 2 * SPECK128_128_KEY_SIZE,
> +               .max_keysize            = 2 * SPECK128_256_KEY_SIZE,
> +               .ivsize                 = SPECK128_BLOCK_SIZE,
> +               .walksize               = SPECK_NEON_CHUNK_SIZE,
> +               .setkey                 = speck128_xts_setkey,
> +               .encrypt                = speck128_xts_encrypt,
> +               .decrypt                = speck128_xts_decrypt,
> +       }, {
> +               .base.cra_name          = "xts(speck64)",
> +               .base.cra_driver_name   = "xts-speck64-neon",
> +               .base.cra_priority      = 300,
> +               .base.cra_blocksize     = SPECK64_BLOCK_SIZE,
> +               .base.cra_ctxsize       = sizeof(struct speck64_xts_tfm_ctx),
> +               .base.cra_alignmask     = 7,
> +               .base.cra_module        = THIS_MODULE,
> +               .min_keysize            = 2 * SPECK64_96_KEY_SIZE,
> +               .max_keysize            = 2 * SPECK64_128_KEY_SIZE,
> +               .ivsize                 = SPECK64_BLOCK_SIZE,
> +               .walksize               = SPECK_NEON_CHUNK_SIZE,
> +               .setkey                 = speck64_xts_setkey,
> +               .encrypt                = speck64_xts_encrypt,
> +               .decrypt                = speck64_xts_decrypt,
> +       }
> +};
> +
> +static int __init speck_neon_module_init(void)
> +{
> +       if (!(elf_hwcap & HWCAP_NEON))
> +               return -ENODEV;
> +       return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
> +}
> +
> +static void __exit speck_neon_module_exit(void)
> +{
> +       crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
> +}
> +
> +module_init(speck_neon_module_init);
> +module_exit(speck_neon_module_exit);
> +
> +MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
> +MODULE_ALIAS_CRYPTO("xts(speck128)");
> +MODULE_ALIAS_CRYPTO("xts-speck128-neon");
> +MODULE_ALIAS_CRYPTO("xts(speck64)");
> +MODULE_ALIAS_CRYPTO("xts-speck64-neon");
> --
> 2.16.0.rc1.238.g530d649a79-goog
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  2018-02-13 11:34     ` Ard Biesheuvel
@ 2018-02-13 18:57       ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-13 18:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman

Hi Ard,

On Tue, Feb 13, 2018 at 11:34:36AM +0000, Ard Biesheuvel wrote:
> Hi Eric,
> 
> On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
> > encrypted/decrypted (doing one cipher round for all the blocks, then the
> > next round, etc.), then goes through XTS postprocessing.
> >
> > The performance depends on the processor but can be about 3 times faster
> > than the generic code.  For example, on an ARMv7 processor we observe
> > the following performance with Speck128/256-XTS:
> >
> >     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
> >     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
> >
> > In comparison to AES-256-XTS without the Cryptography Extensions:
> >
> >     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
> >     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
> >     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
> >
> > Speck64/128-XTS is even faster:
> >
> >     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
> >
> > Note that as with the generic code, only the Speck128 and Speck64
> > variants are supported.  Also, for now only the XTS mode of operation is
> > supported, to target the disk and file encryption use cases.  The NEON
> > code also only handles the portion of the data that is evenly divisible
> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
> > course, other modes of operation could be added later if needed, and/or
> > the NEON code could be updated to handle other buffer sizes.
> >
> > The XTS specification is only defined for AES which has a 128-bit block
> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
> > paper.  Of course, when possible users should use Speck128-XTS, but even
> > that may be too slow on some processors; Speck64-XTS can be faster.
> >
> 
> I think this is excellent work. Speck seems an appropriate solution to
> this problem, and I'm glad we are not ending up with a stream cipher
> for block encryption.
> 
> Also, I think an arm64 port would be nice. I may take a stab at this
> if nobody else beats me to it.

We don't really want to encourage people to use Speck over AES with the
Cryptography Extensions, so that's why I didn't include an arm64 port.  That
being said, I suppose we can't stop people from adding an arm64 port if they
really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
Cryptography Extensions (though I thought that almost all do).

> 
> I did run into an issue with this code though: On big-endian, I get
> 
> [    0.272381] alg: skcipher: Test 1 failed (invalid result) on
> encryption for xts-speck64-neon
> [    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
> [    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
> 
> so there may be a byte order corner case you missed in the rewrite (or
> the issue existed before, as I did not test your v1)
> 

To be honest I haven't tested either version on a big endian ARM CPU yet.  I
don't really know how to do that currently; maybe it's possible with QEMU.

But assuming I haven't missed anything, in the assembly code everything is
treated as byte arrays with the exception of the round keys which are 32-bit or
64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
the assembly code *should* work correctly on a big endian CPU.

However, looking over it now, I think there is a bug in the glue code for
Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
Namely, the tweak is treated as CPU endian when it should be little endian.
Could you try the following patch?

diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
index 3987dd6e063e..960cc634b36f 100644
--- a/arch/arm/crypto/speck-neon-glue.c
+++ b/arch/arm/crypto/speck-neon-glue.c
@@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
 	struct skcipher_walk walk;
-	u64 tweak;
+	__le64 tweak;
 	int err;
 
 	err = skcipher_walk_virt(&walk, req, true);
@@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
 		}
 
 		/* Handle any remainder with generic code */
-		while (nbytes >= sizeof(u64)) {
-			*(u64 *)dst = *(u64 *)src ^ tweak;
+		while (nbytes >= sizeof(__le64)) {
+			*(__le64 *)dst = *(__le64 *)src ^ tweak;
 			(*crypt_one)(&ctx->main_key, dst, dst);
-			*(u64 *)dst ^= tweak;
-			tweak = (tweak << 1) ^
-				((tweak & (1ULL << 63)) ? 0x1B : 0);
-
-			dst += sizeof(u64);
-			src += sizeof(u64);
-			nbytes -= sizeof(u64);
+			*(__le64 *)dst ^= tweak;
+			tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
+					    ((tweak & cpu_to_le64(1ULL << 63)) ?
+					     0x1B : 0));
+			dst += sizeof(__le64);
+			src += sizeof(__le64);
+			nbytes -= sizeof(__le64);
 		}
 		err = skcipher_walk_done(&walk, nbytes);
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
@ 2018-02-13 18:57       ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-02-13 18:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Tue, Feb 13, 2018 at 11:34:36AM +0000, Ard Biesheuvel wrote:
> Hi Eric,
> 
> On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
> > encrypted/decrypted (doing one cipher round for all the blocks, then the
> > next round, etc.), then goes through XTS postprocessing.
> >
> > The performance depends on the processor but can be about 3 times faster
> > than the generic code.  For example, on an ARMv7 processor we observe
> > the following performance with Speck128/256-XTS:
> >
> >     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
> >     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
> >
> > In comparison to AES-256-XTS without the Cryptography Extensions:
> >
> >     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
> >     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
> >     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
> >
> > Speck64/128-XTS is even faster:
> >
> >     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
> >
> > Note that as with the generic code, only the Speck128 and Speck64
> > variants are supported.  Also, for now only the XTS mode of operation is
> > supported, to target the disk and file encryption use cases.  The NEON
> > code also only handles the portion of the data that is evenly divisible
> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
> > course, other modes of operation could be added later if needed, and/or
> > the NEON code could be updated to handle other buffer sizes.
> >
> > The XTS specification is only defined for AES which has a 128-bit block
> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
> > paper.  Of course, when possible users should use Speck128-XTS, but even
> > that may be too slow on some processors; Speck64-XTS can be faster.
> >
> 
> I think this is excellent work. Speck seems an appropriate solution to
> this problem, and I'm glad we are not ending up with a stream cipher
> for block encryption.
> 
> Also, I think an arm64 port would be nice. I may take a stab at this
> if nobody else beats me to it.

We don't really want to encourage people to use Speck over AES with the
Cryptography Extensions, so that's why I didn't include an arm64 port.  That
being said, I suppose we can't stop people from adding an arm64 port if they
really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
Cryptography Extensions (though I thought that almost all do).

> 
> I did run into an issue with this code though: On big-endian, I get
> 
> [    0.272381] alg: skcipher: Test 1 failed (invalid result) on
> encryption for xts-speck64-neon
> [    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
> [    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
> 
> so there may be a byte order corner case you missed in the rewrite (or
> the issue existed before, as I did not test your v1)
> 

To be honest I haven't tested either version on a big endian ARM CPU yet.  I
don't really know how to do that currently; maybe it's possible with QEMU.

But assuming I haven't missed anything, in the assembly code everything is
treated as byte arrays with the exception of the round keys which are 32-bit or
64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
the assembly code *should* work correctly on a big endian CPU.

However, looking over it now, I think there is a bug in the glue code for
Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
Namely, the tweak is treated as CPU endian when it should be little endian.
Could you try the following patch?

diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
index 3987dd6e063e..960cc634b36f 100644
--- a/arch/arm/crypto/speck-neon-glue.c
+++ b/arch/arm/crypto/speck-neon-glue.c
@@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
 	struct skcipher_walk walk;
-	u64 tweak;
+	__le64 tweak;
 	int err;
 
 	err = skcipher_walk_virt(&walk, req, true);
@@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
 		}
 
 		/* Handle any remainder with generic code */
-		while (nbytes >= sizeof(u64)) {
-			*(u64 *)dst = *(u64 *)src ^ tweak;
+		while (nbytes >= sizeof(__le64)) {
+			*(__le64 *)dst = *(__le64 *)src ^ tweak;
 			(*crypt_one)(&ctx->main_key, dst, dst);
-			*(u64 *)dst ^= tweak;
-			tweak = (tweak << 1) ^
-				((tweak & (1ULL << 63)) ? 0x1B : 0);
-
-			dst += sizeof(u64);
-			src += sizeof(u64);
-			nbytes -= sizeof(u64);
+			*(__le64 *)dst ^= tweak;
+			tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
+					    ((tweak & cpu_to_le64(1ULL << 63)) ?
+					     0x1B : 0));
+			dst += sizeof(__le64);
+			src += sizeof(__le64);
+			nbytes -= sizeof(__le64);
 		}
 		err = skcipher_walk_done(&walk, nbytes);
 	}

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  2018-02-13 18:57       ` Eric Biggers
  (?)
@ 2018-02-13 19:04         ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 19:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jeffrey Walton, Greg Kaiser, Herbert Xu, Michael Halcrow,
	Patrik Torstensson, Alex Cope, Paul Lawrence, linux-fscrypt,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	Greg Kroah-Hartman, linux-arm-kernel, Paul Crowley

On 13 February 2018 at 18:57, Eric Biggers <ebiggers@google.com> wrote:
> Hi Ard,
>
> On Tue, Feb 13, 2018 at 11:34:36AM +0000, Ard Biesheuvel wrote:
>> Hi Eric,
>>
>> On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
>> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
>> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
>> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
>> > encrypted/decrypted (doing one cipher round for all the blocks, then the
>> > next round, etc.), then goes through XTS postprocessing.
>> >
>> > The performance depends on the processor but can be about 3 times faster
>> > than the generic code.  For example, on an ARMv7 processor we observe
>> > the following performance with Speck128/256-XTS:
>> >
>> >     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
>> >     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>> >
>> > In comparison to AES-256-XTS without the Cryptography Extensions:
>> >
>> >     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
>> >     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
>> >     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
>> >
>> > Speck64/128-XTS is even faster:
>> >
>> >     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
>> >
>> > Note that as with the generic code, only the Speck128 and Speck64
>> > variants are supported.  Also, for now only the XTS mode of operation is
>> > supported, to target the disk and file encryption use cases.  The NEON
>> > code also only handles the portion of the data that is evenly divisible
>> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
>> > course, other modes of operation could be added later if needed, and/or
>> > the NEON code could be updated to handle other buffer sizes.
>> >
>> > The XTS specification is only defined for AES which has a 128-bit block
>> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
>> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
>> > paper.  Of course, when possible users should use Speck128-XTS, but even
>> > that may be too slow on some processors; Speck64-XTS can be faster.
>> >
>>
>> I think this is excellent work. Speck seems an appropriate solution to
>> this problem, and I'm glad we are not ending up with a stream cipher
>> for block encryption.
>>
>> Also, I think an arm64 port would be nice. I may take a stab at this
>> if nobody else beats me to it.
>
> We don't really want to encourage people to use Speck over AES with the
> Cryptography Extensions, so that's why I didn't include an arm64 port.  That
> being said, I suppose we can't stop people from adding an arm64 port if they
> really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
> Cryptography Extensions (though I thought that almost all do).
>

Many do, but not all of them. A notable exception is the Raspberry Pi 3.

>>
>> I did run into an issue with this code though: On big-endian, I get
>>
>> [ 0.272381] alg: skcipher: Test 1 failed (invalid result) on
>> encryption for xts-speck64-neon
>> [    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
>> [    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
>>
>> so there may be a byte order corner case you missed in the rewrite (or
>> the issue existed before, as I did not test your v1)
>>
>
> To be honest I haven't tested either version on a big endian ARM CPU yet.  I
> don't really know how to do that currently; maybe it's possible with QEMU.
>

I tested this on a big-endian 32-bit VM running under KVM on a 64-bit host.

> But assuming I haven't missed anything, in the assembly code everything is
> treated as byte arrays with the exception of the round keys which are 32-bit or
> 64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
> vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
> the assembly code *should* work correctly on a big endian CPU.
>

Indeed.

> However, looking over it now, I think there is a bug in the glue code for
> Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
> Namely, the tweak is treated as CPU endian when it should be little endian.
> Could you try the following patch?
>
> diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
> index 3987dd6e063e..960cc634b36f 100644
> --- a/arch/arm/crypto/speck-neon-glue.c
> +++ b/arch/arm/crypto/speck-neon-glue.c
> @@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
>         struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>         const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
>         struct skcipher_walk walk;
> -       u64 tweak;
> +       __le64 tweak;
>         int err;
>
>         err = skcipher_walk_virt(&walk, req, true);
> @@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
>                 }
>
>                 /* Handle any remainder with generic code */
> -               while (nbytes >= sizeof(u64)) {
> -                       *(u64 *)dst = *(u64 *)src ^ tweak;
> +               while (nbytes >= sizeof(__le64)) {
> +                       *(__le64 *)dst = *(__le64 *)src ^ tweak;
>                         (*crypt_one)(&ctx->main_key, dst, dst);
> -                       *(u64 *)dst ^= tweak;
> -                       tweak = (tweak << 1) ^
> -                               ((tweak & (1ULL << 63)) ? 0x1B : 0);
> -
> -                       dst += sizeof(u64);
> -                       src += sizeof(u64);
> -                       nbytes -= sizeof(u64);
> +                       *(__le64 *)dst ^= tweak;
> +                       tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
> +                                           ((tweak & cpu_to_le64(1ULL << 63)) ?
> +                                            0x1B : 0));
> +                       dst += sizeof(__le64);
> +                       src += sizeof(__le64);
> +                       nbytes -= sizeof(__le64);
>                 }
>                 err = skcipher_walk_done(&walk, nbytes);
>         }

This fixes it.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
@ 2018-02-13 19:04         ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 19:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman

On 13 February 2018 at 18:57, Eric Biggers <ebiggers@google.com> wrote:
> Hi Ard,
>
> On Tue, Feb 13, 2018 at 11:34:36AM +0000, Ard Biesheuvel wrote:
>> Hi Eric,
>>
>> On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
>> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
>> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
>> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
>> > encrypted/decrypted (doing one cipher round for all the blocks, then the
>> > next round, etc.), then goes through XTS postprocessing.
>> >
>> > The performance depends on the processor but can be about 3 times faster
>> > than the generic code.  For example, on an ARMv7 processor we observe
>> > the following performance with Speck128/256-XTS:
>> >
>> >     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
>> >     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>> >
>> > In comparison to AES-256-XTS without the Cryptography Extensions:
>> >
>> >     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
>> >     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
>> >     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
>> >
>> > Speck64/128-XTS is even faster:
>> >
>> >     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
>> >
>> > Note that as with the generic code, only the Speck128 and Speck64
>> > variants are supported.  Also, for now only the XTS mode of operation is
>> > supported, to target the disk and file encryption use cases.  The NEON
>> > code also only handles the portion of the data that is evenly divisible
>> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
>> > course, other modes of operation could be added later if needed, and/or
>> > the NEON code could be updated to handle other buffer sizes.
>> >
>> > The XTS specification is only defined for AES which has a 128-bit block
>> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
>> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
>> > paper.  Of course, when possible users should use Speck128-XTS, but even
>> > that may be too slow on some processors; Speck64-XTS can be faster.
>> >
>>
>> I think this is excellent work. Speck seems an appropriate solution to
>> this problem, and I'm glad we are not ending up with a stream cipher
>> for block encryption.
>>
>> Also, I think an arm64 port would be nice. I may take a stab at this
>> if nobody else beats me to it.
>
> We don't really want to encourage people to use Speck over AES with the
> Cryptography Extensions, so that's why I didn't include an arm64 port.  That
> being said, I suppose we can't stop people from adding an arm64 port if they
> really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
> Cryptography Extensions (though I thought that almost all do).
>

Many do, but not all of them. A notable exception is the Raspberry Pi 3.

>>
>> I did run into an issue with this code though: On big-endian, I get
>>
>> [ 0.272381] alg: skcipher: Test 1 failed (invalid result) on
>> encryption for xts-speck64-neon
>> [    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
>> [    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
>>
>> so there may be a byte order corner case you missed in the rewrite (or
>> the issue existed before, as I did not test your v1)
>>
>
> To be honest I haven't tested either version on a big endian ARM CPU yet.  I
> don't really know how to do that currently; maybe it's possible with QEMU.
>

I tested this on a big-endian 32-bit VM running under KVM on a 64-bit host.

> But assuming I haven't missed anything, in the assembly code everything is
> treated as byte arrays with the exception of the round keys which are 32-bit or
> 64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
> vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
> the assembly code *should* work correctly on a big endian CPU.
>

Indeed.

> However, looking over it now, I think there is a bug in the glue code for
> Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
> Namely, the tweak is treated as CPU endian when it should be little endian.
> Could you try the following patch?
>
> diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
> index 3987dd6e063e..960cc634b36f 100644
> --- a/arch/arm/crypto/speck-neon-glue.c
> +++ b/arch/arm/crypto/speck-neon-glue.c
> @@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
>         struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>         const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
>         struct skcipher_walk walk;
> -       u64 tweak;
> +       __le64 tweak;
>         int err;
>
>         err = skcipher_walk_virt(&walk, req, true);
> @@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
>                 }
>
>                 /* Handle any remainder with generic code */
> -               while (nbytes >= sizeof(u64)) {
> -                       *(u64 *)dst = *(u64 *)src ^ tweak;
> +               while (nbytes >= sizeof(__le64)) {
> +                       *(__le64 *)dst = *(__le64 *)src ^ tweak;
>                         (*crypt_one)(&ctx->main_key, dst, dst);
> -                       *(u64 *)dst ^= tweak;
> -                       tweak = (tweak << 1) ^
> -                               ((tweak & (1ULL << 63)) ? 0x1B : 0);
> -
> -                       dst += sizeof(u64);
> -                       src += sizeof(u64);
> -                       nbytes -= sizeof(u64);
> +                       *(__le64 *)dst ^= tweak;
> +                       tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
> +                                           ((tweak & cpu_to_le64(1ULL << 63)) ?
> +                                            0x1B : 0));
> +                       dst += sizeof(__le64);
> +                       src += sizeof(__le64);
> +                       nbytes -= sizeof(__le64);
>                 }
>                 err = skcipher_walk_done(&walk, nbytes);
>         }

This fixes it.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
@ 2018-02-13 19:04         ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2018-02-13 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 February 2018 at 18:57, Eric Biggers <ebiggers@google.com> wrote:
> Hi Ard,
>
> On Tue, Feb 13, 2018 at 11:34:36AM +0000, Ard Biesheuvel wrote:
>> Hi Eric,
>>
>> On 12 February 2018 at 23:52, Eric Biggers <ebiggers@google.com> wrote:
>> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
>> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
>> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
>> > encrypted/decrypted (doing one cipher round for all the blocks, then the
>> > next round, etc.), then goes through XTS postprocessing.
>> >
>> > The performance depends on the processor but can be about 3 times faster
>> > than the generic code.  For example, on an ARMv7 processor we observe
>> > the following performance with Speck128/256-XTS:
>> >
>> >     xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
>> >     xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>> >
>> > In comparison to AES-256-XTS without the Cryptography Extensions:
>> >
>> >     xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
>> >     xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
>> >     xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
>> >
>> > Speck64/128-XTS is even faster:
>> >
>> >     xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
>> >
>> > Note that as with the generic code, only the Speck128 and Speck64
>> > variants are supported.  Also, for now only the XTS mode of operation is
>> > supported, to target the disk and file encryption use cases.  The NEON
>> > code also only handles the portion of the data that is evenly divisible
>> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
>> > course, other modes of operation could be added later if needed, and/or
>> > the NEON code could be updated to handle other buffer sizes.
>> >
>> > The XTS specification is only defined for AES which has a 128-bit block
>> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
>> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
>> > paper.  Of course, when possible users should use Speck128-XTS, but even
>> > that may be too slow on some processors; Speck64-XTS can be faster.
>> >
>>
>> I think this is excellent work. Speck seems an appropriate solution to
>> this problem, and I'm glad we are not ending up with a stream cipher
>> for block encryption.
>>
>> Also, I think an arm64 port would be nice. I may take a stab at this
>> if nobody else beats me to it.
>
> We don't really want to encourage people to use Speck over AES with the
> Cryptography Extensions, so that's why I didn't include an arm64 port.  That
> being said, I suppose we can't stop people from adding an arm64 port if they
> really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
> Cryptography Extensions (though I thought that almost all do).
>

Many do, but not all of them. A notable exception is the Raspberry Pi 3.

>>
>> I did run into an issue with this code though: On big-endian, I get
>>
>> [ 0.272381] alg: skcipher: Test 1 failed (invalid result) on
>> encryption for xts-speck64-neon
>> [    0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
>> [    0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
>>
>> so there may be a byte order corner case you missed in the rewrite (or
>> the issue existed before, as I did not test your v1)
>>
>
> To be honest I haven't tested either version on a big endian ARM CPU yet.  I
> don't really know how to do that currently; maybe it's possible with QEMU.
>

I tested this on a big-endian 32-bit VM running under KVM on a 64-bit host.

> But assuming I haven't missed anything, in the assembly code everything is
> treated as byte arrays with the exception of the round keys which are 32-bit or
> 64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
> vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
> the assembly code *should* work correctly on a big endian CPU.
>

Indeed.

> However, looking over it now, I think there is a bug in the glue code for
> Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
> Namely, the tweak is treated as CPU endian when it should be little endian.
> Could you try the following patch?
>
> diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
> index 3987dd6e063e..960cc634b36f 100644
> --- a/arch/arm/crypto/speck-neon-glue.c
> +++ b/arch/arm/crypto/speck-neon-glue.c
> @@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
>         struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>         const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
>         struct skcipher_walk walk;
> -       u64 tweak;
> +       __le64 tweak;
>         int err;
>
>         err = skcipher_walk_virt(&walk, req, true);
> @@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
>                 }
>
>                 /* Handle any remainder with generic code */
> -               while (nbytes >= sizeof(u64)) {
> -                       *(u64 *)dst = *(u64 *)src ^ tweak;
> +               while (nbytes >= sizeof(__le64)) {
> +                       *(__le64 *)dst = *(__le64 *)src ^ tweak;
>                         (*crypt_one)(&ctx->main_key, dst, dst);
> -                       *(u64 *)dst ^= tweak;
> -                       tweak = (tweak << 1) ^
> -                               ((tweak & (1ULL << 63)) ? 0x1B : 0);
> -
> -                       dst += sizeof(u64);
> -                       src += sizeof(u64);
> -                       nbytes -= sizeof(u64);
> +                       *(__le64 *)dst ^= tweak;
> +                       tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
> +                                           ((tweak & cpu_to_le64(1ULL << 63)) ?
> +                                            0x1B : 0));
> +                       dst += sizeof(__le64);
> +                       src += sizeof(__le64);
> +                       nbytes -= sizeof(__le64);
>                 }
>                 err = skcipher_walk_done(&walk, nbytes);
>         }

This fixes it.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-02-12 23:52 ` Eric Biggers
  (?)
@ 2018-04-24 16:11   ` Jason A. Donenfeld
  -1 siblings, 0 replies; 56+ messages in thread
From: Jason A. Donenfeld @ 2018-04-24 16:11 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jeffrey Walton, Greg Kaiser, Herbert Xu, Ard Biesheuvel,
	Michael Halcrow, Patrik Torstensson, Alex Cope, Paul Lawrence,
	linux-fscrypt, Linux Crypto Mailing List, Greg Kroah-Hartman,
	linux-arm-kernel, Paul Crowley

Can we please not Speck?

It was just rejected by the ISO/IEC.

https://twitter.com/TomerAshur/status/988659711091228673

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 16:11   ` Jason A. Donenfeld
  0 siblings, 0 replies; 56+ messages in thread
From: Jason A. Donenfeld @ 2018-04-24 16:11 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Crypto Mailing List, Herbert Xu, linux-fscrypt,
	linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman

Can we please not Speck?

It was just rejected by the ISO/IEC.

https://twitter.com/TomerAshur/status/988659711091228673

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 16:11   ` Jason A. Donenfeld
  0 siblings, 0 replies; 56+ messages in thread
From: Jason A. Donenfeld @ 2018-04-24 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

Can we please not Speck?

It was just rejected by the ISO/IEC.

https://twitter.com/TomerAshur/status/988659711091228673

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 16:11   ` Jason A. Donenfeld
  (?)
@ 2018-04-24 18:16     ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-24 18:16 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Jeffrey Walton, Greg Kaiser, Herbert Xu, Ard Biesheuvel,
	Michael Halcrow, Patrik Torstensson, Alex Cope, Paul Lawrence,
	linux-fscrypt, Linux Crypto Mailing List, Greg Kroah-Hartman,
	linux-arm-kernel, Paul Crowley

Hi Jason,

On Tue, Apr 24, 2018 at 06:11:26PM +0200, Jason A. Donenfeld wrote:
> Can we please not Speck?
> 
> It was just rejected by the ISO/IEC.
> 
> https://twitter.com/TomerAshur/status/988659711091228673

So, what do you propose replacing it with?

As I explained in the patch, the purpose of adding Speck is to allow low-end
Android devices -- ones that have CPUs without the ARMv8 Cryptography Extensions
-- to start using dm-crypt or fscrypt.  Currently such devices are unencrypted.
So, Speck is replacing *no encryption*, not another encryption algorithm.  By
removing Speck, you are removing encryption.  It's great that people are
enthusiastic about debating choices of crypto algorithms.  But it's unfortunate
that "no crypto" tends to pass by without comment from the same people.

We really wanted to use ChaCha20 instead.  But it would have been used in a
context where IVs are reused (f2fs encryption on flash storage), which
catastrophically breaks stream ciphers, but is less bad for a block cipher
operating in XTS mode.  Thus, we had to use either a block cipher, or a
wide-block encryption mode (pseudorandom permutation over the whole input).  Of
course, we would have liked to store nonces instead, but that is not currently
feasible with either dm-crypt or fscrypt.  It can be done with dm-crypt on top
of dm-integrity, but that performs very poorly and would be especially
inappropriate for low end mobile devices.

Paul Crowley actually designed a very neat wide-block encryption mode based on
ChaCha20 and Poly1305, which we considered too.  But it would have been harder
to implement, and we'd have had to be pushing it with zero or very little
outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
respect the controversy about Speck has actually become an advantage, as it has
received much more cryptanalysis than other lightweight block ciphers.)

The reason we chose Speck had nothing to do with the proposed ISO standard or
any sociopolitical factors, but rather because it was the only algorithm we
could find that met the performance and security requirements.  Note that Linux
doesn't bow down to any particular standards organization, and it offers
algorithms that were specified in various places, even some with no more than a
publication by the author.  In fact, support for SM4 was just added too, which
is a Chinese government standard.  Are you going to send a patch to remove that
too, or is it just NSA designed algorithms that are not okay?

It is unfortunate that we had to choose an algorithm that has some
emotional/political baggage attached, and of course we did expect some pushback.
But I hope you can understand that all *technical* indicators are that Speck is
secure enough, and not really backdoor-able other than a new cryptoanalytical
technique that would likely apply to other ARX ciphers as well (in fact, you'd
probably have a different opinion of it if the authors had simply worked
somewhere else and published the exact same algorithm); and also, the trend
towards stream ciphers such as ChaCha20 has mostly ignored the disk encryption
use case, where a block cipher is still needed.

Thanks,

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 18:16     ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-24 18:16 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Crypto Mailing List, Herbert Xu, linux-fscrypt,
	linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman

Hi Jason,

On Tue, Apr 24, 2018 at 06:11:26PM +0200, Jason A. Donenfeld wrote:
> Can we please not Speck?
> 
> It was just rejected by the ISO/IEC.
> 
> https://twitter.com/TomerAshur/status/988659711091228673

So, what do you propose replacing it with?

As I explained in the patch, the purpose of adding Speck is to allow low-end
Android devices -- ones that have CPUs without the ARMv8 Cryptography Extensions
-- to start using dm-crypt or fscrypt.  Currently such devices are unencrypted.
So, Speck is replacing *no encryption*, not another encryption algorithm.  By
removing Speck, you are removing encryption.  It's great that people are
enthusiastic about debating choices of crypto algorithms.  But it's unfortunate
that "no crypto" tends to pass by without comment from the same people.

We really wanted to use ChaCha20 instead.  But it would have been used in a
context where IVs are reused (f2fs encryption on flash storage), which
catastrophically breaks stream ciphers, but is less bad for a block cipher
operating in XTS mode.  Thus, we had to use either a block cipher, or a
wide-block encryption mode (pseudorandom permutation over the whole input).  Of
course, we would have liked to store nonces instead, but that is not currently
feasible with either dm-crypt or fscrypt.  It can be done with dm-crypt on top
of dm-integrity, but that performs very poorly and would be especially
inappropriate for low end mobile devices.

Paul Crowley actually designed a very neat wide-block encryption mode based on
ChaCha20 and Poly1305, which we considered too.  But it would have been harder
to implement, and we'd have had to be pushing it with zero or very little
outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
respect the controversy about Speck has actually become an advantage, as it has
received much more cryptanalysis than other lightweight block ciphers.)

The reason we chose Speck had nothing to do with the proposed ISO standard or
any sociopolitical factors, but rather because it was the only algorithm we
could find that met the performance and security requirements.  Note that Linux
doesn't bow down to any particular standards organization, and it offers
algorithms that were specified in various places, even some with no more than a
publication by the author.  In fact, support for SM4 was just added too, which
is a Chinese government standard.  Are you going to send a patch to remove that
too, or is it just NSA designed algorithms that are not okay?

It is unfortunate that we had to choose an algorithm that has some
emotional/political baggage attached, and of course we did expect some pushback.
But I hope you can understand that all *technical* indicators are that Speck is
secure enough, and not really backdoor-able other than a new cryptoanalytical
technique that would likely apply to other ARX ciphers as well (in fact, you'd
probably have a different opinion of it if the authors had simply worked
somewhere else and published the exact same algorithm); and also, the trend
towards stream ciphers such as ChaCha20 has mostly ignored the disk encryption
use case, where a block cipher is still needed.

Thanks,

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 18:16     ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-24 18:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jason,

On Tue, Apr 24, 2018 at 06:11:26PM +0200, Jason A. Donenfeld wrote:
> Can we please not Speck?
> 
> It was just rejected by the ISO/IEC.
> 
> https://twitter.com/TomerAshur/status/988659711091228673

So, what do you propose replacing it with?

As I explained in the patch, the purpose of adding Speck is to allow low-end
Android devices -- ones that have CPUs without the ARMv8 Cryptography Extensions
-- to start using dm-crypt or fscrypt.  Currently such devices are unencrypted.
So, Speck is replacing *no encryption*, not another encryption algorithm.  By
removing Speck, you are removing encryption.  It's great that people are
enthusiastic about debating choices of crypto algorithms.  But it's unfortunate
that "no crypto" tends to pass by without comment from the same people.

We really wanted to use ChaCha20 instead.  But it would have been used in a
context where IVs are reused (f2fs encryption on flash storage), which
catastrophically breaks stream ciphers, but is less bad for a block cipher
operating in XTS mode.  Thus, we had to use either a block cipher, or a
wide-block encryption mode (pseudorandom permutation over the whole input).  Of
course, we would have liked to store nonces instead, but that is not currently
feasible with either dm-crypt or fscrypt.  It can be done with dm-crypt on top
of dm-integrity, but that performs very poorly and would be especially
inappropriate for low end mobile devices.

Paul Crowley actually designed a very neat wide-block encryption mode based on
ChaCha20 and Poly1305, which we considered too.  But it would have been harder
to implement, and we'd have had to be pushing it with zero or very little
outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
respect the controversy about Speck has actually become an advantage, as it has
received much more cryptanalysis than other lightweight block ciphers.)

The reason we chose Speck had nothing to do with the proposed ISO standard or
any sociopolitical factors, but rather because it was the only algorithm we
could find that met the performance and security requirements.  Note that Linux
doesn't bow down to any particular standards organization, and it offers
algorithms that were specified in various places, even some with no more than a
publication by the author.  In fact, support for SM4 was just added too, which
is a Chinese government standard.  Are you going to send a patch to remove that
too, or is it just NSA designed algorithms that are not okay?

It is unfortunate that we had to choose an algorithm that has some
emotional/political baggage attached, and of course we did expect some pushback.
But I hope you can understand that all *technical* indicators are that Speck is
secure enough, and not really backdoor-able other than a new cryptoanalytical
technique that would likely apply to other ARX ciphers as well (in fact, you'd
probably have a different opinion of it if the authors had simply worked
somewhere else and published the exact same algorithm); and also, the trend
towards stream ciphers such as ChaCha20 has mostly ignored the disk encryption
use case, where a block cipher is still needed.

Thanks,

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 18:16     ` Eric Biggers
  (?)
@ 2018-04-24 20:58       ` Jason A. Donenfeld
  -1 siblings, 0 replies; 56+ messages in thread
From: Jason A. Donenfeld @ 2018-04-24 20:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jeffrey Walton, Greg Kaiser, Herbert Xu, Ard Biesheuvel,
	Michael Halcrow, tashur, Patrik Torstensson, Alex Cope,
	Paul Lawrence, linux-fscrypt, Linux Crypto Mailing List,
	Greg Kroah-Hartman, linux-arm-kernel, Paul Crowley

Hi Eric,

On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> So, what do you propose replacing it with?

Something more cryptographically justifiable.

> outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
> respect the controversy about Speck has actually become an advantage, as it has
> received much more cryptanalysis than other lightweight block ciphers.)

That's the thing that worries me, actually. Many of the design
decisions behind Speck haven't been justified.

> The reason we chose Speck had nothing to do with the proposed ISO standard or
> any sociopolitical factors, but rather because it was the only algorithm we
> could find that met the performance and security requirements.

> Note that Linux
> doesn't bow down to any particular standards organization, and it offers
> algorithms that were specified in various places, even some with no more than a
> publication by the author.  In fact, support for SM4 was just added too, which
> is a Chinese government standard.  Are you going to send a patch to remove that
> too, or is it just NSA designed algorithms that are not okay?

No need to be belittling; I have much less tinfoil strapped around my
head than perhaps you think. I'm not blindly opposed to
government-designed algorithms. Take SHA2, for example -- built by the
NSA.

But I do care quite a bit about using ciphers that have acceptance of
the academic community and a large body of literature documenting its
design decisions and analyzing it. Some of the best symmetric
cryptographers in academia have expressed reservations about it, and
it was just rejected from a major standard's body. Linux, of course,
is free to disagree -- or "bow down" as you oddly put it -- but I'd
make sure you've got a pretty large bucket of justifications for that
disagreement.

> (in fact, you'd
> probably have a different opinion of it if the authors had simply worked
> somewhere else and published the exact same algorithm);

Again, no need to patronize. I don't actually have a bias like that.

> But I hope you can understand that all *technical* indicators are that Speck is
> secure enough

That's the thing I'm worried about.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 20:58       ` Jason A. Donenfeld
  0 siblings, 0 replies; 56+ messages in thread
From: Jason A. Donenfeld @ 2018-04-24 20:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Linux Crypto Mailing List, Herbert Xu, linux-fscrypt,
	linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman, tashur

Hi Eric,

On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> So, what do you propose replacing it with?

Something more cryptographically justifiable.

> outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
> respect the controversy about Speck has actually become an advantage, as it has
> received much more cryptanalysis than other lightweight block ciphers.)

That's the thing that worries me, actually. Many of the design
decisions behind Speck haven't been justified.

> The reason we chose Speck had nothing to do with the proposed ISO standard or
> any sociopolitical factors, but rather because it was the only algorithm we
> could find that met the performance and security requirements.

> Note that Linux
> doesn't bow down to any particular standards organization, and it offers
> algorithms that were specified in various places, even some with no more than a
> publication by the author.  In fact, support for SM4 was just added too, which
> is a Chinese government standard.  Are you going to send a patch to remove that
> too, or is it just NSA designed algorithms that are not okay?

No need to be belittling; I have much less tinfoil strapped around my
head than perhaps you think. I'm not blindly opposed to
government-designed algorithms. Take SHA2, for example -- built by the
NSA.

But I do care quite a bit about using ciphers that have acceptance of
the academic community and a large body of literature documenting its
design decisions and analyzing it. Some of the best symmetric
cryptographers in academia have expressed reservations about it, and
it was just rejected from a major standard's body. Linux, of course,
is free to disagree -- or "bow down" as you oddly put it -- but I'd
make sure you've got a pretty large bucket of justifications for that
disagreement.

> (in fact, you'd
> probably have a different opinion of it if the authors had simply worked
> somewhere else and published the exact same algorithm);

Again, no need to patronize. I don't actually have a bias like that.

> But I hope you can understand that all *technical* indicators are that Speck is
> secure enough

That's the thing I'm worried about.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 20:58       ` Jason A. Donenfeld
  0 siblings, 0 replies; 56+ messages in thread
From: Jason A. Donenfeld @ 2018-04-24 20:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> So, what do you propose replacing it with?

Something more cryptographically justifiable.

> outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
> respect the controversy about Speck has actually become an advantage, as it has
> received much more cryptanalysis than other lightweight block ciphers.)

That's the thing that worries me, actually. Many of the design
decisions behind Speck haven't been justified.

> The reason we chose Speck had nothing to do with the proposed ISO standard or
> any sociopolitical factors, but rather because it was the only algorithm we
> could find that met the performance and security requirements.

> Note that Linux
> doesn't bow down to any particular standards organization, and it offers
> algorithms that were specified in various places, even some with no more than a
> publication by the author.  In fact, support for SM4 was just added too, which
> is a Chinese government standard.  Are you going to send a patch to remove that
> too, or is it just NSA designed algorithms that are not okay?

No need to be belittling; I have much less tinfoil strapped around my
head than perhaps you think. I'm not blindly opposed to
government-designed algorithms. Take SHA2, for example -- built by the
NSA.

But I do care quite a bit about using ciphers that have acceptance of
the academic community and a large body of literature documenting its
design decisions and analyzing it. Some of the best symmetric
cryptographers in academia have expressed reservations about it, and
it was just rejected from a major standard's body. Linux, of course,
is free to disagree -- or "bow down" as you oddly put it -- but I'd
make sure you've got a pretty large bucket of justifications for that
disagreement.

> (in fact, you'd
> probably have a different opinion of it if the authors had simply worked
> somewhere else and published the exact same algorithm);

Again, no need to patronize. I don't actually have a bias like that.

> But I hope you can understand that all *technical* indicators are that Speck is
> secure enough

That's the thing I'm worried about.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 20:58       ` Jason A. Donenfeld
  (?)
@ 2018-04-24 21:58         ` Paul Crowley
  -1 siblings, 0 replies; 56+ messages in thread
From: Paul Crowley @ 2018-04-24 21:58 UTC (permalink / raw)
  To: Jason
  Cc: noloader, Greg Kaiser, herbert, ard.biesheuvel, Michael Halcrow,
	Eric Biggers, Patrik Torstensson, Alex Cope, Paul Lawrence,
	linux-fscrypt, linux-crypto, gregkh, tashur, linux-arm-kernel

On Tue, 24 Apr 2018 at 13:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> > So, what do you propose replacing it with?

> Something more cryptographically justifiable.

I'm keen to hear recommendations here, if there are options we should be
considering I'd like to know about them.

> That's the thing that worries me, actually. Many of the design
> decisions behind Speck haven't been justified.

It seems to me justified about as well as one would hope for a new cipher -
  "Notes on the design and analysis of Simon and Speck" seems to me to give
more detail on the reasoning than went into eg Salsa20, which I think also
hit a perfectly acceptable bar and was a good choice for adding to the
Linux kernel. Of course it's building on the fairly detailed understanding
we now have of how to build a secure ARX cipher. Given what a prize
cryptanalysis of an NSA-designed block cipher would be for anyone in the
field, the sheer simplicity and straightforwardness of the design, taken
with the very large gap between the full cipher and the best cryptanalysis,
and drawing on my own experience attacking Salsa20, I feel pretty good
about fielding this design. But if you have a specific alternative in mind
- a 128-bit block cipher (so we can use it in XTS mode) which is fast and
side-channel-free on ARM processors with NEON but without ARM CE - I'm very
keen to hear about it.

Could you say a little more about what it is that separates Speck from SM4
for you?

Thanks!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 21:58         ` Paul Crowley
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Crowley @ 2018-04-24 21:58 UTC (permalink / raw)
  To: Jason
  Cc: Eric Biggers, linux-crypto, herbert, linux-fscrypt,
	linux-arm-kernel, ard.biesheuvel, noloader, Patrik Torstensson,
	Greg Kaiser, Paul Lawrence, Michael Halcrow, Alex Cope, gregkh,
	tashur

On Tue, 24 Apr 2018 at 13:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> > So, what do you propose replacing it with?

> Something more cryptographically justifiable.

I'm keen to hear recommendations here, if there are options we should be
considering I'd like to know about them.

> That's the thing that worries me, actually. Many of the design
> decisions behind Speck haven't been justified.

It seems to me justified about as well as one would hope for a new cipher -
  "Notes on the design and analysis of Simon and Speck" seems to me to give
more detail on the reasoning than went into eg Salsa20, which I think also
hit a perfectly acceptable bar and was a good choice for adding to the
Linux kernel. Of course it's building on the fairly detailed understanding
we now have of how to build a secure ARX cipher. Given what a prize
cryptanalysis of an NSA-designed block cipher would be for anyone in the
field, the sheer simplicity and straightforwardness of the design, taken
with the very large gap between the full cipher and the best cryptanalysis,
and drawing on my own experience attacking Salsa20, I feel pretty good
about fielding this design. But if you have a specific alternative in mind
- a 128-bit block cipher (so we can use it in XTS mode) which is fast and
side-channel-free on ARM processors with NEON but without ARM CE - I'm very
keen to hear about it.

Could you say a little more about what it is that separates Speck from SM4
for you?

Thanks!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 21:58         ` Paul Crowley
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Crowley @ 2018-04-24 21:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 24 Apr 2018 at 13:58, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> > So, what do you propose replacing it with?

> Something more cryptographically justifiable.

I'm keen to hear recommendations here, if there are options we should be
considering I'd like to know about them.

> That's the thing that worries me, actually. Many of the design
> decisions behind Speck haven't been justified.

It seems to me justified about as well as one would hope for a new cipher -
  "Notes on the design and analysis of Simon and Speck" seems to me to give
more detail on the reasoning than went into eg Salsa20, which I think also
hit a perfectly acceptable bar and was a good choice for adding to the
Linux kernel. Of course it's building on the fairly detailed understanding
we now have of how to build a secure ARX cipher. Given what a prize
cryptanalysis of an NSA-designed block cipher would be for anyone in the
field, the sheer simplicity and straightforwardness of the design, taken
with the very large gap between the full cipher and the best cryptanalysis,
and drawing on my own experience attacking Salsa20, I feel pretty good
about fielding this design. But if you have a specific alternative in mind
- a 128-bit block cipher (so we can use it in XTS mode) which is fast and
side-channel-free on ARM processors with NEON but without ARM CE - I'm very
keen to hear about it.

Could you say a little more about what it is that separates Speck from SM4
for you?

Thanks!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 16:11   ` Jason A. Donenfeld
  (?)
@ 2018-04-24 22:43     ` Jeffrey Walton
  -1 siblings, 0 replies; 56+ messages in thread
From: Jeffrey Walton @ 2018-04-24 22:43 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Greg Kaiser, Herbert Xu, Ard Biesheuvel, Michael Halcrow,
	Eric Biggers, Patrik Torstensson, Alex Cope, Paul Lawrence,
	linux-fscrypt, Linux Crypto Mailing List, Greg Kroah-Hartman,
	linux-arm-kernel, Paul Crowley

On Tue, Apr 24, 2018 at 12:11 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Can we please not Speck?
>
> It was just rejected by the ISO/IEC.
>
> https://twitter.com/TomerAshur/status/988659711091228673

Yeah, but here was the reason given
(https://www.wikitribune.com/story/2018/04/20/internet/67004/67004/):

    A source at an International Organization for Standardization (ISO)
    meeting of expert delegations in Wuhan, China, told WikiTribune
    that the U.S. delegation, including NSA officials, refused to provide
    the standard level of technical information to proceed.

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 22:43     ` Jeffrey Walton
  0 siblings, 0 replies; 56+ messages in thread
From: Jeffrey Walton @ 2018-04-24 22:43 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Eric Biggers, Linux Crypto Mailing List, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman

On Tue, Apr 24, 2018 at 12:11 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Can we please not Speck?
>
> It was just rejected by the ISO/IEC.
>
> https://twitter.com/TomerAshur/status/988659711091228673

Yeah, but here was the reason given
(https://www.wikitribune.com/story/2018/04/20/internet/67004/67004/):

    A source at an International Organization for Standardization (ISO)
    meeting of expert delegations in Wuhan, China, told WikiTribune
    that the U.S. delegation, including NSA officials, refused to provide
    the standard level of technical information to proceed.

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 22:43     ` Jeffrey Walton
  0 siblings, 0 replies; 56+ messages in thread
From: Jeffrey Walton @ 2018-04-24 22:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 24, 2018 at 12:11 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Can we please not Speck?
>
> It was just rejected by the ISO/IEC.
>
> https://twitter.com/TomerAshur/status/988659711091228673

Yeah, but here was the reason given
(https://www.wikitribune.com/story/2018/04/20/internet/67004/67004/):

    A source at an International Organization for Standardization (ISO)
    meeting of expert delegations in Wuhan, China, told WikiTribune
    that the U.S. delegation, including NSA officials, refused to provide
    the standard level of technical information to proceed.

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 20:58       ` Jason A. Donenfeld
  (?)
@ 2018-04-24 22:47         ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-24 22:47 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Jeffrey Walton, Greg Kaiser, Herbert Xu, Ard Biesheuvel,
	Michael Halcrow, tashur, Patrik Torstensson, Alex Cope,
	Paul Lawrence, linux-fscrypt, Linux Crypto Mailing List,
	Greg Kroah-Hartman, linux-arm-kernel, Paul Crowley

Hi Jason,

On Tue, Apr 24, 2018 at 10:58:35PM +0200, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> > So, what do you propose replacing it with?
> 
> Something more cryptographically justifiable.
> 

It's easy to say that, but do you have an actual suggestion?  As I mentioned,
for disk encryption without AES instructions the main alternatives we've
considered are ChaCha20 with reused nonces, an unpublished wide-block mode based
on ChaCha20 and Poly1305 (with no external cryptanalysis yet, and probably
actually using ChaCha8 or ChaCha12 to meet performance requirements), or the
status quo of no encryption at all.

It *might* be possible to add per-block metadata support to f2fs, in which it
could be used with ChaCha20 in fscrypt.  But if feasible at all it would be
quite difficult (requiring some significant filesystem surgery, and disabling
conflicting filesystem features that allow data to be updated in-place) and
would not cover dm-crypt, nor ext4.

Note also that many other lightweight block ciphers are designed for hardware
and perform poorly in software, e.g. PRESENT is even slower than AES.  Thus
there really weren't many options.

Any concrete suggestions are greatly appreciated!

> > outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
> > respect the controversy about Speck has actually become an advantage, as it has
> > received much more cryptanalysis than other lightweight block ciphers.)
> 
> That's the thing that worries me, actually. Many of the design
> decisions behind Speck haven't been justified.
> 

Originally that was true, but later there were significant clarifications
released, e.g. the paper "Notes on the design and analysis of Simon and Speck"
(https://eprint.iacr.org/2017/560.pdf).  In fact, from what I can see, many
competing lightweight block ciphers don't have as much design justification
available as Speck.  Daniel Bernstein's papers are excellent, but unfortunately
he has only designed a stream cipher, not a block cipher or another algorithm
that is applicable for disk encryption.

> > The reason we chose Speck had nothing to do with the proposed ISO standard or
> > any sociopolitical factors, but rather because it was the only algorithm we
> > could find that met the performance and security requirements.
> 
> > Note that Linux
> > doesn't bow down to any particular standards organization, and it offers
> > algorithms that were specified in various places, even some with no more than a
> > publication by the author.  In fact, support for SM4 was just added too, which
> > is a Chinese government standard.  Are you going to send a patch to remove that
> > too, or is it just NSA designed algorithms that are not okay?
> 
> No need to be belittling; I have much less tinfoil strapped around my
> head than perhaps you think. I'm not blindly opposed to
> government-designed algorithms. Take SHA2, for example -- built by the
> NSA.
> 
> But I do care quite a bit about using ciphers that have acceptance of
> the academic community and a large body of literature documenting its
> design decisions and analyzing it. Some of the best symmetric
> cryptographers in academia have expressed reservations about it, and
> it was just rejected from a major standard's body. Linux, of course,
> is free to disagree -- or "bow down" as you oddly put it -- but I'd
> make sure you've got a pretty large bucket of justifications for that
> disagreement.
> 

There have actually been many papers analyzing Speck.  As with other ciphers,
reduced-round variants have been successfully attacked, while the full variants
have held up.  This is expected.  It's true that some other ciphers such as
ChaCha20 have a higher security margin, which has resulted in some criticism of
Speck.  But the correct security margin is always debatable, and in a
performance-oriented cipher it's desirable to not have an excessive number of
rounds.  In fact it was even looking like ChaCha20 was not going to be fast
enough on some CPUs, so if we went the ChaCha route we may have actually have
had to use ChaCha12 or ChaCha8 instead.

Also, some papers present results for just the weakest variants of Speck
(Speck32 and Speck48) while omitting the strongest (Speck128, the one that's
planned to be offered for Android), presumably because the authors weren't able
to attack it as successfully.  I think that's causing some confusion.

I don't see how the ISO standarization process means much for crypto algorithms.
It seems very political, and actually it seems that people involved were pretty
clear that Speck was rejected primarily for political reasons.  Interestingly,
ChaCha20 is not an ISO standard either.  Does that mean ChaCha20 shouldn't be
used?

In any case, given that the status quo on low-end Android devices is no
encryption, it would be great to start having them be encrypted.  It would be a
shame if pushback for non-technical reasons prevents that from happening.

> > (in fact, you'd
> > probably have a different opinion of it if the authors had simply worked
> > somewhere else and published the exact same algorithm);
> 
> Again, no need to patronize. I don't actually have a bias like that.
> 
> > But I hope you can understand that all *technical* indicators are that Speck is
> > secure enough
> 
> That's the thing I'm worried about.
>

Thanks,

- Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 22:47         ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-24 22:47 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Crypto Mailing List, Herbert Xu, linux-fscrypt,
	linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton, Paul Crowley,
	Patrik Torstensson, Greg Kaiser, Paul Lawrence, Michael Halcrow,
	Alex Cope, Greg Kroah-Hartman, tashur

Hi Jason,

On Tue, Apr 24, 2018 at 10:58:35PM +0200, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> > So, what do you propose replacing it with?
> 
> Something more cryptographically justifiable.
> 

It's easy to say that, but do you have an actual suggestion?  As I mentioned,
for disk encryption without AES instructions the main alternatives we've
considered are ChaCha20 with reused nonces, an unpublished wide-block mode based
on ChaCha20 and Poly1305 (with no external cryptanalysis yet, and probably
actually using ChaCha8 or ChaCha12 to meet performance requirements), or the
status quo of no encryption at all.

It *might* be possible to add per-block metadata support to f2fs, in which it
could be used with ChaCha20 in fscrypt.  But if feasible at all it would be
quite difficult (requiring some significant filesystem surgery, and disabling
conflicting filesystem features that allow data to be updated in-place) and
would not cover dm-crypt, nor ext4.

Note also that many other lightweight block ciphers are designed for hardware
and perform poorly in software, e.g. PRESENT is even slower than AES.  Thus
there really weren't many options.

Any concrete suggestions are greatly appreciated!

> > outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
> > respect the controversy about Speck has actually become an advantage, as it has
> > received much more cryptanalysis than other lightweight block ciphers.)
> 
> That's the thing that worries me, actually. Many of the design
> decisions behind Speck haven't been justified.
> 

Originally that was true, but later there were significant clarifications
released, e.g. the paper "Notes on the design and analysis of Simon and Speck"
(https://eprint.iacr.org/2017/560.pdf).  In fact, from what I can see, many
competing lightweight block ciphers don't have as much design justification
available as Speck.  Daniel Bernstein's papers are excellent, but unfortunately
he has only designed a stream cipher, not a block cipher or another algorithm
that is applicable for disk encryption.

> > The reason we chose Speck had nothing to do with the proposed ISO standard or
> > any sociopolitical factors, but rather because it was the only algorithm we
> > could find that met the performance and security requirements.
> 
> > Note that Linux
> > doesn't bow down to any particular standards organization, and it offers
> > algorithms that were specified in various places, even some with no more than a
> > publication by the author.  In fact, support for SM4 was just added too, which
> > is a Chinese government standard.  Are you going to send a patch to remove that
> > too, or is it just NSA designed algorithms that are not okay?
> 
> No need to be belittling; I have much less tinfoil strapped around my
> head than perhaps you think. I'm not blindly opposed to
> government-designed algorithms. Take SHA2, for example -- built by the
> NSA.
> 
> But I do care quite a bit about using ciphers that have acceptance of
> the academic community and a large body of literature documenting its
> design decisions and analyzing it. Some of the best symmetric
> cryptographers in academia have expressed reservations about it, and
> it was just rejected from a major standard's body. Linux, of course,
> is free to disagree -- or "bow down" as you oddly put it -- but I'd
> make sure you've got a pretty large bucket of justifications for that
> disagreement.
> 

There have actually been many papers analyzing Speck.  As with other ciphers,
reduced-round variants have been successfully attacked, while the full variants
have held up.  This is expected.  It's true that some other ciphers such as
ChaCha20 have a higher security margin, which has resulted in some criticism of
Speck.  But the correct security margin is always debatable, and in a
performance-oriented cipher it's desirable to not have an excessive number of
rounds.  In fact it was even looking like ChaCha20 was not going to be fast
enough on some CPUs, so if we went the ChaCha route we may have actually have
had to use ChaCha12 or ChaCha8 instead.

Also, some papers present results for just the weakest variants of Speck
(Speck32 and Speck48) while omitting the strongest (Speck128, the one that's
planned to be offered for Android), presumably because the authors weren't able
to attack it as successfully.  I think that's causing some confusion.

I don't see how the ISO standarization process means much for crypto algorithms.
It seems very political, and actually it seems that people involved were pretty
clear that Speck was rejected primarily for political reasons.  Interestingly,
ChaCha20 is not an ISO standard either.  Does that mean ChaCha20 shouldn't be
used?

In any case, given that the status quo on low-end Android devices is no
encryption, it would be great to start having them be encrypted.  It would be a
shame if pushback for non-technical reasons prevents that from happening.

> > (in fact, you'd
> > probably have a different opinion of it if the authors had simply worked
> > somewhere else and published the exact same algorithm);
> 
> Again, no need to patronize. I don't actually have a bias like that.
> 
> > But I hope you can understand that all *technical* indicators are that Speck is
> > secure enough
> 
> That's the thing I'm worried about.
>

Thanks,

- Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-24 22:47         ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-24 22:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jason,

On Tue, Apr 24, 2018 at 10:58:35PM +0200, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> On Tue, Apr 24, 2018 at 8:16 PM, Eric Biggers <ebiggers@google.com> wrote:
> > So, what do you propose replacing it with?
> 
> Something more cryptographically justifiable.
> 

It's easy to say that, but do you have an actual suggestion?  As I mentioned,
for disk encryption without AES instructions the main alternatives we've
considered are ChaCha20 with reused nonces, an unpublished wide-block mode based
on ChaCha20 and Poly1305 (with no external cryptanalysis yet, and probably
actually using ChaCha8 or ChaCha12 to meet performance requirements), or the
status quo of no encryption at all.

It *might* be possible to add per-block metadata support to f2fs, in which it
could be used with ChaCha20 in fscrypt.  But if feasible at all it would be
quite difficult (requiring some significant filesystem surgery, and disabling
conflicting filesystem features that allow data to be updated in-place) and
would not cover dm-crypt, nor ext4.

Note also that many other lightweight block ciphers are designed for hardware
and perform poorly in software, e.g. PRESENT is even slower than AES.  Thus
there really weren't many options.

Any concrete suggestions are greatly appreciated!

> > outside crypto review, vs. the many cryptanalysis papers on Speck.  (In that
> > respect the controversy about Speck has actually become an advantage, as it has
> > received much more cryptanalysis than other lightweight block ciphers.)
> 
> That's the thing that worries me, actually. Many of the design
> decisions behind Speck haven't been justified.
> 

Originally that was true, but later there were significant clarifications
released, e.g. the paper "Notes on the design and analysis of Simon and Speck"
(https://eprint.iacr.org/2017/560.pdf).  In fact, from what I can see, many
competing lightweight block ciphers don't have as much design justification
available as Speck.  Daniel Bernstein's papers are excellent, but unfortunately
he has only designed a stream cipher, not a block cipher or another algorithm
that is applicable for disk encryption.

> > The reason we chose Speck had nothing to do with the proposed ISO standard or
> > any sociopolitical factors, but rather because it was the only algorithm we
> > could find that met the performance and security requirements.
> 
> > Note that Linux
> > doesn't bow down to any particular standards organization, and it offers
> > algorithms that were specified in various places, even some with no more than a
> > publication by the author.  In fact, support for SM4 was just added too, which
> > is a Chinese government standard.  Are you going to send a patch to remove that
> > too, or is it just NSA designed algorithms that are not okay?
> 
> No need to be belittling; I have much less tinfoil strapped around my
> head than perhaps you think. I'm not blindly opposed to
> government-designed algorithms. Take SHA2, for example -- built by the
> NSA.
> 
> But I do care quite a bit about using ciphers that have acceptance of
> the academic community and a large body of literature documenting its
> design decisions and analyzing it. Some of the best symmetric
> cryptographers in academia have expressed reservations about it, and
> it was just rejected from a major standard's body. Linux, of course,
> is free to disagree -- or "bow down" as you oddly put it -- but I'd
> make sure you've got a pretty large bucket of justifications for that
> disagreement.
> 

There have actually been many papers analyzing Speck.  As with other ciphers,
reduced-round variants have been successfully attacked, while the full variants
have held up.  This is expected.  It's true that some other ciphers such as
ChaCha20 have a higher security margin, which has resulted in some criticism of
Speck.  But the correct security margin is always debatable, and in a
performance-oriented cipher it's desirable to not have an excessive number of
rounds.  In fact it was even looking like ChaCha20 was not going to be fast
enough on some CPUs, so if we went the ChaCha route we may have actually have
had to use ChaCha12 or ChaCha8 instead.

Also, some papers present results for just the weakest variants of Speck
(Speck32 and Speck48) while omitting the strongest (Speck128, the one that's
planned to be offered for Android), presumably because the authors weren't able
to attack it as successfully.  I think that's causing some confusion.

I don't see how the ISO standarization process means much for crypto algorithms.
It seems very political, and actually it seems that people involved were pretty
clear that Speck was rejected primarily for political reasons.  Interestingly,
ChaCha20 is not an ISO standard either.  Does that mean ChaCha20 shouldn't be
used?

In any case, given that the status quo on low-end Android devices is no
encryption, it would be great to start having them be encrypted.  It would be a
shame if pushback for non-technical reasons prevents that from happening.

> > (in fact, you'd
> > probably have a different opinion of it if the authors had simply worked
> > somewhere else and published the exact same algorithm);
> 
> Again, no need to patronize. I don't actually have a bias like that.
> 
> > But I hope you can understand that all *technical* indicators are that Speck is
> > secure enough
> 
> That's the thing I'm worried about.
>

Thanks,

- Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 20:58       ` Jason A. Donenfeld
                         ` (3 preceding siblings ...)
  (?)
@ 2018-04-25  5:30       ` Theodore Y. Ts'o
  -1 siblings, 0 replies; 56+ messages in thread
From: Theodore Y. Ts'o @ 2018-04-25  5:30 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Eric Biggers, Linux Crypto Mailing List, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, tashur

On Tue, Apr 24, 2018 at 10:58:35PM +0200, Jason A. Donenfeld wrote:
> > Note that Linux
> > doesn't bow down to any particular standards organization, and it offers
> > algorithms that were specified in various places, even some with no more than a
> > publication by the author.  In fact, support for SM4 was just added too, which
> > is a Chinese government standard.  Are you going to send a patch to remove that
> > too, or is it just NSA designed algorithms that are not okay?
> 
> No need to be belittling; I have much less tinfoil strapped around my
> head than perhaps you think. I'm not blindly opposed to
> government-designed algorithms. Take SHA2, for example -- built by the
> NSA.
> 
> But I do care quite a bit about using ciphers that have acceptance of
> the academic community and a large body of literature documenting its
> design decisions and analyzing it.....

So where is the large body of literature documenting the design
decisions of SM2?  Has it received as much analysis as Speck?  And if
not, why aren't you gunning after it?

					- Ted

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-24 22:47         ` Eric Biggers
  (?)
@ 2018-04-25 14:33           ` Samuel Neves
  -1 siblings, 0 replies; 56+ messages in thread
From: Samuel Neves @ 2018-04-25 14:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jeffrey Walton, Jason A. Donenfeld, Greg Kaiser, Herbert Xu,
	Ard Biesheuvel, Michael Halcrow, tashur, Patrik Torstensson,
	Alex Cope, Paul Lawrence, linux-fscrypt,
	Linux Crypto Mailing List, Greg Kroah-Hartman, linux-arm-kernel,
	Paul Crowley

Let's put the provenance of Speck aside for a moment, and suppose that
it is an ideal block cipher. There are still some issues with this
patch as it stands.

 - The rationale seems off. Consider this bit from the commit message:

> Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't
> fast enough either; it seems that only a modern ARX cipher can provide sufficient performance
> on these devices.

One of these things is very much not like the others. Threefish _is_ a
modern ARX cipher---a tweakable block cipher in fact, precluding the
need for XEX-style masking. Is it too slow? Does it not have the
correct block size?

> We've also considered a novel length-preserving encryption mode based on
> ChaCha20 and Poly1305.

I'm very curious about this, namely as to what the role of Poly1305
would be here. ChaCha20's underlying permutation could, of course, be
transformed into a 512-bit tweakable block cipher relatively
painlessly, retaining the performance of regular ChaCha20 with
marginal additional overhead. This would not be a standard
construction, but clearly that is not an issue.

But the biggest problem here, in my mind, is that for all the talk of
using 128-bit block Speck, this patch tacks on the 64-bit block
variant of Speck into the kernel, and speck64-xts as well! As far as I
can tell, this is the _only_ instance of a 64-bit XTS instance in the
entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
the kernel already had XTEA. Instead, this is adding yet another
64-bit block cipher into the crypto API, in a disk-encryption mode no
less, so that it can be misused later. In the disk encryption setting,
it's particularly concerning to be using such a small block size, as
data volumes can quickly add up to the birthday bound.

> It's easy to say that, but do you have an actual suggestion?

I don't know how seriously you are actually asking this, but some
128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
RC6. SPARX, in particular, has similarities to Speck but has some
further AES-like design guarantees that other prior ARX block ciphers
did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
also work well with NEON, but I don't know much about their
performance there.

Best regards,
Samuel Neves

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-25 14:33           ` Samuel Neves
  0 siblings, 0 replies; 56+ messages in thread
From: Samuel Neves @ 2018-04-25 14:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jason A. Donenfeld, Linux Crypto Mailing List, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, tashur

Let's put the provenance of Speck aside for a moment, and suppose that
it is an ideal block cipher. There are still some issues with this
patch as it stands.

 - The rationale seems off. Consider this bit from the commit message:

> Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't
> fast enough either; it seems that only a modern ARX cipher can provide sufficient performance
> on these devices.

One of these things is very much not like the others. Threefish _is_ a
modern ARX cipher---a tweakable block cipher in fact, precluding the
need for XEX-style masking. Is it too slow? Does it not have the
correct block size?

> We've also considered a novel length-preserving encryption mode based on
> ChaCha20 and Poly1305.

I'm very curious about this, namely as to what the role of Poly1305
would be here. ChaCha20's underlying permutation could, of course, be
transformed into a 512-bit tweakable block cipher relatively
painlessly, retaining the performance of regular ChaCha20 with
marginal additional overhead. This would not be a standard
construction, but clearly that is not an issue.

But the biggest problem here, in my mind, is that for all the talk of
using 128-bit block Speck, this patch tacks on the 64-bit block
variant of Speck into the kernel, and speck64-xts as well! As far as I
can tell, this is the _only_ instance of a 64-bit XTS instance in the
entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
the kernel already had XTEA. Instead, this is adding yet another
64-bit block cipher into the crypto API, in a disk-encryption mode no
less, so that it can be misused later. In the disk encryption setting,
it's particularly concerning to be using such a small block size, as
data volumes can quickly add up to the birthday bound.

> It's easy to say that, but do you have an actual suggestion?

I don't know how seriously you are actually asking this, but some
128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
RC6. SPARX, in particular, has similarities to Speck but has some
further AES-like design guarantees that other prior ARX block ciphers
did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
also work well with NEON, but I don't know much about their
performance there.

Best regards,
Samuel Neves

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-25 14:33           ` Samuel Neves
  0 siblings, 0 replies; 56+ messages in thread
From: Samuel Neves @ 2018-04-25 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

Let's put the provenance of Speck aside for a moment, and suppose that
it is an ideal block cipher. There are still some issues with this
patch as it stands.

 - The rationale seems off. Consider this bit from the commit message:

> Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't
> fast enough either; it seems that only a modern ARX cipher can provide sufficient performance
> on these devices.

One of these things is very much not like the others. Threefish _is_ a
modern ARX cipher---a tweakable block cipher in fact, precluding the
need for XEX-style masking. Is it too slow? Does it not have the
correct block size?

> We've also considered a novel length-preserving encryption mode based on
> ChaCha20 and Poly1305.

I'm very curious about this, namely as to what the role of Poly1305
would be here. ChaCha20's underlying permutation could, of course, be
transformed into a 512-bit tweakable block cipher relatively
painlessly, retaining the performance of regular ChaCha20 with
marginal additional overhead. This would not be a standard
construction, but clearly that is not an issue.

But the biggest problem here, in my mind, is that for all the talk of
using 128-bit block Speck, this patch tacks on the 64-bit block
variant of Speck into the kernel, and speck64-xts as well! As far as I
can tell, this is the _only_ instance of a 64-bit XTS instance in the
entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
the kernel already had XTEA. Instead, this is adding yet another
64-bit block cipher into the crypto API, in a disk-encryption mode no
less, so that it can be misused later. In the disk encryption setting,
it's particularly concerning to be using such a small block size, as
data volumes can quickly add up to the birthday bound.

> It's easy to say that, but do you have an actual suggestion?

I don't know how seriously you are actually asking this, but some
128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
RC6. SPARX, in particular, has similarities to Speck but has some
further AES-like design guarantees that other prior ARX block ciphers
did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
also work well with NEON, but I don't know much about their
performance there.

Best regards,
Samuel Neves

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-25 14:33           ` Samuel Neves
  (?)
@ 2018-04-25 19:49             ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-25 19:49 UTC (permalink / raw)
  To: Samuel Neves
  Cc: Jeffrey Walton, Jason A. Donenfeld, Greg Kaiser, Herbert Xu,
	Ard Biesheuvel, Michael Halcrow, tashur, Patrik Torstensson,
	Alex Cope, Paul Lawrence, linux-fscrypt,
	Linux Crypto Mailing List, Greg Kroah-Hartman, linux-arm-kernel,
	Paul Crowley

Hi Samuel,

On Wed, Apr 25, 2018 at 03:33:16PM +0100, Samuel Neves wrote:
> Let's put the provenance of Speck aside for a moment, and suppose that
> it is an ideal block cipher. There are still some issues with this
> patch as it stands.
> 
>  - The rationale seems off. Consider this bit from the commit message:
> 
> > Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't
> > fast enough either; it seems that only a modern ARX cipher can provide sufficient performance
> > on these devices.
> 
> One of these things is very much not like the others. Threefish _is_ a
> modern ARX cipher---a tweakable block cipher in fact, precluding the
> need for XEX-style masking. Is it too slow? Does it not have the
> correct block size?
> 
> > We've also considered a novel length-preserving encryption mode based on
> > ChaCha20 and Poly1305.
> 
> I'm very curious about this, namely as to what the role of Poly1305
> would be here. ChaCha20's underlying permutation could, of course, be
> transformed into a 512-bit tweakable block cipher relatively
> painlessly, retaining the performance of regular ChaCha20 with
> marginal additional overhead. This would not be a standard
> construction, but clearly that is not an issue.
> 
> But the biggest problem here, in my mind, is that for all the talk of
> using 128-bit block Speck, this patch tacks on the 64-bit block
> variant of Speck into the kernel, and speck64-xts as well! As far as I
> can tell, this is the _only_ instance of a 64-bit XTS instance in the
> entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
> the kernel already had XTEA. Instead, this is adding yet another
> 64-bit block cipher into the crypto API, in a disk-encryption mode no
> less, so that it can be misused later. In the disk encryption setting,
> it's particularly concerning to be using such a small block size, as
> data volumes can quickly add up to the birthday bound.
> 
> > It's easy to say that, but do you have an actual suggestion?
> 
> I don't know how seriously you are actually asking this, but some
> 128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
> RC6. SPARX, in particular, has similarities to Speck but has some
> further AES-like design guarantees that other prior ARX block ciphers
> did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
> also work well with NEON, but I don't know much about their
> performance there.
> 

I agree that my explanation should have been better, and should have considered
more crypto algorithms.  The main difficulty is that we have extreme performance
requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
performance exceeding that after much optimization, we've been getting a lot of
pushback as people want closer to 100 MB/s.

That's why I also included Speck64-XTS in the patches, since it was
straightforward to include, and some devices may really need that last 20-30% of
performance for encryption to be feasible at all.  (And when the choice is
between unencrypted and a 64-bit block cipher, used in a context where the
weakest points in the cryptosystem are actually elsewhere such as the user's
low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
that continues to be the case I'd be fine with Speck64 being removed, leaving
just Speck128.

Note that in practice, to have any chance at meeting the performance requirement
the cipher needed to be NEON accelerated.  That made benchmarking really hard
and time-consuming, since to definitely know how an algorithm performs it can
take upwards of a week to implement a NEON version.  It needs to be very well
optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
performance improvement on some CPUs just by changing the NEON instructions used
to implement the 8-bit rotates, an optimization that is not possible with
ciphers that don't use rotate amounts that are multiples of 8.  (This was an
intentional design choice by the Speck designers; they do know what they're
doing, actually.)

Thus, we had to be pretty aggressive about dropping algorithms from
consideration if there were preliminary indications that they wouldn't perform
well, or had too little cryptanalysis, or had other issues such as an unclear
patent situation.  Threefish for example I did test the C implementation at
https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
that it could be improved over 4x with NEON, if at all, so I did not take the
long time it would have taken to write an optimized NEON implementation to
benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.

RC5 and RC6 use data-dependent rotates which won't perform too well on NEON,
also historically those algorithms have been patented.  It sounds like the last
patents expired last year, but we'd need to double check and be very sure that's
really the case.

As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
Crowley to explain it properly, but briefly it's actually a pseudorandom
permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
would operate on a whole 512-byte sector, and if any bit of the 512-byte
plaintext is changed, then every bit in the 512-byte ciphertext would change
with 50% probability.  To make this possible, the construction uses a polynomial
evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
mode.

Using ChaCha20's underlying 512-bit permutation to build a tweakable block
cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
obvious to me how to do so.  Do you have references to any relevant papers?
Remember that we strongly prefer a published cipher to a custom one -- even if
the core is reused, a mistake may be made in the way it is used.  Thus,
similarly to Paul's wide-block mode, I'd be concerned that we'd have to
self-publish a new construction, then use it with no outside crypto review.
*Maybe* it would be straightforward enough to be okay, but to know I'd need to
see the details of how it would actually work.

But in the end, Speck seemed like the clear choice because it had multiple NEON
implementations available already which showed it could be implemented very
efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
ciphers) yet the security margin is still similar to AES; it has no intellectual
property concerns; there is a paper clearly explaining the design decisions; it
is naturally resistant to timing attacks; it supports a 128-bit block size, so
it can be easily used in XTS mode; it supports the same key sizes as AES; and it
has a simple and understandable design with no "magic numbers" besides 8 and 3
(compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
had a public key embedded in the algorithm).  Also as Paul mentioned he is
confident in the construction, and he has published cryptanalysis on Salsa20, so
his opinion is probably more significant than mine :-)

But I will definitely take a closer look at SPARX and some of the other ciphers
you mentioned in case I missed something.  I really do appreciate the
suggestions, by the way, and in any case we do need to be very well prepared to
justify our choices.  I just hope that people can understand that we are
implementing real-world crypto which must operate under *very* tight performance
constraints on ARM processors, and it must be compatible with dm-crypt and
fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
at first seem reasonable choices had to (unfortunately) be excluded.

Thanks!

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-25 19:49             ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-25 19:49 UTC (permalink / raw)
  To: Samuel Neves
  Cc: Jason A. Donenfeld, Linux Crypto Mailing List, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, tashur

Hi Samuel,

On Wed, Apr 25, 2018 at 03:33:16PM +0100, Samuel Neves wrote:
> Let's put the provenance of Speck aside for a moment, and suppose that
> it is an ideal block cipher. There are still some issues with this
> patch as it stands.
> 
>  - The rationale seems off. Consider this bit from the commit message:
> 
> > Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't
> > fast enough either; it seems that only a modern ARX cipher can provide sufficient performance
> > on these devices.
> 
> One of these things is very much not like the others. Threefish _is_ a
> modern ARX cipher---a tweakable block cipher in fact, precluding the
> need for XEX-style masking. Is it too slow? Does it not have the
> correct block size?
> 
> > We've also considered a novel length-preserving encryption mode based on
> > ChaCha20 and Poly1305.
> 
> I'm very curious about this, namely as to what the role of Poly1305
> would be here. ChaCha20's underlying permutation could, of course, be
> transformed into a 512-bit tweakable block cipher relatively
> painlessly, retaining the performance of regular ChaCha20 with
> marginal additional overhead. This would not be a standard
> construction, but clearly that is not an issue.
> 
> But the biggest problem here, in my mind, is that for all the talk of
> using 128-bit block Speck, this patch tacks on the 64-bit block
> variant of Speck into the kernel, and speck64-xts as well! As far as I
> can tell, this is the _only_ instance of a 64-bit XTS instance in the
> entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
> the kernel already had XTEA. Instead, this is adding yet another
> 64-bit block cipher into the crypto API, in a disk-encryption mode no
> less, so that it can be misused later. In the disk encryption setting,
> it's particularly concerning to be using such a small block size, as
> data volumes can quickly add up to the birthday bound.
> 
> > It's easy to say that, but do you have an actual suggestion?
> 
> I don't know how seriously you are actually asking this, but some
> 128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
> RC6. SPARX, in particular, has similarities to Speck but has some
> further AES-like design guarantees that other prior ARX block ciphers
> did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
> also work well with NEON, but I don't know much about their
> performance there.
> 

I agree that my explanation should have been better, and should have considered
more crypto algorithms.  The main difficulty is that we have extreme performance
requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
performance exceeding that after much optimization, we've been getting a lot of
pushback as people want closer to 100 MB/s.

That's why I also included Speck64-XTS in the patches, since it was
straightforward to include, and some devices may really need that last 20-30% of
performance for encryption to be feasible at all.  (And when the choice is
between unencrypted and a 64-bit block cipher, used in a context where the
weakest points in the cryptosystem are actually elsewhere such as the user's
low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
that continues to be the case I'd be fine with Speck64 being removed, leaving
just Speck128.

Note that in practice, to have any chance at meeting the performance requirement
the cipher needed to be NEON accelerated.  That made benchmarking really hard
and time-consuming, since to definitely know how an algorithm performs it can
take upwards of a week to implement a NEON version.  It needs to be very well
optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
performance improvement on some CPUs just by changing the NEON instructions used
to implement the 8-bit rotates, an optimization that is not possible with
ciphers that don't use rotate amounts that are multiples of 8.  (This was an
intentional design choice by the Speck designers; they do know what they're
doing, actually.)

Thus, we had to be pretty aggressive about dropping algorithms from
consideration if there were preliminary indications that they wouldn't perform
well, or had too little cryptanalysis, or had other issues such as an unclear
patent situation.  Threefish for example I did test the C implementation at
https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
that it could be improved over 4x with NEON, if at all, so I did not take the
long time it would have taken to write an optimized NEON implementation to
benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.

RC5 and RC6 use data-dependent rotates which won't perform too well on NEON,
also historically those algorithms have been patented.  It sounds like the last
patents expired last year, but we'd need to double check and be very sure that's
really the case.

As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
Crowley to explain it properly, but briefly it's actually a pseudorandom
permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
would operate on a whole 512-byte sector, and if any bit of the 512-byte
plaintext is changed, then every bit in the 512-byte ciphertext would change
with 50% probability.  To make this possible, the construction uses a polynomial
evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
mode.

Using ChaCha20's underlying 512-bit permutation to build a tweakable block
cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
obvious to me how to do so.  Do you have references to any relevant papers?
Remember that we strongly prefer a published cipher to a custom one -- even if
the core is reused, a mistake may be made in the way it is used.  Thus,
similarly to Paul's wide-block mode, I'd be concerned that we'd have to
self-publish a new construction, then use it with no outside crypto review.
*Maybe* it would be straightforward enough to be okay, but to know I'd need to
see the details of how it would actually work.

But in the end, Speck seemed like the clear choice because it had multiple NEON
implementations available already which showed it could be implemented very
efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
ciphers) yet the security margin is still similar to AES; it has no intellectual
property concerns; there is a paper clearly explaining the design decisions; it
is naturally resistant to timing attacks; it supports a 128-bit block size, so
it can be easily used in XTS mode; it supports the same key sizes as AES; and it
has a simple and understandable design with no "magic numbers" besides 8 and 3
(compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
had a public key embedded in the algorithm).  Also as Paul mentioned he is
confident in the construction, and he has published cryptanalysis on Salsa20, so
his opinion is probably more significant than mine :-)

But I will definitely take a closer look at SPARX and some of the other ciphers
you mentioned in case I missed something.  I really do appreciate the
suggestions, by the way, and in any case we do need to be very well prepared to
justify our choices.  I just hope that people can understand that we are
implementing real-world crypto which must operate under *very* tight performance
constraints on ARM processors, and it must be compatible with dm-crypt and
fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
at first seem reasonable choices had to (unfortunately) be excluded.

Thanks!

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-25 19:49             ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-04-25 19:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Samuel,

On Wed, Apr 25, 2018 at 03:33:16PM +0100, Samuel Neves wrote:
> Let's put the provenance of Speck aside for a moment, and suppose that
> it is an ideal block cipher. There are still some issues with this
> patch as it stands.
> 
>  - The rationale seems off. Consider this bit from the commit message:
> 
> > Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't
> > fast enough either; it seems that only a modern ARX cipher can provide sufficient performance
> > on these devices.
> 
> One of these things is very much not like the others. Threefish _is_ a
> modern ARX cipher---a tweakable block cipher in fact, precluding the
> need for XEX-style masking. Is it too slow? Does it not have the
> correct block size?
> 
> > We've also considered a novel length-preserving encryption mode based on
> > ChaCha20 and Poly1305.
> 
> I'm very curious about this, namely as to what the role of Poly1305
> would be here. ChaCha20's underlying permutation could, of course, be
> transformed into a 512-bit tweakable block cipher relatively
> painlessly, retaining the performance of regular ChaCha20 with
> marginal additional overhead. This would not be a standard
> construction, but clearly that is not an issue.
> 
> But the biggest problem here, in my mind, is that for all the talk of
> using 128-bit block Speck, this patch tacks on the 64-bit block
> variant of Speck into the kernel, and speck64-xts as well! As far as I
> can tell, this is the _only_ instance of a 64-bit XTS instance in the
> entire codebase. Now, if you wanted a fast 64-bit ARX block cipher,
> the kernel already had XTEA. Instead, this is adding yet another
> 64-bit block cipher into the crypto API, in a disk-encryption mode no
> less, so that it can be misused later. In the disk encryption setting,
> it's particularly concerning to be using such a small block size, as
> data volumes can quickly add up to the birthday bound.
> 
> > It's easy to say that, but do you have an actual suggestion?
> 
> I don't know how seriously you are actually asking this, but some
> 128-bit software-friendly block ciphers could be SPARX, LEA, RC5, or
> RC6. SPARX, in particular, has similarities to Speck but has some
> further AES-like design guarantees that other prior ARX block ciphers
> did not. Some other bitsliced designs, such as Noekeon or SKINNY, may
> also work well with NEON, but I don't know much about their
> performance there.
> 

I agree that my explanation should have been better, and should have considered
more crypto algorithms.  The main difficulty is that we have extreme performance
requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
performance exceeding that after much optimization, we've been getting a lot of
pushback as people want closer to 100 MB/s.

That's why I also included Speck64-XTS in the patches, since it was
straightforward to include, and some devices may really need that last 20-30% of
performance for encryption to be feasible at all.  (And when the choice is
between unencrypted and a 64-bit block cipher, used in a context where the
weakest points in the cryptosystem are actually elsewhere such as the user's
low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
that continues to be the case I'd be fine with Speck64 being removed, leaving
just Speck128.

Note that in practice, to have any chance at meeting the performance requirement
the cipher needed to be NEON accelerated.  That made benchmarking really hard
and time-consuming, since to definitely know how an algorithm performs it can
take upwards of a week to implement a NEON version.  It needs to be very well
optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
performance improvement on some CPUs just by changing the NEON instructions used
to implement the 8-bit rotates, an optimization that is not possible with
ciphers that don't use rotate amounts that are multiples of 8.  (This was an
intentional design choice by the Speck designers; they do know what they're
doing, actually.)

Thus, we had to be pretty aggressive about dropping algorithms from
consideration if there were preliminary indications that they wouldn't perform
well, or had too little cryptanalysis, or had other issues such as an unclear
patent situation.  Threefish for example I did test the C implementation at
https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
that it could be improved over 4x with NEON, if at all, so I did not take the
long time it would have taken to write an optimized NEON implementation to
benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.

RC5 and RC6 use data-dependent rotates which won't perform too well on NEON,
also historically those algorithms have been patented.  It sounds like the last
patents expired last year, but we'd need to double check and be very sure that's
really the case.

As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
Crowley to explain it properly, but briefly it's actually a pseudorandom
permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
would operate on a whole 512-byte sector, and if any bit of the 512-byte
plaintext is changed, then every bit in the 512-byte ciphertext would change
with 50% probability.  To make this possible, the construction uses a polynomial
evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
mode.

Using ChaCha20's underlying 512-bit permutation to build a tweakable block
cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
obvious to me how to do so.  Do you have references to any relevant papers?
Remember that we strongly prefer a published cipher to a custom one -- even if
the core is reused, a mistake may be made in the way it is used.  Thus,
similarly to Paul's wide-block mode, I'd be concerned that we'd have to
self-publish a new construction, then use it with no outside crypto review.
*Maybe* it would be straightforward enough to be okay, but to know I'd need to
see the details of how it would actually work.

But in the end, Speck seemed like the clear choice because it had multiple NEON
implementations available already which showed it could be implemented very
efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
ciphers) yet the security margin is still similar to AES; it has no intellectual
property concerns; there is a paper clearly explaining the design decisions; it
is naturally resistant to timing attacks; it supports a 128-bit block size, so
it can be easily used in XTS mode; it supports the same key sizes as AES; and it
has a simple and understandable design with no "magic numbers" besides 8 and 3
(compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
had a public key embedded in the algorithm).  Also as Paul mentioned he is
confident in the construction, and he has published cryptanalysis on Salsa20, so
his opinion is probably more significant than mine :-)

But I will definitely take a closer look at SPARX and some of the other ciphers
you mentioned in case I missed something.  I really do appreciate the
suggestions, by the way, and in any case we do need to be very well prepared to
justify our choices.  I just hope that people can understand that we are
implementing real-world crypto which must operate under *very* tight performance
constraints on ARM processors, and it must be compatible with dm-crypt and
fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
at first seem reasonable choices had to (unfortunately) be excluded.

Thanks!

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-25 19:49             ` Eric Biggers
  (?)
@ 2018-04-26  2:05               ` Samuel Neves
  -1 siblings, 0 replies; 56+ messages in thread
From: Samuel Neves @ 2018-04-26  2:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jeffrey Walton, Jason A. Donenfeld, Greg Kaiser, Herbert Xu,
	Ard Biesheuvel, Michael Halcrow, tashur, Patrik Torstensson,
	Alex Cope, Paul Lawrence, linux-fscrypt,
	Linux Crypto Mailing List, Greg Kroah-Hartman, linux-arm-kernel,
	Paul Crowley

On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
> I agree that my explanation should have been better, and should have considered
> more crypto algorithms.  The main difficulty is that we have extreme performance
> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> performance exceeding that after much optimization, we've been getting a lot of
> pushback as people want closer to 100 MB/s.
>

I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
would put the performance upper bound around 15 cycles per byte, with
the comfortable number being ~7. That's indeed tough, though not
impossible.

>
> That's why I also included Speck64-XTS in the patches, since it was
> straightforward to include, and some devices may really need that last 20-30% of
> performance for encryption to be feasible at all.  (And when the choice is
> between unencrypted and a 64-bit block cipher, used in a context where the
> weakest points in the cryptosystem are actually elsewhere such as the user's
> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> that continues to be the case I'd be fine with Speck64 being removed, leaving
> just Speck128.
>

I would very much prefer that to be the case. As many of us know,
"it's better than nothing" has been often used to justify other bad
choices, like RC4, that end up preventing better ones from being
adopted. At a time where we're trying to get rid of 64-bit ciphers in
TLS, where data volumes per session are comparatively low, it would be
unfortunate if the opposite starts happening on encryption at rest.

>
> Note that in practice, to have any chance at meeting the performance requirement
> the cipher needed to be NEON accelerated.  That made benchmarking really hard
> and time-consuming, since to definitely know how an algorithm performs it can
> take upwards of a week to implement a NEON version.  It needs to be very well
> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> performance improvement on some CPUs just by changing the NEON instructions used
> to implement the 8-bit rotates, an optimization that is not possible with
> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> intentional design choice by the Speck designers; they do know what they're
> doing, actually.)
>
> Thus, we had to be pretty aggressive about dropping algorithms from
> consideration if there were preliminary indications that they wouldn't perform
> well, or had too little cryptanalysis, or had other issues such as an unclear
> patent situation.  Threefish for example I did test the C implementation at
> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
> that it could be improved over 4x with NEON, if at all, so I did not take the
> long time it would have taken to write an optimized NEON implementation to
> benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
>

In my limited experience with NEON and 64-bit ARX, there's usually a
~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
The extra speedup from encrypting 2 block in parallel is then
somewhere between 1x and 2x, depending on various details. Getting
near 4x might be feasible, but it is indeed time-consuming to get
there.

>
> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> Crowley to explain it properly, but briefly it's actually a pseudorandom
> permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
> would operate on a whole 512-byte sector, and if any bit of the 512-byte
> plaintext is changed, then every bit in the 512-byte ciphertext would change
> with 50% probability.  To make this possible, the construction uses a polynomial
> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> mode.
>

Oh, OK, that sounds like something resembling Naor-Reingold or its
relatives. That would work, but with 3 or 4 passes I guess it wouldn't
be very fast.

>
> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> obvious to me how to do so.  Do you have references to any relevant papers?
> Remember that we strongly prefer a published cipher to a custom one -- even if
> the core is reused, a mistake may be made in the way it is used.  Thus,
> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> self-publish a new construction, then use it with no outside crypto review.
> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> see the details of how it would actually work.
>

This would be the 'tweakable Even-Mansour' construction and its
variants. The variant I'm most familiar with would be MEM [1],
focusing on software friendliness, but there is other provable
security work in the same vein, including [3, 4, 5]. It's very similar
to how the XEX mode turns a block cipher into a tweakable block
cipher.

In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
ChaCha20's, but everything translates easily from one to the other. We
also included cheap masks for 512-bit permutations, just in case.

[1] https://eprint.iacr.org/2015/999
[2] https://github.com/MEM-AEAD/mem-aead
[3] https://eprint.iacr.org/2015/539
[4] https://eprint.iacr.org/2015/476
[5] https://competitions.cr.yp.to/round2/minalpherv11.pdf

>
> But in the end, Speck seemed like the clear choice because it had multiple NEON
> implementations available already which showed it could be implemented very
> efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
> ciphers) yet the security margin is still similar to AES; it has no intellectual
> property concerns; there is a paper clearly explaining the design decisions; it
> is naturally resistant to timing attacks; it supports a 128-bit block size, so
> it can be easily used in XTS mode; it supports the same key sizes as AES; and it
> has a simple and understandable design with no "magic numbers" besides 8 and 3
> (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
> had a public key embedded in the algorithm).  Also as Paul mentioned he is
> confident in the construction, and he has published cryptanalysis on Salsa20, so
> his opinion is probably more significant than mine :-)
>
> But I will definitely take a closer look at SPARX and some of the other ciphers
> you mentioned in case I missed something.  I really do appreciate the
> suggestions, by the way, and in any case we do need to be very well prepared to
> justify our choices.  I just hope that people can understand that we are
> implementing real-world crypto which must operate under *very* tight performance
> constraints on ARM processors, and it must be compatible with dm-crypt and
> fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
> at first seem reasonable choices had to (unfortunately) be excluded.
>

I understand it is a tough choice, and it's unfortunate that many of
the algorithms we have cater mostly to either the
high-hardware-accelerated-end or the extremely low-end, without a lot
of good options at the middle-end.

Best regards,
Samuel Neves

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-26  2:05               ` Samuel Neves
  0 siblings, 0 replies; 56+ messages in thread
From: Samuel Neves @ 2018-04-26  2:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jason A. Donenfeld, Linux Crypto Mailing List, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, tashur

On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
> I agree that my explanation should have been better, and should have considered
> more crypto algorithms.  The main difficulty is that we have extreme performance
> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> performance exceeding that after much optimization, we've been getting a lot of
> pushback as people want closer to 100 MB/s.
>

I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
would put the performance upper bound around 15 cycles per byte, with
the comfortable number being ~7. That's indeed tough, though not
impossible.

>
> That's why I also included Speck64-XTS in the patches, since it was
> straightforward to include, and some devices may really need that last 20-30% of
> performance for encryption to be feasible at all.  (And when the choice is
> between unencrypted and a 64-bit block cipher, used in a context where the
> weakest points in the cryptosystem are actually elsewhere such as the user's
> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> that continues to be the case I'd be fine with Speck64 being removed, leaving
> just Speck128.
>

I would very much prefer that to be the case. As many of us know,
"it's better than nothing" has been often used to justify other bad
choices, like RC4, that end up preventing better ones from being
adopted. At a time where we're trying to get rid of 64-bit ciphers in
TLS, where data volumes per session are comparatively low, it would be
unfortunate if the opposite starts happening on encryption at rest.

>
> Note that in practice, to have any chance at meeting the performance requirement
> the cipher needed to be NEON accelerated.  That made benchmarking really hard
> and time-consuming, since to definitely know how an algorithm performs it can
> take upwards of a week to implement a NEON version.  It needs to be very well
> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> performance improvement on some CPUs just by changing the NEON instructions used
> to implement the 8-bit rotates, an optimization that is not possible with
> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> intentional design choice by the Speck designers; they do know what they're
> doing, actually.)
>
> Thus, we had to be pretty aggressive about dropping algorithms from
> consideration if there were preliminary indications that they wouldn't perform
> well, or had too little cryptanalysis, or had other issues such as an unclear
> patent situation.  Threefish for example I did test the C implementation at
> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
> that it could be improved over 4x with NEON, if at all, so I did not take the
> long time it would have taken to write an optimized NEON implementation to
> benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
>

In my limited experience with NEON and 64-bit ARX, there's usually a
~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
The extra speedup from encrypting 2 block in parallel is then
somewhere between 1x and 2x, depending on various details. Getting
near 4x might be feasible, but it is indeed time-consuming to get
there.

>
> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> Crowley to explain it properly, but briefly it's actually a pseudorandom
> permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
> would operate on a whole 512-byte sector, and if any bit of the 512-byte
> plaintext is changed, then every bit in the 512-byte ciphertext would change
> with 50% probability.  To make this possible, the construction uses a polynomial
> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> mode.
>

Oh, OK, that sounds like something resembling Naor-Reingold or its
relatives. That would work, but with 3 or 4 passes I guess it wouldn't
be very fast.

>
> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> obvious to me how to do so.  Do you have references to any relevant papers?
> Remember that we strongly prefer a published cipher to a custom one -- even if
> the core is reused, a mistake may be made in the way it is used.  Thus,
> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> self-publish a new construction, then use it with no outside crypto review.
> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> see the details of how it would actually work.
>

This would be the 'tweakable Even-Mansour' construction and its
variants. The variant I'm most familiar with would be MEM [1],
focusing on software friendliness, but there is other provable
security work in the same vein, including [3, 4, 5]. It's very similar
to how the XEX mode turns a block cipher into a tweakable block
cipher.

In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
ChaCha20's, but everything translates easily from one to the other. We
also included cheap masks for 512-bit permutations, just in case.

[1] https://eprint.iacr.org/2015/999
[2] https://github.com/MEM-AEAD/mem-aead
[3] https://eprint.iacr.org/2015/539
[4] https://eprint.iacr.org/2015/476
[5] https://competitions.cr.yp.to/round2/minalpherv11.pdf

>
> But in the end, Speck seemed like the clear choice because it had multiple NEON
> implementations available already which showed it could be implemented very
> efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
> ciphers) yet the security margin is still similar to AES; it has no intellectual
> property concerns; there is a paper clearly explaining the design decisions; it
> is naturally resistant to timing attacks; it supports a 128-bit block size, so
> it can be easily used in XTS mode; it supports the same key sizes as AES; and it
> has a simple and understandable design with no "magic numbers" besides 8 and 3
> (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
> had a public key embedded in the algorithm).  Also as Paul mentioned he is
> confident in the construction, and he has published cryptanalysis on Salsa20, so
> his opinion is probably more significant than mine :-)
>
> But I will definitely take a closer look at SPARX and some of the other ciphers
> you mentioned in case I missed something.  I really do appreciate the
> suggestions, by the way, and in any case we do need to be very well prepared to
> justify our choices.  I just hope that people can understand that we are
> implementing real-world crypto which must operate under *very* tight performance
> constraints on ARM processors, and it must be compatible with dm-crypt and
> fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
> at first seem reasonable choices had to (unfortunately) be excluded.
>

I understand it is a tough choice, and it's unfortunate that many of
the algorithms we have cater mostly to either the
high-hardware-accelerated-end or the extremely low-end, without a lot
of good options at the middle-end.

Best regards,
Samuel Neves

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-26  2:05               ` Samuel Neves
  0 siblings, 0 replies; 56+ messages in thread
From: Samuel Neves @ 2018-04-26  2:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
> I agree that my explanation should have been better, and should have considered
> more crypto algorithms.  The main difficulty is that we have extreme performance
> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> performance exceeding that after much optimization, we've been getting a lot of
> pushback as people want closer to 100 MB/s.
>

I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
would put the performance upper bound around 15 cycles per byte, with
the comfortable number being ~7. That's indeed tough, though not
impossible.

>
> That's why I also included Speck64-XTS in the patches, since it was
> straightforward to include, and some devices may really need that last 20-30% of
> performance for encryption to be feasible at all.  (And when the choice is
> between unencrypted and a 64-bit block cipher, used in a context where the
> weakest points in the cryptosystem are actually elsewhere such as the user's
> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> that continues to be the case I'd be fine with Speck64 being removed, leaving
> just Speck128.
>

I would very much prefer that to be the case. As many of us know,
"it's better than nothing" has been often used to justify other bad
choices, like RC4, that end up preventing better ones from being
adopted. At a time where we're trying to get rid of 64-bit ciphers in
TLS, where data volumes per session are comparatively low, it would be
unfortunate if the opposite starts happening on encryption at rest.

>
> Note that in practice, to have any chance at meeting the performance requirement
> the cipher needed to be NEON accelerated.  That made benchmarking really hard
> and time-consuming, since to definitely know how an algorithm performs it can
> take upwards of a week to implement a NEON version.  It needs to be very well
> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> performance improvement on some CPUs just by changing the NEON instructions used
> to implement the 8-bit rotates, an optimization that is not possible with
> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> intentional design choice by the Speck designers; they do know what they're
> doing, actually.)
>
> Thus, we had to be pretty aggressive about dropping algorithms from
> consideration if there were preliminary indications that they wouldn't perform
> well, or had too little cryptanalysis, or had other issues such as an unclear
> patent situation.  Threefish for example I did test the C implementation at
> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
> that it could be improved over 4x with NEON, if at all, so I did not take the
> long time it would have taken to write an optimized NEON implementation to
> benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
>

In my limited experience with NEON and 64-bit ARX, there's usually a
~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
The extra speedup from encrypting 2 block in parallel is then
somewhere between 1x and 2x, depending on various details. Getting
near 4x might be feasible, but it is indeed time-consuming to get
there.

>
> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> Crowley to explain it properly, but briefly it's actually a pseudorandom
> permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
> would operate on a whole 512-byte sector, and if any bit of the 512-byte
> plaintext is changed, then every bit in the 512-byte ciphertext would change
> with 50% probability.  To make this possible, the construction uses a polynomial
> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> mode.
>

Oh, OK, that sounds like something resembling Naor-Reingold or its
relatives. That would work, but with 3 or 4 passes I guess it wouldn't
be very fast.

>
> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> obvious to me how to do so.  Do you have references to any relevant papers?
> Remember that we strongly prefer a published cipher to a custom one -- even if
> the core is reused, a mistake may be made in the way it is used.  Thus,
> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> self-publish a new construction, then use it with no outside crypto review.
> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> see the details of how it would actually work.
>

This would be the 'tweakable Even-Mansour' construction and its
variants. The variant I'm most familiar with would be MEM [1],
focusing on software friendliness, but there is other provable
security work in the same vein, including [3, 4, 5]. It's very similar
to how the XEX mode turns a block cipher into a tweakable block
cipher.

In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
ChaCha20's, but everything translates easily from one to the other. We
also included cheap masks for 512-bit permutations, just in case.

[1] https://eprint.iacr.org/2015/999
[2] https://github.com/MEM-AEAD/mem-aead
[3] https://eprint.iacr.org/2015/539
[4] https://eprint.iacr.org/2015/476
[5] https://competitions.cr.yp.to/round2/minalpherv11.pdf

>
> But in the end, Speck seemed like the clear choice because it had multiple NEON
> implementations available already which showed it could be implemented very
> efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
> ciphers) yet the security margin is still similar to AES; it has no intellectual
> property concerns; there is a paper clearly explaining the design decisions; it
> is naturally resistant to timing attacks; it supports a 128-bit block size, so
> it can be easily used in XTS mode; it supports the same key sizes as AES; and it
> has a simple and understandable design with no "magic numbers" besides 8 and 3
> (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
> had a public key embedded in the algorithm).  Also as Paul mentioned he is
> confident in the construction, and he has published cryptanalysis on Salsa20, so
> his opinion is probably more significant than mine :-)
>
> But I will definitely take a closer look at SPARX and some of the other ciphers
> you mentioned in case I missed something.  I really do appreciate the
> suggestions, by the way, and in any case we do need to be very well prepared to
> justify our choices.  I just hope that people can understand that we are
> implementing real-world crypto which must operate under *very* tight performance
> constraints on ARM processors, and it must be compatible with dm-crypt and
> fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
> at first seem reasonable choices had to (unfortunately) be excluded.
>

I understand it is a tough choice, and it's unfortunate that many of
the algorithms we have cater mostly to either the
high-hardware-accelerated-end or the extremely low-end, without a lot
of good options at the middle-end.

Best regards,
Samuel Neves

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-26  2:05               ` Samuel Neves
  (?)
@ 2018-04-26 16:30                 ` Paul Crowley
  -1 siblings, 0 replies; 56+ messages in thread
From: Paul Crowley @ 2018-04-26 16:30 UTC (permalink / raw)
  To: samuel.c.p.neves
  Cc: noloader, Jason, Greg Kaiser, Herbert Xu, Eric Biggers,
	Michael Halcrow, Ard Biesheuvel, Patrik Torstensson, Alex Cope,
	Paul Lawrence, linux-fscrypt, linux-crypto, gregkh, tashur,
	linux-arm-kernel

> Oh, OK, that sounds like something resembling Naor-Reingold or its
> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
> be very fast.

It most resembles HCH mode https://eprint.iacr.org/2007/028 using two
passes of Poly1305, one pass of ChaCha20, and one invocation of a 128-bit
block cipher for the entire block. I have a writeup with a proof that it's
a secure tweakable SPRP, but we haven't actually implemented it yet so the
"Performance" section is a bit thin. From published benchmarks, Poly1305 is
around 2.3 cpb and ChaCha12 around 4.5 cbp on our target platform, so we're
hoping to achieve something a little over 7.1 cpb.

Right now we're in a situation where the people who can afford higher-end
devices with ARM CE get AES encryption, and the rest of the world gets no
encryption, or optional encryption that is rarely enabled because of the
performance cost. It's important to me to change that, and right now Speck
still looks like a good choice for achieving that end.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-04-26 16:30                 ` Paul Crowley
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Crowley @ 2018-04-26 16:30 UTC (permalink / raw)
  To: samuel.c.p.neves
  Cc: Eric Biggers, Jason, linux-crypto, Herbert Xu, linux-fscrypt,
	linux-arm-kernel, Ard Biesheuvel, noloader, Patrik Torstensson,
	Greg Kaiser, Paul Lawrence, Michael Halcrow, Alex Cope, gregkh,
	tashur

> Oh, OK, that sounds like something resembling Naor-Reingold or its
> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
> be very fast.

It most resembles HCH mode https://eprint.iacr.org/2007/028 using two
passes of Poly1305, one pass of ChaCha20, and one invocation of a 128-bit
block cipher for the entire block. I have a writeup with a proof that it's
a secure tweakable SPRP, but we haven't actually implemented it yet so the
"Performance" section is a bit thin. From published benchmarks, Poly1305 is
around 2.3 cpb and ChaCha12 around 4.5 cbp on our target platform, so we're
hoping to achieve something a little over 7.1 cpb.

Right now we're in a situation where the people who can afford higher-end
devices with ARM CE get AES encryption, and the rest of the world gets no
encryption, or optional encryption that is rarely enabled because of the
performance cost. It's important to me to change that, and right now Speck
still looks like a good choice for achieving that end.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-04-26 16:30                 ` Paul Crowley
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Crowley @ 2018-04-26 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

> Oh, OK, that sounds like something resembling Naor-Reingold or its
> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
> be very fast.

It most resembles HCH mode https://eprint.iacr.org/2007/028 using two
passes of Poly1305, one pass of ChaCha20, and one invocation of a 128-bit
block cipher for the entire block. I have a writeup with a proof that it's
a secure tweakable SPRP, but we haven't actually implemented it yet so the
"Performance" section is a bit thin. From published benchmarks, Poly1305 is
around 2.3 cpb and ChaCha12 around 4.5 cbp on our target platform, so we're
hoping to achieve something a little over 7.1 cpb.

Right now we're in a situation where the people who can afford higher-end
devices with ARM CE get AES encryption, and the rest of the world gets no
encryption, or optional encryption that is rarely enabled because of the
performance cost. It's important to me to change that, and right now Speck
still looks like a good choice for achieving that end.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
  2018-04-26  2:05               ` Samuel Neves
  (?)
@ 2018-05-07 23:20                 ` Eric Biggers
  -1 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-05-07 23:20 UTC (permalink / raw)
  To: Samuel Neves
  Cc: Jeffrey Walton, Jason A. Donenfeld, Greg Kaiser, Herbert Xu,
	Ard Biesheuvel, Michael Halcrow, tashur, Patrik Torstensson,
	Alex Cope, Paul Lawrence, linux-fscrypt,
	Linux Crypto Mailing List, Greg Kroah-Hartman, linux-arm-kernel,
	Paul Crowley

Hi Samuel,

On Thu, Apr 26, 2018 at 03:05:44AM +0100, Samuel Neves wrote:
> On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
> > I agree that my explanation should have been better, and should have considered
> > more crypto algorithms.  The main difficulty is that we have extreme performance
> > requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> > devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> > performance exceeding that after much optimization, we've been getting a lot of
> > pushback as people want closer to 100 MB/s.
> >
> 
> I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
> would put the performance upper bound around 15 cycles per byte, with
> the comfortable number being ~7. That's indeed tough, though not
> impossible.
> 
> >
> > That's why I also included Speck64-XTS in the patches, since it was
> > straightforward to include, and some devices may really need that last 20-30% of
> > performance for encryption to be feasible at all.  (And when the choice is
> > between unencrypted and a 64-bit block cipher, used in a context where the
> > weakest points in the cryptosystem are actually elsewhere such as the user's
> > low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> > the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> > that continues to be the case I'd be fine with Speck64 being removed, leaving
> > just Speck128.
> >
> 
> I would very much prefer that to be the case. As many of us know,
> "it's better than nothing" has been often used to justify other bad
> choices, like RC4, that end up preventing better ones from being
> adopted. At a time where we're trying to get rid of 64-bit ciphers in
> TLS, where data volumes per session are comparatively low, it would be
> unfortunate if the opposite starts happening on encryption at rest.
> 
> >
> > Note that in practice, to have any chance at meeting the performance requirement
> > the cipher needed to be NEON accelerated.  That made benchmarking really hard
> > and time-consuming, since to definitely know how an algorithm performs it can
> > take upwards of a week to implement a NEON version.  It needs to be very well
> > optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> > performance improvement on some CPUs just by changing the NEON instructions used
> > to implement the 8-bit rotates, an optimization that is not possible with
> > ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> > intentional design choice by the Speck designers; they do know what they're
> > doing, actually.)
> >
> > Thus, we had to be pretty aggressive about dropping algorithms from
> > consideration if there were preliminary indications that they wouldn't perform
> > well, or had too little cryptanalysis, or had other issues such as an unclear
> > patent situation.  Threefish for example I did test the C implementation at
> > https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> > than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
> > that it could be improved over 4x with NEON, if at all, so I did not take the
> > long time it would have taken to write an optimized NEON implementation to
> > benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
> >
> 
> In my limited experience with NEON and 64-bit ARX, there's usually a
> ~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
> The extra speedup from encrypting 2 block in parallel is then
> somewhere between 1x and 2x, depending on various details. Getting
> near 4x might be feasible, but it is indeed time-consuming to get
> there.
> 
> >
> > As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> > Crowley to explain it properly, but briefly it's actually a pseudorandom
> > permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
> > would operate on a whole 512-byte sector, and if any bit of the 512-byte
> > plaintext is changed, then every bit in the 512-byte ciphertext would change
> > with 50% probability.  To make this possible, the construction uses a polynomial
> > evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> > mode.
> >
> 
> Oh, OK, that sounds like something resembling Naor-Reingold or its
> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
> be very fast.
> 
> >
> > Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> > cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> > obvious to me how to do so.  Do you have references to any relevant papers?
> > Remember that we strongly prefer a published cipher to a custom one -- even if
> > the core is reused, a mistake may be made in the way it is used.  Thus,
> > similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> > self-publish a new construction, then use it with no outside crypto review.
> > *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> > see the details of how it would actually work.
> >
> 
> This would be the 'tweakable Even-Mansour' construction and its
> variants. The variant I'm most familiar with would be MEM [1],
> focusing on software friendliness, but there is other provable
> security work in the same vein, including [3, 4, 5]. It's very similar
> to how the XEX mode turns a block cipher into a tweakable block
> cipher.
> 
> In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
> ChaCha20's, but everything translates easily from one to the other. We
> also included cheap masks for 512-bit permutations, just in case.
> 
> [1] https://eprint.iacr.org/2015/999
> [2] https://github.com/MEM-AEAD/mem-aead
> [3] https://eprint.iacr.org/2015/539
> [4] https://eprint.iacr.org/2015/476
> [5] https://competitions.cr.yp.to/round2/minalpherv11.pdf
> 
> >
> > But in the end, Speck seemed like the clear choice because it had multiple NEON
> > implementations available already which showed it could be implemented very
> > efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
> > ciphers) yet the security margin is still similar to AES; it has no intellectual
> > property concerns; there is a paper clearly explaining the design decisions; it
> > is naturally resistant to timing attacks; it supports a 128-bit block size, so
> > it can be easily used in XTS mode; it supports the same key sizes as AES; and it
> > has a simple and understandable design with no "magic numbers" besides 8 and 3
> > (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
> > had a public key embedded in the algorithm).  Also as Paul mentioned he is
> > confident in the construction, and he has published cryptanalysis on Salsa20, so
> > his opinion is probably more significant than mine :-)
> >
> > But I will definitely take a closer look at SPARX and some of the other ciphers
> > you mentioned in case I missed something.  I really do appreciate the
> > suggestions, by the way, and in any case we do need to be very well prepared to
> > justify our choices.  I just hope that people can understand that we are
> > implementing real-world crypto which must operate under *very* tight performance
> > constraints on ARM processors, and it must be compatible with dm-crypt and
> > fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
> > at first seem reasonable choices had to (unfortunately) be excluded.
> >
> 
> I understand it is a tough choice, and it's unfortunate that many of
> the algorithms we have cater mostly to either the
> high-hardware-accelerated-end or the extremely low-end, without a lot
> of good options at the middle-end.
> 

First, we're planning a publication which explains our choices in more detail,
so please treat this as some more preliminary notes.

To make sure we've exhausted as many alternatives as possible, I wrote NEON
implementations of all the block ciphers you suggested with the exception of
SKINNY (which looked very hardware-oriented and not efficient in software), as
well as some that others have suggested.  (It was tough, but after doing a
couple, it got much easier...)  The following shows the decryption performance
I'm getting on an ARMv7 platform.  Encryption speeds were usually similar, but
in our use case we care much more about decryption, as that affects the most
critical metrics such as the time to launch applications.

	ChaCha8-MEM: 183256 KB/s
	ChaCha12-MEM: 134833 KB/s
	Chaskey-LTS-XTS: 99097 KB/s
	ChaCha20-MEM: 87875 KB/s
	Speck64/128-XTS: 85332 KB/s
	Speck128/128-XTS: 73404 KB/s
	RC5-128/12/256-XTS: 69887 KB/s
	Speck128/256-XTS: 69597 KB/s
	RC5-64/12/128-XTS: 69267 KB/s
	LEA-128-XTS: 67986 KB/s
	CHAM128/128-XTS: 52982 KB/s
	LEA-256-XTS: 50429 KB/s
	Threefish-256: 48349 KB/s
	RC6-XTS: 46855 KB/s
	RC5-128/20/256-XTS: 44291 KB/s
	RC5-64/20/128-XTS: 43924 KB/s
	NOEKEON-XTS: 40705 KB/s
	Sparx128/128-XTS: 39191 KB/s
	XTEA-XTS: 38239 KB/s
	AES-128-XTS: 25549 KB/s
	AES-256-XTS: 18640 KB/s

Remember that for dm-crypt or fscrypt over flash storage and/or f2fs, a stream
cipher is insecure.  Moreover, on these (low-end) devices the status quo is no
encryption, and we need every bit of performance available.  Anything below
50 MB/s is definitely unacceptable.  But even at that speed we get many
complaints, so in practice we need something faster.  That means that the
algorithms close to 50 MB/s, such as Threefish, still aren't fast enough.

ChaCha-MEM (based roughly on your paper: https://eprint.iacr.org/2015/999), has
the best performance, especially if we allow for the 12 or 8-round variants.  My
code for it is based roughly on the existing
arch/arm/crypto/chacha20-neon-core.S, but updated to support the inverse
permutation (on 4 blocks at a time, using all 16 NEON registers) and do the
masking required by MEM.  However, ChaCha-MEM would be a pretty bleeding-edge
and customized construction, and Paul Crowley and I have concerns about its
security.  The problem is that the MEM security proof assumes that the
underlying permutation has no more detectable structural properties than a
randomly selected permutation.  However, the ChaCha permutation is known to have
certain symmetries, e.g. if the sixteen 32-bit words are (a, a, a, a, b, b, b,
b, c, c, c, c, d, d, d, d), then they always map to some (e, e, e, e, f, f, f,
f, g, g, g, g, h, h, h, h).

For the MEM mask generation, we can use the "expand 32-byte k" constant to break
the symmetry, like is done in the ChaCha stream cipher.  However, that's not
possible for the inner application of the permutation.  So, we'd be using the
ChaCha permutation in a manner in which it wasn't intended, and the security of
the ChaCha stream cipher wouldn't directly carry over.  Granted, it's not
impossible that it would be secure, but at the present time it doesn't seem like
a good choice to actually field.

Chaskey-LTS is faster than Speck, but unfortunately it's not really a viable
option because it has only a 64-bit security level, due to its use of the
Even-Mansour construction with a 128-bit key.  Of course, it would still be
better than nothing, but we prefer a cipher that has a security level in line
with what is accepted for modern crypto.

RC5 with the traditional 12 rounds is about as fast as Speck, but there is a
known differential attack on that number of rounds.  So if we choose RC5 we'd
almost certainly have to use the 20-round variant, which is much slower.

That leaves LEA-128-XTS as the only other algorithm that might meet the
performance requirement, as it is only slightly slower than Speck128-XTS.  It
may be the most viable alternative, but beyond the slight performance loss it
still has some disadvantages compared to Speck:

- Importantly, the LEA authors forgot to include test vectors, so I'm not yet
  100% sure I implemented it correctly.  (The Speck authors unfortunately didn't
  make the endianness of their test vectors clear in their initial publication,
  but at least they actually provided test vectors!)
- LEA has received some cryptanalysis, but not nearly as much as Speck.
- It took some very heavy optimization to get good LEA performance, much more
  than I had to do for Speck.  My final LEA code has separate code paths for
  128-bit and 256-bit keys, and has reordered and preprocessed the round keys,
  and reordered the operations.  As a result, it's harder to see how it maps to
  the original paper.  In contrast, my Speck code is more straightforward and
  maintainable.
- LEA-256 (256-bit key) is much slower than LEA-128 (128-bit key), as it has
  33% more rounds.  LEA-256 would not be fast enough, so we would have to use
  LEA-128.  In contrast, with Speck we can use Speck128/256 (256-bit key).
  We're willing to accept a 128-bit security level, but 256-bit is preferable.
  (I think the Speck designers took a more informed approach to setting
  appropriate security margins for a lightweight cipher; it seems that other
  designers often choose too few or too many rounds, especially as the key
  length is varied.)
- LEA encryption is also a bit slower than decryption, while with Speck
  encryption and decryption are almost exactly the same speed.

Note that like Speck, LEA doesn't appear to be approved by a standards
organization either; it's just specified in a research paper.

Thus, from a technical perspective, and given the current state of the art in
lightweight cryptography, currently Speck128-XTS seems to be the best choice for
the problem domain.  It's unfortunate that there are so few good options and
that the field is so politicized, but it is what it is.

Still, we don't want to abandon HPolyC (Paul's new ChaCha and Poly1305-based
wide-block mode), and eventually we hope to offer it as an option as well.  But
it's not yet published, and it's a more complex algorithm that is harder to
implement so I haven't yet had a chance to implement and benchmark it.  And we
don't want to continue to leave users unprotected while we spend a long time
coming up with the perfect algorithm, or for hardware AES support to arrive to
all low-end CPUs when it's unclear if/when that will happen.

Again, we're planning a publication which will explain all this in more detail.

Thanks!

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/5] crypto: Speck support
@ 2018-05-07 23:20                 ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-05-07 23:20 UTC (permalink / raw)
  To: Samuel Neves
  Cc: Jason A. Donenfeld, Linux Crypto Mailing List, Herbert Xu,
	linux-fscrypt, linux-arm-kernel, Ard Biesheuvel, Jeffrey Walton,
	Paul Crowley, Patrik Torstensson, Greg Kaiser, Paul Lawrence,
	Michael Halcrow, Alex Cope, Greg Kroah-Hartman, tashur

Hi Samuel,

On Thu, Apr 26, 2018 at 03:05:44AM +0100, Samuel Neves wrote:
> On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
> > I agree that my explanation should have been better, and should have considered
> > more crypto algorithms.  The main difficulty is that we have extreme performance
> > requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> > devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> > performance exceeding that after much optimization, we've been getting a lot of
> > pushback as people want closer to 100 MB/s.
> >
> 
> I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
> would put the performance upper bound around 15 cycles per byte, with
> the comfortable number being ~7. That's indeed tough, though not
> impossible.
> 
> >
> > That's why I also included Speck64-XTS in the patches, since it was
> > straightforward to include, and some devices may really need that last 20-30% of
> > performance for encryption to be feasible at all.  (And when the choice is
> > between unencrypted and a 64-bit block cipher, used in a context where the
> > weakest points in the cryptosystem are actually elsewhere such as the user's
> > low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> > the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> > that continues to be the case I'd be fine with Speck64 being removed, leaving
> > just Speck128.
> >
> 
> I would very much prefer that to be the case. As many of us know,
> "it's better than nothing" has been often used to justify other bad
> choices, like RC4, that end up preventing better ones from being
> adopted. At a time where we're trying to get rid of 64-bit ciphers in
> TLS, where data volumes per session are comparatively low, it would be
> unfortunate if the opposite starts happening on encryption at rest.
> 
> >
> > Note that in practice, to have any chance at meeting the performance requirement
> > the cipher needed to be NEON accelerated.  That made benchmarking really hard
> > and time-consuming, since to definitely know how an algorithm performs it can
> > take upwards of a week to implement a NEON version.  It needs to be very well
> > optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> > performance improvement on some CPUs just by changing the NEON instructions used
> > to implement the 8-bit rotates, an optimization that is not possible with
> > ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> > intentional design choice by the Speck designers; they do know what they're
> > doing, actually.)
> >
> > Thus, we had to be pretty aggressive about dropping algorithms from
> > consideration if there were preliminary indications that they wouldn't perform
> > well, or had too little cryptanalysis, or had other issues such as an unclear
> > patent situation.  Threefish for example I did test the C implementation at
> > https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> > than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
> > that it could be improved over 4x with NEON, if at all, so I did not take the
> > long time it would have taken to write an optimized NEON implementation to
> > benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
> >
> 
> In my limited experience with NEON and 64-bit ARX, there's usually a
> ~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
> The extra speedup from encrypting 2 block in parallel is then
> somewhere between 1x and 2x, depending on various details. Getting
> near 4x might be feasible, but it is indeed time-consuming to get
> there.
> 
> >
> > As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> > Crowley to explain it properly, but briefly it's actually a pseudorandom
> > permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
> > would operate on a whole 512-byte sector, and if any bit of the 512-byte
> > plaintext is changed, then every bit in the 512-byte ciphertext would change
> > with 50% probability.  To make this possible, the construction uses a polynomial
> > evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> > mode.
> >
> 
> Oh, OK, that sounds like something resembling Naor-Reingold or its
> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
> be very fast.
> 
> >
> > Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> > cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> > obvious to me how to do so.  Do you have references to any relevant papers?
> > Remember that we strongly prefer a published cipher to a custom one -- even if
> > the core is reused, a mistake may be made in the way it is used.  Thus,
> > similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> > self-publish a new construction, then use it with no outside crypto review.
> > *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> > see the details of how it would actually work.
> >
> 
> This would be the 'tweakable Even-Mansour' construction and its
> variants. The variant I'm most familiar with would be MEM [1],
> focusing on software friendliness, but there is other provable
> security work in the same vein, including [3, 4, 5]. It's very similar
> to how the XEX mode turns a block cipher into a tweakable block
> cipher.
> 
> In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
> ChaCha20's, but everything translates easily from one to the other. We
> also included cheap masks for 512-bit permutations, just in case.
> 
> [1] https://eprint.iacr.org/2015/999
> [2] https://github.com/MEM-AEAD/mem-aead
> [3] https://eprint.iacr.org/2015/539
> [4] https://eprint.iacr.org/2015/476
> [5] https://competitions.cr.yp.to/round2/minalpherv11.pdf
> 
> >
> > But in the end, Speck seemed like the clear choice because it had multiple NEON
> > implementations available already which showed it could be implemented very
> > efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
> > ciphers) yet the security margin is still similar to AES; it has no intellectual
> > property concerns; there is a paper clearly explaining the design decisions; it
> > is naturally resistant to timing attacks; it supports a 128-bit block size, so
> > it can be easily used in XTS mode; it supports the same key sizes as AES; and it
> > has a simple and understandable design with no "magic numbers" besides 8 and 3
> > (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
> > had a public key embedded in the algorithm).  Also as Paul mentioned he is
> > confident in the construction, and he has published cryptanalysis on Salsa20, so
> > his opinion is probably more significant than mine :-)
> >
> > But I will definitely take a closer look at SPARX and some of the other ciphers
> > you mentioned in case I missed something.  I really do appreciate the
> > suggestions, by the way, and in any case we do need to be very well prepared to
> > justify our choices.  I just hope that people can understand that we are
> > implementing real-world crypto which must operate under *very* tight performance
> > constraints on ARM processors, and it must be compatible with dm-crypt and
> > fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
> > at first seem reasonable choices had to (unfortunately) be excluded.
> >
> 
> I understand it is a tough choice, and it's unfortunate that many of
> the algorithms we have cater mostly to either the
> high-hardware-accelerated-end or the extremely low-end, without a lot
> of good options at the middle-end.
> 

First, we're planning a publication which explains our choices in more detail,
so please treat this as some more preliminary notes.

To make sure we've exhausted as many alternatives as possible, I wrote NEON
implementations of all the block ciphers you suggested with the exception of
SKINNY (which looked very hardware-oriented and not efficient in software), as
well as some that others have suggested.  (It was tough, but after doing a
couple, it got much easier...)  The following shows the decryption performance
I'm getting on an ARMv7 platform.  Encryption speeds were usually similar, but
in our use case we care much more about decryption, as that affects the most
critical metrics such as the time to launch applications.

	ChaCha8-MEM: 183256 KB/s
	ChaCha12-MEM: 134833 KB/s
	Chaskey-LTS-XTS: 99097 KB/s
	ChaCha20-MEM: 87875 KB/s
	Speck64/128-XTS: 85332 KB/s
	Speck128/128-XTS: 73404 KB/s
	RC5-128/12/256-XTS: 69887 KB/s
	Speck128/256-XTS: 69597 KB/s
	RC5-64/12/128-XTS: 69267 KB/s
	LEA-128-XTS: 67986 KB/s
	CHAM128/128-XTS: 52982 KB/s
	LEA-256-XTS: 50429 KB/s
	Threefish-256: 48349 KB/s
	RC6-XTS: 46855 KB/s
	RC5-128/20/256-XTS: 44291 KB/s
	RC5-64/20/128-XTS: 43924 KB/s
	NOEKEON-XTS: 40705 KB/s
	Sparx128/128-XTS: 39191 KB/s
	XTEA-XTS: 38239 KB/s
	AES-128-XTS: 25549 KB/s
	AES-256-XTS: 18640 KB/s

Remember that for dm-crypt or fscrypt over flash storage and/or f2fs, a stream
cipher is insecure.  Moreover, on these (low-end) devices the status quo is no
encryption, and we need every bit of performance available.  Anything below
50 MB/s is definitely unacceptable.  But even at that speed we get many
complaints, so in practice we need something faster.  That means that the
algorithms close to 50 MB/s, such as Threefish, still aren't fast enough.

ChaCha-MEM (based roughly on your paper: https://eprint.iacr.org/2015/999), has
the best performance, especially if we allow for the 12 or 8-round variants.  My
code for it is based roughly on the existing
arch/arm/crypto/chacha20-neon-core.S, but updated to support the inverse
permutation (on 4 blocks at a time, using all 16 NEON registers) and do the
masking required by MEM.  However, ChaCha-MEM would be a pretty bleeding-edge
and customized construction, and Paul Crowley and I have concerns about its
security.  The problem is that the MEM security proof assumes that the
underlying permutation has no more detectable structural properties than a
randomly selected permutation.  However, the ChaCha permutation is known to have
certain symmetries, e.g. if the sixteen 32-bit words are (a, a, a, a, b, b, b,
b, c, c, c, c, d, d, d, d), then they always map to some (e, e, e, e, f, f, f,
f, g, g, g, g, h, h, h, h).

For the MEM mask generation, we can use the "expand 32-byte k" constant to break
the symmetry, like is done in the ChaCha stream cipher.  However, that's not
possible for the inner application of the permutation.  So, we'd be using the
ChaCha permutation in a manner in which it wasn't intended, and the security of
the ChaCha stream cipher wouldn't directly carry over.  Granted, it's not
impossible that it would be secure, but at the present time it doesn't seem like
a good choice to actually field.

Chaskey-LTS is faster than Speck, but unfortunately it's not really a viable
option because it has only a 64-bit security level, due to its use of the
Even-Mansour construction with a 128-bit key.  Of course, it would still be
better than nothing, but we prefer a cipher that has a security level in line
with what is accepted for modern crypto.

RC5 with the traditional 12 rounds is about as fast as Speck, but there is a
known differential attack on that number of rounds.  So if we choose RC5 we'd
almost certainly have to use the 20-round variant, which is much slower.

That leaves LEA-128-XTS as the only other algorithm that might meet the
performance requirement, as it is only slightly slower than Speck128-XTS.  It
may be the most viable alternative, but beyond the slight performance loss it
still has some disadvantages compared to Speck:

- Importantly, the LEA authors forgot to include test vectors, so I'm not yet
  100% sure I implemented it correctly.  (The Speck authors unfortunately didn't
  make the endianness of their test vectors clear in their initial publication,
  but at least they actually provided test vectors!)
- LEA has received some cryptanalysis, but not nearly as much as Speck.
- It took some very heavy optimization to get good LEA performance, much more
  than I had to do for Speck.  My final LEA code has separate code paths for
  128-bit and 256-bit keys, and has reordered and preprocessed the round keys,
  and reordered the operations.  As a result, it's harder to see how it maps to
  the original paper.  In contrast, my Speck code is more straightforward and
  maintainable.
- LEA-256 (256-bit key) is much slower than LEA-128 (128-bit key), as it has
  33% more rounds.  LEA-256 would not be fast enough, so we would have to use
  LEA-128.  In contrast, with Speck we can use Speck128/256 (256-bit key).
  We're willing to accept a 128-bit security level, but 256-bit is preferable.
  (I think the Speck designers took a more informed approach to setting
  appropriate security margins for a lightweight cipher; it seems that other
  designers often choose too few or too many rounds, especially as the key
  length is varied.)
- LEA encryption is also a bit slower than decryption, while with Speck
  encryption and decryption are almost exactly the same speed.

Note that like Speck, LEA doesn't appear to be approved by a standards
organization either; it's just specified in a research paper.

Thus, from a technical perspective, and given the current state of the art in
lightweight cryptography, currently Speck128-XTS seems to be the best choice for
the problem domain.  It's unfortunate that there are so few good options and
that the field is so politicized, but it is what it is.

Still, we don't want to abandon HPolyC (Paul's new ChaCha and Poly1305-based
wide-block mode), and eventually we hope to offer it as an option as well.  But
it's not yet published, and it's a more complex algorithm that is harder to
implement so I haven't yet had a chance to implement and benchmark it.  And we
don't want to continue to leave users unprotected while we spend a long time
coming up with the perfect algorithm, or for hardware AES support to arrive to
all low-end CPUs when it's unclear if/when that will happen.

Again, we're planning a publication which will explain all this in more detail.

Thanks!

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-05-07 23:20                 ` Eric Biggers
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Biggers @ 2018-05-07 23:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Samuel,

On Thu, Apr 26, 2018 at 03:05:44AM +0100, Samuel Neves wrote:
> On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
> > I agree that my explanation should have been better, and should have considered
> > more crypto algorithms.  The main difficulty is that we have extreme performance
> > requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
> > devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
> > performance exceeding that after much optimization, we've been getting a lot of
> > pushback as people want closer to 100 MB/s.
> >
> 
> I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
> would put the performance upper bound around 15 cycles per byte, with
> the comfortable number being ~7. That's indeed tough, though not
> impossible.
> 
> >
> > That's why I also included Speck64-XTS in the patches, since it was
> > straightforward to include, and some devices may really need that last 20-30% of
> > performance for encryption to be feasible at all.  (And when the choice is
> > between unencrypted and a 64-bit block cipher, used in a context where the
> > weakest points in the cryptosystem are actually elsewhere such as the user's
> > low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
> > the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
> > that continues to be the case I'd be fine with Speck64 being removed, leaving
> > just Speck128.
> >
> 
> I would very much prefer that to be the case. As many of us know,
> "it's better than nothing" has been often used to justify other bad
> choices, like RC4, that end up preventing better ones from being
> adopted. At a time where we're trying to get rid of 64-bit ciphers in
> TLS, where data volumes per session are comparatively low, it would be
> unfortunate if the opposite starts happening on encryption at rest.
> 
> >
> > Note that in practice, to have any chance at meeting the performance requirement
> > the cipher needed to be NEON accelerated.  That made benchmarking really hard
> > and time-consuming, since to definitely know how an algorithm performs it can
> > take upwards of a week to implement a NEON version.  It needs to be very well
> > optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
> > performance improvement on some CPUs just by changing the NEON instructions used
> > to implement the 8-bit rotates, an optimization that is not possible with
> > ciphers that don't use rotate amounts that are multiples of 8.  (This was an
> > intentional design choice by the Speck designers; they do know what they're
> > doing, actually.)
> >
> > Thus, we had to be pretty aggressive about dropping algorithms from
> > consideration if there were preliminary indications that they wouldn't perform
> > well, or had too little cryptanalysis, or had other issues such as an unclear
> > patent situation.  Threefish for example I did test the C implementation at
> > https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
> > than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
> > that it could be improved over 4x with NEON, if at all, so I did not take the
> > long time it would have taken to write an optimized NEON implementation to
> > benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
> >
> 
> In my limited experience with NEON and 64-bit ARX, there's usually a
> ~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
> The extra speedup from encrypting 2 block in parallel is then
> somewhere between 1x and 2x, depending on various details. Getting
> near 4x might be feasible, but it is indeed time-consuming to get
> there.
> 
> >
> > As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
> > Crowley to explain it properly, but briefly it's actually a pseudorandom
> > permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
> > would operate on a whole 512-byte sector, and if any bit of the 512-byte
> > plaintext is changed, then every bit in the 512-byte ciphertext would change
> > with 50% probability.  To make this possible, the construction uses a polynomial
> > evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
> > mode.
> >
> 
> Oh, OK, that sounds like something resembling Naor-Reingold or its
> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
> be very fast.
> 
> >
> > Using ChaCha20's underlying 512-bit permutation to build a tweakable block
> > cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
> > obvious to me how to do so.  Do you have references to any relevant papers?
> > Remember that we strongly prefer a published cipher to a custom one -- even if
> > the core is reused, a mistake may be made in the way it is used.  Thus,
> > similarly to Paul's wide-block mode, I'd be concerned that we'd have to
> > self-publish a new construction, then use it with no outside crypto review.
> > *Maybe* it would be straightforward enough to be okay, but to know I'd need to
> > see the details of how it would actually work.
> >
> 
> This would be the 'tweakable Even-Mansour' construction and its
> variants. The variant I'm most familiar with would be MEM [1],
> focusing on software friendliness, but there is other provable
> security work in the same vein, including [3, 4, 5]. It's very similar
> to how the XEX mode turns a block cipher into a tweakable block
> cipher.
> 
> In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
> ChaCha20's, but everything translates easily from one to the other. We
> also included cheap masks for 512-bit permutations, just in case.
> 
> [1] https://eprint.iacr.org/2015/999
> [2] https://github.com/MEM-AEAD/mem-aead
> [3] https://eprint.iacr.org/2015/539
> [4] https://eprint.iacr.org/2015/476
> [5] https://competitions.cr.yp.to/round2/minalpherv11.pdf
> 
> >
> > But in the end, Speck seemed like the clear choice because it had multiple NEON
> > implementations available already which showed it could be implemented very
> > efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
> > ciphers) yet the security margin is still similar to AES; it has no intellectual
> > property concerns; there is a paper clearly explaining the design decisions; it
> > is naturally resistant to timing attacks; it supports a 128-bit block size, so
> > it can be easily used in XTS mode; it supports the same key sizes as AES; and it
> > has a simple and understandable design with no "magic numbers" besides 8 and 3
> > (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
> > had a public key embedded in the algorithm).  Also as Paul mentioned he is
> > confident in the construction, and he has published cryptanalysis on Salsa20, so
> > his opinion is probably more significant than mine :-)
> >
> > But I will definitely take a closer look at SPARX and some of the other ciphers
> > you mentioned in case I missed something.  I really do appreciate the
> > suggestions, by the way, and in any case we do need to be very well prepared to
> > justify our choices.  I just hope that people can understand that we are
> > implementing real-world crypto which must operate under *very* tight performance
> > constraints on ARM processors, and it must be compatible with dm-crypt and
> > fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
> > at first seem reasonable choices had to (unfortunately) be excluded.
> >
> 
> I understand it is a tough choice, and it's unfortunate that many of
> the algorithms we have cater mostly to either the
> high-hardware-accelerated-end or the extremely low-end, without a lot
> of good options at the middle-end.
> 

First, we're planning a publication which explains our choices in more detail,
so please treat this as some more preliminary notes.

To make sure we've exhausted as many alternatives as possible, I wrote NEON
implementations of all the block ciphers you suggested with the exception of
SKINNY (which looked very hardware-oriented and not efficient in software), as
well as some that others have suggested.  (It was tough, but after doing a
couple, it got much easier...)  The following shows the decryption performance
I'm getting on an ARMv7 platform.  Encryption speeds were usually similar, but
in our use case we care much more about decryption, as that affects the most
critical metrics such as the time to launch applications.

	ChaCha8-MEM: 183256 KB/s
	ChaCha12-MEM: 134833 KB/s
	Chaskey-LTS-XTS: 99097 KB/s
	ChaCha20-MEM: 87875 KB/s
	Speck64/128-XTS: 85332 KB/s
	Speck128/128-XTS: 73404 KB/s
	RC5-128/12/256-XTS: 69887 KB/s
	Speck128/256-XTS: 69597 KB/s
	RC5-64/12/128-XTS: 69267 KB/s
	LEA-128-XTS: 67986 KB/s
	CHAM128/128-XTS: 52982 KB/s
	LEA-256-XTS: 50429 KB/s
	Threefish-256: 48349 KB/s
	RC6-XTS: 46855 KB/s
	RC5-128/20/256-XTS: 44291 KB/s
	RC5-64/20/128-XTS: 43924 KB/s
	NOEKEON-XTS: 40705 KB/s
	Sparx128/128-XTS: 39191 KB/s
	XTEA-XTS: 38239 KB/s
	AES-128-XTS: 25549 KB/s
	AES-256-XTS: 18640 KB/s

Remember that for dm-crypt or fscrypt over flash storage and/or f2fs, a stream
cipher is insecure.  Moreover, on these (low-end) devices the status quo is no
encryption, and we need every bit of performance available.  Anything below
50 MB/s is definitely unacceptable.  But even at that speed we get many
complaints, so in practice we need something faster.  That means that the
algorithms close to 50 MB/s, such as Threefish, still aren't fast enough.

ChaCha-MEM (based roughly on your paper: https://eprint.iacr.org/2015/999), has
the best performance, especially if we allow for the 12 or 8-round variants.  My
code for it is based roughly on the existing
arch/arm/crypto/chacha20-neon-core.S, but updated to support the inverse
permutation (on 4 blocks at a time, using all 16 NEON registers) and do the
masking required by MEM.  However, ChaCha-MEM would be a pretty bleeding-edge
and customized construction, and Paul Crowley and I have concerns about its
security.  The problem is that the MEM security proof assumes that the
underlying permutation has no more detectable structural properties than a
randomly selected permutation.  However, the ChaCha permutation is known to have
certain symmetries, e.g. if the sixteen 32-bit words are (a, a, a, a, b, b, b,
b, c, c, c, c, d, d, d, d), then they always map to some (e, e, e, e, f, f, f,
f, g, g, g, g, h, h, h, h).

For the MEM mask generation, we can use the "expand 32-byte k" constant to break
the symmetry, like is done in the ChaCha stream cipher.  However, that's not
possible for the inner application of the permutation.  So, we'd be using the
ChaCha permutation in a manner in which it wasn't intended, and the security of
the ChaCha stream cipher wouldn't directly carry over.  Granted, it's not
impossible that it would be secure, but at the present time it doesn't seem like
a good choice to actually field.

Chaskey-LTS is faster than Speck, but unfortunately it's not really a viable
option because it has only a 64-bit security level, due to its use of the
Even-Mansour construction with a 128-bit key.  Of course, it would still be
better than nothing, but we prefer a cipher that has a security level in line
with what is accepted for modern crypto.

RC5 with the traditional 12 rounds is about as fast as Speck, but there is a
known differential attack on that number of rounds.  So if we choose RC5 we'd
almost certainly have to use the 20-round variant, which is much slower.

That leaves LEA-128-XTS as the only other algorithm that might meet the
performance requirement, as it is only slightly slower than Speck128-XTS.  It
may be the most viable alternative, but beyond the slight performance loss it
still has some disadvantages compared to Speck:

- Importantly, the LEA authors forgot to include test vectors, so I'm not yet
  100% sure I implemented it correctly.  (The Speck authors unfortunately didn't
  make the endianness of their test vectors clear in their initial publication,
  but at least they actually provided test vectors!)
- LEA has received some cryptanalysis, but not nearly as much as Speck.
- It took some very heavy optimization to get good LEA performance, much more
  than I had to do for Speck.  My final LEA code has separate code paths for
  128-bit and 256-bit keys, and has reordered and preprocessed the round keys,
  and reordered the operations.  As a result, it's harder to see how it maps to
  the original paper.  In contrast, my Speck code is more straightforward and
  maintainable.
- LEA-256 (256-bit key) is much slower than LEA-128 (128-bit key), as it has
  33% more rounds.  LEA-256 would not be fast enough, so we would have to use
  LEA-128.  In contrast, with Speck we can use Speck128/256 (256-bit key).
  We're willing to accept a 128-bit security level, but 256-bit is preferable.
  (I think the Speck designers took a more informed approach to setting
  appropriate security margins for a lightweight cipher; it seems that other
  designers often choose too few or too many rounds, especially as the key
  length is varied.)
- LEA encryption is also a bit slower than decryption, while with Speck
  encryption and decryption are almost exactly the same speed.

Note that like Speck, LEA doesn't appear to be approved by a standards
organization either; it's just specified in a research paper.

Thus, from a technical perspective, and given the current state of the art in
lightweight cryptography, currently Speck128-XTS seems to be the best choice for
the problem domain.  It's unfortunate that there are so few good options and
that the field is so politicized, but it is what it is.

Still, we don't want to abandon HPolyC (Paul's new ChaCha and Poly1305-based
wide-block mode), and eventually we hope to offer it as an option as well.  But
it's not yet published, and it's a more complex algorithm that is harder to
implement so I haven't yet had a chance to implement and benchmark it.  And we
don't want to continue to leave users unprotected while we spend a long time
coming up with the perfect algorithm, or for hardware AES support to arrive to
all low-end CPUs when it's unclear if/when that will happen.

Again, we're planning a publication which will explain all this in more detail.

Thanks!

Eric

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
       [not found] <8c9dc804-1f59-a245-57ba-51db3c234621@esat.kuleuven.be>
  2018-06-01 19:23   ` Tomer Ashur
@ 2018-06-01 19:23   ` Tomer Ashur
  0 siblings, 0 replies; 56+ messages in thread
From: Tomer Ashur @ 2018-06-01 19:23 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jeffrey Walton, Jason A. Donenfeld, Greg Kaiser, Herbert Xu,
	Ard Biesheuvel, Michael Halcrow, Samuel Neves,
	Patrik Torstensson, Alex Cope, Paul Lawrence, linux-fscrypt,
	Linux Crypto Mailing List, Greg Kroah-Hartman, linux-arm-kernel,
	Paul Crowley


[-- Attachment #1.1.1: Type: text/plain, Size: 32255 bytes --]

[Resending because the email bounced back from all 3 mailing lists.
Sorry if you get this email twice]
Hi Eric et al.,
I know that this thread is already stale, and I'm sorry I couldn't join
earlier but maybe late is better than never. Allow me to first introduce
myself: my name is Tomer Ashur and I'm a post-doctoral fellow in KU
Leuven. I am part of symmetric-key group led by Vincent Rijmen where I'm
mostly involved in cryptanalysis. I am also part of ISO/IEC JTC 1/SC
27/WG 2, the group which decided to reject Simon and Speck from ISO. If
it's okay with you, I'd like to give my perspective on what happened in
ISO and what is Speck's real standing with the academic community.

First, I'd like to say that the NSA has done quite extensive work in
muddying the waters, arguing that Simon & Speck are secure and that all
objections are political. This is not true, as I will now show with
examples. The bottom line is that there are still many open questions
about their security, questions that the NSA has, on multiple occasions,
refused to answer.

> It seems to me justified about as well as one would hope for a new cipher - 
>   "Notes on the design and analysis of Simon and Speck" seems to me to give ... detail on the reasoning
This is actually an optical illusion. First you need to understand the
context for this document. The NSA (in particular, the exact same person
who previously promoted DUAL_EC in ISO) proposed to include Simon &
Speck in ISO/IEC 29192-2 back in 2015. For obvious reasons they were met
with skepticism. A main concern was the lack of any design rationale and
internal cryptanalytic results. The NSA people fought tooth and nail for
a year and a half simultaneously arguing two almost mutually-exclusive
points: (i) they employ the most talented cryptographers and hence, we
should trust them when they say that an algorithm is secure; and (ii)
they are average cryptographers and hence they would not be able to
insert a backdoor into the algorithm.

More than once they argued in a meeting that the cryptanalysis for the
ciphers has been stabilized (i.e., that attacks will not improve) just
to be proved wrong in the next meeting (their answer: "well, _now_ it
has fully stabilized", which was again proven wrong in the next
meeting). One of them even had a bet with Tanja Lange that no attack on
either Simon or Speck would be extended by 3 rounds or more in the
upcoming year. He lost this bet. They were very uncooperative, and made
it a point to let us know that they will not be providing more
information about the algorithms.

So, in this climate, you can imagine how surprised we all were when in
one of the meetings (after not getting the votes they needed in order to
proceed to the next stage) they announced that they will provide a
design rationale. At first they distributed it to us in ISO, but per my
suggestion they then uploaded it to ePrint (see ePrint 2017/560).

But our joy was short-lived. Once you read this so-called design
rationale you can immediately notice two things. Firstly, that they
explain in length all decisions affecting performance (in particular,
rotation amounts - which in one of the meetings they described as
"most-efficient; secure-enough"). The second thing is that when it comes
to cryptanalysis this document is merely a literature review. There is
literally nothing new there - all they do is to cite published works by
academics, something wrongly.

Now, there is no nice way to say that, but this document includes
omissions, falsehoods, half-truths and outright lies. I will not go into
the full analysis of the document, but here are some examples:

 1. Omissions - I already said that this document does not provide any
    new information. This becomes apparent when you try to find out how
    they chose the number of rounds. The document remains quite vague on
    this question. There is a lot of hand waving about "Matsui-like
    techniques", "multipath effect", etc. but nowhere you can find (in
    the old version, they recently uploaded a new version which I didn't
    have time to read yet) a place where they say: "this is how we set
    the number of rounds".

    Another omission is about the key schedule - you won't find any
    useful information about the design decisions leading to these
    particular key schedules. Simon uses 3 matrices U,V, and W which are
    not explained, not does the constant c. Speck's key schedule is more
    straightforward but a discussion about the symmetries that may arise
    from using the round function for the key schedule would still be
    appropriate here. Not discussing the combined security of the cipher
    with its key schedule goes against the current trend in linear
    cryptanalysis (see e.g., [2] and many follow up papers).
 2. Half-truths -  take a look at page 16 where they explain how they
    avoided rotation/slide attacks. They give the standard explanation
    that using round-constants would thwart these attacks. This could
    have been fine if the last sentence wasn't "/Also see [AL16]/". From
    the text it seems as if /AL16/ supports the claims made in this
    paragraph. However, /AL16/ is a paper I co-authored which is how I
    know that not only that it doesn't support the claim, it actually
    shows how to adapt rotational cryptanalysis to algorithms using
    round constants.

    As a side note, the goal of /AL16/ was to present a novel way to use
    rotational cryptanalysis in the presence of round constants. This
    paper was published in FSE'17 and we followed up on it with a paper
    in FSE'18 using this attack against Speck{32,48,64} [1]. The reason
    we focused on these versions and not the larger one is not, as was
    suggested in this thread, that they are somehow more secure. The
    actual reason is much less prosaic: these are the resources we had
    at our disposal. This is also the reason the weak-key classes are so
    small. But the fact that my publicly funded university cannot afford
    a better number cruncher doesn't mean that someone with access to
    such won't be able to find better results. In fact, I am quite
    convinced that if you give our tool the resources it needs, it would
    penetrate way more than the currently best known distinguisher of 19
    rounds for Speck128 (translating to better key recovery attacks).

    What is important to understand here is in the same way you do
    "real-world crypto", academics often do "proofs of concept". After
    publishing the attack technique and the attack on (reduced-)Speck, I
    moved to my next project because the scientific marginal benefit is
    small. There is of course the personal gain of being known as the
    guy who broke Speck, but I'm not particularly interested in such
    fame. All of that being said, if anyone has the firepower to run
    this tool and to improve the existing attacks for Speck128, feel
    free to drop me an email.
 3. Falsehoods - with this word I refer to claims in the so-called
    design rationale that are wrong. We can argue whether they were
    included on purpose or if they are simply mistakes. But in either
    case, they are exist and they are worrisome. I would only give one
    example: "/the design team’s early analytic efforts led us to
    believe that the limiting cryptanalytic features for Simon and
    Speck-type block ciphers would be of the linear and differential
    sort"/ (see Page 4). Believing that differential and linear attacks
    would be the most dangerous attacks is reasonable, but as we can see
    from [1], it is wrong.
 4. Lies - this is the most troubling part. The NSA lies to the public
    (including the American people) on official documents. I already
    wrote that the choice for the exact number of rounds is only
    motivated through some hand waving. This makes it hard to tell what
    the real security margin is. But even if you interpret the hand
    waving conservatively, the math results in much smaller security
    margins than what is claimed. I gave a rump session talk about this
    in Crypto 2017 which you can view here [3]. The talk focuses on
    Simon but the story for Speck is similar and results in security
    margins of 15.6%, 15.6%, and 14.7% for Speck128 with key sizes 128,
    192, and 256, respectively. According to the NSA, that is, and only
    if you accept the claim that attacks have stabilized.

    the choice for the number of rounds was heavily discussed in the ISO
    meeting in Berlin about 6 months ago. When confronted with this
    question, the NSA answered (again) that they will not be providing
    further information, added that anyone with a decent level of
    English would immediately understand what they meant, and called me
    an incompetent cryptographer. Nevertheless, a few months after the
    meeting they updated the so-called design rationale and added a
    footnote that reads:
>     "The original version of this paper said 50% here, but noted that
>     this was “very conser-
>     vative.” This led to confusion by some, who interpreted 50% as an
>     exact value, rather than
>     the very conservative upper bound we intended it to be. This is
>     supported by the literature
>     (see, e.g., [CW15]) and by our internal analysis. Indeed 50% is a
>     significant overestimate;
>     25% appears to be a more accurate estimate. We apologize for the
>     lack of clarity here, and
>     note that even if future advances increased the 25% to 50% Simon
>     would still be secure." (Page 11)
    This is a fine clarification except that it is an outrageous lie.
    For example, for Simon32 the so-called design rationale reports that
    the best linear trail can penetrate at most 12 rounds. As part of my
    research I found an 18-round linear hull which _was confirmed, in
    writing,_ by the NSA (I should have the email somewhere and can find
    it if anyone is interested). The difference between 12 and 18 rounds
    is indeed 50% and not 25% as they argue in the updated document.

These are only part of the problems I and others found with the
so-called design rationale. Having so many problems in a document meant
to convince people that you're not doing anything sinister is either an
indication for some serious incompetence, or an indication that
something sinister is actually happening. Either way, it is clear that
this document is meant for PR and has no scientific value. It surely
does not inspire confidence in the algorithms.

All of this was known to the people in the room when ISO made its
decision to reject Simon and Speck (after deliberating about this for
more than 3 years. Not because there were disagreements but because we
wanted to give the NSA a fair chance). These people also got a first
hand impression of how poorly the people the NSA sent fare with
_technical_ questions, basically refusing to answer all, and throwing
tantrums instead. And then, the ISO people also saw another thing.
During the discussions I asked the NSA two non-technical questions (from
a crypto point of view. These are technical questions from a
standardization point of view): 
    - Q: You claim that third party analysis is indicative of the
algorithm's real security. Were you aware of all these results when you
published the algorithms, or are any of them better than what you knew of?
    - A: I refuse to answer that
    -Q: Are you aware of any cryptanalytic results better than those
already found by academia?
    -A: I refuse to answer that either.

Now, there seem to be some notion that the people in ISO are bureaucrats
with limited understanding in cryptography. The truth is that WG 2 (the
cryptography experts) includes people like Kan Yasuda, Shiho Moriai, Dan
Berenstein, Pascal Paillier, Tanja Lange, Orr Dunkelman and Jian Guo
(partial list). You can't say that they don't know what they're doing.
Which is why, having all this information, we decided that including
these algorithms in one of our standards would undermine the trust
people have in ISO and the work it is doing.

Note that in parallel to the Simon and Speck process, people from the
NSA (different from those involved in Simon and Speck) are successfully
promoting at least two other projects. So you can't say that there
really is a significant anti-NSA bias either. No, these algorithms seem
insecure, attacks against them keep improving, their designers either
refuse to answer basic questions about their security or lie... What
other conclusion could we have reached except that there might be a
security problem with these algorithms?

This of course brings us back to the question asked early in this thread:

> support for SM4 was just added too, which is a Chinese government standard. Are you going to send a patch to remove that
> too, or is it just NSA designed algorithms that are not okay?
This seems pretty obvious to me. If you don't feel comfortable with SM4,
don't add it either. There are at least that many reasons to distrust
the Chinese government as there are to distrust the NSA.

However, the answer to the question
> Could you say a little more about what it is that separates Speck from SM4
> for you?
is a bit different. There are two main things that separate Speck from
SM4. Firstly, it seems more secure. This is either because it actually
is more secure, or because the Chinese did a better job in hiding their
backdoors; but at least it doesn't scream "something strange is going on
here!!!". Second, SM4 is also being standardized in ISO these days and
the Chinese are very cooperative with the process. Whatever question you
have about this algorithm, I can get you an answer from the person
promoting SM4. This inspires confidence in the algorithm and the
process. Is this enough? I don't think so. But being a member of ISO I'm
bound by certain rules that don't allow me to reject algorithms based on
my intuition, so it seems that SM4 (as well as LEA and Kuznyechik) would
probably find their way into the respective standards.

That being said, if you ask for my opinion, just don't include SM4.

Which bring us to the million dollar question:
> So, what do you propose replacing it with?
Nothing. I am usually not one to argue for maintaining the status quo
and I sure am in favor of encryption-for-all but this case is the text
book example for employing the Precautionary Principle. You yourself are
not fully convinced that Speck is secure and does not contain any
backdoors. If it was really secure, it could have been used in all cases
and not only on low-end devices where AES is too slow. AES is slower
than Speck on most platforms.

Now, I'm a sort of a mathematician which doesn't know much about
processor generations and implementation efficiency. Things like 134833
KB/s are Chinese to me. But the way I understand it, these devices that
are to weak to support AES would not be around in 2-5 years which would
make the problem go away. In the foreseeable future, even if the
crypto-extension isn't added to low-end processors, they would still
improve to a degree they can run some of the efficient-but-not-enough
algorithms of today, no?

I would also like to point out that including an algorithm because "it's
better than nothing" result in something that is not
better-than-nothing, but stands in the way of good solutions. Since
there is no acute problem, why do we need to solve it? This is from the
cryptographers' point of view. From the end-user point of view when they
get something bundled into Android, they don't know that it was included
there as something that is "better than nothing". They think of it as
"good enough; endorsed by Android/Google/Linux". What you give them is a
false sense of security because they don't know of all the question
marks surrounding Speck (both technical and political).

So I think that as a first step, no-encryption is better than using
Speck. Then we can move for a longer term solution. Since this is an
important enough issue I asked around and people are happily willing to
help. For example, Dan Berenstein seems to believe that a solution can
be built using a generic construction along the lines of your discussion
with Samuel (with or without a variant of ChaCha). Even if a generic
construction cannot be used Berenstein told me he's willing to help
design a solution. I also asked Vincent Rijmen and Orr Dunkelman and
they both told me they'd be willing to work in a team to find (or
design) a solution. This is already an impressive cadre and I'm sure it
would not be too much of a problem to solicit other notable
cryptographer because basically, no one in this community thinks it's a
good idea to use Speck.

Sorry for the long post and Shabbat Shalom,

Tomer Ashur, PhD
Senior Researcher
COSIC, KU Leuven

[1] https://eprint.iacr.org/2017/1036
[2] https://eprint.iacr.org/2012/303
[3] https://www.youtube.com/watch?v=3d-xruyR89g&t=2s




On 05/08/2018 01:20 AM, Eric Biggers wrote:
> Hi Samuel,
>
> On Thu, Apr 26, 2018 at 03:05:44AM +0100, Samuel Neves wrote:
>> On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
>>> I agree that my explanation should have been better, and should have considered
>>> more crypto algorithms.  The main difficulty is that we have extreme performance
>>> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
>>> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
>>> performance exceeding that after much optimization, we've been getting a lot of
>>> pushback as people want closer to 100 MB/s.
>>>
>> I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
>> would put the performance upper bound around 15 cycles per byte, with
>> the comfortable number being ~7. That's indeed tough, though not
>> impossible.
>>
>>> That's why I also included Speck64-XTS in the patches, since it was
>>> straightforward to include, and some devices may really need that last 20-30% of
>>> performance for encryption to be feasible at all.  (And when the choice is
>>> between unencrypted and a 64-bit block cipher, used in a context where the
>>> weakest points in the cryptosystem are actually elsewhere such as the user's
>>> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
>>> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
>>> that continues to be the case I'd be fine with Speck64 being removed, leaving
>>> just Speck128.
>>>
>> I would very much prefer that to be the case. As many of us know,
>> "it's better than nothing" has been often used to justify other bad
>> choices, like RC4, that end up preventing better ones from being
>> adopted. At a time where we're trying to get rid of 64-bit ciphers in
>> TLS, where data volumes per session are comparatively low, it would be
>> unfortunate if the opposite starts happening on encryption at rest.
>>
>>> Note that in practice, to have any chance at meeting the performance requirement
>>> the cipher needed to be NEON accelerated.  That made benchmarking really hard
>>> and time-consuming, since to definitely know how an algorithm performs it can
>>> take upwards of a week to implement a NEON version.  It needs to be very well
>>> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
>>> performance improvement on some CPUs just by changing the NEON instructions used
>>> to implement the 8-bit rotates, an optimization that is not possible with
>>> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
>>> intentional design choice by the Speck designers; they do know what they're
>>> doing, actually.)
>>>
>>> Thus, we had to be pretty aggressive about dropping algorithms from
>>> consideration if there were preliminary indications that they wouldn't perform
>>> well, or had too little cryptanalysis, or had other issues such as an unclear
>>> patent situation.  Threefish for example I did test the C implementation at
>>> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
>>> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
>>> that it could be improved over 4x with NEON, if at all, so I did not take the
>>> long time it would have taken to write an optimized NEON implementation to
>>> benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
>>>
>> In my limited experience with NEON and 64-bit ARX, there's usually a
>> ~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
>> The extra speedup from encrypting 2 block in parallel is then
>> somewhere between 1x and 2x, depending on various details. Getting
>> near 4x might be feasible, but it is indeed time-consuming to get
>> there.
>>
>>> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
>>> Crowley to explain it properly, but briefly it's actually a pseudorandom
>>> permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
>>> would operate on a whole 512-byte sector, and if any bit of the 512-byte
>>> plaintext is changed, then every bit in the 512-byte ciphertext would change
>>> with 50% probability.  To make this possible, the construction uses a polynomial
>>> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
>>> mode.
>>>
>> Oh, OK, that sounds like something resembling Naor-Reingold or its
>> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
>> be very fast.
>>
>>> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
>>> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
>>> obvious to me how to do so.  Do you have references to any relevant papers?
>>> Remember that we strongly prefer a published cipher to a custom one -- even if
>>> the core is reused, a mistake may be made in the way it is used.  Thus,
>>> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
>>> self-publish a new construction, then use it with no outside crypto review.
>>> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
>>> see the details of how it would actually work.
>>>
>> This would be the 'tweakable Even-Mansour' construction and its
>> variants. The variant I'm most familiar with would be MEM [1],
>> focusing on software friendliness, but there is other provable
>> security work in the same vein, including [3, 4, 5]. It's very similar
>> to how the XEX mode turns a block cipher into a tweakable block
>> cipher.
>>
>> In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
>> ChaCha20's, but everything translates easily from one to the other. We
>> also included cheap masks for 512-bit permutations, just in case.
>>
>> [1] https://eprint.iacr.org/2015/999
>> [2] https://github.com/MEM-AEAD/mem-aead
>> [3] https://eprint.iacr.org/2015/539
>> [4] https://eprint.iacr.org/2015/476
>> [5] https://competitions.cr.yp.to/round2/minalpherv11.pdf
>>
>>> But in the end, Speck seemed like the clear choice because it had multiple NEON
>>> implementations available already which showed it could be implemented very
>>> efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
>>> ciphers) yet the security margin is still similar to AES; it has no intellectual
>>> property concerns; there is a paper clearly explaining the design decisions; it
>>> is naturally resistant to timing attacks; it supports a 128-bit block size, so
>>> it can be easily used in XTS mode; it supports the same key sizes as AES; and it
>>> has a simple and understandable design with no "magic numbers" besides 8 and 3
>>> (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
>>> had a public key embedded in the algorithm).  Also as Paul mentioned he is
>>> confident in the construction, and he has published cryptanalysis on Salsa20, so
>>> his opinion is probably more significant than mine :-)
>>>
>>> But I will definitely take a closer look at SPARX and some of the other ciphers
>>> you mentioned in case I missed something.  I really do appreciate the
>>> suggestions, by the way, and in any case we do need to be very well prepared to
>>> justify our choices.  I just hope that people can understand that we are
>>> implementing real-world crypto which must operate under *very* tight performance
>>> constraints on ARM processors, and it must be compatible with dm-crypt and
>>> fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
>>> at first seem reasonable choices had to (unfortunately) be excluded.
>>>
>> I understand it is a tough choice, and it's unfortunate that many of
>> the algorithms we have cater mostly to either the
>> high-hardware-accelerated-end or the extremely low-end, without a lot
>> of good options at the middle-end.
>>
> First, we're planning a publication which explains our choices in more detail,
> so please treat this as some more preliminary notes.
>
> To make sure we've exhausted as many alternatives as possible, I wrote NEON
> implementations of all the block ciphers you suggested with the exception of
> SKINNY (which looked very hardware-oriented and not efficient in software), as
> well as some that others have suggested.  (It was tough, but after doing a
> couple, it got much easier...)  The following shows the decryption performance
> I'm getting on an ARMv7 platform.  Encryption speeds were usually similar, but
> in our use case we care much more about decryption, as that affects the most
> critical metrics such as the time to launch applications.
>
> 	ChaCha8-MEM: 183256 KB/s
> 	ChaCha12-MEM: 134833 KB/s
> 	Chaskey-LTS-XTS: 99097 KB/s
> 	ChaCha20-MEM: 87875 KB/s
> 	Speck64/128-XTS: 85332 KB/s
> 	Speck128/128-XTS: 73404 KB/s
> 	RC5-128/12/256-XTS: 69887 KB/s
> 	Speck128/256-XTS: 69597 KB/s
> 	RC5-64/12/128-XTS: 69267 KB/s
> 	LEA-128-XTS: 67986 KB/s
> 	CHAM128/128-XTS: 52982 KB/s
> 	LEA-256-XTS: 50429 KB/s
> 	Threefish-256: 48349 KB/s
> 	RC6-XTS: 46855 KB/s
> 	RC5-128/20/256-XTS: 44291 KB/s
> 	RC5-64/20/128-XTS: 43924 KB/s
> 	NOEKEON-XTS: 40705 KB/s
> 	Sparx128/128-XTS: 39191 KB/s
> 	XTEA-XTS: 38239 KB/s
> 	AES-128-XTS: 25549 KB/s
> 	AES-256-XTS: 18640 KB/s
>
> Remember that for dm-crypt or fscrypt over flash storage and/or f2fs, a stream
> cipher is insecure.  Moreover, on these (low-end) devices the status quo is no
> encryption, and we need every bit of performance available.  Anything below
> 50 MB/s is definitely unacceptable.  But even at that speed we get many
> complaints, so in practice we need something faster.  That means that the
> algorithms close to 50 MB/s, such as Threefish, still aren't fast enough.
>
> ChaCha-MEM (based roughly on your paper: https://eprint.iacr.org/2015/999), has
> the best performance, especially if we allow for the 12 or 8-round variants.  My
> code for it is based roughly on the existing
> arch/arm/crypto/chacha20-neon-core.S, but updated to support the inverse
> permutation (on 4 blocks at a time, using all 16 NEON registers) and do the
> masking required by MEM.  However, ChaCha-MEM would be a pretty bleeding-edge
> and customized construction, and Paul Crowley and I have concerns about its
> security.  The problem is that the MEM security proof assumes that the
> underlying permutation has no more detectable structural properties than a
> randomly selected permutation.  However, the ChaCha permutation is known to have
> certain symmetries, e.g. if the sixteen 32-bit words are (a, a, a, a, b, b, b,
> b, c, c, c, c, d, d, d, d), then they always map to some (e, e, e, e, f, f, f,
> f, g, g, g, g, h, h, h, h).
>
> For the MEM mask generation, we can use the "expand 32-byte k" constant to break
> the symmetry, like is done in the ChaCha stream cipher.  However, that's not
> possible for the inner application of the permutation.  So, we'd be using the
> ChaCha permutation in a manner in which it wasn't intended, and the security of
> the ChaCha stream cipher wouldn't directly carry over.  Granted, it's not
> impossible that it would be secure, but at the present time it doesn't seem like
> a good choice to actually field.
>
> Chaskey-LTS is faster than Speck, but unfortunately it's not really a viable
> option because it has only a 64-bit security level, due to its use of the
> Even-Mansour construction with a 128-bit key.  Of course, it would still be
> better than nothing, but we prefer a cipher that has a security level in line
> with what is accepted for modern crypto.
>
> RC5 with the traditional 12 rounds is about as fast as Speck, but there is a
> known differential attack on that number of rounds.  So if we choose RC5 we'd
> almost certainly have to use the 20-round variant, which is much slower.
>
> That leaves LEA-128-XTS as the only other algorithm that might meet the
> performance requirement, as it is only slightly slower than Speck128-XTS.  It
> may be the most viable alternative, but beyond the slight performance loss it
> still has some disadvantages compared to Speck:
>
> - Importantly, the LEA authors forgot to include test vectors, so I'm not yet
>   100% sure I implemented it correctly.  (The Speck authors unfortunately didn't
>   make the endianness of their test vectors clear in their initial publication,
>   but at least they actually provided test vectors!)
> - LEA has received some cryptanalysis, but not nearly as much as Speck.
> - It took some very heavy optimization to get good LEA performance, much more
>   than I had to do for Speck.  My final LEA code has separate code paths for
>   128-bit and 256-bit keys, and has reordered and preprocessed the round keys,
>   and reordered the operations.  As a result, it's harder to see how it maps to
>   the original paper.  In contrast, my Speck code is more straightforward and
>   maintainable.
> - LEA-256 (256-bit key) is much slower than LEA-128 (128-bit key), as it has
>   33% more rounds.  LEA-256 would not be fast enough, so we would have to use
>   LEA-128.  In contrast, with Speck we can use Speck128/256 (256-bit key).
>   We're willing to accept a 128-bit security level, but 256-bit is preferable.
>   (I think the Speck designers took a more informed approach to setting
>   appropriate security margins for a lightweight cipher; it seems that other
>   designers often choose too few or too many rounds, especially as the key
>   length is varied.)
> - LEA encryption is also a bit slower than decryption, while with Speck
>   encryption and decryption are almost exactly the same speed.
>
> Note that like Speck, LEA doesn't appear to be approved by a standards
> organization either; it's just specified in a research paper.
>
> Thus, from a technical perspective, and given the current state of the art in
> lightweight cryptography, currently Speck128-XTS seems to be the best choice for
> the problem domain.  It's unfortunate that there are so few good options and
> that the field is so politicized, but it is what it is.
>
> Still, we don't want to abandon HPolyC (Paul's new ChaCha and Poly1305-based
> wide-block mode), and eventually we hope to offer it as an option as well.  But
> it's not yet published, and it's a more complex algorithm that is harder to
> implement so I haven't yet had a chance to implement and benchmark it.  And we
> don't want to continue to leave users unprotected while we spend a long time
> coming up with the perfect algorithm, or for hardware AES support to arrive to
> all low-end CPUs when it's unclear if/when that will happen.
>
> Again, we're planning a publication which will explain all this in more detail.
>
> Thanks!
>
> Eric



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-06-01 19:23   ` Tomer Ashur
  0 siblings, 0 replies; 56+ messages in thread
From: Tomer Ashur @ 2018-06-01 19:23 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Samuel Neves, Jason A. Donenfeld, Linux Crypto Mailing List,
	Herbert Xu, linux-fscrypt, linux-arm-kernel, Ard Biesheuvel,
	Jeffrey Walton, Paul Crowley, Patrik Torstensson, Greg Kaiser,
	Paul Lawrence, Michael Halcrow, Alex Cope, Greg Kroah-Hartman


[-- Attachment #1.1: Type: text/plain, Size: 32255 bytes --]

[Resending because the email bounced back from all 3 mailing lists.
Sorry if you get this email twice]
Hi Eric et al.,
I know that this thread is already stale, and I'm sorry I couldn't join
earlier but maybe late is better than never. Allow me to first introduce
myself: my name is Tomer Ashur and I'm a post-doctoral fellow in KU
Leuven. I am part of symmetric-key group led by Vincent Rijmen where I'm
mostly involved in cryptanalysis. I am also part of ISO/IEC JTC 1/SC
27/WG 2, the group which decided to reject Simon and Speck from ISO. If
it's okay with you, I'd like to give my perspective on what happened in
ISO and what is Speck's real standing with the academic community.

First, I'd like to say that the NSA has done quite extensive work in
muddying the waters, arguing that Simon & Speck are secure and that all
objections are political. This is not true, as I will now show with
examples. The bottom line is that there are still many open questions
about their security, questions that the NSA has, on multiple occasions,
refused to answer.

> It seems to me justified about as well as one would hope for a new cipher - 
>   "Notes on the design and analysis of Simon and Speck" seems to me to give ... detail on the reasoning
This is actually an optical illusion. First you need to understand the
context for this document. The NSA (in particular, the exact same person
who previously promoted DUAL_EC in ISO) proposed to include Simon &
Speck in ISO/IEC 29192-2 back in 2015. For obvious reasons they were met
with skepticism. A main concern was the lack of any design rationale and
internal cryptanalytic results. The NSA people fought tooth and nail for
a year and a half simultaneously arguing two almost mutually-exclusive
points: (i) they employ the most talented cryptographers and hence, we
should trust them when they say that an algorithm is secure; and (ii)
they are average cryptographers and hence they would not be able to
insert a backdoor into the algorithm.

More than once they argued in a meeting that the cryptanalysis for the
ciphers has been stabilized (i.e., that attacks will not improve) just
to be proved wrong in the next meeting (their answer: "well, _now_ it
has fully stabilized", which was again proven wrong in the next
meeting). One of them even had a bet with Tanja Lange that no attack on
either Simon or Speck would be extended by 3 rounds or more in the
upcoming year. He lost this bet. They were very uncooperative, and made
it a point to let us know that they will not be providing more
information about the algorithms.

So, in this climate, you can imagine how surprised we all were when in
one of the meetings (after not getting the votes they needed in order to
proceed to the next stage) they announced that they will provide a
design rationale. At first they distributed it to us in ISO, but per my
suggestion they then uploaded it to ePrint (see ePrint 2017/560).

But our joy was short-lived. Once you read this so-called design
rationale you can immediately notice two things. Firstly, that they
explain in length all decisions affecting performance (in particular,
rotation amounts - which in one of the meetings they described as
"most-efficient; secure-enough"). The second thing is that when it comes
to cryptanalysis this document is merely a literature review. There is
literally nothing new there - all they do is to cite published works by
academics, something wrongly.

Now, there is no nice way to say that, but this document includes
omissions, falsehoods, half-truths and outright lies. I will not go into
the full analysis of the document, but here are some examples:

 1. Omissions - I already said that this document does not provide any
    new information. This becomes apparent when you try to find out how
    they chose the number of rounds. The document remains quite vague on
    this question. There is a lot of hand waving about "Matsui-like
    techniques", "multipath effect", etc. but nowhere you can find (in
    the old version, they recently uploaded a new version which I didn't
    have time to read yet) a place where they say: "this is how we set
    the number of rounds".

    Another omission is about the key schedule - you won't find any
    useful information about the design decisions leading to these
    particular key schedules. Simon uses 3 matrices U,V, and W which are
    not explained, not does the constant c. Speck's key schedule is more
    straightforward but a discussion about the symmetries that may arise
    from using the round function for the key schedule would still be
    appropriate here. Not discussing the combined security of the cipher
    with its key schedule goes against the current trend in linear
    cryptanalysis (see e.g., [2] and many follow up papers).
 2. Half-truths -  take a look at page 16 where they explain how they
    avoided rotation/slide attacks. They give the standard explanation
    that using round-constants would thwart these attacks. This could
    have been fine if the last sentence wasn't "/Also see [AL16]/". From
    the text it seems as if /AL16/ supports the claims made in this
    paragraph. However, /AL16/ is a paper I co-authored which is how I
    know that not only that it doesn't support the claim, it actually
    shows how to adapt rotational cryptanalysis to algorithms using
    round constants.

    As a side note, the goal of /AL16/ was to present a novel way to use
    rotational cryptanalysis in the presence of round constants. This
    paper was published in FSE'17 and we followed up on it with a paper
    in FSE'18 using this attack against Speck{32,48,64} [1]. The reason
    we focused on these versions and not the larger one is not, as was
    suggested in this thread, that they are somehow more secure. The
    actual reason is much less prosaic: these are the resources we had
    at our disposal. This is also the reason the weak-key classes are so
    small. But the fact that my publicly funded university cannot afford
    a better number cruncher doesn't mean that someone with access to
    such won't be able to find better results. In fact, I am quite
    convinced that if you give our tool the resources it needs, it would
    penetrate way more than the currently best known distinguisher of 19
    rounds for Speck128 (translating to better key recovery attacks).

    What is important to understand here is in the same way you do
    "real-world crypto", academics often do "proofs of concept". After
    publishing the attack technique and the attack on (reduced-)Speck, I
    moved to my next project because the scientific marginal benefit is
    small. There is of course the personal gain of being known as the
    guy who broke Speck, but I'm not particularly interested in such
    fame. All of that being said, if anyone has the firepower to run
    this tool and to improve the existing attacks for Speck128, feel
    free to drop me an email.
 3. Falsehoods - with this word I refer to claims in the so-called
    design rationale that are wrong. We can argue whether they were
    included on purpose or if they are simply mistakes. But in either
    case, they are exist and they are worrisome. I would only give one
    example: "/the design team’s early analytic efforts led us to
    believe that the limiting cryptanalytic features for Simon and
    Speck-type block ciphers would be of the linear and differential
    sort"/ (see Page 4). Believing that differential and linear attacks
    would be the most dangerous attacks is reasonable, but as we can see
    from [1], it is wrong.
 4. Lies - this is the most troubling part. The NSA lies to the public
    (including the American people) on official documents. I already
    wrote that the choice for the exact number of rounds is only
    motivated through some hand waving. This makes it hard to tell what
    the real security margin is. But even if you interpret the hand
    waving conservatively, the math results in much smaller security
    margins than what is claimed. I gave a rump session talk about this
    in Crypto 2017 which you can view here [3]. The talk focuses on
    Simon but the story for Speck is similar and results in security
    margins of 15.6%, 15.6%, and 14.7% for Speck128 with key sizes 128,
    192, and 256, respectively. According to the NSA, that is, and only
    if you accept the claim that attacks have stabilized.

    the choice for the number of rounds was heavily discussed in the ISO
    meeting in Berlin about 6 months ago. When confronted with this
    question, the NSA answered (again) that they will not be providing
    further information, added that anyone with a decent level of
    English would immediately understand what they meant, and called me
    an incompetent cryptographer. Nevertheless, a few months after the
    meeting they updated the so-called design rationale and added a
    footnote that reads:
>     "The original version of this paper said 50% here, but noted that
>     this was “very conser-
>     vative.” This led to confusion by some, who interpreted 50% as an
>     exact value, rather than
>     the very conservative upper bound we intended it to be. This is
>     supported by the literature
>     (see, e.g., [CW15]) and by our internal analysis. Indeed 50% is a
>     significant overestimate;
>     25% appears to be a more accurate estimate. We apologize for the
>     lack of clarity here, and
>     note that even if future advances increased the 25% to 50% Simon
>     would still be secure." (Page 11)
    This is a fine clarification except that it is an outrageous lie.
    For example, for Simon32 the so-called design rationale reports that
    the best linear trail can penetrate at most 12 rounds. As part of my
    research I found an 18-round linear hull which _was confirmed, in
    writing,_ by the NSA (I should have the email somewhere and can find
    it if anyone is interested). The difference between 12 and 18 rounds
    is indeed 50% and not 25% as they argue in the updated document.

These are only part of the problems I and others found with the
so-called design rationale. Having so many problems in a document meant
to convince people that you're not doing anything sinister is either an
indication for some serious incompetence, or an indication that
something sinister is actually happening. Either way, it is clear that
this document is meant for PR and has no scientific value. It surely
does not inspire confidence in the algorithms.

All of this was known to the people in the room when ISO made its
decision to reject Simon and Speck (after deliberating about this for
more than 3 years. Not because there were disagreements but because we
wanted to give the NSA a fair chance). These people also got a first
hand impression of how poorly the people the NSA sent fare with
_technical_ questions, basically refusing to answer all, and throwing
tantrums instead. And then, the ISO people also saw another thing.
During the discussions I asked the NSA two non-technical questions (from
a crypto point of view. These are technical questions from a
standardization point of view): 
    - Q: You claim that third party analysis is indicative of the
algorithm's real security. Were you aware of all these results when you
published the algorithms, or are any of them better than what you knew of?
    - A: I refuse to answer that
    -Q: Are you aware of any cryptanalytic results better than those
already found by academia?
    -A: I refuse to answer that either.

Now, there seem to be some notion that the people in ISO are bureaucrats
with limited understanding in cryptography. The truth is that WG 2 (the
cryptography experts) includes people like Kan Yasuda, Shiho Moriai, Dan
Berenstein, Pascal Paillier, Tanja Lange, Orr Dunkelman and Jian Guo
(partial list). You can't say that they don't know what they're doing.
Which is why, having all this information, we decided that including
these algorithms in one of our standards would undermine the trust
people have in ISO and the work it is doing.

Note that in parallel to the Simon and Speck process, people from the
NSA (different from those involved in Simon and Speck) are successfully
promoting at least two other projects. So you can't say that there
really is a significant anti-NSA bias either. No, these algorithms seem
insecure, attacks against them keep improving, their designers either
refuse to answer basic questions about their security or lie... What
other conclusion could we have reached except that there might be a
security problem with these algorithms?

This of course brings us back to the question asked early in this thread:

> support for SM4 was just added too, which is a Chinese government standard. Are you going to send a patch to remove that
> too, or is it just NSA designed algorithms that are not okay?
This seems pretty obvious to me. If you don't feel comfortable with SM4,
don't add it either. There are at least that many reasons to distrust
the Chinese government as there are to distrust the NSA.

However, the answer to the question
> Could you say a little more about what it is that separates Speck from SM4
> for you?
is a bit different. There are two main things that separate Speck from
SM4. Firstly, it seems more secure. This is either because it actually
is more secure, or because the Chinese did a better job in hiding their
backdoors; but at least it doesn't scream "something strange is going on
here!!!". Second, SM4 is also being standardized in ISO these days and
the Chinese are very cooperative with the process. Whatever question you
have about this algorithm, I can get you an answer from the person
promoting SM4. This inspires confidence in the algorithm and the
process. Is this enough? I don't think so. But being a member of ISO I'm
bound by certain rules that don't allow me to reject algorithms based on
my intuition, so it seems that SM4 (as well as LEA and Kuznyechik) would
probably find their way into the respective standards.

That being said, if you ask for my opinion, just don't include SM4.

Which bring us to the million dollar question:
> So, what do you propose replacing it with?
Nothing. I am usually not one to argue for maintaining the status quo
and I sure am in favor of encryption-for-all but this case is the text
book example for employing the Precautionary Principle. You yourself are
not fully convinced that Speck is secure and does not contain any
backdoors. If it was really secure, it could have been used in all cases
and not only on low-end devices where AES is too slow. AES is slower
than Speck on most platforms.

Now, I'm a sort of a mathematician which doesn't know much about
processor generations and implementation efficiency. Things like 134833
KB/s are Chinese to me. But the way I understand it, these devices that
are to weak to support AES would not be around in 2-5 years which would
make the problem go away. In the foreseeable future, even if the
crypto-extension isn't added to low-end processors, they would still
improve to a degree they can run some of the efficient-but-not-enough
algorithms of today, no?

I would also like to point out that including an algorithm because "it's
better than nothing" result in something that is not
better-than-nothing, but stands in the way of good solutions. Since
there is no acute problem, why do we need to solve it? This is from the
cryptographers' point of view. From the end-user point of view when they
get something bundled into Android, they don't know that it was included
there as something that is "better than nothing". They think of it as
"good enough; endorsed by Android/Google/Linux". What you give them is a
false sense of security because they don't know of all the question
marks surrounding Speck (both technical and political).

So I think that as a first step, no-encryption is better than using
Speck. Then we can move for a longer term solution. Since this is an
important enough issue I asked around and people are happily willing to
help. For example, Dan Berenstein seems to believe that a solution can
be built using a generic construction along the lines of your discussion
with Samuel (with or without a variant of ChaCha). Even if a generic
construction cannot be used Berenstein told me he's willing to help
design a solution. I also asked Vincent Rijmen and Orr Dunkelman and
they both told me they'd be willing to work in a team to find (or
design) a solution. This is already an impressive cadre and I'm sure it
would not be too much of a problem to solicit other notable
cryptographer because basically, no one in this community thinks it's a
good idea to use Speck.

Sorry for the long post and Shabbat Shalom,

Tomer Ashur, PhD
Senior Researcher
COSIC, KU Leuven

[1] https://eprint.iacr.org/2017/1036
[2] https://eprint.iacr.org/2012/303
[3] https://www.youtube.com/watch?v=3d-xruyR89g&t=2s




On 05/08/2018 01:20 AM, Eric Biggers wrote:
> Hi Samuel,
>
> On Thu, Apr 26, 2018 at 03:05:44AM +0100, Samuel Neves wrote:
>> On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
>>> I agree that my explanation should have been better, and should have considered
>>> more crypto algorithms.  The main difficulty is that we have extreme performance
>>> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
>>> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
>>> performance exceeding that after much optimization, we've been getting a lot of
>>> pushback as people want closer to 100 MB/s.
>>>
>> I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
>> would put the performance upper bound around 15 cycles per byte, with
>> the comfortable number being ~7. That's indeed tough, though not
>> impossible.
>>
>>> That's why I also included Speck64-XTS in the patches, since it was
>>> straightforward to include, and some devices may really need that last 20-30% of
>>> performance for encryption to be feasible at all.  (And when the choice is
>>> between unencrypted and a 64-bit block cipher, used in a context where the
>>> weakest points in the cryptosystem are actually elsewhere such as the user's
>>> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
>>> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
>>> that continues to be the case I'd be fine with Speck64 being removed, leaving
>>> just Speck128.
>>>
>> I would very much prefer that to be the case. As many of us know,
>> "it's better than nothing" has been often used to justify other bad
>> choices, like RC4, that end up preventing better ones from being
>> adopted. At a time where we're trying to get rid of 64-bit ciphers in
>> TLS, where data volumes per session are comparatively low, it would be
>> unfortunate if the opposite starts happening on encryption at rest.
>>
>>> Note that in practice, to have any chance at meeting the performance requirement
>>> the cipher needed to be NEON accelerated.  That made benchmarking really hard
>>> and time-consuming, since to definitely know how an algorithm performs it can
>>> take upwards of a week to implement a NEON version.  It needs to be very well
>>> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
>>> performance improvement on some CPUs just by changing the NEON instructions used
>>> to implement the 8-bit rotates, an optimization that is not possible with
>>> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
>>> intentional design choice by the Speck designers; they do know what they're
>>> doing, actually.)
>>>
>>> Thus, we had to be pretty aggressive about dropping algorithms from
>>> consideration if there were preliminary indications that they wouldn't perform
>>> well, or had too little cryptanalysis, or had other issues such as an unclear
>>> patent situation.  Threefish for example I did test the C implementation at
>>> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
>>> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
>>> that it could be improved over 4x with NEON, if at all, so I did not take the
>>> long time it would have taken to write an optimized NEON implementation to
>>> benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
>>>
>> In my limited experience with NEON and 64-bit ARX, there's usually a
>> ~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
>> The extra speedup from encrypting 2 block in parallel is then
>> somewhere between 1x and 2x, depending on various details. Getting
>> near 4x might be feasible, but it is indeed time-consuming to get
>> there.
>>
>>> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
>>> Crowley to explain it properly, but briefly it's actually a pseudorandom
>>> permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
>>> would operate on a whole 512-byte sector, and if any bit of the 512-byte
>>> plaintext is changed, then every bit in the 512-byte ciphertext would change
>>> with 50% probability.  To make this possible, the construction uses a polynomial
>>> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
>>> mode.
>>>
>> Oh, OK, that sounds like something resembling Naor-Reingold or its
>> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
>> be very fast.
>>
>>> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
>>> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
>>> obvious to me how to do so.  Do you have references to any relevant papers?
>>> Remember that we strongly prefer a published cipher to a custom one -- even if
>>> the core is reused, a mistake may be made in the way it is used.  Thus,
>>> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
>>> self-publish a new construction, then use it with no outside crypto review.
>>> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
>>> see the details of how it would actually work.
>>>
>> This would be the 'tweakable Even-Mansour' construction and its
>> variants. The variant I'm most familiar with would be MEM [1],
>> focusing on software friendliness, but there is other provable
>> security work in the same vein, including [3, 4, 5]. It's very similar
>> to how the XEX mode turns a block cipher into a tweakable block
>> cipher.
>>
>> In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
>> ChaCha20's, but everything translates easily from one to the other. We
>> also included cheap masks for 512-bit permutations, just in case.
>>
>> [1] https://eprint.iacr.org/2015/999
>> [2] https://github.com/MEM-AEAD/mem-aead
>> [3] https://eprint.iacr.org/2015/539
>> [4] https://eprint.iacr.org/2015/476
>> [5] https://competitions.cr.yp.to/round2/minalpherv11.pdf
>>
>>> But in the end, Speck seemed like the clear choice because it had multiple NEON
>>> implementations available already which showed it could be implemented very
>>> efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
>>> ciphers) yet the security margin is still similar to AES; it has no intellectual
>>> property concerns; there is a paper clearly explaining the design decisions; it
>>> is naturally resistant to timing attacks; it supports a 128-bit block size, so
>>> it can be easily used in XTS mode; it supports the same key sizes as AES; and it
>>> has a simple and understandable design with no "magic numbers" besides 8 and 3
>>> (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
>>> had a public key embedded in the algorithm).  Also as Paul mentioned he is
>>> confident in the construction, and he has published cryptanalysis on Salsa20, so
>>> his opinion is probably more significant than mine :-)
>>>
>>> But I will definitely take a closer look at SPARX and some of the other ciphers
>>> you mentioned in case I missed something.  I really do appreciate the
>>> suggestions, by the way, and in any case we do need to be very well prepared to
>>> justify our choices.  I just hope that people can understand that we are
>>> implementing real-world crypto which must operate under *very* tight performance
>>> constraints on ARM processors, and it must be compatible with dm-crypt and
>>> fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
>>> at first seem reasonable choices had to (unfortunately) be excluded.
>>>
>> I understand it is a tough choice, and it's unfortunate that many of
>> the algorithms we have cater mostly to either the
>> high-hardware-accelerated-end or the extremely low-end, without a lot
>> of good options at the middle-end.
>>
> First, we're planning a publication which explains our choices in more detail,
> so please treat this as some more preliminary notes.
>
> To make sure we've exhausted as many alternatives as possible, I wrote NEON
> implementations of all the block ciphers you suggested with the exception of
> SKINNY (which looked very hardware-oriented and not efficient in software), as
> well as some that others have suggested.  (It was tough, but after doing a
> couple, it got much easier...)  The following shows the decryption performance
> I'm getting on an ARMv7 platform.  Encryption speeds were usually similar, but
> in our use case we care much more about decryption, as that affects the most
> critical metrics such as the time to launch applications.
>
> 	ChaCha8-MEM: 183256 KB/s
> 	ChaCha12-MEM: 134833 KB/s
> 	Chaskey-LTS-XTS: 99097 KB/s
> 	ChaCha20-MEM: 87875 KB/s
> 	Speck64/128-XTS: 85332 KB/s
> 	Speck128/128-XTS: 73404 KB/s
> 	RC5-128/12/256-XTS: 69887 KB/s
> 	Speck128/256-XTS: 69597 KB/s
> 	RC5-64/12/128-XTS: 69267 KB/s
> 	LEA-128-XTS: 67986 KB/s
> 	CHAM128/128-XTS: 52982 KB/s
> 	LEA-256-XTS: 50429 KB/s
> 	Threefish-256: 48349 KB/s
> 	RC6-XTS: 46855 KB/s
> 	RC5-128/20/256-XTS: 44291 KB/s
> 	RC5-64/20/128-XTS: 43924 KB/s
> 	NOEKEON-XTS: 40705 KB/s
> 	Sparx128/128-XTS: 39191 KB/s
> 	XTEA-XTS: 38239 KB/s
> 	AES-128-XTS: 25549 KB/s
> 	AES-256-XTS: 18640 KB/s
>
> Remember that for dm-crypt or fscrypt over flash storage and/or f2fs, a stream
> cipher is insecure.  Moreover, on these (low-end) devices the status quo is no
> encryption, and we need every bit of performance available.  Anything below
> 50 MB/s is definitely unacceptable.  But even at that speed we get many
> complaints, so in practice we need something faster.  That means that the
> algorithms close to 50 MB/s, such as Threefish, still aren't fast enough.
>
> ChaCha-MEM (based roughly on your paper: https://eprint.iacr.org/2015/999), has
> the best performance, especially if we allow for the 12 or 8-round variants.  My
> code for it is based roughly on the existing
> arch/arm/crypto/chacha20-neon-core.S, but updated to support the inverse
> permutation (on 4 blocks at a time, using all 16 NEON registers) and do the
> masking required by MEM.  However, ChaCha-MEM would be a pretty bleeding-edge
> and customized construction, and Paul Crowley and I have concerns about its
> security.  The problem is that the MEM security proof assumes that the
> underlying permutation has no more detectable structural properties than a
> randomly selected permutation.  However, the ChaCha permutation is known to have
> certain symmetries, e.g. if the sixteen 32-bit words are (a, a, a, a, b, b, b,
> b, c, c, c, c, d, d, d, d), then they always map to some (e, e, e, e, f, f, f,
> f, g, g, g, g, h, h, h, h).
>
> For the MEM mask generation, we can use the "expand 32-byte k" constant to break
> the symmetry, like is done in the ChaCha stream cipher.  However, that's not
> possible for the inner application of the permutation.  So, we'd be using the
> ChaCha permutation in a manner in which it wasn't intended, and the security of
> the ChaCha stream cipher wouldn't directly carry over.  Granted, it's not
> impossible that it would be secure, but at the present time it doesn't seem like
> a good choice to actually field.
>
> Chaskey-LTS is faster than Speck, but unfortunately it's not really a viable
> option because it has only a 64-bit security level, due to its use of the
> Even-Mansour construction with a 128-bit key.  Of course, it would still be
> better than nothing, but we prefer a cipher that has a security level in line
> with what is accepted for modern crypto.
>
> RC5 with the traditional 12 rounds is about as fast as Speck, but there is a
> known differential attack on that number of rounds.  So if we choose RC5 we'd
> almost certainly have to use the 20-round variant, which is much slower.
>
> That leaves LEA-128-XTS as the only other algorithm that might meet the
> performance requirement, as it is only slightly slower than Speck128-XTS.  It
> may be the most viable alternative, but beyond the slight performance loss it
> still has some disadvantages compared to Speck:
>
> - Importantly, the LEA authors forgot to include test vectors, so I'm not yet
>   100% sure I implemented it correctly.  (The Speck authors unfortunately didn't
>   make the endianness of their test vectors clear in their initial publication,
>   but at least they actually provided test vectors!)
> - LEA has received some cryptanalysis, but not nearly as much as Speck.
> - It took some very heavy optimization to get good LEA performance, much more
>   than I had to do for Speck.  My final LEA code has separate code paths for
>   128-bit and 256-bit keys, and has reordered and preprocessed the round keys,
>   and reordered the operations.  As a result, it's harder to see how it maps to
>   the original paper.  In contrast, my Speck code is more straightforward and
>   maintainable.
> - LEA-256 (256-bit key) is much slower than LEA-128 (128-bit key), as it has
>   33% more rounds.  LEA-256 would not be fast enough, so we would have to use
>   LEA-128.  In contrast, with Speck we can use Speck128/256 (256-bit key).
>   We're willing to accept a 128-bit security level, but 256-bit is preferable.
>   (I think the Speck designers took a more informed approach to setting
>   appropriate security margins for a lightweight cipher; it seems that other
>   designers often choose too few or too many rounds, especially as the key
>   length is varied.)
> - LEA encryption is also a bit slower than decryption, while with Speck
>   encryption and decryption are almost exactly the same speed.
>
> Note that like Speck, LEA doesn't appear to be approved by a standards
> organization either; it's just specified in a research paper.
>
> Thus, from a technical perspective, and given the current state of the art in
> lightweight cryptography, currently Speck128-XTS seems to be the best choice for
> the problem domain.  It's unfortunate that there are so few good options and
> that the field is so politicized, but it is what it is.
>
> Still, we don't want to abandon HPolyC (Paul's new ChaCha and Poly1305-based
> wide-block mode), and eventually we hope to offer it as an option as well.  But
> it's not yet published, and it's a more complex algorithm that is harder to
> implement so I haven't yet had a chance to implement and benchmark it.  And we
> don't want to continue to leave users unprotected while we spend a long time
> coming up with the perfect algorithm, or for hardware AES support to arrive to
> all low-end CPUs when it's unclear if/when that will happen.
>
> Again, we're planning a publication which will explain all this in more detail.
>
> Thanks!
>
> Eric



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/5] crypto: Speck support
@ 2018-06-01 19:23   ` Tomer Ashur
  0 siblings, 0 replies; 56+ messages in thread
From: Tomer Ashur @ 2018-06-01 19:23 UTC (permalink / raw)
  To: linux-arm-kernel

[Resending because the email bounced back from all 3 mailing lists.
Sorry if you get this email twice]
Hi Eric et al.,
I know that this thread is already stale, and I'm sorry I couldn't join
earlier but maybe late is better than never. Allow me to first introduce
myself: my name is Tomer Ashur and I'm a post-doctoral fellow in KU
Leuven. I am part of symmetric-key group led by Vincent Rijmen where I'm
mostly involved in cryptanalysis. I am also part of ISO/IEC JTC 1/SC
27/WG 2, the group which decided to reject Simon and Speck from ISO. If
it's okay with you, I'd like to give my perspective on what happened in
ISO and what is Speck's real standing with the academic community.

First, I'd like to say that the NSA has done quite extensive work in
muddying the waters, arguing that Simon & Speck are secure and that all
objections are political. This is not true, as I will now show with
examples. The bottom line is that there are still many open questions
about their security, questions that the NSA has, on multiple occasions,
refused to answer.

> It seems to me justified about as well as one would hope for a new cipher - 
>   "Notes on the design and analysis of Simon and Speck" seems to me to give ... detail on the reasoning
This is actually an optical illusion. First you need to understand the
context for this document. The NSA (in particular, the exact same person
who previously promoted DUAL_EC in ISO) proposed to include Simon &
Speck in ISO/IEC 29192-2 back in 2015. For obvious reasons they were met
with skepticism. A main concern was the lack of any design rationale and
internal cryptanalytic results. The NSA people fought tooth and nail for
a year and a half simultaneously arguing two almost mutually-exclusive
points: (i) they employ the most talented cryptographers and hence, we
should trust them when they say that an algorithm is secure; and (ii)
they are average cryptographers and hence they would not be able to
insert a backdoor into the algorithm.

More than once they argued in a meeting that the cryptanalysis for the
ciphers has been stabilized (i.e., that attacks will not improve) just
to be proved wrong in the next meeting (their answer: "well, _now_ it
has fully stabilized", which was again proven wrong in the next
meeting). One of them even had a bet with Tanja Lange that no attack on
either Simon or Speck would be extended by 3 rounds or more in the
upcoming year. He lost this bet. They were very uncooperative, and made
it a point to let us know that they will not be providing more
information about the algorithms.

So, in this climate, you can imagine how surprised we all were when in
one of the meetings (after not getting the votes they needed in order to
proceed to the next stage) they announced that they will provide a
design rationale. At first they distributed it to us in ISO, but per my
suggestion they then uploaded it to ePrint (see ePrint 2017/560).

But our joy was short-lived. Once you read this so-called design
rationale you can immediately notice two things. Firstly, that they
explain in length all decisions affecting performance (in particular,
rotation amounts - which in one of the meetings they described as
"most-efficient; secure-enough"). The second thing is that when it comes
to cryptanalysis this document is merely a literature review. There is
literally nothing new there - all they do is to cite published works by
academics, something wrongly.

Now, there is no nice way to say that, but this document includes
omissions, falsehoods, half-truths and outright lies. I will not go into
the full analysis of the document, but here are some examples:

 1. Omissions - I already said that this document does not provide any
    new information. This becomes apparent when you try to find out how
    they chose the number of rounds. The document remains quite vague on
    this question. There is a lot of hand waving about "Matsui-like
    techniques", "multipath effect", etc. but nowhere you can find (in
    the old version, they recently uploaded a new version which I didn't
    have time to read yet) a place where they say: "this is how we set
    the number of rounds".

    Another omission is about the key schedule - you won't find any
    useful information about the design decisions leading to these
    particular key schedules. Simon uses 3 matrices U,V, and W which are
    not explained, not does the constant c. Speck's key schedule is more
    straightforward but a discussion about the symmetries that may arise
    from using the round function for the key schedule would still be
    appropriate here. Not discussing the combined security of the cipher
    with its key schedule goes against the current trend in linear
    cryptanalysis (see e.g., [2] and many follow up papers).
 2. Half-truths -? take a look at page 16 where they explain how they
    avoided rotation/slide attacks. They give the standard explanation
    that using round-constants would thwart these attacks. This could
    have been fine if the last sentence wasn't "/Also see [AL16]/". From
    the text it seems as if /AL16/ supports the claims made in this
    paragraph. However, /AL16/ is a paper I co-authored which is how I
    know that not only that it doesn't support the claim, it actually
    shows how to adapt rotational cryptanalysis to algorithms using
    round constants.

    As a side note, the goal of /AL16/ was to present a novel way to use
    rotational cryptanalysis in the presence of round constants. This
    paper was published in FSE'17 and we followed up on it with a paper
    in FSE'18 using this attack against Speck{32,48,64} [1]. The reason
    we focused on these versions and not the larger one is not, as was
    suggested in this thread, that they are somehow more secure. The
    actual reason is much less prosaic: these are the resources we had
    at our disposal. This is also the reason the weak-key classes are so
    small. But the fact that my publicly funded university cannot afford
    a better number cruncher doesn't mean that someone with access to
    such won't be able to find better results. In fact, I am quite
    convinced that if you give our tool the resources it needs, it would
    penetrate way more than the currently best known distinguisher of 19
    rounds for Speck128 (translating to better key recovery attacks).

    What is important to understand here is in the same way you do
    "real-world crypto", academics often do "proofs of concept". After
    publishing the attack technique and the attack on (reduced-)Speck, I
    moved to my next project because the scientific marginal benefit is
    small. There is of course the personal gain of being known as the
    guy who broke Speck, but I'm not particularly interested in such
    fame. All of that being said, if anyone has the firepower to run
    this tool and to improve the existing attacks for Speck128, feel
    free to drop me an email.
 3. Falsehoods - with this word I refer to claims in the so-called
    design rationale that are wrong. We can argue whether they were
    included on purpose or if they are simply mistakes. But in either
    case, they are exist and they are worrisome. I would only give one
    example: "/the design team?s early analytic efforts led us to
    believe that the limiting cryptanalytic features for Simon and
    Speck-type block ciphers would be of the linear and differential
    sort"/ (see Page 4). Believing that differential and linear attacks
    would be the most dangerous attacks is reasonable, but as we can see
    from [1], it is wrong.
 4. Lies - this is the most troubling part. The NSA lies to the public
    (including the American people) on official documents. I already
    wrote that the choice for the exact number of rounds is only
    motivated through some hand waving. This makes it hard to tell what
    the real security margin is. But even if you interpret the hand
    waving conservatively, the math results in much smaller security
    margins than what is claimed. I gave a rump session talk about this
    in Crypto 2017 which you can view here [3]. The talk focuses on
    Simon but the story for Speck is similar and results in security
    margins of 15.6%, 15.6%, and 14.7% for Speck128 with key sizes 128,
    192, and 256, respectively. According to the NSA, that is, and only
    if you accept the claim that attacks have stabilized.

    the choice for the number of rounds was heavily discussed in the ISO
    meeting in Berlin about 6 months ago. When confronted with this
    question, the NSA answered (again) that they will not be providing
    further information, added that anyone with a decent level of
    English would immediately understand what they meant, and called me
    an incompetent cryptographer. Nevertheless, a few months after the
    meeting they updated the so-called design rationale and added a
    footnote that reads:
>     "The original version of this paper said 50% here, but noted that
>     this was ?very conser-
>     vative.? This led to confusion by some, who interpreted 50% as an
>     exact value, rather than
>     the very conservative upper bound we intended it to be. This is
>     supported by the literature
>     (see, e.g., [CW15]) and by our internal analysis. Indeed 50% is a
>     significant overestimate;
>     25% appears to be a more accurate estimate. We apologize for the
>     lack of clarity here, and
>     note that even if future advances increased the 25% to 50% Simon
>     would still be secure." (Page 11)
    This is a fine clarification except that it is an outrageous lie.
    For example, for Simon32 the so-called design rationale reports that
    the best linear trail can penetrate at most 12 rounds. As part of my
    research I found an 18-round linear hull which _was confirmed, in
    writing,_ by the NSA (I should have the email somewhere and can find
    it if anyone is interested). The difference between 12 and 18 rounds
    is indeed 50% and not 25% as they argue in the updated document.

These are only part of the problems I and others found with the
so-called design rationale. Having so many problems in a document meant
to convince people that you're not doing anything sinister is either an
indication for some serious incompetence, or an indication that
something sinister is actually happening. Either way, it is clear that
this document is meant for PR and has no scientific value. It surely
does not inspire confidence in the algorithms.

All of this was known to the people in the room when ISO made its
decision to reject Simon and Speck (after deliberating about this for
more than 3 years. Not because there were disagreements but because we
wanted to give the NSA a fair chance). These people also got a first
hand impression of how poorly the people the NSA sent fare with
_technical_ questions, basically refusing to answer all, and throwing
tantrums instead. And then, the ISO people also saw another thing.
During the discussions I asked the NSA two non-technical questions (from
a crypto point of view. These are technical questions from a
standardization point of view):?
??? - Q: You claim that third party analysis is indicative of the
algorithm's real security. Were you aware of all these results when you
published the algorithms, or are any of them better than what you knew of?
??? - A: I refuse to answer that
??? -Q: Are you aware of any cryptanalytic results better than those
already found by academia?
??? -A: I refuse to answer that either.

Now, there seem to be some notion that the people in ISO are bureaucrats
with limited understanding in cryptography. The truth is that WG 2 (the
cryptography experts) includes people like Kan Yasuda, Shiho Moriai, Dan
Berenstein, Pascal Paillier, Tanja Lange, Orr Dunkelman and Jian Guo
(partial list). You can't say that they don't know what they're doing.
Which is why, having all this information, we decided that including
these algorithms in one of our standards would undermine the trust
people have in ISO and the work it is doing.

Note that in parallel to the Simon and Speck process, people from the
NSA (different from those involved in Simon and Speck) are successfully
promoting at least two other projects. So you can't say that there
really is a significant anti-NSA bias either. No, these algorithms seem
insecure, attacks against them keep improving, their designers either
refuse to answer basic questions about their security or lie... What
other conclusion could we have reached except that there might be a
security problem with these algorithms?

This of course brings us back to the question asked early in this thread:

> support for SM4 was just added too, which is a Chinese government standard. Are you going to send a patch to remove that
> too, or is it just NSA designed algorithms that are not okay?
This seems pretty obvious to me. If you don't feel comfortable with SM4,
don't add it either. There are at least that many reasons to distrust
the Chinese government as there are to distrust the NSA.

However, the answer to the question
> Could you say a little more about what it is that separates Speck from SM4
> for you?
is a bit different. There are two main things that separate Speck from
SM4. Firstly, it seems more secure. This is either because it actually
is more secure, or because the Chinese did a better job in hiding their
backdoors; but at least it doesn't scream "something strange is going on
here!!!". Second, SM4 is also being standardized in ISO these days and
the Chinese are very cooperative with the process. Whatever question you
have about this algorithm, I can get you an answer from the person
promoting SM4. This inspires confidence in the algorithm and the
process. Is this enough? I don't think so. But being a member of ISO I'm
bound by certain rules that don't allow me to reject algorithms based on
my intuition, so it seems that SM4 (as well as LEA and Kuznyechik) would
probably find their way into the respective standards.

That being said, if you ask for my opinion, just don't include SM4.

Which bring us to the million dollar question:
> So, what do you propose replacing it with?
Nothing. I am usually not one to argue for maintaining the status quo
and I sure am in favor of encryption-for-all but this case is the text
book example for employing the Precautionary Principle. You yourself are
not fully convinced that Speck is secure and does not contain any
backdoors. If it was really secure, it could have been used in all cases
and not only on low-end devices where AES is too slow. AES is slower
than Speck on most platforms.

Now, I'm a sort of a mathematician which doesn't know much about
processor generations and implementation efficiency. Things like 134833
KB/s are Chinese to me. But the way I understand it, these devices that
are to weak to support AES would not be around in 2-5 years which would
make the problem go away. In the foreseeable future, even if the
crypto-extension isn't added to low-end processors, they would still
improve to a degree they can run some of the efficient-but-not-enough
algorithms of today, no?

I would also like to point out that including an algorithm because "it's
better than nothing" result in something that is not
better-than-nothing, but stands in the way of good solutions. Since
there is no acute problem, why do we need to solve it? This is from the
cryptographers' point of view. From the end-user point of view when they
get something bundled into Android, they don't know that it was included
there as something that is "better than nothing". They think of it as
"good enough; endorsed by Android/Google/Linux". What you give them is a
false sense of security because they don't know of all the question
marks surrounding Speck (both technical and political).

So I think that as a first step, no-encryption is better than using
Speck. Then we can move for a longer term solution. Since this is an
important enough issue I asked around and people are happily willing to
help. For example, Dan Berenstein seems to believe that a solution can
be built using a generic construction along the lines of your discussion
with Samuel (with or without a variant of ChaCha). Even if a generic
construction cannot be used Berenstein told me he's willing to help
design a solution. I also asked Vincent Rijmen and Orr Dunkelman and
they both told me they'd be willing to work in a team to find (or
design) a solution. This is already an impressive cadre and I'm sure it
would not be too much of a problem to solicit other notable
cryptographer because basically, no one in this community thinks it's a
good idea to use Speck.

Sorry for the long post and Shabbat Shalom,

Tomer Ashur, PhD
Senior Researcher
COSIC, KU Leuven

[1] https://eprint.iacr.org/2017/1036
[2] https://eprint.iacr.org/2012/303
[3] https://www.youtube.com/watch?v=3d-xruyR89g&t=2s




On 05/08/2018 01:20 AM, Eric Biggers wrote:
> Hi Samuel,
>
> On Thu, Apr 26, 2018 at 03:05:44AM +0100, Samuel Neves wrote:
>> On Wed, Apr 25, 2018 at 8:49 PM, Eric Biggers <ebiggers@google.com> wrote:
>>> I agree that my explanation should have been better, and should have considered
>>> more crypto algorithms.  The main difficulty is that we have extreme performance
>>> requirements -- it needs to be 50 MB/s at the very least on even low-end ARM
>>> devices like smartwatches.  And even with the NEON-accelerated Speck128-XTS
>>> performance exceeding that after much optimization, we've been getting a lot of
>>> pushback as people want closer to 100 MB/s.
>>>
>> I couldn't find any NEON-capable ARMv7 chip below 800 MHz, so this
>> would put the performance upper bound around 15 cycles per byte, with
>> the comfortable number being ~7. That's indeed tough, though not
>> impossible.
>>
>>> That's why I also included Speck64-XTS in the patches, since it was
>>> straightforward to include, and some devices may really need that last 20-30% of
>>> performance for encryption to be feasible at all.  (And when the choice is
>>> between unencrypted and a 64-bit block cipher, used in a context where the
>>> weakest points in the cryptosystem are actually elsewhere such as the user's
>>> low-entropy PIN and the flash storage doing wear-leveling, I'd certainly take
>>> the 64-bit block cipher.)  So far we haven't had to use Speck64 though, and if
>>> that continues to be the case I'd be fine with Speck64 being removed, leaving
>>> just Speck128.
>>>
>> I would very much prefer that to be the case. As many of us know,
>> "it's better than nothing" has been often used to justify other bad
>> choices, like RC4, that end up preventing better ones from being
>> adopted. At a time where we're trying to get rid of 64-bit ciphers in
>> TLS, where data volumes per session are comparatively low, it would be
>> unfortunate if the opposite starts happening on encryption at rest.
>>
>>> Note that in practice, to have any chance at meeting the performance requirement
>>> the cipher needed to be NEON accelerated.  That made benchmarking really hard
>>> and time-consuming, since to definitely know how an algorithm performs it can
>>> take upwards of a week to implement a NEON version.  It needs to be very well
>>> optimized too, to compare the algorithms fairly -- e.g. with Speck I got a 20%
>>> performance improvement on some CPUs just by changing the NEON instructions used
>>> to implement the 8-bit rotates, an optimization that is not possible with
>>> ciphers that don't use rotate amounts that are multiples of 8.  (This was an
>>> intentional design choice by the Speck designers; they do know what they're
>>> doing, actually.)
>>>
>>> Thus, we had to be pretty aggressive about dropping algorithms from
>>> consideration if there were preliminary indications that they wouldn't perform
>>> well, or had too little cryptanalysis, or had other issues such as an unclear
>>> patent situation.  Threefish for example I did test the C implementation at
>>> https://github.com/wernerd/Skein3Fish, but on ARM32 it was over 4 times slower
>>> than my NEON implementation of Speck128/256-XTS.  And I did not see a clear way
>>> that it could be improved over 4x with NEON, if at all, so I did not take the
>>> long time it would have taken to write an optimized NEON implementation to
>>> benchmark it properly.  Perhaps that was a mistake.  But, time is not unlimited.
>>>
>> In my limited experience with NEON and 64-bit ARX, there's usually a
>> ~2x speedup solely from NEON's native 64-bit operations on ARMv7-A.
>> The extra speedup from encrypting 2 block in parallel is then
>> somewhere between 1x and 2x, depending on various details. Getting
>> near 4x might be feasible, but it is indeed time-consuming to get
>> there.
>>
>>> As for the wide-block mode using ChaCha20 and Poly1305, you'd have to ask Paul
>>> Crowley to explain it properly, but briefly it's actually a pseudorandom
>>> permutation over an arbitrarily-sized message.  So with dm-crypt for example, it
>>> would operate on a whole 512-byte sector, and if any bit of the 512-byte
>>> plaintext is changed, then every bit in the 512-byte ciphertext would change
>>> with 50% probability.  To make this possible, the construction uses a polynomial
>>> evalution in GF(2^130-5) as a universal hash function, similar to the Poly1305
>>> mode.
>>>
>> Oh, OK, that sounds like something resembling Naor-Reingold or its
>> relatives. That would work, but with 3 or 4 passes I guess it wouldn't
>> be very fast.
>>
>>> Using ChaCha20's underlying 512-bit permutation to build a tweakable block
>>> cipher is an interesting idea.  But maybe in my crypto-naivety, it is not
>>> obvious to me how to do so.  Do you have references to any relevant papers?
>>> Remember that we strongly prefer a published cipher to a custom one -- even if
>>> the core is reused, a mistake may be made in the way it is used.  Thus,
>>> similarly to Paul's wide-block mode, I'd be concerned that we'd have to
>>> self-publish a new construction, then use it with no outside crypto review.
>>> *Maybe* it would be straightforward enough to be okay, but to know I'd need to
>>> see the details of how it would actually work.
>>>
>> This would be the 'tweakable Even-Mansour' construction and its
>> variants. The variant I'm most familiar with would be MEM [1],
>> focusing on software friendliness, but there is other provable
>> security work in the same vein, including [3, 4, 5]. It's very similar
>> to how the XEX mode turns a block cipher into a tweakable block
>> cipher.
>>
>> In [1, 2] we used a 1024-bit permutation out of BLAKE2 instead of
>> ChaCha20's, but everything translates easily from one to the other. We
>> also included cheap masks for 512-bit permutations, just in case.
>>
>> [1] https://eprint.iacr.org/2015/999
>> [2] https://github.com/MEM-AEAD/mem-aead
>> [3] https://eprint.iacr.org/2015/539
>> [4] https://eprint.iacr.org/2015/476
>> [5] https://competitions.cr.yp.to/round2/minalpherv11.pdf
>>
>>> But in the end, Speck seemed like the clear choice because it had multiple NEON
>>> implementations available already which showed it could be implemented very
>>> efficiently in NEON; it has over 70 cryptanalysis papers (far more than most
>>> ciphers) yet the security margin is still similar to AES; it has no intellectual
>>> property concerns; there is a paper clearly explaining the design decisions; it
>>> is naturally resistant to timing attacks; it supports a 128-bit block size, so
>>> it can be easily used in XTS mode; it supports the same key sizes as AES; and it
>>> has a simple and understandable design with no "magic numbers" besides 8 and 3
>>> (compare to an actual backdoored algorithm like Dual_EC_DRGB, which basically
>>> had a public key embedded in the algorithm).  Also as Paul mentioned he is
>>> confident in the construction, and he has published cryptanalysis on Salsa20, so
>>> his opinion is probably more significant than mine :-)
>>>
>>> But I will definitely take a closer look at SPARX and some of the other ciphers
>>> you mentioned in case I missed something.  I really do appreciate the
>>> suggestions, by the way, and in any case we do need to be very well prepared to
>>> justify our choices.  I just hope that people can understand that we are
>>> implementing real-world crypto which must operate under *very* tight performance
>>> constraints on ARM processors, and it must be compatible with dm-crypt and
>>> fscrypt with no room for ciphertext expansion.  Thus, many algorithms which may
>>> at first seem reasonable choices had to (unfortunately) be excluded.
>>>
>> I understand it is a tough choice, and it's unfortunate that many of
>> the algorithms we have cater mostly to either the
>> high-hardware-accelerated-end or the extremely low-end, without a lot
>> of good options at the middle-end.
>>
> First, we're planning a publication which explains our choices in more detail,
> so please treat this as some more preliminary notes.
>
> To make sure we've exhausted as many alternatives as possible, I wrote NEON
> implementations of all the block ciphers you suggested with the exception of
> SKINNY (which looked very hardware-oriented and not efficient in software), as
> well as some that others have suggested.  (It was tough, but after doing a
> couple, it got much easier...)  The following shows the decryption performance
> I'm getting on an ARMv7 platform.  Encryption speeds were usually similar, but
> in our use case we care much more about decryption, as that affects the most
> critical metrics such as the time to launch applications.
>
> 	ChaCha8-MEM: 183256 KB/s
> 	ChaCha12-MEM: 134833 KB/s
> 	Chaskey-LTS-XTS: 99097 KB/s
> 	ChaCha20-MEM: 87875 KB/s
> 	Speck64/128-XTS: 85332 KB/s
> 	Speck128/128-XTS: 73404 KB/s
> 	RC5-128/12/256-XTS: 69887 KB/s
> 	Speck128/256-XTS: 69597 KB/s
> 	RC5-64/12/128-XTS: 69267 KB/s
> 	LEA-128-XTS: 67986 KB/s
> 	CHAM128/128-XTS: 52982 KB/s
> 	LEA-256-XTS: 50429 KB/s
> 	Threefish-256: 48349 KB/s
> 	RC6-XTS: 46855 KB/s
> 	RC5-128/20/256-XTS: 44291 KB/s
> 	RC5-64/20/128-XTS: 43924 KB/s
> 	NOEKEON-XTS: 40705 KB/s
> 	Sparx128/128-XTS: 39191 KB/s
> 	XTEA-XTS: 38239 KB/s
> 	AES-128-XTS: 25549 KB/s
> 	AES-256-XTS: 18640 KB/s
>
> Remember that for dm-crypt or fscrypt over flash storage and/or f2fs, a stream
> cipher is insecure.  Moreover, on these (low-end) devices the status quo is no
> encryption, and we need every bit of performance available.  Anything below
> 50 MB/s is definitely unacceptable.  But even at that speed we get many
> complaints, so in practice we need something faster.  That means that the
> algorithms close to 50 MB/s, such as Threefish, still aren't fast enough.
>
> ChaCha-MEM (based roughly on your paper: https://eprint.iacr.org/2015/999), has
> the best performance, especially if we allow for the 12 or 8-round variants.  My
> code for it is based roughly on the existing
> arch/arm/crypto/chacha20-neon-core.S, but updated to support the inverse
> permutation (on 4 blocks at a time, using all 16 NEON registers) and do the
> masking required by MEM.  However, ChaCha-MEM would be a pretty bleeding-edge
> and customized construction, and Paul Crowley and I have concerns about its
> security.  The problem is that the MEM security proof assumes that the
> underlying permutation has no more detectable structural properties than a
> randomly selected permutation.  However, the ChaCha permutation is known to have
> certain symmetries, e.g. if the sixteen 32-bit words are (a, a, a, a, b, b, b,
> b, c, c, c, c, d, d, d, d), then they always map to some (e, e, e, e, f, f, f,
> f, g, g, g, g, h, h, h, h).
>
> For the MEM mask generation, we can use the "expand 32-byte k" constant to break
> the symmetry, like is done in the ChaCha stream cipher.  However, that's not
> possible for the inner application of the permutation.  So, we'd be using the
> ChaCha permutation in a manner in which it wasn't intended, and the security of
> the ChaCha stream cipher wouldn't directly carry over.  Granted, it's not
> impossible that it would be secure, but at the present time it doesn't seem like
> a good choice to actually field.
>
> Chaskey-LTS is faster than Speck, but unfortunately it's not really a viable
> option because it has only a 64-bit security level, due to its use of the
> Even-Mansour construction with a 128-bit key.  Of course, it would still be
> better than nothing, but we prefer a cipher that has a security level in line
> with what is accepted for modern crypto.
>
> RC5 with the traditional 12 rounds is about as fast as Speck, but there is a
> known differential attack on that number of rounds.  So if we choose RC5 we'd
> almost certainly have to use the 20-round variant, which is much slower.
>
> That leaves LEA-128-XTS as the only other algorithm that might meet the
> performance requirement, as it is only slightly slower than Speck128-XTS.  It
> may be the most viable alternative, but beyond the slight performance loss it
> still has some disadvantages compared to Speck:
>
> - Importantly, the LEA authors forgot to include test vectors, so I'm not yet
>   100% sure I implemented it correctly.  (The Speck authors unfortunately didn't
>   make the endianness of their test vectors clear in their initial publication,
>   but at least they actually provided test vectors!)
> - LEA has received some cryptanalysis, but not nearly as much as Speck.
> - It took some very heavy optimization to get good LEA performance, much more
>   than I had to do for Speck.  My final LEA code has separate code paths for
>   128-bit and 256-bit keys, and has reordered and preprocessed the round keys,
>   and reordered the operations.  As a result, it's harder to see how it maps to
>   the original paper.  In contrast, my Speck code is more straightforward and
>   maintainable.
> - LEA-256 (256-bit key) is much slower than LEA-128 (128-bit key), as it has
>   33% more rounds.  LEA-256 would not be fast enough, so we would have to use
>   LEA-128.  In contrast, with Speck we can use Speck128/256 (256-bit key).
>   We're willing to accept a 128-bit security level, but 256-bit is preferable.
>   (I think the Speck designers took a more informed approach to setting
>   appropriate security margins for a lightweight cipher; it seems that other
>   designers often choose too few or too many rounds, especially as the key
>   length is varied.)
> - LEA encryption is also a bit slower than decryption, while with Speck
>   encryption and decryption are almost exactly the same speed.
>
> Note that like Speck, LEA doesn't appear to be approved by a standards
> organization either; it's just specified in a research paper.
>
> Thus, from a technical perspective, and given the current state of the art in
> lightweight cryptography, currently Speck128-XTS seems to be the best choice for
> the problem domain.  It's unfortunate that there are so few good options and
> that the field is so politicized, but it is what it is.
>
> Still, we don't want to abandon HPolyC (Paul's new ChaCha and Poly1305-based
> wide-block mode), and eventually we hope to offer it as an option as well.  But
> it's not yet published, and it's a more complex algorithm that is harder to
> implement so I haven't yet had a chance to implement and benchmark it.  And we
> don't want to continue to leave users unprotected while we spend a long time
> coming up with the perfect algorithm, or for hardware AES support to arrive to
> all low-end CPUs when it's unclear if/when that will happen.
>
> Again, we're planning a publication which will explain all this in more detail.
>
> Thanks!
>
> Eric


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180601/c1da80f3/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-06-01 19:23 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 23:52 [PATCH v2 0/5] crypto: Speck support Eric Biggers
2018-02-12 23:52 ` Eric Biggers
2018-02-12 23:52 ` [PATCH v2 1/5] crypto: add support for the Speck block cipher Eric Biggers
2018-02-12 23:52   ` Eric Biggers
2018-02-12 23:52 ` [PATCH v2 2/5] crypto: speck - export common helpers Eric Biggers
2018-02-12 23:52   ` Eric Biggers
2018-02-12 23:52 ` [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS Eric Biggers
2018-02-12 23:52   ` Eric Biggers
2018-02-13 11:34   ` Ard Biesheuvel
2018-02-13 11:34     ` Ard Biesheuvel
2018-02-13 18:57     ` Eric Biggers
2018-02-13 18:57       ` Eric Biggers
2018-02-13 19:04       ` Ard Biesheuvel
2018-02-13 19:04         ` Ard Biesheuvel
2018-02-13 19:04         ` Ard Biesheuvel
2018-02-12 23:52 ` [PATCH v2 4/5] crypto: speck - add test vectors for Speck128-XTS Eric Biggers
2018-02-12 23:52   ` Eric Biggers
2018-02-12 23:52 ` [PATCH v2 5/5] crypto: speck - add test vectors for Speck64-XTS Eric Biggers
2018-02-12 23:52   ` Eric Biggers
2018-04-24 16:11 ` [PATCH v2 0/5] crypto: Speck support Jason A. Donenfeld
2018-04-24 16:11   ` Jason A. Donenfeld
2018-04-24 16:11   ` Jason A. Donenfeld
2018-04-24 18:16   ` Eric Biggers
2018-04-24 18:16     ` Eric Biggers
2018-04-24 18:16     ` Eric Biggers
2018-04-24 20:58     ` Jason A. Donenfeld
2018-04-24 20:58       ` Jason A. Donenfeld
2018-04-24 20:58       ` Jason A. Donenfeld
2018-04-24 21:58       ` Paul Crowley
2018-04-24 21:58         ` Paul Crowley
2018-04-24 21:58         ` Paul Crowley
2018-04-24 22:47       ` Eric Biggers
2018-04-24 22:47         ` Eric Biggers
2018-04-24 22:47         ` Eric Biggers
2018-04-25 14:33         ` Samuel Neves
2018-04-25 14:33           ` Samuel Neves
2018-04-25 14:33           ` Samuel Neves
2018-04-25 19:49           ` Eric Biggers
2018-04-25 19:49             ` Eric Biggers
2018-04-25 19:49             ` Eric Biggers
2018-04-26  2:05             ` Samuel Neves
2018-04-26  2:05               ` Samuel Neves
2018-04-26  2:05               ` Samuel Neves
2018-04-26 16:30               ` Paul Crowley
2018-04-26 16:30                 ` Paul Crowley
2018-04-26 16:30                 ` Paul Crowley
2018-05-07 23:20               ` Eric Biggers
2018-05-07 23:20                 ` Eric Biggers
2018-05-07 23:20                 ` Eric Biggers
2018-04-25  5:30       ` Theodore Y. Ts'o
2018-04-24 22:43   ` Jeffrey Walton
2018-04-24 22:43     ` Jeffrey Walton
2018-04-24 22:43     ` Jeffrey Walton
     [not found] <8c9dc804-1f59-a245-57ba-51db3c234621@esat.kuleuven.be>
2018-06-01 19:23 ` Tomer Ashur
2018-06-01 19:23   ` Tomer Ashur
2018-06-01 19:23   ` Tomer Ashur

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.