* [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel
@ 2018-09-25 14:55 Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 01/23] asm: simd context helper API Jason A. Donenfeld
` (23 more replies)
0 siblings, 24 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:55 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh; +Cc: Jason A. Donenfeld
Changes v5->v6, along with who suggested it.
--------------------------------------------
- [Eric Biggers] Cleaner and more efficient blake2s_final function.
- [Eric Biggers] Remove modulo trick that was working around a gcc 8.1 bug.
- [Eric Biggers] Use crypto_xor_cpy to avoid a memmove in chacha20.
- In the unlikely condition that we transition from SIMD back to scalar
instructions for Poly1305, it's important that we also convert back
from base 2^26 to 2^64. This is super rare and weird, and it's hard
to imagine somebody actually doing this for whatever bad reason, but
it is a possibility, so we shouldn't calculate things wrong in that
instance.
- [Andrew Lunn] Change BUG_ON to WARN_ON in choice places.
- [Thomas Gleixner] Explicitly dual license files under X where X≠GPL2
to be under X && GPL2.
- [Andy Lutomirski] Use separate symbols for selftests that are of
variable length, so that we neither waste space on disk nor waste space
in memory (via __initconst).
- [Thomas Gleixner] Make SPDX headers a standalone comment, and use the
// commenting style in .h files, per the kernel style guide.
- [Paul Burton] Allow assembler to fill branch delay slots on MIPS32r2,
so that the implementation is more future-proof.
- [Eric Biggers] Use Eric's scalar implementation for ChaCha20 ARM, which
performs very well for Cortex-A7,5 and ARMv6, while using Andy
Polyakov's NEON implementation for other ARMv7.
- [Ard Biesheuvel] Reduce optimization from -O3 to -O2.
- [Arn?d Bi?e(rgmann|shuvel)] Make sure stack frames stay inside 1024
bytes on 32-bit and 1280 bytes on 64-bit.
- [Ard Biesheuvel] CRYPTOGAMS-related commit messages now have the
corresponding OpenSSL commit written in them.
- [Eric Biggers] Keep HChaCha20 key in native endian, since in some
cases it can trivially be passed into the ChaCha20 block later.
- Make constants follow a consistent naming scheme.
- Workaround gcc 8 stack mis-optimization on m68k.
- [Arnd Bergmann] Workaround gcc 8 stack mis-optimization with KASAN.
Many of the above ARM changes are a result of discussions between Ard,
Eric, AndyL, AndyP, Samuel, and me. These discussions are ongoing, and so
it's possible we'll do another revision with further ARM tweaks. But perhaps
this one will be sufficient for merging now, and we can continue to refine
later in the cycle.
-----------------------------------------------------------
This patchset is available on git.kernel.org in this branch, where it may be
pulled directly for inclusion into net-next:
* https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=jd/wireguard
-----------------------------------------------------------
WireGuard is a secure network tunnel written especially for Linux, which
has faced around three years of serious development, deployment, and
scrutiny. It delivers excellent performance and is extremely easy to
use and configure. It has been designed with the primary goal of being
both easy to audit by virtue of being small and highly secure from a
cryptography and systems security perspective. WireGuard is used by some
massive companies pushing enormous amounts of traffic, and likely
already today you've consumed bytes that at some point transited through
a WireGuard tunnel. Even as an out-of-tree module, WireGuard has been
integrated into various userspace tools, Linux distributions, mobile
phones, and data centers. There are ports in several languages to
several operating systems, and even commercial hardware and services
sold integrating WireGuard. It is time, therefore, for WireGuard to be
properly integrated into Linux.
Ample information, including documentation, installation instructions,
and project details, is available at:
* https://www.wireguard.com/
* https://www.wireguard.com/papers/wireguard.pdf
As it is currently an out-of-tree module, it lives in its own git repo
and has its own mailing list, and every commit for the module is tested
against every stable kernel since 3.10 on a variety of architectures
using an extensive test suite:
* https://git.zx2c4.com/WireGuard
https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/WireGuard.git/
* https://lists.zx2c4.com/mailman/listinfo/wireguard
* https://www.wireguard.com/build-status/
The project has been broadly discussed at conferences, and was presented
to the Netdev developers in Seoul last November, where a paper was
released detailing some interesting aspects of the project. Dave asked
me after the talk if I would consider sending in a v1 "sooner rather
than later", hence this patchset. A decision is still waiting from the
Linux Plumbers Conference, but an update on these topics may be presented
in Vancouver in a few months. Prior presentations:
* https://www.wireguard.com/presentations/
* https://www.wireguard.com/papers/wireguard-netdev22.pdf
The cryptography in the protocol itself has been formally verified by
several independent academic teams with positive results, and I know of
two additional efforts on their way to further corroborate those
findings. The version 1 protocol is "complete", and so the purpose of
this review is to assess the implementation of the protocol. However, it
still may be of interest to know that the thing you're reviewing uses a
protocol with various nice security properties:
* https://www.wireguard.com/formal-verification/
This patchset is divided into four segments. The first introduces a very
simple helper for working with the FPU state for the purposes of amortizing
SIMD operations. The second segment is a small collection of cryptographic
primitives, split up into several commits by primitive and by hardware. The
third shows usage of Zinc within the existing crypto API and as a replacement
to the existing crypto API. The last is WireGuard itself, presented as an
unintrusive and self-contained virtual network driver.
It is intended that this entire patch series enter the kernel through
DaveM's net-next tree. Subsequently, WireGuard patches will go through
DaveM's net-next tree, while Zinc patches will go through Greg KH's tree.
Enjoy,
Jason
^ permalink raw reply [flat|nested] 213+ messages in thread
* [PATCH net-next v6 01/23] asm: simd context helper API
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-28 8:28 ` Ard Biesheuvel
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
` (22 subsequent siblings)
23 siblings, 1 reply; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Thomas Gleixner, linux-arch
Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related
FPU/SIMD functions over a number of calls, because FPU restoration is
quite expensive. This adds a simple header for carrying out this pattern:
simd_context_t simd_context;
simd_get(&simd_context);
while ((item = get_item_from_queue()) != NULL) {
encrypt_item(item, simd_context);
simd_relax(&simd_context);
}
simd_put(&simd_context);
The relaxation step ensures that we don't trample over preemption, and
the get/put API should be a familiar paradigm in the kernel.
On the other end, code that actually wants to use SIMD instructions can
accept this as a parameter and check it via:
void encrypt_item(struct item *item, simd_context_t *simd_context)
{
if (item->len > LARGE_FOR_SIMD && simd_use(simd_context))
wild_simd_code(item);
else
boring_scalar_code(item);
}
The actual XSAVE happens during simd_use (and only on the first time),
so that if the context is never actually used, no performance penalty is
hit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: linux-arch@vger.kernel.org
---
arch/alpha/include/asm/Kbuild | 5 ++-
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/simd.h | 63 ++++++++++++++++++++++++++++++
arch/arm64/include/asm/simd.h | 51 +++++++++++++++++++++---
arch/c6x/include/asm/Kbuild | 3 +-
arch/h8300/include/asm/Kbuild | 3 +-
arch/hexagon/include/asm/Kbuild | 1 +
arch/ia64/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nds32/include/asm/Kbuild | 7 ++--
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 7 ++--
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 3 +-
arch/riscv/include/asm/Kbuild | 3 +-
arch/s390/include/asm/Kbuild | 3 +-
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 3 +-
arch/unicore32/include/asm/Kbuild | 1 +
arch/x86/include/asm/simd.h | 44 ++++++++++++++++++++-
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/simd.h | 20 ++++++++++
include/linux/simd.h | 28 +++++++++++++
26 files changed, 234 insertions(+), 21 deletions(-)
create mode 100644 arch/arm/include/asm/simd.h
create mode 100644 include/linux/simd.h
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 0580cb8c84b2..07b2c1025d34 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -2,14 +2,15 @@
generic-y += compat.h
+generic-y += current.h
generic-y += exec.h
generic-y += export.h
generic-y += fb.h
generic-y += irq_work.h
+generic-y += kprobes.h
generic-y += mcs_spinlock.h
generic-y += mm-arch-hooks.h
generic-y += preempt.h
generic-y += sections.h
+generic-y += simd.h
generic-y += trace_clock.h
-generic-y += current.h
-generic-y += kprobes.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index feed50ce89fa..a7f4255f1649 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += parport.h
generic-y += pci.h
generic-y += percpu.h
generic-y += preempt.h
+generic-y += simd.h
generic-y += topology.h
generic-y += trace_clock.h
generic-y += user.h
diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h
new file mode 100644
index 000000000000..263950dd69cb
--- /dev/null
+++ b/arch/arm/include/asm/simd.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/simd.h>
+#ifndef _ASM_SIMD_H
+#define _ASM_SIMD_H
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+#include <asm/neon.h>
+
+static __must_check inline bool may_use_simd(void)
+{
+ return !in_interrupt();
+}
+
+static inline void simd_get(simd_context_t *ctx)
+{
+ *ctx = may_use_simd() ? HAVE_FULL_SIMD : HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t *ctx)
+{
+ if (*ctx & HAVE_SIMD_IN_USE)
+ kernel_neon_end();
+ *ctx = HAVE_NO_SIMD;
+}
+
+static __must_check inline bool simd_use(simd_context_t *ctx)
+{
+ if (!(*ctx & HAVE_FULL_SIMD))
+ return false;
+ if (*ctx & HAVE_SIMD_IN_USE)
+ return true;
+ kernel_neon_begin();
+ *ctx |= HAVE_SIMD_IN_USE;
+ return true;
+}
+
+#else
+
+static __must_check inline bool may_use_simd(void)
+{
+ return false;
+}
+
+static inline void simd_get(simd_context_t *ctx)
+{
+ *ctx = HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t *ctx)
+{
+}
+
+static __must_check inline bool simd_use(simd_context_t *ctx)
+{
+ return false;
+}
+#endif
+
+#endif /* _ASM_SIMD_H */
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 6495cc51246f..a45ff1600040 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -1,11 +1,10 @@
-/*
- * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+/* SPDX-License-Identifier: GPL-2.0
*
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License version 2 as published
- * by the Free Software Foundation.
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
*/
+#include <linux/simd.h>
#ifndef __ASM_SIMD_H
#define __ASM_SIMD_H
@@ -16,6 +15,8 @@
#include <linux/types.h>
#ifdef CONFIG_KERNEL_MODE_NEON
+#include <asm/neon.h>
+#include <asm/simd.h>
DECLARE_PER_CPU(bool, kernel_neon_busy);
@@ -40,9 +41,47 @@ static __must_check inline bool may_use_simd(void)
!this_cpu_read(kernel_neon_busy);
}
+static inline void simd_get(simd_context_t *ctx)
+{
+ *ctx = may_use_simd() ? HAVE_FULL_SIMD : HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t *ctx)
+{
+ if (*ctx & HAVE_SIMD_IN_USE)
+ kernel_neon_end();
+ *ctx = HAVE_NO_SIMD;
+}
+
+static __must_check inline bool simd_use(simd_context_t *ctx)
+{
+ if (!(*ctx & HAVE_FULL_SIMD))
+ return false;
+ if (*ctx & HAVE_SIMD_IN_USE)
+ return true;
+ kernel_neon_begin();
+ *ctx |= HAVE_SIMD_IN_USE;
+ return true;
+}
+
#else /* ! CONFIG_KERNEL_MODE_NEON */
-static __must_check inline bool may_use_simd(void) {
+static __must_check inline bool may_use_simd(void)
+{
+ return false;
+}
+
+static inline void simd_get(simd_context_t *ctx)
+{
+ *ctx = HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t *ctx)
+{
+}
+
+static __must_check inline bool simd_use(simd_context_t *ctx)
+{
return false;
}
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index 33a2c94fed0d..22f3d8333c74 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -5,8 +5,8 @@ generic-y += compat.h
generic-y += current.h
generic-y += device.h
generic-y += div64.h
-generic-y += dma.h
generic-y += dma-mapping.h
+generic-y += dma.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
@@ -30,6 +30,7 @@ generic-y += pgalloc.h
generic-y += preempt.h
generic-y += segment.h
generic-y += serial.h
+generic-y += simd.h
generic-y += tlbflush.h
generic-y += topology.h
generic-y += trace_clock.h
diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild
index a5d0b2991f47..f5c2f12d593e 100644
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -8,8 +8,8 @@ generic-y += current.h
generic-y += delay.h
generic-y += device.h
generic-y += div64.h
-generic-y += dma.h
generic-y += dma-mapping.h
+generic-y += dma.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
@@ -39,6 +39,7 @@ generic-y += preempt.h
generic-y += scatterlist.h
generic-y += sections.h
generic-y += serial.h
+generic-y += simd.h
generic-y += sizes.h
generic-y += spinlock.h
generic-y += timex.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index dd2fd9c0d292..217d4695fd8a 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -29,6 +29,7 @@ generic-y += rwsem.h
generic-y += sections.h
generic-y += segment.h
generic-y += serial.h
+generic-y += simd.h
generic-y += sizes.h
generic-y += topology.h
generic-y += trace_clock.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 557bbc8ba9f5..41c5ebdf79e5 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += mm-arch-hooks.h
generic-y += preempt.h
+generic-y += simd.h
generic-y += trace_clock.h
generic-y += vtime.h
generic-y += word-at-a-time.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index a4b8d3331a9e..73898dd1a4d0 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -19,6 +19,7 @@ generic-y += mm-arch-hooks.h
generic-y += percpu.h
generic-y += preempt.h
generic-y += sections.h
+generic-y += simd.h
generic-y += spinlock.h
generic-y += topology.h
generic-y += trace_clock.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 569ba9e670c1..7a877eea99d3 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -25,6 +25,7 @@ generic-y += parport.h
generic-y += percpu.h
generic-y += preempt.h
generic-y += serial.h
+generic-y += simd.h
generic-y += syscalls.h
generic-y += topology.h
generic-y += trace_clock.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 58351e48421e..e8868e0fb2c3 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += sections.h
generic-y += segment.h
+generic-y += simd.h
generic-y += trace_clock.h
generic-y += unaligned.h
generic-y += user.h
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index dbc4e5422550..603c1d020620 100644
--- a/arch/nds32/include/asm/Kbuild
+++ b/arch/nds32/include/asm/Kbuild
@@ -7,14 +7,14 @@ generic-y += bug.h
generic-y += bugs.h
generic-y += checksum.h
generic-y += clkdev.h
-generic-y += cmpxchg.h
generic-y += cmpxchg-local.h
+generic-y += cmpxchg.h
generic-y += compat.h
generic-y += cputime.h
generic-y += device.h
generic-y += div64.h
-generic-y += dma.h
generic-y += dma-mapping.h
+generic-y += dma.h
generic-y += emergency-restart.h
generic-y += errno.h
generic-y += exec.h
@@ -46,14 +46,15 @@ generic-y += sections.h
generic-y += segment.h
generic-y += serial.h
generic-y += shmbuf.h
+generic-y += simd.h
generic-y += sizes.h
generic-y += stat.h
generic-y += switch_to.h
generic-y += timex.h
generic-y += topology.h
generic-y += trace_clock.h
-generic-y += xor.h
generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += word-at-a-time.h
+generic-y += xor.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 8fde4fa2c34f..571a9d9ad107 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -33,6 +33,7 @@ generic-y += preempt.h
generic-y += sections.h
generic-y += segment.h
generic-y += serial.h
+generic-y += simd.h
generic-y += spinlock.h
generic-y += topology.h
generic-y += trace_clock.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index eb87cd8327c8..5e9f2f4c4d39 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -28,12 +28,13 @@ generic-y += module.h
generic-y += pci.h
generic-y += percpu.h
generic-y += preempt.h
-generic-y += qspinlock_types.h
-generic-y += qspinlock.h
-generic-y += qrwlock_types.h
generic-y += qrwlock.h
+generic-y += qrwlock_types.h
+generic-y += qspinlock.h
+generic-y += qspinlock_types.h
generic-y += sections.h
generic-y += segment.h
+generic-y += simd.h
generic-y += string.h
generic-y += switch_to.h
generic-y += topology.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 2013d639e735..97970b4d05ab 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += percpu.h
generic-y += preempt.h
generic-y += seccomp.h
generic-y += segment.h
+generic-y += simd.h
generic-y += topology.h
generic-y += trace_clock.h
generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 3196d227e351..64290f48e733 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -4,7 +4,8 @@ generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += local64.h
generic-y += mcs_spinlock.h
+generic-y += msi.h
generic-y += preempt.h
generic-y += rwsem.h
+generic-y += simd.h
generic-y += vtime.h
-generic-y += msi.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index efdbe311e936..6669b7374c0a 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -5,9 +5,9 @@ generic-y += compat.h
generic-y += cputime.h
generic-y += device.h
generic-y += div64.h
-generic-y += dma.h
generic-y += dma-contiguous.h
generic-y += dma-mapping.h
+generic-y += dma.h
generic-y += emergency-restart.h
generic-y += errno.h
generic-y += exec.h
@@ -46,6 +46,7 @@ generic-y += setup.h
generic-y += shmbuf.h
generic-y += shmparam.h
generic-y += signal.h
+generic-y += simd.h
generic-y += socket.h
generic-y += sockios.h
generic-y += stat.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index e3239772887a..7a26dc6ce815 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,9 +7,9 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h
generic-y += cacheflush.h
generic-y += device.h
+generic-y += div64.h
generic-y += dma-contiguous.h
generic-y += dma-mapping.h
-generic-y += div64.h
generic-y += emergency-restart.h
generic-y += export.h
generic-y += fb.h
@@ -22,6 +22,7 @@ generic-y += mcs_spinlock.h
generic-y += mm-arch-hooks.h
generic-y += preempt.h
generic-y += rwsem.h
+generic-y += simd.h
generic-y += trace_clock.h
generic-y += unaligned.h
generic-y += word-at-a-time.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 6a5609a55965..8e64ff35a933 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += percpu.h
generic-y += preempt.h
generic-y += rwsem.h
generic-y += serial.h
+generic-y += simd.h
generic-y += sizes.h
generic-y += trace_clock.h
generic-y += xor.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 410b263ef5c8..72b9e08fb350 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -17,5 +17,6 @@ generic-y += msi.h
generic-y += preempt.h
generic-y += rwsem.h
generic-y += serial.h
+generic-y += simd.h
generic-y += trace_clock.h
generic-y += word-at-a-time.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index b10dde6cb793..d37288b08dd2 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,15 +16,16 @@ generic-y += io.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
+generic-y += kprobes.h
generic-y += mcs_spinlock.h
generic-y += mm-arch-hooks.h
generic-y += param.h
generic-y += pci.h
generic-y += percpu.h
generic-y += preempt.h
+generic-y += simd.h
generic-y += switch_to.h
generic-y += topology.h
generic-y += trace_clock.h
generic-y += word-at-a-time.h
generic-y += xor.h
-generic-y += kprobes.h
diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild
index bfc7abe77905..98a908720bbd 100644
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -27,6 +27,7 @@ generic-y += preempt.h
generic-y += sections.h
generic-y += segment.h
generic-y += serial.h
+generic-y += simd.h
generic-y += sizes.h
generic-y += syscalls.h
generic-y += topology.h
diff --git a/arch/x86/include/asm/simd.h b/arch/x86/include/asm/simd.h
index a341c878e977..4aad7f158dcb 100644
--- a/arch/x86/include/asm/simd.h
+++ b/arch/x86/include/asm/simd.h
@@ -1,4 +1,11 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/simd.h>
+#ifndef _ASM_SIMD_H
+#define _ASM_SIMD_H
#include <asm/fpu/api.h>
@@ -10,3 +17,38 @@ static __must_check inline bool may_use_simd(void)
{
return irq_fpu_usable();
}
+
+static inline void simd_get(simd_context_t *ctx)
+{
+#if !defined(CONFIG_UML)
+ *ctx = may_use_simd() ? HAVE_FULL_SIMD : HAVE_NO_SIMD;
+#else
+ *ctx = HAVE_NO_SIMD;
+#endif
+}
+
+static inline void simd_put(simd_context_t *ctx)
+{
+#if !defined(CONFIG_UML)
+ if (*ctx & HAVE_SIMD_IN_USE)
+ kernel_fpu_end();
+#endif
+ *ctx = HAVE_NO_SIMD;
+}
+
+static __must_check inline bool simd_use(simd_context_t *ctx)
+{
+#if !defined(CONFIG_UML)
+ if (!(*ctx & HAVE_FULL_SIMD))
+ return false;
+ if (*ctx & HAVE_SIMD_IN_USE)
+ return true;
+ kernel_fpu_begin();
+ *ctx |= HAVE_SIMD_IN_USE;
+ return true;
+#else
+ return false;
+#endif
+}
+
+#endif /* _ASM_SIMD_H */
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 82c756431b49..7950f359649d 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -24,6 +24,7 @@ generic-y += percpu.h
generic-y += preempt.h
generic-y += rwsem.h
generic-y += sections.h
+generic-y += simd.h
generic-y += topology.h
generic-y += trace_clock.h
generic-y += word-at-a-time.h
diff --git a/include/asm-generic/simd.h b/include/asm-generic/simd.h
index d0343d58a74a..b3dd61ac010e 100644
--- a/include/asm-generic/simd.h
+++ b/include/asm-generic/simd.h
@@ -1,5 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/simd.h>
+#ifndef _ASM_SIMD_H
+#define _ASM_SIMD_H
+
#include <linux/hardirq.h>
/*
@@ -13,3 +17,19 @@ static __must_check inline bool may_use_simd(void)
{
return !in_interrupt();
}
+
+static inline void simd_get(simd_context_t *ctx)
+{
+ *ctx = HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t *ctx)
+{
+}
+
+static __must_check inline bool simd_use(simd_context_t *ctx)
+{
+ return false;
+}
+
+#endif /* _ASM_SIMD_H */
diff --git a/include/linux/simd.h b/include/linux/simd.h
new file mode 100644
index 000000000000..33bba21012ff
--- /dev/null
+++ b/include/linux/simd.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _SIMD_H
+#define _SIMD_H
+
+typedef enum {
+ HAVE_NO_SIMD = 1 << 0,
+ HAVE_FULL_SIMD = 1 << 1,
+ HAVE_SIMD_IN_USE = 1 << 31
+} simd_context_t;
+
+#include <linux/sched.h>
+#include <asm/simd.h>
+
+static inline void simd_relax(simd_context_t *ctx)
+{
+#ifdef CONFIG_PREEMPT
+ if ((*ctx & HAVE_SIMD_IN_USE) && need_resched()) {
+ simd_put(ctx);
+ simd_get(ctx);
+ }
+#endif
+}
+
+#endif /* _SIMD_H */
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 01/23] asm: simd context helper API Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 18:33 ` Joe Perches
2018-09-25 14:56 ` [PATCH net-next v6 03/23] zinc: ChaCha20 generic C implementation and selftest Jason A. Donenfeld
` (21 subsequent siblings)
23 siblings, 1 reply; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson
Zinc stands for "Zinc Is Neat Crypto" or "Zinc as IN Crypto" or maybe
just "Zx2c4's INsane Cryptolib." It's also short, easy to type, and
plays nicely with the recent trend of naming crypto libraries after
elements. The guiding principle is "don't overdo it". It's less of a
library and more of a directory tree for organizing well-curated direct
implementations of cryptography primitives.
Zinc is a new cryptography API that is much more minimal and lower-level
than the current one. It intends to complement it and provide a basis
upon which the current crypto API might build, as the provider of
software implementations of cryptographic primitives. It is motivated by
three primary observations in crypto API design:
* Highly composable "cipher modes" and related abstractions from
90s cryptographers did not turn out to be as terrific an idea as
hoped, leading to a host of API misuse problems.
* Most programmers are afraid of crypto code, and so prefer to
integrate it into libraries in a highly abstracted manner, so as to
shield themselves from implementation details. Cryptographers, on
the other hand, prefer simple direct implementations, which they're
able to verify for high assurance and optimize in accordance with
their expertise.
* Overly abstracted and flexible cryptography APIs lead to a host of
dangerous problems and performance issues. The kernel is in the
business usually not of coming up with new uses of crypto, but
rather implementing various constructions, which means it essentially
needs a library of primitives, not a highly abstracted enterprise-ready
pluggable system, with a few particular exceptions.
This last observation has seen itself play out several times over and
over again within the kernel:
* The perennial move of actual primitives away from crypto/ and into
lib/, so that users can actually call these functions directly with
no overhead and without lots of allocations, function pointers,
string specifier parsing, and general clunkiness. For example:
sha256, chacha20, siphash, sha1, and so forth live in lib/ rather
than in crypto/. Zinc intends to stop the cluttering of lib/ and
introduce these direct primitives into their proper place, lib/zinc/.
* An abundance of misuse bugs with the present crypto API that have
been very unpleasant to clean up.
* A hesitance to even use cryptography, because of the overhead and
headaches involved in accessing the routines.
Zinc goes in a rather different direction. Rather than providing a
thoroughly designed and abstracted API, Zinc gives you simple functions,
which implement some primitive, or some particular and specific
construction of primitives. It is not dynamic in the least, though one
could imagine implementing a complex dynamic dispatch mechanism (such as
the current crypto API) on top of these basic functions. After all,
dynamic dispatch is usually needed for applications with cipher agility,
such as IPsec, dm-crypt, AF_ALG, and so forth, and the existing crypto
API will continue to play that role. However, Zinc will provide a non-
haphazard way of directly utilizing crypto routines in applications
that do have neither the need nor desire for abstraction and dynamic
dispatch.
It also organizes the implementations in a simple, straight-forward,
and direct manner, making it enjoyable and intuitive to work on.
Rather than moving optimized assembly implementations into arch/, it
keeps them all together in lib/zinc/, making it simple and obvious to
compare and contrast what's happening. This is, notably, exactly what
the lib/raid6/ tree does, and that seems to work out rather well. It's
also the pattern of most successful crypto libraries. The architecture-
specific glue-code is made a part of each translation unit, rather than
being in a separate one, so that generic and architecture-optimized code
are combined at compile-time, and incompatibility branches compiled out by
the optimizer.
All implementations have been extensively tested and fuzzed, and are
selected for their quality, trustworthiness, and performance. Wherever
possible and performant, formally verified implementations are used,
such as those from HACL* [1] and Fiat-Crypto [2]. The routines also take
special care to zero out secrets using memzero_explicit (and future work
is planned to have gcc do this more reliably and performantly with
compiler plugins). The performance of the selected implementations is
state-of-the-art and unrivaled on a broad array of hardware, though of
course we will continue to fine tune these to the hardware demands
needed by kernel contributors. Each implementation also comes with
extensive self-tests and crafted test vectors, pulled from various
places such as Wycheproof [9].
Regularity of function signatures is important, so that users can easily
"guess" the name of the function they want. Though, individual
primitives are oftentimes not trivially interchangeable, having been
designed for different things and requiring different parameters and
semantics, and so the function signatures they provide will directly
reflect the realities of the primitives' usages, rather than hiding it
behind (inevitably leaky) abstractions. Also, in contrast to the current
crypto API, Zinc functions can work on stack buffers, and can be called
with different keys, without requiring allocations or locking.
SIMD is used automatically when available, though some routines may
benefit from either having their SIMD disabled for particular
invocations, or to have the SIMD initialization calls amortized over
several invocations of the function, and so Zinc utilizes function
signatures enabling that in conjunction with the recently introduced
simd_context_t.
More generally, Zinc provides function signatures that allow just what
is required by the various callers. This isn't to say that users of the
functions will be permitted to pollute the function semantics with weird
particular needs, but we are trying very hard not to overdo it, and that
means looking carefully at what's actually necessary, and doing just that,
and not much more than that. Remember: practicality and cleanliness rather
than over-zealous infrastructure.
Zinc provides also an opening for the best implementers in academia to
contribute their time and effort to the kernel, by being sufficiently
simple and inviting. In discussing this commit with some of the best and
brightest over the last few years, there are many who are eager to
devote rare talent and energy to this effort.
Following the merging of this, I expect for the primitives that
currently exist in lib/ to work their way into lib/zinc/, after intense
scrutiny of each implementation, potentially replacing them with either
formally-verified implementations, or better studied and faster
state-of-the-art implementations.
Also following the merging of this, I expect for the old crypto API
implementations to be ported over to use Zinc for their software-based
implementations.
As Zinc is simply library code, its config options are un-menued, with
the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
BUG_ONs.
[1] https://github.com/project-everest/hacl-star
[2] https://github.com/mit-plv/fiat-crypto
[3] https://cr.yp.to/ecdh.html
[4] https://cr.yp.to/chacha.html
[5] https://cr.yp.to/snuffle/xsalsa-20081128.pdf
[6] https://cr.yp.to/mac.html
[7] https://blake2.net/
[8] https://tools.ietf.org/html/rfc8439
[9] https://github.com/google/wycheproof
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
MAINTAINERS | 8 ++++++++
lib/Kconfig | 2 ++
lib/Makefile | 2 ++
lib/zinc/Kconfig | 32 ++++++++++++++++++++++++++++++++
lib/zinc/Makefile | 3 +++
5 files changed, 47 insertions(+)
create mode 100644 lib/zinc/Kconfig
create mode 100644 lib/zinc/Makefile
diff --git a/MAINTAINERS b/MAINTAINERS
index 4327777dce57..5967c737f3ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16170,6 +16170,14 @@ Q: https://patchwork.linuxtv.org/project/linux-media/list/
S: Maintained
F: drivers/media/dvb-frontends/zd1301_demod*
+ZINC CRYPTOGRAPHY LIBRARY
+M: Jason A. Donenfeld <Jason@zx2c4.com>
+M: Samuel Neves <sneves@dei.uc.pt>
+S: Maintained
+F: lib/zinc/
+F: include/zinc/
+L: linux-crypto@vger.kernel.org
+
ZPOOL COMPRESSED PAGE STORAGE API
M: Dan Streetman <ddstreet@ieee.org>
L: linux-mm@kvack.org
diff --git a/lib/Kconfig b/lib/Kconfig
index a3928d4438b5..3e6848269c66 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -485,6 +485,8 @@ config GLOB_SELFTEST
module load) by a small amount, so you're welcome to play with
it, but you probably don't need it.
+source "lib/zinc/Kconfig"
+
#
# Netlink attribute parsing support is select'ed if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index ca3f7ebb900d..571d28d3f143 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -214,6 +214,8 @@ obj-$(CONFIG_PERCPU_TEST) += percpu_test.o
obj-$(CONFIG_ASN1) += asn1_decoder.o
+obj-y += zinc/
+
obj-$(CONFIG_FONT_SUPPORT) += fonts/
obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
new file mode 100644
index 000000000000..4e2e59126a67
--- /dev/null
+++ b/lib/zinc/Kconfig
@@ -0,0 +1,32 @@
+config ZINC_DEBUG
+ bool "Zinc cryptography library debugging and self-tests"
+ help
+ This builds a series of self-tests for the Zinc crypto library, which
+ help diagnose any cryptographic algorithm implementation issues that
+ might be at the root cause of potential bugs. It also adds various
+ debugging traps.
+
+ Unless you're developing and testing cryptographic routines, or are
+ especially paranoid about correctness on your hardware, you may say
+ N here.
+
+config ZINC_ARCH_ARM
+ def_bool y
+ depends on ARM && !64BIT
+
+config ZINC_ARCH_ARM64
+ def_bool y
+ depends on ARM64 && 64BIT
+
+config ZINC_ARCH_X86_64
+ def_bool y
+ depends on X86_64 && 64BIT
+ depends on !UML
+
+config ZINC_ARCH_MIPS
+ def_bool y
+ depends on MIPS && CPU_MIPS32_R2 && !64BIT
+
+config ZINC_ARCH_MIPS64
+ def_bool y
+ depends on MIPS && 64BIT
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
new file mode 100644
index 000000000000..a61c80d676cb
--- /dev/null
+++ b/lib/zinc/Makefile
@@ -0,0 +1,3 @@
+ccflags-y := -O2
+ccflags-y += -D'pr_fmt(fmt)="zinc: " fmt'
+ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 03/23] zinc: ChaCha20 generic C implementation and selftest
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 01/23] asm: simd context helper API Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-28 15:40 ` Ard Biesheuvel
2018-10-02 3:15 ` Herbert Xu
2018-09-25 14:56 ` [PATCH net-next v6 04/23] zinc: ChaCha20 x86_64 implementation Jason A. Donenfeld
` (20 subsequent siblings)
23 siblings, 2 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson
This implements the ChaCha20 permutation as a single C statement, by way
of the comma operator, which the compiler is able to simplify
terrifically.
Information: https://cr.yp.to/chacha.html
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
include/zinc/chacha20.h | 65 +
lib/zinc/Kconfig | 4 +
lib/zinc/Makefile | 3 +
lib/zinc/chacha20/chacha20.c | 179 +++
lib/zinc/selftest/chacha20.h | 2676 ++++++++++++++++++++++++++++++++++
5 files changed, 2927 insertions(+)
create mode 100644 include/zinc/chacha20.h
create mode 100644 lib/zinc/chacha20/chacha20.c
create mode 100644 lib/zinc/selftest/chacha20.h
diff --git a/include/zinc/chacha20.h b/include/zinc/chacha20.h
new file mode 100644
index 000000000000..14bbadd242c9
--- /dev/null
+++ b/include/zinc/chacha20.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_CHACHA20_H
+#define _ZINC_CHACHA20_H
+
+#include <asm/unaligned.h>
+#include <linux/simd.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+enum {
+ CHACHA20_NONCE_SIZE = 16,
+ CHACHA20_KEY_SIZE = 32,
+ CHACHA20_KEY_WORDS = CHACHA20_KEY_SIZE / sizeof(u32),
+ CHACHA20_BLOCK_SIZE = 64,
+ CHACHA20_BLOCK_WORDS = CHACHA20_BLOCK_SIZE / sizeof(u32),
+ HCHACHA20_NONCE_SIZE = CHACHA20_NONCE_SIZE,
+ HCHACHA20_KEY_SIZE = CHACHA20_KEY_SIZE
+};
+
+enum { /* expand 32-byte k */
+ CHACHA20_CONSTANT_EXPA = 0x61707865U,
+ CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
+ CHACHA20_CONSTANT_2_BY = 0x79622d32U,
+ CHACHA20_CONSTANT_TE_K = 0x6b206574U
+};
+
+struct chacha20_ctx {
+ u32 constant[4];
+ u32 key[8];
+ u32 counter[4];
+} __aligned(32);
+
+static inline void chacha20_init(struct chacha20_ctx *state,
+ const u8 key[CHACHA20_KEY_SIZE],
+ const u64 nonce)
+{
+ state->constant[0] = CHACHA20_CONSTANT_EXPA;
+ state->constant[1] = CHACHA20_CONSTANT_ND_3;
+ state->constant[2] = CHACHA20_CONSTANT_2_BY;
+ state->constant[3] = CHACHA20_CONSTANT_TE_K;
+ state->key[0] = get_unaligned_le32(key + 0);
+ state->key[1] = get_unaligned_le32(key + 4);
+ state->key[2] = get_unaligned_le32(key + 8);
+ state->key[3] = get_unaligned_le32(key + 12);
+ state->key[4] = get_unaligned_le32(key + 16);
+ state->key[5] = get_unaligned_le32(key + 20);
+ state->key[6] = get_unaligned_le32(key + 24);
+ state->key[7] = get_unaligned_le32(key + 28);
+ state->counter[0] = 0;
+ state->counter[1] = 0;
+ state->counter[2] = nonce & U32_MAX;
+ state->counter[3] = nonce >> 32;
+}
+void chacha20(struct chacha20_ctx *state, u8 *dst, const u8 *src, u32 len,
+ simd_context_t *simd_context);
+
+void hchacha20(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE], simd_context_t *simd_context);
+
+#endif /* _ZINC_CHACHA20_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index 4e2e59126a67..1ca1ae1e9ea9 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -1,3 +1,7 @@
+config ZINC_CHACHA20
+ tristate
+ select CRYPTO_ALGAPI
+
config ZINC_DEBUG
bool "Zinc cryptography library debugging and self-tests"
help
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index a61c80d676cb..3d80144d55a6 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -1,3 +1,6 @@
ccflags-y := -O2
ccflags-y += -D'pr_fmt(fmt)="zinc: " fmt'
ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
+
+zinc_chacha20-y := chacha20/chacha20.o
+obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
new file mode 100644
index 000000000000..c82d9fc71f21
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Implementation of the ChaCha20 stream cipher.
+ *
+ * Information: https://cr.yp.to/chacha.html
+ */
+
+#include <zinc/chacha20.h>
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <crypto/algapi.h>
+
+#ifndef HAVE_CHACHA20_ARCH_IMPLEMENTATION
+void __init chacha20_fpu_init(void)
+{
+}
+static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *out,
+ const u8 *in, const size_t len,
+ simd_context_t *simd_context)
+{
+ return false;
+}
+static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+ return false;
+}
+#endif
+
+#define QUARTER_ROUND(x, a, b, c, d) ( \
+ x[a] += x[b], \
+ x[d] = rol32((x[d] ^ x[a]), 16), \
+ x[c] += x[d], \
+ x[b] = rol32((x[b] ^ x[c]), 12), \
+ x[a] += x[b], \
+ x[d] = rol32((x[d] ^ x[a]), 8), \
+ x[c] += x[d], \
+ x[b] = rol32((x[b] ^ x[c]), 7) \
+)
+
+#define C(i, j) (i * 4 + j)
+
+#define DOUBLE_ROUND(x) ( \
+ /* Column Round */ \
+ QUARTER_ROUND(x, C(0, 0), C(1, 0), C(2, 0), C(3, 0)), \
+ QUARTER_ROUND(x, C(0, 1), C(1, 1), C(2, 1), C(3, 1)), \
+ QUARTER_ROUND(x, C(0, 2), C(1, 2), C(2, 2), C(3, 2)), \
+ QUARTER_ROUND(x, C(0, 3), C(1, 3), C(2, 3), C(3, 3)), \
+ /* Diagonal Round */ \
+ QUARTER_ROUND(x, C(0, 0), C(1, 1), C(2, 2), C(3, 3)), \
+ QUARTER_ROUND(x, C(0, 1), C(1, 2), C(2, 3), C(3, 0)), \
+ QUARTER_ROUND(x, C(0, 2), C(1, 3), C(2, 0), C(3, 1)), \
+ QUARTER_ROUND(x, C(0, 3), C(1, 0), C(2, 1), C(3, 2)) \
+)
+
+#define TWENTY_ROUNDS(x) ( \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x) \
+)
+
+static void chacha20_block_generic(__le32 *stream, u32 *state)
+{
+ u32 x[CHACHA20_BLOCK_WORDS];
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(x); ++i)
+ x[i] = state[i];
+
+ TWENTY_ROUNDS(x);
+
+ for (i = 0; i < ARRAY_SIZE(x); ++i)
+ stream[i] = cpu_to_le32(x[i] + state[i]);
+
+ ++state[12];
+}
+
+static void chacha20_generic(struct chacha20_ctx *state, u8 *out, const u8 *in,
+ u32 len)
+{
+ __le32 buf[CHACHA20_BLOCK_WORDS];
+
+ while (len >= CHACHA20_BLOCK_SIZE) {
+ chacha20_block_generic(buf, (u32 *)state);
+ crypto_xor_cpy(out, in, (u8 *)buf, CHACHA20_BLOCK_SIZE);
+ len -= CHACHA20_BLOCK_SIZE;
+ out += CHACHA20_BLOCK_SIZE;
+ in += CHACHA20_BLOCK_SIZE;
+ }
+ if (len) {
+ chacha20_block_generic(buf, (u32 *)state);
+ crypto_xor_cpy(out, in, (u8 *)buf, len);
+ }
+}
+
+void chacha20(struct chacha20_ctx *state, u8 *dst, const u8 *src, u32 len,
+ simd_context_t *simd_context)
+{
+ if (!chacha20_arch(state, dst, src, len, simd_context))
+ chacha20_generic(state, dst, src, len);
+}
+EXPORT_SYMBOL(chacha20);
+
+static void hchacha20_generic(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE])
+{
+ u32 x[] = { CHACHA20_CONSTANT_EXPA,
+ CHACHA20_CONSTANT_ND_3,
+ CHACHA20_CONSTANT_2_BY,
+ CHACHA20_CONSTANT_TE_K,
+ get_unaligned_le32(key + 0),
+ get_unaligned_le32(key + 4),
+ get_unaligned_le32(key + 8),
+ get_unaligned_le32(key + 12),
+ get_unaligned_le32(key + 16),
+ get_unaligned_le32(key + 20),
+ get_unaligned_le32(key + 24),
+ get_unaligned_le32(key + 28),
+ get_unaligned_le32(nonce + 0),
+ get_unaligned_le32(nonce + 4),
+ get_unaligned_le32(nonce + 8),
+ get_unaligned_le32(nonce + 12)
+ };
+
+ TWENTY_ROUNDS(x);
+
+ memcpy(derived_key + 0, x + 0, sizeof(u32) * 4);
+ memcpy(derived_key + 4, x + 12, sizeof(u32) * 4);
+}
+
+/* Derived key should be 32-bit aligned */
+void hchacha20(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE], simd_context_t *simd_context)
+{
+ if (!hchacha20_arch(derived_key, nonce, key, simd_context))
+ hchacha20_generic(derived_key, nonce, key);
+}
+EXPORT_SYMBOL(hchacha20);
+
+#include "../selftest/chacha20.h"
+
+static bool nosimd __initdata = false;
+
+static int __init mod_init(void)
+{
+ if (!nosimd)
+ chacha20_fpu_init();
+#ifdef DEBUG
+ if (!chacha20_selftest())
+ return -ENOTRECOVERABLE;
+#endif
+ return 0;
+}
+
+static void __exit mod_exit(void)
+{
+}
+
+module_param(nosimd, bool, 0);
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("ChaCha20 stream cipher");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
diff --git a/lib/zinc/selftest/chacha20.h b/lib/zinc/selftest/chacha20.h
new file mode 100644
index 000000000000..397a4b930fd7
--- /dev/null
+++ b/lib/zinc/selftest/chacha20.h
@@ -0,0 +1,2676 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+
+struct chacha20_testvec {
+ const u8 *input, *output, *key;
+ u64 nonce;
+ size_t ilen;
+};
+
+struct hchacha20_testvec {
+ u8 key[HCHACHA20_KEY_SIZE];
+ u8 nonce[HCHACHA20_NONCE_SIZE];
+ u8 output[CHACHA20_KEY_SIZE];
+};
+
+/* These test vectors are generated by reference implementations and are
+ * designed to check chacha20 implementation block handling, as well as from
+ * the draft-arciszewski-xchacha-01 document.
+ */
+
+static const u8 input01[] __initconst = { };
+static const u8 output01[] __initconst = { };
+static const u8 key01[] __initconst = {
+ 0x09, 0xf4, 0xe8, 0x57, 0x10, 0xf2, 0x12, 0xc3,
+ 0xc6, 0x91, 0xc4, 0x09, 0x97, 0x46, 0xef, 0xfe,
+ 0x02, 0x00, 0xe4, 0x5c, 0x82, 0xed, 0x16, 0xf3,
+ 0x32, 0xbe, 0xec, 0x7a, 0xe6, 0x68, 0x12, 0x26
+};
+enum { nonce01 = 0x3834e2afca3c66d3ULL };
+
+static const u8 input02[] __initconst = {
+ 0x9d
+};
+static const u8 output02[] __initconst = {
+ 0x94
+};
+static const u8 key02[] __initconst = {
+ 0x8c, 0x01, 0xac, 0xaf, 0x62, 0x63, 0x56, 0x7a,
+ 0xad, 0x23, 0x4c, 0x58, 0x29, 0x29, 0xbe, 0xab,
+ 0xe9, 0xf8, 0xdf, 0x6c, 0x8c, 0x74, 0x4d, 0x7d,
+ 0x13, 0x94, 0x10, 0x02, 0x3d, 0x8e, 0x9f, 0x94
+};
+enum { nonce02 = 0x5d1b3bfdedd9f73aULL };
+
+static const u8 input03[] __initconst = {
+ 0x04, 0x16
+};
+static const u8 output03[] __initconst = {
+ 0x92, 0x07
+};
+static const u8 key03[] __initconst = {
+ 0x22, 0x0c, 0x79, 0x2c, 0x38, 0x51, 0xbe, 0x99,
+ 0xa9, 0x59, 0x24, 0x50, 0xef, 0x87, 0x38, 0xa6,
+ 0xa0, 0x97, 0x20, 0xcb, 0xb4, 0x0c, 0x94, 0x67,
+ 0x1f, 0x98, 0xdc, 0xc4, 0x83, 0xbc, 0x35, 0x4d
+};
+enum { nonce03 = 0x7a3353ad720a3e2eULL };
+
+static const u8 input04[] __initconst = {
+ 0xc7, 0xcc, 0xd0
+};
+static const u8 output04[] __initconst = {
+ 0xd8, 0x41, 0x80
+};
+static const u8 key04[] __initconst = {
+ 0x81, 0x5e, 0x12, 0x01, 0xc4, 0x36, 0x15, 0x03,
+ 0x11, 0xa0, 0xe9, 0x86, 0xbb, 0x5a, 0xdc, 0x45,
+ 0x7d, 0x5e, 0x98, 0xf8, 0x06, 0x76, 0x1c, 0xec,
+ 0xc0, 0xf7, 0xca, 0x4e, 0x99, 0xd9, 0x42, 0x38
+};
+enum { nonce04 = 0x6816e2fc66176da2ULL };
+
+static const u8 input05[] __initconst = {
+ 0x48, 0xf1, 0x31, 0x5f
+};
+static const u8 output05[] __initconst = {
+ 0x48, 0xf7, 0x13, 0x67
+};
+static const u8 key05[] __initconst = {
+ 0x3f, 0xd6, 0xb6, 0x5e, 0x2f, 0xda, 0x82, 0x39,
+ 0x97, 0x06, 0xd3, 0x62, 0x4f, 0xbd, 0xcb, 0x9b,
+ 0x1d, 0xe6, 0x4a, 0x76, 0xab, 0xdd, 0x14, 0x50,
+ 0x59, 0x21, 0xe3, 0xb2, 0xc7, 0x95, 0xbc, 0x45
+};
+enum { nonce05 = 0xc41a7490e228cc42ULL };
+
+static const u8 input06[] __initconst = {
+ 0xae, 0xa2, 0x85, 0x1d, 0xc8
+};
+static const u8 output06[] __initconst = {
+ 0xfa, 0xff, 0x45, 0x6b, 0x6f
+};
+static const u8 key06[] __initconst = {
+ 0x04, 0x8d, 0xea, 0x67, 0x20, 0x78, 0xfb, 0x8f,
+ 0x49, 0x80, 0x35, 0xb5, 0x7b, 0xe4, 0x31, 0x74,
+ 0x57, 0x43, 0x3a, 0x64, 0x64, 0xb9, 0xe6, 0x23,
+ 0x4d, 0xfe, 0xb8, 0x7b, 0x71, 0x4d, 0x9d, 0x21
+};
+enum { nonce06 = 0x251366db50b10903ULL };
+
+static const u8 input07[] __initconst = {
+ 0x1a, 0x32, 0x85, 0xb6, 0xe8, 0x52
+};
+static const u8 output07[] __initconst = {
+ 0xd3, 0x5f, 0xf0, 0x07, 0x69, 0xec
+};
+static const u8 key07[] __initconst = {
+ 0xbf, 0x2d, 0x42, 0x99, 0x97, 0x76, 0x04, 0xad,
+ 0xd3, 0x8f, 0x6e, 0x6a, 0x34, 0x85, 0xaf, 0x81,
+ 0xef, 0x36, 0x33, 0xd5, 0x43, 0xa2, 0xaa, 0x08,
+ 0x0f, 0x77, 0x42, 0x83, 0x58, 0xc5, 0x42, 0x2a
+};
+enum { nonce07 = 0xe0796da17dba9b58ULL };
+
+static const u8 input08[] __initconst = {
+ 0x40, 0xae, 0xcd, 0xe4, 0x3d, 0x22, 0xe0
+};
+static const u8 output08[] __initconst = {
+ 0xfd, 0x8a, 0x9f, 0x3d, 0x05, 0xc9, 0xd3
+};
+static const u8 key08[] __initconst = {
+ 0xdc, 0x3f, 0x41, 0xe3, 0x23, 0x2a, 0x8d, 0xf6,
+ 0x41, 0x2a, 0xa7, 0x66, 0x05, 0x68, 0xe4, 0x7b,
+ 0xc4, 0x58, 0xd6, 0xcc, 0xdf, 0x0d, 0xc6, 0x25,
+ 0x1b, 0x61, 0x32, 0x12, 0x4e, 0xf1, 0xe6, 0x29
+};
+enum { nonce08 = 0xb1d2536d9e159832ULL };
+
+static const u8 input09[] __initconst = {
+ 0xba, 0x1d, 0x14, 0x16, 0x9f, 0x83, 0x67, 0x24
+};
+static const u8 output09[] __initconst = {
+ 0x7c, 0xe3, 0x78, 0x1d, 0xa2, 0xe7, 0xe9, 0x39
+};
+static const u8 key09[] __initconst = {
+ 0x17, 0x55, 0x90, 0x52, 0xa4, 0xce, 0x12, 0xae,
+ 0xd4, 0xfd, 0xd4, 0xfb, 0xd5, 0x18, 0x59, 0x50,
+ 0x4e, 0x51, 0x99, 0x32, 0x09, 0x31, 0xfc, 0xf7,
+ 0x27, 0x10, 0x8e, 0xa2, 0x4b, 0xa5, 0xf5, 0x62
+};
+enum { nonce09 = 0x495fc269536d003ULL };
+
+static const u8 input10[] __initconst = {
+ 0x09, 0xfd, 0x3c, 0x0b, 0x3d, 0x0e, 0xf3, 0x9d,
+ 0x27
+};
+static const u8 output10[] __initconst = {
+ 0xdc, 0xe4, 0x33, 0x60, 0x0c, 0x07, 0xcb, 0x51,
+ 0x6b
+};
+static const u8 key10[] __initconst = {
+ 0x4e, 0x00, 0x72, 0x37, 0x0f, 0x52, 0x4d, 0x6f,
+ 0x37, 0x50, 0x3c, 0xb3, 0x51, 0x81, 0x49, 0x16,
+ 0x7e, 0xfd, 0xb1, 0x51, 0x72, 0x2e, 0xe4, 0x16,
+ 0x68, 0x5c, 0x5b, 0x8a, 0xc3, 0x90, 0x70, 0x04
+};
+enum { nonce10 = 0x1ad9d1114d88cbbdULL };
+
+static const u8 input11[] __initconst = {
+ 0x70, 0x18, 0x52, 0x85, 0xba, 0x66, 0xff, 0x2c,
+ 0x9a, 0x46
+};
+static const u8 output11[] __initconst = {
+ 0xf5, 0x2a, 0x7a, 0xfd, 0x31, 0x7c, 0x91, 0x41,
+ 0xb1, 0xcf
+};
+static const u8 key11[] __initconst = {
+ 0x48, 0xb4, 0xd0, 0x7c, 0x88, 0xd1, 0x96, 0x0d,
+ 0x80, 0x33, 0xb4, 0xd5, 0x31, 0x9a, 0x88, 0xca,
+ 0x14, 0xdc, 0xf0, 0xa8, 0xf3, 0xac, 0xb8, 0x47,
+ 0x75, 0x86, 0x7c, 0x88, 0x50, 0x11, 0x43, 0x40
+};
+enum { nonce11 = 0x47c35dd1f4f8aa4fULL };
+
+static const u8 input12[] __initconst = {
+ 0x9e, 0x8e, 0x3d, 0x2a, 0x05, 0xfd, 0xe4, 0x90,
+ 0x24, 0x1c, 0xd3
+};
+static const u8 output12[] __initconst = {
+ 0x97, 0x72, 0x40, 0x9f, 0xc0, 0x6b, 0x05, 0x33,
+ 0x42, 0x7e, 0x28
+};
+static const u8 key12[] __initconst = {
+ 0xee, 0xff, 0x33, 0x33, 0xe0, 0x28, 0xdf, 0xa2,
+ 0xb6, 0x5e, 0x25, 0x09, 0x52, 0xde, 0xa5, 0x9c,
+ 0x8f, 0x95, 0xa9, 0x03, 0x77, 0x0f, 0xbe, 0xa1,
+ 0xd0, 0x7d, 0x73, 0x2f, 0xf8, 0x7e, 0x51, 0x44
+};
+enum { nonce12 = 0xc22d044dc6ea4af3ULL };
+
+static const u8 input13[] __initconst = {
+ 0x9c, 0x16, 0xa2, 0x22, 0x4d, 0xbe, 0x04, 0x9a,
+ 0xb3, 0xb5, 0xc6, 0x58
+};
+static const u8 output13[] __initconst = {
+ 0xf0, 0x81, 0xdb, 0x6d, 0xa3, 0xe9, 0xb2, 0xc6,
+ 0x32, 0x50, 0x16, 0x9f
+};
+static const u8 key13[] __initconst = {
+ 0x96, 0xb3, 0x01, 0xd2, 0x7a, 0x8c, 0x94, 0x09,
+ 0x4f, 0x58, 0xbe, 0x80, 0xcc, 0xa9, 0x7e, 0x2d,
+ 0xad, 0x58, 0x3b, 0x63, 0xb8, 0x5c, 0x17, 0xce,
+ 0xbf, 0x43, 0x33, 0x7a, 0x7b, 0x82, 0x28, 0x2f
+};
+enum { nonce13 = 0x2a5d05d88cd7b0daULL };
+
+static const u8 input14[] __initconst = {
+ 0x57, 0x4f, 0xaa, 0x30, 0xe6, 0x23, 0x50, 0x86,
+ 0x91, 0xa5, 0x60, 0x96, 0x2b
+};
+static const u8 output14[] __initconst = {
+ 0x6c, 0x1f, 0x3b, 0x42, 0xb6, 0x2f, 0xf0, 0xbd,
+ 0x76, 0x60, 0xc7, 0x7e, 0x8d
+};
+static const u8 key14[] __initconst = {
+ 0x22, 0x85, 0xaf, 0x8f, 0xa3, 0x53, 0xa0, 0xc4,
+ 0xb5, 0x75, 0xc0, 0xba, 0x30, 0x92, 0xc3, 0x32,
+ 0x20, 0x5a, 0x8f, 0x7e, 0x93, 0xda, 0x65, 0x18,
+ 0xd1, 0xf6, 0x9a, 0x9b, 0x8f, 0x85, 0x30, 0xe6
+};
+enum { nonce14 = 0xf9946c166aa4475fULL };
+
+static const u8 input15[] __initconst = {
+ 0x89, 0x81, 0xc7, 0xe2, 0x00, 0xac, 0x52, 0x70,
+ 0xa4, 0x79, 0xab, 0xeb, 0x74, 0xf7
+};
+static const u8 output15[] __initconst = {
+ 0xb4, 0xd0, 0xa9, 0x9d, 0x15, 0x5f, 0x48, 0xd6,
+ 0x00, 0x7e, 0x4c, 0x77, 0x5a, 0x46
+};
+static const u8 key15[] __initconst = {
+ 0x0a, 0x66, 0x36, 0xca, 0x5d, 0x82, 0x23, 0xb6,
+ 0xe4, 0x9b, 0xad, 0x5e, 0xd0, 0x7f, 0xf6, 0x7a,
+ 0x7b, 0x03, 0xa7, 0x4c, 0xfd, 0xec, 0xd5, 0xa1,
+ 0xfc, 0x25, 0x54, 0xda, 0x5a, 0x5c, 0xf0, 0x2c
+};
+enum { nonce15 = 0x9ab2b87a35e772c8ULL };
+
+static const u8 input16[] __initconst = {
+ 0x5f, 0x09, 0xc0, 0x8b, 0x1e, 0xde, 0xca, 0xd9,
+ 0xb7, 0x5c, 0x23, 0xc9, 0x55, 0x1e, 0xcf
+};
+static const u8 output16[] __initconst = {
+ 0x76, 0x9b, 0x53, 0xf3, 0x66, 0x88, 0x28, 0x60,
+ 0x98, 0x80, 0x2c, 0xa8, 0x80, 0xa6, 0x48
+};
+static const u8 key16[] __initconst = {
+ 0x80, 0xb5, 0x51, 0xdf, 0x17, 0x5b, 0xb0, 0xef,
+ 0x8b, 0x5b, 0x2e, 0x3e, 0xc5, 0xe3, 0xa5, 0x86,
+ 0xac, 0x0d, 0x8e, 0x32, 0x90, 0x9d, 0x82, 0x27,
+ 0xf1, 0x23, 0x26, 0xc3, 0xea, 0x55, 0xb6, 0x63
+};
+enum { nonce16 = 0xa82e9d39e4d02ef5ULL };
+
+static const u8 input17[] __initconst = {
+ 0x87, 0x0b, 0x36, 0x71, 0x7c, 0xb9, 0x0b, 0x80,
+ 0x4d, 0x77, 0x5c, 0x4f, 0xf5, 0x51, 0x0e, 0x1a
+};
+static const u8 output17[] __initconst = {
+ 0xf1, 0x12, 0x4a, 0x8a, 0xd9, 0xd0, 0x08, 0x67,
+ 0x66, 0xd7, 0x34, 0xea, 0x32, 0x3b, 0x54, 0x0e
+};
+static const u8 key17[] __initconst = {
+ 0xfb, 0x71, 0x5f, 0x3f, 0x7a, 0xc0, 0x9a, 0xc8,
+ 0xc8, 0xcf, 0xe8, 0xbc, 0xfb, 0x09, 0xbf, 0x89,
+ 0x6a, 0xef, 0xd5, 0xe5, 0x36, 0x87, 0x14, 0x76,
+ 0x00, 0xb9, 0x32, 0x28, 0xb2, 0x00, 0x42, 0x53
+};
+enum { nonce17 = 0x229b87e73d557b96ULL };
+
+static const u8 input18[] __initconst = {
+ 0x38, 0x42, 0xb5, 0x37, 0xb4, 0x3d, 0xfe, 0x59,
+ 0x38, 0x68, 0x88, 0xfa, 0x89, 0x8a, 0x5f, 0x90,
+ 0x3c
+};
+static const u8 output18[] __initconst = {
+ 0xac, 0xad, 0x14, 0xe8, 0x7e, 0xd7, 0xce, 0x96,
+ 0x3d, 0xb3, 0x78, 0x85, 0x22, 0x5a, 0xcb, 0x39,
+ 0xd4
+};
+static const u8 key18[] __initconst = {
+ 0xe1, 0xc1, 0xa8, 0xe0, 0x91, 0xe7, 0x38, 0x66,
+ 0x80, 0x17, 0x12, 0x3c, 0x5e, 0x2d, 0xbb, 0xea,
+ 0xeb, 0x6c, 0x8b, 0xc8, 0x1b, 0x6f, 0x7c, 0xea,
+ 0x50, 0x57, 0x23, 0x1e, 0x65, 0x6f, 0x6d, 0x81
+};
+enum { nonce18 = 0xfaf5fcf8f30e57a9ULL };
+
+static const u8 input19[] __initconst = {
+ 0x1c, 0x4a, 0x30, 0x26, 0xef, 0x9a, 0x32, 0xa7,
+ 0x8f, 0xe5, 0xc0, 0x0f, 0x30, 0x3a, 0xbf, 0x38,
+ 0x54, 0xba
+};
+static const u8 output19[] __initconst = {
+ 0x57, 0x67, 0x54, 0x4f, 0x31, 0xd6, 0xef, 0x35,
+ 0x0b, 0xd9, 0x52, 0xa7, 0x46, 0x7d, 0x12, 0x17,
+ 0x1e, 0xe3
+};
+static const u8 key19[] __initconst = {
+ 0x5a, 0x79, 0xc1, 0xea, 0x33, 0xb3, 0xc7, 0x21,
+ 0xec, 0xf8, 0xcb, 0xd2, 0x58, 0x96, 0x23, 0xd6,
+ 0x4d, 0xed, 0x2f, 0xdf, 0x8a, 0x79, 0xe6, 0x8b,
+ 0x38, 0xa3, 0xc3, 0x7a, 0x33, 0xda, 0x02, 0xc7
+};
+enum { nonce19 = 0x2b23b61840429604ULL };
+
+static const u8 input20[] __initconst = {
+ 0xab, 0xe9, 0x32, 0xbb, 0x35, 0x17, 0xe0, 0x60,
+ 0x80, 0xb1, 0x27, 0xdc, 0xe6, 0x62, 0x9e, 0x0c,
+ 0x77, 0xf4, 0x50
+};
+static const u8 output20[] __initconst = {
+ 0x54, 0x6d, 0xaa, 0xfc, 0x08, 0xfb, 0x71, 0xa8,
+ 0xd6, 0x1d, 0x7d, 0xf3, 0x45, 0x10, 0xb5, 0x4c,
+ 0xcc, 0x4b, 0x45
+};
+static const u8 key20[] __initconst = {
+ 0xa3, 0xfd, 0x3d, 0xa9, 0xeb, 0xea, 0x2c, 0x69,
+ 0xcf, 0x59, 0x38, 0x13, 0x5b, 0xa7, 0x53, 0x8f,
+ 0x5e, 0xa2, 0x33, 0x86, 0x4c, 0x75, 0x26, 0xaf,
+ 0x35, 0x12, 0x09, 0x71, 0x81, 0xea, 0x88, 0x66
+};
+enum { nonce20 = 0x7459667a8fadff58ULL };
+
+static const u8 input21[] __initconst = {
+ 0xa6, 0x82, 0x21, 0x23, 0xad, 0x27, 0x3f, 0xc6,
+ 0xd7, 0x16, 0x0d, 0x6d, 0x24, 0x15, 0x54, 0xc5,
+ 0x96, 0x72, 0x59, 0x8a
+};
+static const u8 output21[] __initconst = {
+ 0x5f, 0x34, 0x32, 0xea, 0x06, 0xd4, 0x9e, 0x01,
+ 0xdc, 0x32, 0x32, 0x40, 0x66, 0x73, 0x6d, 0x4a,
+ 0x6b, 0x12, 0x20, 0xe8
+};
+static const u8 key21[] __initconst = {
+ 0x96, 0xfd, 0x13, 0x23, 0xa9, 0x89, 0x04, 0xe6,
+ 0x31, 0xa5, 0x2c, 0xc1, 0x40, 0xd5, 0x69, 0x5c,
+ 0x32, 0x79, 0x56, 0xe0, 0x29, 0x93, 0x8f, 0xe8,
+ 0x5f, 0x65, 0x53, 0x7f, 0xc1, 0xe9, 0xaf, 0xaf
+};
+enum { nonce21 = 0xba8defee9d8e13b5ULL };
+
+static const u8 input22[] __initconst = {
+ 0xb8, 0x32, 0x1a, 0x81, 0xd8, 0x38, 0x89, 0x5a,
+ 0xb0, 0x05, 0xbe, 0xf4, 0xd2, 0x08, 0xc6, 0xee,
+ 0x79, 0x7b, 0x3a, 0x76, 0x59
+};
+static const u8 output22[] __initconst = {
+ 0xb7, 0xba, 0xae, 0x80, 0xe4, 0x9f, 0x79, 0x84,
+ 0x5a, 0x48, 0x50, 0x6d, 0xcb, 0xd0, 0x06, 0x0c,
+ 0x15, 0x63, 0xa7, 0x5e, 0xbd
+};
+static const u8 key22[] __initconst = {
+ 0x0f, 0x35, 0x3d, 0xeb, 0x5f, 0x0a, 0x82, 0x0d,
+ 0x24, 0x59, 0x71, 0xd8, 0xe6, 0x2d, 0x5f, 0xe1,
+ 0x7e, 0x0c, 0xae, 0xf6, 0xdc, 0x2c, 0xc5, 0x4a,
+ 0x38, 0x88, 0xf2, 0xde, 0xd9, 0x5f, 0x76, 0x7c
+};
+enum { nonce22 = 0xe77f1760e9f5e192ULL };
+
+static const u8 input23[] __initconst = {
+ 0x4b, 0x1e, 0x79, 0x99, 0xcf, 0xef, 0x64, 0x4b,
+ 0xb0, 0x66, 0xae, 0x99, 0x2e, 0x68, 0x97, 0xf5,
+ 0x5d, 0x9b, 0x3f, 0x7a, 0xa9, 0xd9
+};
+static const u8 output23[] __initconst = {
+ 0x5f, 0xa4, 0x08, 0x39, 0xca, 0xfa, 0x2b, 0x83,
+ 0x5d, 0x95, 0x70, 0x7c, 0x2e, 0xd4, 0xae, 0xfa,
+ 0x45, 0x4a, 0x77, 0x7f, 0xa7, 0x65
+};
+static const u8 key23[] __initconst = {
+ 0x4a, 0x06, 0x83, 0x64, 0xaa, 0xe3, 0x38, 0x32,
+ 0x28, 0x5d, 0xa4, 0xb2, 0x5a, 0xee, 0xcf, 0x8e,
+ 0x19, 0x67, 0xf1, 0x09, 0xe8, 0xc9, 0xf6, 0x40,
+ 0x02, 0x6d, 0x0b, 0xde, 0xfa, 0x81, 0x03, 0xb1
+};
+enum { nonce23 = 0x9b3f349158709849ULL };
+
+static const u8 input24[] __initconst = {
+ 0xc6, 0xfc, 0x47, 0x5e, 0xd8, 0xed, 0xa9, 0xe5,
+ 0x4f, 0x82, 0x79, 0x35, 0xee, 0x3e, 0x7e, 0x3e,
+ 0x35, 0x70, 0x6e, 0xfa, 0x6d, 0x08, 0xe8
+};
+static const u8 output24[] __initconst = {
+ 0x3b, 0xc5, 0xf8, 0xc2, 0xbf, 0x2b, 0x90, 0x33,
+ 0xa6, 0xae, 0xf5, 0x5a, 0x65, 0xb3, 0x3d, 0xe1,
+ 0xcd, 0x5f, 0x55, 0xfa, 0xe7, 0xa5, 0x4a
+};
+static const u8 key24[] __initconst = {
+ 0x00, 0x24, 0xc3, 0x65, 0x5f, 0xe6, 0x31, 0xbb,
+ 0x6d, 0xfc, 0x20, 0x7b, 0x1b, 0xa8, 0x96, 0x26,
+ 0x55, 0x21, 0x62, 0x25, 0x7e, 0xba, 0x23, 0x97,
+ 0xc9, 0xb8, 0x53, 0xa8, 0xef, 0xab, 0xad, 0x61
+};
+enum { nonce24 = 0x13ee0b8f526177c3ULL };
+
+static const u8 input25[] __initconst = {
+ 0x33, 0x07, 0x16, 0xb1, 0x34, 0x33, 0x67, 0x04,
+ 0x9b, 0x0a, 0xce, 0x1b, 0xe9, 0xde, 0x1a, 0xec,
+ 0xd0, 0x55, 0xfb, 0xc6, 0x33, 0xaf, 0x2d, 0xe3
+};
+static const u8 output25[] __initconst = {
+ 0x05, 0x93, 0x10, 0xd1, 0x58, 0x6f, 0x68, 0x62,
+ 0x45, 0xdb, 0x91, 0xae, 0x70, 0xcf, 0xd4, 0x5f,
+ 0xee, 0xdf, 0xd5, 0xba, 0x9e, 0xde, 0x68, 0xe6
+};
+static const u8 key25[] __initconst = {
+ 0x83, 0xa9, 0x4f, 0x5d, 0x74, 0xd5, 0x91, 0xb3,
+ 0xc9, 0x97, 0x19, 0x15, 0xdb, 0x0d, 0x0b, 0x4a,
+ 0x3d, 0x55, 0xcf, 0xab, 0xb2, 0x05, 0x21, 0x35,
+ 0x45, 0x50, 0xeb, 0xf8, 0xf5, 0xbf, 0x36, 0x35
+};
+enum { nonce25 = 0x7c6f459e49ebfebcULL };
+
+static const u8 input26[] __initconst = {
+ 0xc2, 0xd4, 0x7a, 0xa3, 0x92, 0xe1, 0xac, 0x46,
+ 0x1a, 0x15, 0x38, 0xc9, 0xb5, 0xfd, 0xdf, 0x84,
+ 0x38, 0xbc, 0x6b, 0x1d, 0xb0, 0x83, 0x43, 0x04,
+ 0x39
+};
+static const u8 output26[] __initconst = {
+ 0x7f, 0xde, 0xd6, 0x87, 0xcc, 0x34, 0xf4, 0x12,
+ 0xae, 0x55, 0xa5, 0x89, 0x95, 0x29, 0xfc, 0x18,
+ 0xd8, 0xc7, 0x7c, 0xd3, 0xcb, 0x85, 0x95, 0x21,
+ 0xd2
+};
+static const u8 key26[] __initconst = {
+ 0xe4, 0xd0, 0x54, 0x1d, 0x7d, 0x47, 0xa8, 0xc1,
+ 0x08, 0xca, 0xe2, 0x42, 0x52, 0x95, 0x16, 0x43,
+ 0xa3, 0x01, 0x23, 0x03, 0xcc, 0x3b, 0x81, 0x78,
+ 0x23, 0xcc, 0xa7, 0x36, 0xd7, 0xa0, 0x97, 0x8d
+};
+enum { nonce26 = 0x524401012231683ULL };
+
+static const u8 input27[] __initconst = {
+ 0x0d, 0xb0, 0xcf, 0xec, 0xfc, 0x38, 0x9d, 0x9d,
+ 0x89, 0x00, 0x96, 0xf2, 0x79, 0x8a, 0xa1, 0x8d,
+ 0x32, 0x5e, 0xc6, 0x12, 0x22, 0xec, 0xf6, 0x52,
+ 0xc1, 0x0b
+};
+static const u8 output27[] __initconst = {
+ 0xef, 0xe1, 0xf2, 0x67, 0x8e, 0x2c, 0x00, 0x9f,
+ 0x1d, 0x4c, 0x66, 0x1f, 0x94, 0x58, 0xdc, 0xbb,
+ 0xb9, 0x11, 0x8f, 0x74, 0xfd, 0x0e, 0x14, 0x01,
+ 0xa8, 0x21
+};
+static const u8 key27[] __initconst = {
+ 0x78, 0x71, 0xa4, 0xe6, 0xb2, 0x95, 0x44, 0x12,
+ 0x81, 0xaa, 0x7e, 0x94, 0xa7, 0x8d, 0x44, 0xea,
+ 0xc4, 0xbc, 0x01, 0xb7, 0x9e, 0xf7, 0x82, 0x9e,
+ 0x3b, 0x23, 0x9f, 0x31, 0xdd, 0xb8, 0x0d, 0x18
+};
+enum { nonce27 = 0xd58fe0e58fb254d6ULL };
+
+static const u8 input28[] __initconst = {
+ 0xaa, 0xb7, 0xaa, 0xd9, 0xa8, 0x91, 0xd7, 0x8a,
+ 0x97, 0x9b, 0xdb, 0x7c, 0x47, 0x2b, 0xdb, 0xd2,
+ 0xda, 0x77, 0xb1, 0xfa, 0x2d, 0x12, 0xe3, 0xe9,
+ 0xc4, 0x7f, 0x54
+};
+static const u8 output28[] __initconst = {
+ 0x87, 0x84, 0xa9, 0xa6, 0xad, 0x8f, 0xe6, 0x0f,
+ 0x69, 0xf8, 0x21, 0xc3, 0x54, 0x95, 0x0f, 0xb0,
+ 0x4e, 0xc7, 0x02, 0xe4, 0x04, 0xb0, 0x6c, 0x42,
+ 0x8c, 0x63, 0xe3
+};
+static const u8 key28[] __initconst = {
+ 0x12, 0x23, 0x37, 0x95, 0x04, 0xb4, 0x21, 0xe8,
+ 0xbc, 0x65, 0x46, 0x7a, 0xf4, 0x01, 0x05, 0x3f,
+ 0xb1, 0x34, 0x73, 0xd2, 0x49, 0xbf, 0x6f, 0x20,
+ 0xbd, 0x23, 0x58, 0x5f, 0xd1, 0x73, 0x57, 0xa6
+};
+enum { nonce28 = 0x3a04d51491eb4e07ULL };
+
+static const u8 input29[] __initconst = {
+ 0x55, 0xd0, 0xd4, 0x4b, 0x17, 0xc8, 0xc4, 0x2b,
+ 0xc0, 0x28, 0xbd, 0x9d, 0x65, 0x4d, 0xaf, 0x77,
+ 0x72, 0x7c, 0x36, 0x68, 0xa7, 0xb6, 0x87, 0x4d,
+ 0xb9, 0x27, 0x25, 0x6c
+};
+static const u8 output29[] __initconst = {
+ 0x0e, 0xac, 0x4c, 0xf5, 0x12, 0xb5, 0x56, 0xa5,
+ 0x00, 0x9a, 0xd6, 0xe5, 0x1a, 0x59, 0x2c, 0xf6,
+ 0x42, 0x22, 0xcf, 0x23, 0x98, 0x34, 0x29, 0xac,
+ 0x6e, 0xe3, 0x37, 0x6d
+};
+static const u8 key29[] __initconst = {
+ 0xda, 0x9d, 0x05, 0x0c, 0x0c, 0xba, 0x75, 0xb9,
+ 0x9e, 0xb1, 0x8d, 0xd9, 0x73, 0x26, 0x2c, 0xa9,
+ 0x3a, 0xb5, 0xcb, 0x19, 0x49, 0xa7, 0x4f, 0xf7,
+ 0x64, 0x35, 0x23, 0x20, 0x2a, 0x45, 0x78, 0xc7
+};
+enum { nonce29 = 0xc25ac9982431cbfULL };
+
+static const u8 input30[] __initconst = {
+ 0x4e, 0xd6, 0x85, 0xbb, 0xe7, 0x99, 0xfa, 0x04,
+ 0x33, 0x24, 0xfd, 0x75, 0x18, 0xe3, 0xd3, 0x25,
+ 0xcd, 0xca, 0xae, 0x00, 0xbe, 0x52, 0x56, 0x4a,
+ 0x31, 0xe9, 0x4f, 0xae, 0x8a
+};
+static const u8 output30[] __initconst = {
+ 0x30, 0x36, 0x32, 0xa2, 0x3c, 0xb6, 0xf9, 0xf9,
+ 0x76, 0x70, 0xad, 0xa6, 0x10, 0x41, 0x00, 0x4a,
+ 0xfa, 0xce, 0x1b, 0x86, 0x05, 0xdb, 0x77, 0x96,
+ 0xb3, 0xb7, 0x8f, 0x61, 0x24
+};
+static const u8 key30[] __initconst = {
+ 0x49, 0x35, 0x4c, 0x15, 0x98, 0xfb, 0xc6, 0x57,
+ 0x62, 0x6d, 0x06, 0xc3, 0xd4, 0x79, 0x20, 0x96,
+ 0x05, 0x2a, 0x31, 0x63, 0xc0, 0x44, 0x42, 0x09,
+ 0x13, 0x13, 0xff, 0x1b, 0xc8, 0x63, 0x1f, 0x0b
+};
+enum { nonce30 = 0x4967f9c08e41568bULL };
+
+static const u8 input31[] __initconst = {
+ 0x91, 0x04, 0x20, 0x47, 0x59, 0xee, 0xa6, 0x0f,
+ 0x04, 0x75, 0xc8, 0x18, 0x95, 0x44, 0x01, 0x28,
+ 0x20, 0x6f, 0x73, 0x68, 0x66, 0xb5, 0x03, 0xb3,
+ 0x58, 0x27, 0x6e, 0x7a, 0x76, 0xb8
+};
+static const u8 output31[] __initconst = {
+ 0xe8, 0x03, 0x78, 0x9d, 0x13, 0x15, 0x98, 0xef,
+ 0x64, 0x68, 0x12, 0x41, 0xb0, 0x29, 0x94, 0x0c,
+ 0x83, 0x35, 0x46, 0xa9, 0x74, 0xe1, 0x75, 0xf0,
+ 0xb6, 0x96, 0xc3, 0x6f, 0xd7, 0x70
+};
+static const u8 key31[] __initconst = {
+ 0xef, 0xcd, 0x5a, 0x4a, 0xf4, 0x7e, 0x6a, 0x3a,
+ 0x11, 0x88, 0x72, 0x94, 0xb8, 0xae, 0x84, 0xc3,
+ 0x66, 0xe0, 0xde, 0x4b, 0x00, 0xa5, 0xd6, 0x2d,
+ 0x50, 0xb7, 0x28, 0xff, 0x76, 0x57, 0x18, 0x1f
+};
+enum { nonce31 = 0xcb6f428fa4192e19ULL };
+
+static const u8 input32[] __initconst = {
+ 0x90, 0x06, 0x50, 0x4b, 0x98, 0x14, 0x30, 0xf1,
+ 0xb8, 0xd7, 0xf0, 0xa4, 0x3e, 0x4e, 0xd8, 0x00,
+ 0xea, 0xdb, 0x4f, 0x93, 0x05, 0xef, 0x02, 0x71,
+ 0x1a, 0xcd, 0xa3, 0xb1, 0xae, 0xd3, 0x18
+};
+static const u8 output32[] __initconst = {
+ 0xcb, 0x4a, 0x37, 0x3f, 0xea, 0x40, 0xab, 0x86,
+ 0xfe, 0xcc, 0x07, 0xd5, 0xdc, 0xb2, 0x25, 0xb6,
+ 0xfd, 0x2a, 0x72, 0xbc, 0x5e, 0xd4, 0x75, 0xff,
+ 0x71, 0xfc, 0xce, 0x1e, 0x6f, 0x22, 0xc1
+};
+static const u8 key32[] __initconst = {
+ 0xfc, 0x6d, 0xc3, 0x80, 0xce, 0xa4, 0x31, 0xa1,
+ 0xcc, 0xfa, 0x9d, 0x10, 0x0b, 0xc9, 0x11, 0x77,
+ 0x34, 0xdb, 0xad, 0x1b, 0xc4, 0xfc, 0xeb, 0x79,
+ 0x91, 0xda, 0x59, 0x3b, 0x0d, 0xb1, 0x19, 0x3b
+};
+enum { nonce32 = 0x88551bf050059467ULL };
+
+static const u8 input33[] __initconst = {
+ 0x88, 0x94, 0x71, 0x92, 0xe8, 0xd7, 0xf9, 0xbd,
+ 0x55, 0xe3, 0x22, 0xdb, 0x99, 0x51, 0xfb, 0x50,
+ 0xbf, 0x82, 0xb5, 0x70, 0x8b, 0x2b, 0x6a, 0x03,
+ 0x37, 0xa0, 0xc6, 0x19, 0x5d, 0xc9, 0xbc, 0xcc
+};
+static const u8 output33[] __initconst = {
+ 0xb6, 0x17, 0x51, 0xc8, 0xea, 0x8a, 0x14, 0xdc,
+ 0x23, 0x1b, 0xd4, 0xed, 0xbf, 0x50, 0xb9, 0x38,
+ 0x00, 0xc2, 0x3f, 0x78, 0x3d, 0xbf, 0xa0, 0x84,
+ 0xef, 0x45, 0xb2, 0x7d, 0x48, 0x7b, 0x62, 0xa7
+};
+static const u8 key33[] __initconst = {
+ 0xb9, 0x8f, 0x6a, 0xad, 0xb4, 0x6f, 0xb5, 0xdc,
+ 0x48, 0xfa, 0x43, 0x57, 0x62, 0x97, 0xef, 0x89,
+ 0x4c, 0x5a, 0x7b, 0x67, 0xb8, 0x9d, 0xf0, 0x42,
+ 0x2b, 0x8f, 0xf3, 0x18, 0x05, 0x2e, 0x48, 0xd0
+};
+enum { nonce33 = 0x31f16488fe8447f5ULL };
+
+static const u8 input34[] __initconst = {
+ 0xda, 0x2b, 0x3d, 0x63, 0x9e, 0x4f, 0xc2, 0xb8,
+ 0x7f, 0xc2, 0x1a, 0x8b, 0x0d, 0x95, 0x65, 0x55,
+ 0x52, 0xba, 0x51, 0x51, 0xc0, 0x61, 0x9f, 0x0a,
+ 0x5d, 0xb0, 0x59, 0x8c, 0x64, 0x6a, 0xab, 0xf5,
+ 0x57
+};
+static const u8 output34[] __initconst = {
+ 0x5c, 0xf6, 0x62, 0x24, 0x8c, 0x45, 0xa3, 0x26,
+ 0xd0, 0xe4, 0x88, 0x1c, 0xed, 0xc4, 0x26, 0x58,
+ 0xb5, 0x5d, 0x92, 0xc4, 0x17, 0x44, 0x1c, 0xb8,
+ 0x2c, 0xf3, 0x55, 0x7e, 0xd6, 0xe5, 0xb3, 0x65,
+ 0xa8
+};
+static const u8 key34[] __initconst = {
+ 0xde, 0xd1, 0x27, 0xb7, 0x7c, 0xfa, 0xa6, 0x78,
+ 0x39, 0x80, 0xdf, 0xb7, 0x46, 0xac, 0x71, 0x26,
+ 0xd0, 0x2a, 0x56, 0x79, 0x12, 0xeb, 0x26, 0x37,
+ 0x01, 0x0d, 0x30, 0xe0, 0xe3, 0x66, 0xb2, 0xf4
+};
+enum { nonce34 = 0x92d0d9b252c24149ULL };
+
+static const u8 input35[] __initconst = {
+ 0x3a, 0x15, 0x5b, 0x75, 0x6e, 0xd0, 0x52, 0x20,
+ 0x6c, 0x82, 0xfa, 0xce, 0x5b, 0xea, 0xf5, 0x43,
+ 0xc1, 0x81, 0x7c, 0xb2, 0xac, 0x16, 0x3f, 0xd3,
+ 0x5a, 0xaf, 0x55, 0x98, 0xf4, 0xc6, 0xba, 0x71,
+ 0x25, 0x8b
+};
+static const u8 output35[] __initconst = {
+ 0xb3, 0xaf, 0xac, 0x6d, 0x4d, 0xc7, 0x68, 0x56,
+ 0x50, 0x5b, 0x69, 0x2a, 0xe5, 0x90, 0xf9, 0x5f,
+ 0x99, 0x88, 0xff, 0x0c, 0xa6, 0xb1, 0x83, 0xd6,
+ 0x80, 0xa6, 0x1b, 0xde, 0x94, 0xa4, 0x2c, 0xc3,
+ 0x74, 0xfa
+};
+static const u8 key35[] __initconst = {
+ 0xd8, 0x24, 0xe2, 0x06, 0xd7, 0x7a, 0xce, 0x81,
+ 0x52, 0x72, 0x02, 0x69, 0x89, 0xc4, 0xe9, 0x53,
+ 0x3b, 0x08, 0x5f, 0x98, 0x1e, 0x1b, 0x99, 0x6e,
+ 0x28, 0x17, 0x6d, 0xba, 0xc0, 0x96, 0xf9, 0x3c
+};
+enum { nonce35 = 0x7baf968c4c8e3a37ULL };
+
+static const u8 input36[] __initconst = {
+ 0x31, 0x5d, 0x4f, 0xe3, 0xac, 0xad, 0x17, 0xa6,
+ 0xb5, 0x01, 0xe2, 0xc6, 0xd4, 0x7e, 0xc4, 0x80,
+ 0xc0, 0x59, 0x72, 0xbb, 0x4b, 0x74, 0x6a, 0x41,
+ 0x0f, 0x9c, 0xf6, 0xca, 0x20, 0xb3, 0x73, 0x07,
+ 0x6b, 0x02, 0x2a
+};
+static const u8 output36[] __initconst = {
+ 0xf9, 0x09, 0x92, 0x94, 0x7e, 0x31, 0xf7, 0x53,
+ 0xe8, 0x8a, 0x5b, 0x20, 0xef, 0x9b, 0x45, 0x81,
+ 0xba, 0x5e, 0x45, 0x63, 0xc1, 0xc7, 0x9e, 0x06,
+ 0x0e, 0xd9, 0x62, 0x8e, 0x96, 0xf9, 0xfa, 0x43,
+ 0x4d, 0xd4, 0x28
+};
+static const u8 key36[] __initconst = {
+ 0x13, 0x30, 0x4c, 0x06, 0xae, 0x18, 0xde, 0x03,
+ 0x1d, 0x02, 0x40, 0xf5, 0xbb, 0x19, 0xe3, 0x88,
+ 0x41, 0xb1, 0x29, 0x15, 0x97, 0xc2, 0x69, 0x3f,
+ 0x32, 0x2a, 0x0c, 0x8b, 0xcf, 0x83, 0x8b, 0x6c
+};
+enum { nonce36 = 0x226d251d475075a0ULL };
+
+static const u8 input37[] __initconst = {
+ 0x10, 0x18, 0xbe, 0xfd, 0x66, 0xc9, 0x77, 0xcc,
+ 0x43, 0xe5, 0x46, 0x0b, 0x08, 0x8b, 0xae, 0x11,
+ 0x86, 0x15, 0xc2, 0xf6, 0x45, 0xd4, 0x5f, 0xd6,
+ 0xb6, 0x5f, 0x9f, 0x3e, 0x97, 0xb7, 0xd4, 0xad,
+ 0x0b, 0xe8, 0x31, 0x94
+};
+static const u8 output37[] __initconst = {
+ 0x03, 0x2c, 0x1c, 0xee, 0xc6, 0xdd, 0xed, 0x38,
+ 0x80, 0x6d, 0x84, 0x16, 0xc3, 0xc2, 0x04, 0x63,
+ 0xcd, 0xa7, 0x6e, 0x36, 0x8b, 0xed, 0x78, 0x63,
+ 0x95, 0xfc, 0x69, 0x7a, 0x3f, 0x8d, 0x75, 0x6b,
+ 0x6c, 0x26, 0x56, 0x4d
+};
+static const u8 key37[] __initconst = {
+ 0xac, 0x84, 0x4d, 0xa9, 0x29, 0x49, 0x3c, 0x39,
+ 0x7f, 0xd9, 0xa6, 0x01, 0xf3, 0x7e, 0xfa, 0x4a,
+ 0x14, 0x80, 0x22, 0x74, 0xf0, 0x29, 0x30, 0x2d,
+ 0x07, 0x21, 0xda, 0xc0, 0x4d, 0x70, 0x56, 0xa2
+};
+enum { nonce37 = 0x167823ce3b64925aULL };
+
+static const u8 input38[] __initconst = {
+ 0x30, 0x8f, 0xfa, 0x24, 0x29, 0xb1, 0xfb, 0xce,
+ 0x31, 0x62, 0xdc, 0xd0, 0x46, 0xab, 0xe1, 0x31,
+ 0xd9, 0xae, 0x60, 0x0d, 0xca, 0x0a, 0x49, 0x12,
+ 0x3d, 0x92, 0xe9, 0x91, 0x67, 0x12, 0x62, 0x18,
+ 0x89, 0xe2, 0xf9, 0x1c, 0xcc
+};
+static const u8 output38[] __initconst = {
+ 0x56, 0x9c, 0xc8, 0x7a, 0xc5, 0x98, 0xa3, 0x0f,
+ 0xba, 0xd5, 0x3e, 0xe1, 0xc9, 0x33, 0x64, 0x33,
+ 0xf0, 0xd5, 0xf7, 0x43, 0x66, 0x0e, 0x08, 0x9a,
+ 0x6e, 0x09, 0xe4, 0x01, 0x0d, 0x1e, 0x2f, 0x4b,
+ 0xed, 0x9c, 0x08, 0x8c, 0x03
+};
+static const u8 key38[] __initconst = {
+ 0x77, 0x52, 0x2a, 0x23, 0xf1, 0xc5, 0x96, 0x2b,
+ 0x89, 0x4f, 0x3e, 0xf3, 0xff, 0x0e, 0x94, 0xce,
+ 0xf1, 0xbd, 0x53, 0xf5, 0x77, 0xd6, 0x9e, 0x47,
+ 0x49, 0x3d, 0x16, 0x64, 0xff, 0x95, 0x42, 0x42
+};
+enum { nonce38 = 0xff629d7b82cef357ULL };
+
+static const u8 input39[] __initconst = {
+ 0x38, 0x26, 0x27, 0xd0, 0xc2, 0xf5, 0x34, 0xba,
+ 0xda, 0x0f, 0x1c, 0x1c, 0x9a, 0x70, 0xe5, 0x8a,
+ 0x78, 0x2d, 0x8f, 0x9a, 0xbf, 0x89, 0x6a, 0xfd,
+ 0xd4, 0x9c, 0x33, 0xf1, 0xb6, 0x89, 0x16, 0xe3,
+ 0x6a, 0x00, 0xfa, 0x3a, 0x0f, 0x26
+};
+static const u8 output39[] __initconst = {
+ 0x0f, 0xaf, 0x91, 0x6d, 0x9c, 0x99, 0xa4, 0xf7,
+ 0x3b, 0x9d, 0x9a, 0x98, 0xca, 0xbb, 0x50, 0x48,
+ 0xee, 0xcb, 0x5d, 0xa1, 0x37, 0x2d, 0x36, 0x09,
+ 0x2a, 0xe2, 0x1c, 0x3d, 0x98, 0x40, 0x1c, 0x16,
+ 0x56, 0xa7, 0x98, 0xe9, 0x7d, 0x2b
+};
+static const u8 key39[] __initconst = {
+ 0x6e, 0x83, 0x15, 0x4d, 0xf8, 0x78, 0xa8, 0x0e,
+ 0x71, 0x37, 0xd4, 0x6e, 0x28, 0x5c, 0x06, 0xa1,
+ 0x2d, 0x6c, 0x72, 0x7a, 0xfd, 0xf8, 0x65, 0x1a,
+ 0xb8, 0xe6, 0x29, 0x7b, 0xe5, 0xb3, 0x23, 0x79
+};
+enum { nonce39 = 0xa4d8c491cf093e9dULL };
+
+static const u8 input40[] __initconst = {
+ 0x8f, 0x32, 0x7c, 0x40, 0x37, 0x95, 0x08, 0x00,
+ 0x00, 0xfe, 0x2f, 0x95, 0x20, 0x12, 0x40, 0x18,
+ 0x5e, 0x7e, 0x5e, 0x99, 0xee, 0x8d, 0x91, 0x7d,
+ 0x50, 0x7d, 0x21, 0x45, 0x27, 0xe1, 0x7f, 0xd4,
+ 0x73, 0x10, 0xe1, 0x33, 0xbc, 0xf8, 0xdd
+};
+static const u8 output40[] __initconst = {
+ 0x78, 0x7c, 0xdc, 0x55, 0x2b, 0xd9, 0x2b, 0x3a,
+ 0xdd, 0x56, 0x11, 0x52, 0xd3, 0x2e, 0xe0, 0x0d,
+ 0x23, 0x20, 0x8a, 0xf1, 0x4f, 0xee, 0xf1, 0x68,
+ 0xf6, 0xdc, 0x53, 0xcf, 0x17, 0xd4, 0xf0, 0x6c,
+ 0xdc, 0x80, 0x5f, 0x1c, 0xa4, 0x91, 0x05
+};
+static const u8 key40[] __initconst = {
+ 0x0d, 0x86, 0xbf, 0x8a, 0xba, 0x9e, 0x39, 0x91,
+ 0xa8, 0xe7, 0x22, 0xf0, 0x0c, 0x43, 0x18, 0xe4,
+ 0x1f, 0xb0, 0xaf, 0x8a, 0x34, 0x31, 0xf4, 0x41,
+ 0xf0, 0x89, 0x85, 0xca, 0x5d, 0x05, 0x3b, 0x94
+};
+enum { nonce40 = 0xae7acc4f5986439eULL };
+
+static const u8 input41[] __initconst = {
+ 0x20, 0x5f, 0xc1, 0x83, 0x36, 0x02, 0x76, 0x96,
+ 0xf0, 0xbf, 0x8e, 0x0e, 0x1a, 0xd1, 0xc7, 0x88,
+ 0x18, 0xc7, 0x09, 0xc4, 0x15, 0xd9, 0x4f, 0x5e,
+ 0x1f, 0xb3, 0xb4, 0x6d, 0xcb, 0xa0, 0xd6, 0x8a,
+ 0x3b, 0x40, 0x8e, 0x80, 0xf1, 0xe8, 0x8f, 0x5f
+};
+static const u8 output41[] __initconst = {
+ 0x0b, 0xd1, 0x49, 0x9a, 0x9d, 0xe8, 0x97, 0xb8,
+ 0xd1, 0xeb, 0x90, 0x62, 0x37, 0xd2, 0x99, 0x15,
+ 0x67, 0x6d, 0x27, 0x93, 0xce, 0x37, 0x65, 0xa2,
+ 0x94, 0x88, 0xd6, 0x17, 0xbc, 0x1c, 0x6e, 0xa2,
+ 0xcc, 0xfb, 0x81, 0x0e, 0x30, 0x60, 0x5a, 0x6f
+};
+static const u8 key41[] __initconst = {
+ 0x36, 0x27, 0x57, 0x01, 0x21, 0x68, 0x97, 0xc7,
+ 0x00, 0x67, 0x7b, 0xe9, 0x0f, 0x55, 0x49, 0xbb,
+ 0x92, 0x18, 0x98, 0xf5, 0x5e, 0xbc, 0xe7, 0x5a,
+ 0x9d, 0x3d, 0xc7, 0xbd, 0x59, 0xec, 0x82, 0x8e
+};
+enum { nonce41 = 0x5da05e4c8dfab464ULL };
+
+static const u8 input42[] __initconst = {
+ 0xca, 0x30, 0xcd, 0x63, 0xf0, 0x2d, 0xf1, 0x03,
+ 0x4d, 0x0d, 0xf2, 0xf7, 0x6f, 0xae, 0xd6, 0x34,
+ 0xea, 0xf6, 0x13, 0xcf, 0x1c, 0xa0, 0xd0, 0xe8,
+ 0xa4, 0x78, 0x80, 0x3b, 0x1e, 0xa5, 0x32, 0x4c,
+ 0x73, 0x12, 0xd4, 0x6a, 0x94, 0xbc, 0xba, 0x80,
+ 0x5e
+};
+static const u8 output42[] __initconst = {
+ 0xec, 0x3f, 0x18, 0x31, 0xc0, 0x7b, 0xb5, 0xe2,
+ 0xad, 0xf3, 0xec, 0xa0, 0x16, 0x9d, 0xef, 0xce,
+ 0x05, 0x65, 0x59, 0x9d, 0x5a, 0xca, 0x3e, 0x13,
+ 0xb9, 0x5d, 0x5d, 0xb5, 0xeb, 0xae, 0xc0, 0x87,
+ 0xbb, 0xfd, 0xe7, 0xe4, 0x89, 0x5b, 0xd2, 0x6c,
+ 0x56
+};
+static const u8 key42[] __initconst = {
+ 0x7c, 0x6b, 0x7e, 0x77, 0xcc, 0x8c, 0x1b, 0x03,
+ 0x8b, 0x2a, 0xb3, 0x7c, 0x5a, 0x73, 0xcc, 0xac,
+ 0xdd, 0x53, 0x54, 0x0c, 0x85, 0xed, 0xcd, 0x47,
+ 0x24, 0xc1, 0xb8, 0x9b, 0x2e, 0x41, 0x92, 0x36
+};
+enum { nonce42 = 0xe4d7348b09682c9cULL };
+
+static const u8 input43[] __initconst = {
+ 0x52, 0xf2, 0x4b, 0x7c, 0xe5, 0x58, 0xe8, 0xd2,
+ 0xb7, 0xf3, 0xa1, 0x29, 0x68, 0xa2, 0x50, 0x50,
+ 0xae, 0x9c, 0x1b, 0xe2, 0x67, 0x77, 0xe2, 0xdb,
+ 0x85, 0x55, 0x7e, 0x84, 0x8a, 0x12, 0x3c, 0xb6,
+ 0x2e, 0xed, 0xd3, 0xec, 0x47, 0x68, 0xfa, 0x52,
+ 0x46, 0x9d
+};
+static const u8 output43[] __initconst = {
+ 0x1b, 0xf0, 0x05, 0xe4, 0x1c, 0xd8, 0x74, 0x9a,
+ 0xf0, 0xee, 0x00, 0x54, 0xce, 0x02, 0x83, 0x15,
+ 0xfb, 0x23, 0x35, 0x78, 0xc3, 0xda, 0x98, 0xd8,
+ 0x9d, 0x1b, 0xb2, 0x51, 0x82, 0xb0, 0xff, 0xbe,
+ 0x05, 0xa9, 0xa4, 0x04, 0xba, 0xea, 0x4b, 0x73,
+ 0x47, 0x6e
+};
+static const u8 key43[] __initconst = {
+ 0xeb, 0xec, 0x0e, 0xa1, 0x65, 0xe2, 0x99, 0x46,
+ 0xd8, 0x54, 0x8c, 0x4a, 0x93, 0xdf, 0x6d, 0xbf,
+ 0x93, 0x34, 0x94, 0x57, 0xc9, 0x12, 0x9d, 0x68,
+ 0x05, 0xc5, 0x05, 0xad, 0x5a, 0xc9, 0x2a, 0x3b
+};
+enum { nonce43 = 0xe14f6a902b7827fULL };
+
+static const u8 input44[] __initconst = {
+ 0x3e, 0x22, 0x3e, 0x8e, 0xcd, 0x18, 0xe2, 0xa3,
+ 0x8d, 0x8b, 0x38, 0xc3, 0x02, 0xa3, 0x31, 0x48,
+ 0xc6, 0x0e, 0xec, 0x99, 0x51, 0x11, 0x6d, 0x8b,
+ 0x32, 0x35, 0x3b, 0x08, 0x58, 0x76, 0x25, 0x30,
+ 0xe2, 0xfc, 0xa2, 0x46, 0x7d, 0x6e, 0x34, 0x87,
+ 0xac, 0x42, 0xbf
+};
+static const u8 output44[] __initconst = {
+ 0x08, 0x92, 0x58, 0x02, 0x1a, 0xf4, 0x1f, 0x3d,
+ 0x38, 0x7b, 0x6b, 0xf6, 0x84, 0x07, 0xa3, 0x19,
+ 0x17, 0x2a, 0xed, 0x57, 0x1c, 0xf9, 0x55, 0x37,
+ 0x4e, 0xf4, 0x68, 0x68, 0x82, 0x02, 0x4f, 0xca,
+ 0x21, 0x00, 0xc6, 0x66, 0x79, 0x53, 0x19, 0xef,
+ 0x7f, 0xdd, 0x74
+};
+static const u8 key44[] __initconst = {
+ 0x73, 0xb6, 0x3e, 0xf4, 0x57, 0x52, 0xa6, 0x43,
+ 0x51, 0xd8, 0x25, 0x00, 0xdb, 0xb4, 0x52, 0x69,
+ 0xd6, 0x27, 0x49, 0xeb, 0x9b, 0xf1, 0x7b, 0xa0,
+ 0xd6, 0x7c, 0x9c, 0xd8, 0x95, 0x03, 0x69, 0x26
+};
+enum { nonce44 = 0xf5e6dc4f35ce24e5ULL };
+
+static const u8 input45[] __initconst = {
+ 0x55, 0x76, 0xc0, 0xf1, 0x74, 0x03, 0x7a, 0x6d,
+ 0x14, 0xd8, 0x36, 0x2c, 0x9f, 0x9a, 0x59, 0x7a,
+ 0x2a, 0xf5, 0x77, 0x84, 0x70, 0x7c, 0x1d, 0x04,
+ 0x90, 0x45, 0xa4, 0xc1, 0x5e, 0xdd, 0x2e, 0x07,
+ 0x18, 0x34, 0xa6, 0x85, 0x56, 0x4f, 0x09, 0xaf,
+ 0x2f, 0x83, 0xe1, 0xc6
+};
+static const u8 output45[] __initconst = {
+ 0x22, 0x46, 0xe4, 0x0b, 0x3a, 0x55, 0xcc, 0x9b,
+ 0xf0, 0xc0, 0x53, 0xcd, 0x95, 0xc7, 0x57, 0x6c,
+ 0x77, 0x46, 0x41, 0x72, 0x07, 0xbf, 0xa8, 0xe5,
+ 0x68, 0x69, 0xd8, 0x1e, 0x45, 0xc1, 0xa2, 0x50,
+ 0xa5, 0xd1, 0x62, 0xc9, 0x5a, 0x7d, 0x08, 0x14,
+ 0xae, 0x44, 0x16, 0xb9
+};
+static const u8 key45[] __initconst = {
+ 0x41, 0xf3, 0x88, 0xb2, 0x51, 0x25, 0x47, 0x02,
+ 0x39, 0xe8, 0x15, 0x3a, 0x22, 0x78, 0x86, 0x0b,
+ 0xf9, 0x1e, 0x8d, 0x98, 0xb2, 0x22, 0x82, 0xac,
+ 0x42, 0x94, 0xde, 0x64, 0xf0, 0xfd, 0xb3, 0x6c
+};
+enum { nonce45 = 0xf51a582daf4aa01aULL };
+
+static const u8 input46[] __initconst = {
+ 0xf6, 0xff, 0x20, 0xf9, 0x26, 0x7e, 0x0f, 0xa8,
+ 0x6a, 0x45, 0x5a, 0x91, 0x73, 0xc4, 0x4c, 0x63,
+ 0xe5, 0x61, 0x59, 0xca, 0xec, 0xc0, 0x20, 0x35,
+ 0xbc, 0x9f, 0x58, 0x9c, 0x5e, 0xa1, 0x17, 0x46,
+ 0xcc, 0xab, 0x6e, 0xd0, 0x4f, 0x24, 0xeb, 0x05,
+ 0x4d, 0x40, 0x41, 0xe0, 0x9d
+};
+static const u8 output46[] __initconst = {
+ 0x31, 0x6e, 0x63, 0x3f, 0x9c, 0xe6, 0xb1, 0xb7,
+ 0xef, 0x47, 0x46, 0xd7, 0xb1, 0x53, 0x42, 0x2f,
+ 0x2c, 0xc8, 0x01, 0xae, 0x8b, 0xec, 0x42, 0x2c,
+ 0x6b, 0x2c, 0x9c, 0xb2, 0xf0, 0x29, 0x06, 0xa5,
+ 0xcd, 0x7e, 0xc7, 0x3a, 0x38, 0x98, 0x8a, 0xde,
+ 0x03, 0x29, 0x14, 0x8f, 0xf9
+};
+static const u8 key46[] __initconst = {
+ 0xac, 0xa6, 0x44, 0x4a, 0x0d, 0x42, 0x10, 0xbc,
+ 0xd3, 0xc9, 0x8e, 0x9e, 0x71, 0xa3, 0x1c, 0x14,
+ 0x9d, 0x65, 0x0d, 0x49, 0x4d, 0x8c, 0xec, 0x46,
+ 0xe1, 0x41, 0xcd, 0xf5, 0xfc, 0x82, 0x75, 0x34
+};
+enum { nonce46 = 0x25f85182df84dec5ULL };
+
+static const u8 input47[] __initconst = {
+ 0xa1, 0xd2, 0xf2, 0x52, 0x2f, 0x79, 0x50, 0xb2,
+ 0x42, 0x29, 0x5b, 0x44, 0x20, 0xf9, 0xbd, 0x85,
+ 0xb7, 0x65, 0x77, 0x86, 0xce, 0x3e, 0x1c, 0xe4,
+ 0x70, 0x80, 0xdd, 0x72, 0x07, 0x48, 0x0f, 0x84,
+ 0x0d, 0xfd, 0x97, 0xc0, 0xb7, 0x48, 0x9b, 0xb4,
+ 0xec, 0xff, 0x73, 0x14, 0x99, 0xe4
+};
+static const u8 output47[] __initconst = {
+ 0xe5, 0x3c, 0x78, 0x66, 0x31, 0x1e, 0xd6, 0xc4,
+ 0x9e, 0x71, 0xb3, 0xd7, 0xd5, 0xad, 0x84, 0xf2,
+ 0x78, 0x61, 0x77, 0xf8, 0x31, 0xf0, 0x13, 0xad,
+ 0x66, 0xf5, 0x31, 0x7d, 0xeb, 0xdf, 0xaf, 0xcb,
+ 0xac, 0x28, 0x6c, 0xc2, 0x9e, 0xe7, 0x78, 0xa2,
+ 0xa2, 0x58, 0xce, 0x84, 0x76, 0x70
+};
+static const u8 key47[] __initconst = {
+ 0x05, 0x7f, 0xc0, 0x7f, 0x37, 0x20, 0x71, 0x02,
+ 0x3a, 0xe7, 0x20, 0x5a, 0x0a, 0x8f, 0x79, 0x5a,
+ 0xfe, 0xbb, 0x43, 0x4d, 0x2f, 0xcb, 0xf6, 0x9e,
+ 0xa2, 0x97, 0x00, 0xad, 0x0d, 0x51, 0x7e, 0x17
+};
+enum { nonce47 = 0xae707c60f54de32bULL };
+
+static const u8 input48[] __initconst = {
+ 0x80, 0x93, 0x77, 0x2e, 0x8d, 0xe8, 0xe6, 0xc1,
+ 0x27, 0xe6, 0xf2, 0x89, 0x5b, 0x33, 0x62, 0x18,
+ 0x80, 0x6e, 0x17, 0x22, 0x8e, 0x83, 0x31, 0x40,
+ 0x8f, 0xc9, 0x5c, 0x52, 0x6c, 0x0e, 0xa5, 0xe9,
+ 0x6c, 0x7f, 0xd4, 0x6a, 0x27, 0x56, 0x99, 0xce,
+ 0x8d, 0x37, 0x59, 0xaf, 0xc0, 0x0e, 0xe1
+};
+static const u8 output48[] __initconst = {
+ 0x02, 0xa4, 0x2e, 0x33, 0xb7, 0x7c, 0x2b, 0x9a,
+ 0x18, 0x5a, 0xba, 0x53, 0x38, 0xaf, 0x00, 0xeb,
+ 0xd8, 0x3d, 0x02, 0x77, 0x43, 0x45, 0x03, 0x91,
+ 0xe2, 0x5e, 0x4e, 0xeb, 0x50, 0xd5, 0x5b, 0xe0,
+ 0xf3, 0x33, 0xa7, 0xa2, 0xac, 0x07, 0x6f, 0xeb,
+ 0x3f, 0x6c, 0xcd, 0xf2, 0x6c, 0x61, 0x64
+};
+static const u8 key48[] __initconst = {
+ 0xf3, 0x79, 0xe7, 0xf8, 0x0e, 0x02, 0x05, 0x6b,
+ 0x83, 0x1a, 0xe7, 0x86, 0x6b, 0xe6, 0x8f, 0x3f,
+ 0xd3, 0xa3, 0xe4, 0x6e, 0x29, 0x06, 0xad, 0xbc,
+ 0xe8, 0x33, 0x56, 0x39, 0xdf, 0xb0, 0xe2, 0xfe
+};
+enum { nonce48 = 0xd849b938c6569da0ULL };
+
+static const u8 input49[] __initconst = {
+ 0x89, 0x3b, 0x88, 0x9e, 0x7b, 0x38, 0x16, 0x9f,
+ 0xa1, 0x28, 0xf6, 0xf5, 0x23, 0x74, 0x28, 0xb0,
+ 0xdf, 0x6c, 0x9e, 0x8a, 0x71, 0xaf, 0xed, 0x7a,
+ 0x39, 0x21, 0x57, 0x7d, 0x31, 0x6c, 0xee, 0x0d,
+ 0x11, 0x8d, 0x41, 0x9a, 0x5f, 0xb7, 0x27, 0x40,
+ 0x08, 0xad, 0xc6, 0xe0, 0x00, 0x43, 0x9e, 0xae
+};
+static const u8 output49[] __initconst = {
+ 0x4d, 0xfd, 0xdb, 0x4c, 0x77, 0xc1, 0x05, 0x07,
+ 0x4d, 0x6d, 0x32, 0xcb, 0x2e, 0x0e, 0xff, 0x65,
+ 0xc9, 0x27, 0xeb, 0xa9, 0x46, 0x5b, 0xab, 0x06,
+ 0xe6, 0xb6, 0x5a, 0x1e, 0x00, 0xfb, 0xcf, 0xe4,
+ 0xb9, 0x71, 0x40, 0x10, 0xef, 0x12, 0x39, 0xf0,
+ 0xea, 0x40, 0xb8, 0x9a, 0xa2, 0x85, 0x38, 0x48
+};
+static const u8 key49[] __initconst = {
+ 0xe7, 0x10, 0x40, 0xd9, 0x66, 0xc0, 0xa8, 0x6d,
+ 0xa3, 0xcc, 0x8b, 0xdd, 0x93, 0xf2, 0x6e, 0xe0,
+ 0x90, 0x7f, 0xd0, 0xf4, 0x37, 0x0c, 0x8b, 0x9b,
+ 0x4c, 0x4d, 0xe6, 0xf2, 0x1f, 0xe9, 0x95, 0x24
+};
+enum { nonce49 = 0xf269817bdae01bc0ULL };
+
+static const u8 input50[] __initconst = {
+ 0xda, 0x5b, 0x60, 0xcd, 0xed, 0x58, 0x8e, 0x7f,
+ 0xae, 0xdd, 0xc8, 0x2e, 0x16, 0x90, 0xea, 0x4b,
+ 0x0c, 0x74, 0x14, 0x35, 0xeb, 0xee, 0x2c, 0xff,
+ 0x46, 0x99, 0x97, 0x6e, 0xae, 0xa7, 0x8e, 0x6e,
+ 0x38, 0xfe, 0x63, 0xe7, 0x51, 0xd9, 0xaa, 0xce,
+ 0x7b, 0x1e, 0x7e, 0x5d, 0xc0, 0xe8, 0x10, 0x06,
+ 0x14
+};
+static const u8 output50[] __initconst = {
+ 0xe4, 0xe5, 0x86, 0x1b, 0x66, 0x19, 0xac, 0x49,
+ 0x1c, 0xbd, 0xee, 0x03, 0xaf, 0x11, 0xfc, 0x1f,
+ 0x6a, 0xd2, 0x50, 0x5c, 0xea, 0x2c, 0xa5, 0x75,
+ 0xfd, 0xb7, 0x0e, 0x80, 0x8f, 0xed, 0x3f, 0x31,
+ 0x47, 0xac, 0x67, 0x43, 0xb8, 0x2e, 0xb4, 0x81,
+ 0x6d, 0xe4, 0x1e, 0xb7, 0x8b, 0x0c, 0x53, 0xa9,
+ 0x26
+};
+static const u8 key50[] __initconst = {
+ 0xd7, 0xb2, 0x04, 0x76, 0x30, 0xcc, 0x38, 0x45,
+ 0xef, 0xdb, 0xc5, 0x86, 0x08, 0x61, 0xf0, 0xee,
+ 0x6d, 0xd8, 0x22, 0x04, 0x8c, 0xfb, 0xcb, 0x37,
+ 0xa6, 0xfb, 0x95, 0x22, 0xe1, 0x87, 0xb7, 0x6f
+};
+enum { nonce50 = 0x3b44d09c45607d38ULL };
+
+static const u8 input51[] __initconst = {
+ 0xa9, 0x41, 0x02, 0x4b, 0xd7, 0xd5, 0xd1, 0xf1,
+ 0x21, 0x55, 0xb2, 0x75, 0x6d, 0x77, 0x1b, 0x86,
+ 0xa9, 0xc8, 0x90, 0xfd, 0xed, 0x4a, 0x7b, 0x6c,
+ 0xb2, 0x5f, 0x9b, 0x5f, 0x16, 0xa1, 0x54, 0xdb,
+ 0xd6, 0x3f, 0x6a, 0x7f, 0x2e, 0x51, 0x9d, 0x49,
+ 0x5b, 0xa5, 0x0e, 0xf9, 0xfb, 0x2a, 0x38, 0xff,
+ 0x20, 0x8c
+};
+static const u8 output51[] __initconst = {
+ 0x18, 0xf7, 0x88, 0xc1, 0x72, 0xfd, 0x90, 0x4b,
+ 0xa9, 0x2d, 0xdb, 0x47, 0xb0, 0xa5, 0xc4, 0x37,
+ 0x01, 0x95, 0xc4, 0xb1, 0xab, 0xc5, 0x5b, 0xcd,
+ 0xe1, 0x97, 0x78, 0x13, 0xde, 0x6a, 0xff, 0x36,
+ 0xce, 0xa4, 0x67, 0xc5, 0x4a, 0x45, 0x2b, 0xd9,
+ 0xff, 0x8f, 0x06, 0x7c, 0x63, 0xbb, 0x83, 0x17,
+ 0xb4, 0x6b
+};
+static const u8 key51[] __initconst = {
+ 0x82, 0x1a, 0x79, 0xab, 0x9a, 0xb5, 0x49, 0x6a,
+ 0x30, 0x6b, 0x99, 0x19, 0x11, 0xc7, 0xa2, 0xf4,
+ 0xca, 0x55, 0xb9, 0xdd, 0xe7, 0x2f, 0xe7, 0xc1,
+ 0xdd, 0x27, 0xad, 0x80, 0xf2, 0x56, 0xad, 0xf3
+};
+enum { nonce51 = 0xe93aff94ca71a4a6ULL };
+
+static const u8 input52[] __initconst = {
+ 0x89, 0xdd, 0xf3, 0xfa, 0xb6, 0xc1, 0xaa, 0x9a,
+ 0xc8, 0xad, 0x6b, 0x00, 0xa1, 0x65, 0xea, 0x14,
+ 0x55, 0x54, 0x31, 0x8f, 0xf0, 0x03, 0x84, 0x51,
+ 0x17, 0x1e, 0x0a, 0x93, 0x6e, 0x79, 0x96, 0xa3,
+ 0x2a, 0x85, 0x9c, 0x89, 0xf8, 0xd1, 0xe2, 0x15,
+ 0x95, 0x05, 0xf4, 0x43, 0x4d, 0x6b, 0xf0, 0x71,
+ 0x3b, 0x3e, 0xba
+};
+static const u8 output52[] __initconst = {
+ 0x0c, 0x42, 0x6a, 0xb3, 0x66, 0x63, 0x5d, 0x2c,
+ 0x9f, 0x3d, 0xa6, 0x6e, 0xc7, 0x5f, 0x79, 0x2f,
+ 0x50, 0xe3, 0xd6, 0x07, 0x56, 0xa4, 0x2b, 0x2d,
+ 0x8d, 0x10, 0xc0, 0x6c, 0xa2, 0xfc, 0x97, 0xec,
+ 0x3f, 0x5c, 0x8d, 0x59, 0xbe, 0x84, 0xf1, 0x3e,
+ 0x38, 0x47, 0x4f, 0x75, 0x25, 0x66, 0x88, 0x14,
+ 0x03, 0xdd, 0xde
+};
+static const u8 key52[] __initconst = {
+ 0x4f, 0xb0, 0x27, 0xb6, 0xdd, 0x24, 0x0c, 0xdb,
+ 0x6b, 0x71, 0x2e, 0xac, 0xfc, 0x3f, 0xa6, 0x48,
+ 0x5d, 0xd5, 0xff, 0x53, 0xb5, 0x62, 0xf1, 0xe0,
+ 0x93, 0xfe, 0x39, 0x4c, 0x9f, 0x03, 0x11, 0xa7
+};
+enum { nonce52 = 0xed8becec3bdf6f25ULL };
+
+static const u8 input53[] __initconst = {
+ 0x68, 0xd1, 0xc7, 0x74, 0x44, 0x1c, 0x84, 0xde,
+ 0x27, 0x27, 0x35, 0xf0, 0x18, 0x0b, 0x57, 0xaa,
+ 0xd0, 0x1a, 0xd3, 0x3b, 0x5e, 0x5c, 0x62, 0x93,
+ 0xd7, 0x6b, 0x84, 0x3b, 0x71, 0x83, 0x77, 0x01,
+ 0x3e, 0x59, 0x45, 0xf4, 0x77, 0x6c, 0x6b, 0xcb,
+ 0x88, 0x45, 0x09, 0x1d, 0xc6, 0x45, 0x6e, 0xdc,
+ 0x6e, 0x51, 0xb8, 0x28
+};
+static const u8 output53[] __initconst = {
+ 0xc5, 0x90, 0x96, 0x78, 0x02, 0xf5, 0xc4, 0x3c,
+ 0xde, 0xd4, 0xd4, 0xc6, 0xa7, 0xad, 0x12, 0x47,
+ 0x45, 0xce, 0xcd, 0x8c, 0x35, 0xcc, 0xa6, 0x9e,
+ 0x5a, 0xc6, 0x60, 0xbb, 0xe3, 0xed, 0xec, 0x68,
+ 0x3f, 0x64, 0xf7, 0x06, 0x63, 0x9c, 0x8c, 0xc8,
+ 0x05, 0x3a, 0xad, 0x32, 0x79, 0x8b, 0x45, 0x96,
+ 0x93, 0x73, 0x4c, 0xe0
+};
+static const u8 key53[] __initconst = {
+ 0x42, 0x4b, 0x20, 0x81, 0x49, 0x50, 0xe9, 0xc2,
+ 0x43, 0x69, 0x36, 0xe7, 0x68, 0xae, 0xd5, 0x7e,
+ 0x42, 0x1a, 0x1b, 0xb4, 0x06, 0x4d, 0xa7, 0x17,
+ 0xb5, 0x31, 0xd6, 0x0c, 0xb0, 0x5c, 0x41, 0x0b
+};
+enum { nonce53 = 0xf44ce1931fbda3d7ULL };
+
+static const u8 input54[] __initconst = {
+ 0x7b, 0xf6, 0x8b, 0xae, 0xc0, 0xcb, 0x10, 0x8e,
+ 0xe8, 0xd8, 0x2e, 0x3b, 0x14, 0xba, 0xb4, 0xd2,
+ 0x58, 0x6b, 0x2c, 0xec, 0xc1, 0x81, 0x71, 0xb4,
+ 0xc6, 0xea, 0x08, 0xc5, 0xc9, 0x78, 0xdb, 0xa2,
+ 0xfa, 0x44, 0x50, 0x9b, 0xc8, 0x53, 0x8d, 0x45,
+ 0x42, 0xe7, 0x09, 0xc4, 0x29, 0xd8, 0x75, 0x02,
+ 0xbb, 0xb2, 0x78, 0xcf, 0xe7
+};
+static const u8 output54[] __initconst = {
+ 0xaf, 0x2c, 0x83, 0x26, 0x6e, 0x7f, 0xa6, 0xe9,
+ 0x03, 0x75, 0xfe, 0xfe, 0x87, 0x58, 0xcf, 0xb5,
+ 0xbc, 0x3c, 0x9d, 0xa1, 0x6e, 0x13, 0xf1, 0x0f,
+ 0x9e, 0xbc, 0xe0, 0x54, 0x24, 0x32, 0xce, 0x95,
+ 0xe6, 0xa5, 0x59, 0x3d, 0x24, 0x1d, 0x8f, 0xb1,
+ 0x74, 0x6c, 0x56, 0xe7, 0x96, 0xc1, 0x91, 0xc8,
+ 0x2d, 0x0e, 0xb7, 0x51, 0x10
+};
+static const u8 key54[] __initconst = {
+ 0x00, 0x68, 0x74, 0xdc, 0x30, 0x9e, 0xe3, 0x52,
+ 0xa9, 0xae, 0xb6, 0x7c, 0xa1, 0xdc, 0x12, 0x2d,
+ 0x98, 0x32, 0x7a, 0x77, 0xe1, 0xdd, 0xa3, 0x76,
+ 0x72, 0x34, 0x83, 0xd8, 0xb7, 0x69, 0xba, 0x77
+};
+enum { nonce54 = 0xbea57d79b798b63aULL };
+
+static const u8 input55[] __initconst = {
+ 0xb5, 0xf4, 0x2f, 0xc1, 0x5e, 0x10, 0xa7, 0x4e,
+ 0x74, 0x3d, 0xa3, 0x96, 0xc0, 0x4d, 0x7b, 0x92,
+ 0x8f, 0xdb, 0x2d, 0x15, 0x52, 0x6a, 0x95, 0x5e,
+ 0x40, 0x81, 0x4f, 0x70, 0x73, 0xea, 0x84, 0x65,
+ 0x3d, 0x9a, 0x4e, 0x03, 0x95, 0xf8, 0x5d, 0x2f,
+ 0x07, 0x02, 0x13, 0x13, 0xdd, 0x82, 0xe6, 0x3b,
+ 0xe1, 0x5f, 0xb3, 0x37, 0x9b, 0x88
+};
+static const u8 output55[] __initconst = {
+ 0xc1, 0x88, 0xbd, 0x92, 0x77, 0xad, 0x7c, 0x5f,
+ 0xaf, 0xa8, 0x57, 0x0e, 0x40, 0x0a, 0xdc, 0x70,
+ 0xfb, 0xc6, 0x71, 0xfd, 0xc4, 0x74, 0x60, 0xcc,
+ 0xa0, 0x89, 0x8e, 0x99, 0xf0, 0x06, 0xa6, 0x7c,
+ 0x97, 0x42, 0x21, 0x81, 0x6a, 0x07, 0xe7, 0xb3,
+ 0xf7, 0xa5, 0x03, 0x71, 0x50, 0x05, 0x63, 0x17,
+ 0xa9, 0x46, 0x0b, 0xff, 0x30, 0x78
+};
+static const u8 key55[] __initconst = {
+ 0x19, 0x8f, 0xe7, 0xd7, 0x6b, 0x7f, 0x6f, 0x69,
+ 0x86, 0x91, 0x0f, 0xa7, 0x4a, 0x69, 0x8e, 0x34,
+ 0xf3, 0xdb, 0xde, 0xaf, 0xf2, 0x66, 0x1d, 0x64,
+ 0x97, 0x0c, 0xcf, 0xfa, 0x33, 0x84, 0xfd, 0x0c
+};
+enum { nonce55 = 0x80aa3d3e2c51ef06ULL };
+
+static const u8 input56[] __initconst = {
+ 0x6b, 0xe9, 0x73, 0x42, 0x27, 0x5e, 0x12, 0xcd,
+ 0xaa, 0x45, 0x12, 0x8b, 0xb3, 0xe6, 0x54, 0x33,
+ 0x31, 0x7d, 0xe2, 0x25, 0xc6, 0x86, 0x47, 0x67,
+ 0x86, 0x83, 0xe4, 0x46, 0xb5, 0x8f, 0x2c, 0xbb,
+ 0xe4, 0xb8, 0x9f, 0xa2, 0xa4, 0xe8, 0x75, 0x96,
+ 0x92, 0x51, 0x51, 0xac, 0x8e, 0x2e, 0x6f, 0xfc,
+ 0xbd, 0x0d, 0xa3, 0x9f, 0x16, 0x55, 0x3e
+};
+static const u8 output56[] __initconst = {
+ 0x42, 0x99, 0x73, 0x6c, 0xd9, 0x4b, 0x16, 0xe5,
+ 0x18, 0x63, 0x1a, 0xd9, 0x0e, 0xf1, 0x15, 0x2e,
+ 0x0f, 0x4b, 0xe4, 0x5f, 0xa0, 0x4d, 0xde, 0x9f,
+ 0xa7, 0x18, 0xc1, 0x0c, 0x0b, 0xae, 0x55, 0xe4,
+ 0x89, 0x18, 0xa4, 0x78, 0x9d, 0x25, 0x0d, 0xd5,
+ 0x94, 0x0f, 0xf9, 0x78, 0xa3, 0xa6, 0xe9, 0x9e,
+ 0x2c, 0x73, 0xf0, 0xf7, 0x35, 0xf3, 0x2b
+};
+static const u8 key56[] __initconst = {
+ 0x7d, 0x12, 0xad, 0x51, 0xd5, 0x6f, 0x8f, 0x96,
+ 0xc0, 0x5d, 0x9a, 0xd1, 0x7e, 0x20, 0x98, 0x0e,
+ 0x3c, 0x0a, 0x67, 0x6b, 0x1b, 0x88, 0x69, 0xd4,
+ 0x07, 0x8c, 0xaf, 0x0f, 0x3a, 0x28, 0xe4, 0x5d
+};
+enum { nonce56 = 0x70f4c372fb8b5984ULL };
+
+static const u8 input57[] __initconst = {
+ 0x28, 0xa3, 0x06, 0xe8, 0xe7, 0x08, 0xb9, 0xef,
+ 0x0d, 0x63, 0x15, 0x99, 0xb2, 0x78, 0x7e, 0xaf,
+ 0x30, 0x50, 0xcf, 0xea, 0xc9, 0x91, 0x41, 0x2f,
+ 0x3b, 0x38, 0x70, 0xc4, 0x87, 0xb0, 0x3a, 0xee,
+ 0x4a, 0xea, 0xe3, 0x83, 0x68, 0x8b, 0xcf, 0xda,
+ 0x04, 0xa5, 0xbd, 0xb2, 0xde, 0x3c, 0x55, 0x13,
+ 0xfe, 0x96, 0xad, 0xc1, 0x61, 0x1b, 0x98, 0xde
+};
+static const u8 output57[] __initconst = {
+ 0xf4, 0x44, 0xe9, 0xd2, 0x6d, 0xc2, 0x5a, 0xe9,
+ 0xfd, 0x7e, 0x41, 0x54, 0x3f, 0xf4, 0x12, 0xd8,
+ 0x55, 0x0d, 0x12, 0x9b, 0xd5, 0x2e, 0x95, 0xe5,
+ 0x77, 0x42, 0x3f, 0x2c, 0xfb, 0x28, 0x9d, 0x72,
+ 0x6d, 0x89, 0x82, 0x27, 0x64, 0x6f, 0x0d, 0x57,
+ 0xa1, 0x25, 0xa3, 0x6b, 0x88, 0x9a, 0xac, 0x0c,
+ 0x76, 0x19, 0x90, 0xe2, 0x50, 0x5a, 0xf8, 0x12
+};
+static const u8 key57[] __initconst = {
+ 0x08, 0x26, 0xb8, 0xac, 0xf3, 0xa5, 0xc6, 0xa3,
+ 0x7f, 0x09, 0x87, 0xf5, 0x6c, 0x5a, 0x85, 0x6c,
+ 0x3d, 0xbd, 0xde, 0xd5, 0x87, 0xa3, 0x98, 0x7a,
+ 0xaa, 0x40, 0x3e, 0xf7, 0xff, 0x44, 0x5d, 0xee
+};
+enum { nonce57 = 0xc03a6130bf06b089ULL };
+
+static const u8 input58[] __initconst = {
+ 0x82, 0xa5, 0x38, 0x6f, 0xaa, 0xb4, 0xaf, 0xb2,
+ 0x42, 0x01, 0xa8, 0x39, 0x3f, 0x15, 0x51, 0xa8,
+ 0x11, 0x1b, 0x93, 0xca, 0x9c, 0xa0, 0x57, 0x68,
+ 0x8f, 0xdb, 0x68, 0x53, 0x51, 0x6d, 0x13, 0x22,
+ 0x12, 0x9b, 0xbd, 0x33, 0xa8, 0x52, 0x40, 0x57,
+ 0x80, 0x9b, 0x98, 0xef, 0x56, 0x70, 0x11, 0xfa,
+ 0x36, 0x69, 0x7d, 0x15, 0x48, 0xf9, 0x3b, 0xeb,
+ 0x42
+};
+static const u8 output58[] __initconst = {
+ 0xff, 0x3a, 0x74, 0xc3, 0x3e, 0x44, 0x64, 0x4d,
+ 0x0e, 0x5f, 0x9d, 0xa8, 0xdb, 0xbe, 0x12, 0xef,
+ 0xba, 0x56, 0x65, 0x50, 0x76, 0xaf, 0xa4, 0x4e,
+ 0x01, 0xc1, 0xd3, 0x31, 0x14, 0xe2, 0xbe, 0x7b,
+ 0xa5, 0x67, 0xb4, 0xe3, 0x68, 0x40, 0x9c, 0xb0,
+ 0xb1, 0x78, 0xef, 0x49, 0x03, 0x0f, 0x2d, 0x56,
+ 0xb4, 0x37, 0xdb, 0xbc, 0x2d, 0x68, 0x1c, 0x3c,
+ 0xf1
+};
+static const u8 key58[] __initconst = {
+ 0x7e, 0xf1, 0x7c, 0x20, 0x65, 0xed, 0xcd, 0xd7,
+ 0x57, 0xe8, 0xdb, 0x90, 0x87, 0xdb, 0x5f, 0x63,
+ 0x3d, 0xdd, 0xb8, 0x2b, 0x75, 0x8e, 0x04, 0xb5,
+ 0xf4, 0x12, 0x79, 0xa9, 0x4d, 0x42, 0x16, 0x7f
+};
+enum { nonce58 = 0x92838183f80d2f7fULL };
+
+static const u8 input59[] __initconst = {
+ 0x37, 0xf1, 0x9d, 0xdd, 0xd7, 0x08, 0x9f, 0x13,
+ 0xc5, 0x21, 0x82, 0x75, 0x08, 0x9e, 0x25, 0x16,
+ 0xb1, 0xd1, 0x71, 0x42, 0x28, 0x63, 0xac, 0x47,
+ 0x71, 0x54, 0xb1, 0xfc, 0x39, 0xf0, 0x61, 0x4f,
+ 0x7c, 0x6d, 0x4f, 0xc8, 0x33, 0xef, 0x7e, 0xc8,
+ 0xc0, 0x97, 0xfc, 0x1a, 0x61, 0xb4, 0x87, 0x6f,
+ 0xdd, 0x5a, 0x15, 0x7b, 0x1b, 0x95, 0x50, 0x94,
+ 0x1d, 0xba
+};
+static const u8 output59[] __initconst = {
+ 0x73, 0x67, 0xc5, 0x07, 0xbb, 0x57, 0x79, 0xd5,
+ 0xc9, 0x04, 0xdd, 0x88, 0xf3, 0x86, 0xe5, 0x70,
+ 0x49, 0x31, 0xe0, 0xcc, 0x3b, 0x1d, 0xdf, 0xb0,
+ 0xaf, 0xf4, 0x2d, 0xe0, 0x06, 0x10, 0x91, 0x8d,
+ 0x1c, 0xcf, 0x31, 0x0b, 0xf6, 0x73, 0xda, 0x1c,
+ 0xf0, 0x17, 0x52, 0x9e, 0x20, 0x2e, 0x9f, 0x8c,
+ 0xb3, 0x59, 0xce, 0xd4, 0xd3, 0xc1, 0x81, 0xe9,
+ 0x11, 0x36
+};
+static const u8 key59[] __initconst = {
+ 0xbd, 0x07, 0xd0, 0x53, 0x2c, 0xb3, 0xcc, 0x3f,
+ 0xc4, 0x95, 0xfd, 0xe7, 0x81, 0xb3, 0x29, 0x99,
+ 0x05, 0x45, 0xd6, 0x95, 0x25, 0x0b, 0x72, 0xd3,
+ 0xcd, 0xbb, 0x73, 0xf8, 0xfa, 0xc0, 0x9b, 0x7a
+};
+enum { nonce59 = 0x4a0db819b0d519e2ULL };
+
+static const u8 input60[] __initconst = {
+ 0x58, 0x4e, 0xdf, 0x94, 0x3c, 0x76, 0x0a, 0x79,
+ 0x47, 0xf1, 0xbe, 0x88, 0xd3, 0xba, 0x94, 0xd8,
+ 0xe2, 0x8f, 0xe3, 0x2f, 0x2f, 0x74, 0x82, 0x55,
+ 0xc3, 0xda, 0xe2, 0x4e, 0x2c, 0x8c, 0x45, 0x1d,
+ 0x72, 0x8f, 0x54, 0x41, 0xb5, 0xb7, 0x69, 0xe4,
+ 0xdc, 0xd2, 0x36, 0x21, 0x5c, 0x28, 0x52, 0xf7,
+ 0x98, 0x8e, 0x72, 0xa7, 0x6d, 0x57, 0xed, 0xdc,
+ 0x3c, 0xe6, 0x6a
+};
+static const u8 output60[] __initconst = {
+ 0xda, 0xaf, 0xb5, 0xe3, 0x30, 0x65, 0x5c, 0xb1,
+ 0x48, 0x08, 0x43, 0x7b, 0x9e, 0xd2, 0x6a, 0x62,
+ 0x56, 0x7c, 0xad, 0xd9, 0xe5, 0xf6, 0x09, 0x71,
+ 0xcd, 0xe6, 0x05, 0x6b, 0x3f, 0x44, 0x3a, 0x5c,
+ 0xf6, 0xf8, 0xd7, 0xce, 0x7d, 0xd1, 0xe0, 0x4f,
+ 0x88, 0x15, 0x04, 0xd8, 0x20, 0xf0, 0x3e, 0xef,
+ 0xae, 0xa6, 0x27, 0xa3, 0x0e, 0xfc, 0x18, 0x90,
+ 0x33, 0xcd, 0xd3
+};
+static const u8 key60[] __initconst = {
+ 0xbf, 0xfd, 0x25, 0xb5, 0xb2, 0xfc, 0x78, 0x0c,
+ 0x8e, 0xb9, 0x57, 0x2f, 0x26, 0x4a, 0x7e, 0x71,
+ 0xcc, 0xf2, 0xe0, 0xfd, 0x24, 0x11, 0x20, 0x23,
+ 0x57, 0x00, 0xff, 0x80, 0x11, 0x0c, 0x1e, 0xff
+};
+enum { nonce60 = 0xf18df56fdb7954adULL };
+
+static const u8 input61[] __initconst = {
+ 0xb0, 0xf3, 0x06, 0xbc, 0x22, 0xae, 0x49, 0x40,
+ 0xae, 0xff, 0x1b, 0x31, 0xa7, 0x98, 0xab, 0x1d,
+ 0xe7, 0x40, 0x23, 0x18, 0x4f, 0xab, 0x8e, 0x93,
+ 0x82, 0xf4, 0x56, 0x61, 0xfd, 0x2b, 0xcf, 0xa7,
+ 0xc4, 0xb4, 0x0a, 0xf4, 0xcb, 0xc7, 0x8c, 0x40,
+ 0x57, 0xac, 0x0b, 0x3e, 0x2a, 0x0a, 0x67, 0x83,
+ 0x50, 0xbf, 0xec, 0xb0, 0xc7, 0xf1, 0x32, 0x26,
+ 0x98, 0x80, 0x33, 0xb4
+};
+static const u8 output61[] __initconst = {
+ 0x9d, 0x23, 0x0e, 0xff, 0xcc, 0x7c, 0xd5, 0xcf,
+ 0x1a, 0xb8, 0x59, 0x1e, 0x92, 0xfd, 0x7f, 0xca,
+ 0xca, 0x3c, 0x18, 0x81, 0xde, 0xfa, 0x59, 0xc8,
+ 0x6f, 0x9c, 0x24, 0x3f, 0x3a, 0xe6, 0x0b, 0xb4,
+ 0x34, 0x48, 0x69, 0xfc, 0xb6, 0xea, 0xb2, 0xde,
+ 0x9f, 0xfd, 0x92, 0x36, 0x18, 0x98, 0x99, 0xaa,
+ 0x65, 0xe2, 0xea, 0xf4, 0xb1, 0x47, 0x8e, 0xb0,
+ 0xe7, 0xd4, 0x7a, 0x2c
+};
+static const u8 key61[] __initconst = {
+ 0xd7, 0xfd, 0x9b, 0xbd, 0x8f, 0x65, 0x0d, 0x00,
+ 0xca, 0xa1, 0x6c, 0x85, 0x85, 0xa4, 0x6d, 0xf1,
+ 0xb1, 0x68, 0x0c, 0x8b, 0x5d, 0x37, 0x72, 0xd0,
+ 0xd8, 0xd2, 0x25, 0xab, 0x9f, 0x7b, 0x7d, 0x95
+};
+enum { nonce61 = 0xd82caf72a9c4864fULL };
+
+static const u8 input62[] __initconst = {
+ 0x10, 0x77, 0xf3, 0x2f, 0xc2, 0x50, 0xd6, 0x0c,
+ 0xba, 0xa8, 0x8d, 0xce, 0x0d, 0x58, 0x9e, 0x87,
+ 0xb1, 0x59, 0x66, 0x0a, 0x4a, 0xb3, 0xd8, 0xca,
+ 0x0a, 0x6b, 0xf8, 0xc6, 0x2b, 0x3f, 0x8e, 0x09,
+ 0xe0, 0x0a, 0x15, 0x85, 0xfe, 0xaa, 0xc6, 0xbd,
+ 0x30, 0xef, 0xe4, 0x10, 0x78, 0x03, 0xc1, 0xc7,
+ 0x8a, 0xd9, 0xde, 0x0b, 0x51, 0x07, 0xc4, 0x7b,
+ 0xe2, 0x2e, 0x36, 0x3a, 0xc2
+};
+static const u8 output62[] __initconst = {
+ 0xa0, 0x0c, 0xfc, 0xc1, 0xf6, 0xaf, 0xc2, 0xb8,
+ 0x5c, 0xef, 0x6e, 0xf3, 0xce, 0x15, 0x48, 0x05,
+ 0xb5, 0x78, 0x49, 0x51, 0x1f, 0x9d, 0xf4, 0xbf,
+ 0x2f, 0x53, 0xa2, 0xd1, 0x15, 0x20, 0x82, 0x6b,
+ 0xd2, 0x22, 0x6c, 0x4e, 0x14, 0x87, 0xe3, 0xd7,
+ 0x49, 0x45, 0x84, 0xdb, 0x5f, 0x68, 0x60, 0xc4,
+ 0xb3, 0xe6, 0x3f, 0xd1, 0xfc, 0xa5, 0x73, 0xf3,
+ 0xfc, 0xbb, 0xbe, 0xc8, 0x9d
+};
+static const u8 key62[] __initconst = {
+ 0x6e, 0xc9, 0xaf, 0xce, 0x35, 0xb9, 0x86, 0xd1,
+ 0xce, 0x5f, 0xd9, 0xbb, 0xd5, 0x1f, 0x7c, 0xcd,
+ 0xfe, 0x19, 0xaa, 0x3d, 0xea, 0x64, 0xc1, 0x28,
+ 0x40, 0xba, 0xa1, 0x28, 0xcd, 0x40, 0xb6, 0xf2
+};
+enum { nonce62 = 0xa1c0c265f900cde8ULL };
+
+static const u8 input63[] __initconst = {
+ 0x7a, 0x70, 0x21, 0x2c, 0xef, 0xa6, 0x36, 0xd4,
+ 0xe0, 0xab, 0x8c, 0x25, 0x73, 0x34, 0xc8, 0x94,
+ 0x6c, 0x81, 0xcb, 0x19, 0x8d, 0x5a, 0x49, 0xaa,
+ 0x6f, 0xba, 0x83, 0x72, 0x02, 0x5e, 0xf5, 0x89,
+ 0xce, 0x79, 0x7e, 0x13, 0x3d, 0x5b, 0x98, 0x60,
+ 0x5d, 0xd9, 0xfb, 0x15, 0x93, 0x4c, 0xf3, 0x51,
+ 0x49, 0x55, 0xd1, 0x58, 0xdd, 0x7e, 0x6d, 0xfe,
+ 0xdd, 0x84, 0x23, 0x05, 0xba, 0xe9
+};
+static const u8 output63[] __initconst = {
+ 0x20, 0xb3, 0x5c, 0x03, 0x03, 0x78, 0x17, 0xfc,
+ 0x3b, 0x35, 0x30, 0x9a, 0x00, 0x18, 0xf5, 0xc5,
+ 0x06, 0x53, 0xf5, 0x04, 0x24, 0x9d, 0xd1, 0xb2,
+ 0xac, 0x5a, 0xb6, 0x2a, 0xa5, 0xda, 0x50, 0x00,
+ 0xec, 0xff, 0xa0, 0x7a, 0x14, 0x7b, 0xe4, 0x6b,
+ 0x63, 0xe8, 0x66, 0x86, 0x34, 0xfd, 0x74, 0x44,
+ 0xa2, 0x50, 0x97, 0x0d, 0xdc, 0xc3, 0x84, 0xf8,
+ 0x71, 0x02, 0x31, 0x95, 0xed, 0x54
+};
+static const u8 key63[] __initconst = {
+ 0x7d, 0x64, 0xb4, 0x12, 0x81, 0xe4, 0xe6, 0x8f,
+ 0xcc, 0xe7, 0xd1, 0x1f, 0x70, 0x20, 0xfd, 0xb8,
+ 0x3a, 0x7d, 0xa6, 0x53, 0x65, 0x30, 0x5d, 0xe3,
+ 0x1a, 0x44, 0xbe, 0x62, 0xed, 0x90, 0xc4, 0xd1
+};
+enum { nonce63 = 0xe8e849596c942276ULL };
+
+static const u8 input64[] __initconst = {
+ 0x84, 0xf8, 0xda, 0x87, 0x23, 0x39, 0x60, 0xcf,
+ 0xc5, 0x50, 0x7e, 0xc5, 0x47, 0x29, 0x7c, 0x05,
+ 0xc2, 0xb4, 0xf4, 0xb2, 0xec, 0x5d, 0x48, 0x36,
+ 0xbf, 0xfc, 0x06, 0x8c, 0xf2, 0x0e, 0x88, 0xe7,
+ 0xc9, 0xc5, 0xa4, 0xa2, 0x83, 0x20, 0xa1, 0x6f,
+ 0x37, 0xe5, 0x2d, 0xa1, 0x72, 0xa1, 0x19, 0xef,
+ 0x05, 0x42, 0x08, 0xf2, 0x57, 0x47, 0x31, 0x1e,
+ 0x17, 0x76, 0x13, 0xd3, 0xcc, 0x75, 0x2c
+};
+static const u8 output64[] __initconst = {
+ 0xcb, 0xec, 0x90, 0x88, 0xeb, 0x31, 0x69, 0x20,
+ 0xa6, 0xdc, 0xff, 0x76, 0x98, 0xb0, 0x24, 0x49,
+ 0x7b, 0x20, 0xd9, 0xd1, 0x1b, 0xe3, 0x61, 0xdc,
+ 0xcf, 0x51, 0xf6, 0x70, 0x72, 0x33, 0x28, 0x94,
+ 0xac, 0x73, 0x18, 0xcf, 0x93, 0xfd, 0xca, 0x08,
+ 0x0d, 0xa2, 0xb9, 0x57, 0x1e, 0x51, 0xb6, 0x07,
+ 0x5c, 0xc1, 0x13, 0x64, 0x1d, 0x18, 0x6f, 0xe6,
+ 0x0b, 0xb7, 0x14, 0x03, 0x43, 0xb6, 0xaf
+};
+static const u8 key64[] __initconst = {
+ 0xbf, 0x82, 0x65, 0xe4, 0x50, 0xf9, 0x5e, 0xea,
+ 0x28, 0x91, 0xd1, 0xd2, 0x17, 0x7c, 0x13, 0x7e,
+ 0xf5, 0xd5, 0x6b, 0x06, 0x1c, 0x20, 0xc2, 0x82,
+ 0xa1, 0x7a, 0xa2, 0x14, 0xa1, 0xb0, 0x54, 0x58
+};
+enum { nonce64 = 0xe57c5095aa5723c9ULL };
+
+static const u8 input65[] __initconst = {
+ 0x1c, 0xfb, 0xd3, 0x3f, 0x85, 0xd7, 0xba, 0x7b,
+ 0xae, 0xb1, 0xa5, 0xd2, 0xe5, 0x40, 0xce, 0x4d,
+ 0x3e, 0xab, 0x17, 0x9d, 0x7d, 0x9f, 0x03, 0x98,
+ 0x3f, 0x9f, 0xc8, 0xdd, 0x36, 0x17, 0x43, 0x5c,
+ 0x34, 0xd1, 0x23, 0xe0, 0x77, 0xbf, 0x35, 0x5d,
+ 0x8f, 0xb1, 0xcb, 0x82, 0xbb, 0x39, 0x69, 0xd8,
+ 0x90, 0x45, 0x37, 0xfd, 0x98, 0x25, 0xf7, 0x5b,
+ 0xce, 0x06, 0x43, 0xba, 0x61, 0xa8, 0x47, 0xb9
+};
+static const u8 output65[] __initconst = {
+ 0x73, 0xa5, 0x68, 0xab, 0x8b, 0xa5, 0xc3, 0x7e,
+ 0x74, 0xf8, 0x9d, 0xf5, 0x93, 0x6e, 0xf2, 0x71,
+ 0x6d, 0xde, 0x82, 0xc5, 0x40, 0xa0, 0x46, 0xb3,
+ 0x9a, 0x78, 0xa8, 0xf7, 0xdf, 0xb1, 0xc3, 0xdd,
+ 0x8d, 0x90, 0x00, 0x68, 0x21, 0x48, 0xe8, 0xba,
+ 0x56, 0x9f, 0x8f, 0xe7, 0xa4, 0x4d, 0x36, 0x55,
+ 0xd0, 0x34, 0x99, 0xa6, 0x1c, 0x4c, 0xc1, 0xe2,
+ 0x65, 0x98, 0x14, 0x8e, 0x6a, 0x05, 0xb1, 0x2b
+};
+static const u8 key65[] __initconst = {
+ 0xbd, 0x5c, 0x8a, 0xb0, 0x11, 0x29, 0xf3, 0x00,
+ 0x7a, 0x78, 0x32, 0x63, 0x34, 0x00, 0xe6, 0x7d,
+ 0x30, 0x54, 0xde, 0x37, 0xda, 0xc2, 0xc4, 0x3d,
+ 0x92, 0x6b, 0x4c, 0xc2, 0x92, 0xe9, 0x9e, 0x2a
+};
+enum { nonce65 = 0xf654a3031de746f2ULL };
+
+static const u8 input66[] __initconst = {
+ 0x4b, 0x27, 0x30, 0x8f, 0x28, 0xd8, 0x60, 0x46,
+ 0x39, 0x06, 0x49, 0xea, 0x1b, 0x71, 0x26, 0xe0,
+ 0x99, 0x2b, 0xd4, 0x8f, 0x64, 0x64, 0xcd, 0xac,
+ 0x1d, 0x78, 0x88, 0x90, 0xe1, 0x5c, 0x24, 0x4b,
+ 0xdc, 0x2d, 0xb7, 0xee, 0x3a, 0xe6, 0x86, 0x2c,
+ 0x21, 0xe4, 0x2b, 0xfc, 0xe8, 0x19, 0xca, 0x65,
+ 0xe7, 0xdd, 0x6f, 0x52, 0xb3, 0x11, 0xe1, 0xe2,
+ 0xbf, 0xe8, 0x70, 0xe3, 0x0d, 0x45, 0xb8, 0xa5,
+ 0x20, 0xb7, 0xb5, 0xaf, 0xff, 0x08, 0xcf, 0x23,
+ 0x65, 0xdf, 0x8d, 0xc3, 0x31, 0xf3, 0x1e, 0x6a,
+ 0x58, 0x8d, 0xcc, 0x45, 0x16, 0x86, 0x1f, 0x31,
+ 0x5c, 0x27, 0xcd, 0xc8, 0x6b, 0x19, 0x1e, 0xec,
+ 0x44, 0x75, 0x63, 0x97, 0xfd, 0x79, 0xf6, 0x62,
+ 0xc5, 0xba, 0x17, 0xc7, 0xab, 0x8f, 0xbb, 0xed,
+ 0x85, 0x2a, 0x98, 0x79, 0x21, 0xec, 0x6e, 0x4d,
+ 0xdc, 0xfa, 0x72, 0x52, 0xba, 0xc8, 0x4c
+};
+static const u8 output66[] __initconst = {
+ 0x76, 0x5b, 0x2c, 0xa7, 0x62, 0xb9, 0x08, 0x4a,
+ 0xc6, 0x4a, 0x92, 0xc3, 0xbb, 0x10, 0xb3, 0xee,
+ 0xff, 0xb9, 0x07, 0xc7, 0x27, 0xcb, 0x1e, 0xcf,
+ 0x58, 0x6f, 0xa1, 0x64, 0xe8, 0xf1, 0x4e, 0xe1,
+ 0xef, 0x18, 0x96, 0xab, 0x97, 0x28, 0xd1, 0x7c,
+ 0x71, 0x6c, 0xd1, 0xe2, 0xfa, 0xd9, 0x75, 0xcb,
+ 0xeb, 0xea, 0x0c, 0x86, 0x82, 0xd8, 0xf4, 0xcc,
+ 0xea, 0xa3, 0x00, 0xfa, 0x82, 0xd2, 0xcd, 0xcb,
+ 0xdb, 0x63, 0x28, 0xe2, 0x82, 0xe9, 0x01, 0xed,
+ 0x31, 0xe6, 0x71, 0x45, 0x08, 0x89, 0x8a, 0x23,
+ 0xa8, 0xb5, 0xc2, 0xe2, 0x9f, 0xe9, 0xb8, 0x9a,
+ 0xc4, 0x79, 0x6d, 0x71, 0x52, 0x61, 0x74, 0x6c,
+ 0x1b, 0xd7, 0x65, 0x6d, 0x03, 0xc4, 0x1a, 0xc0,
+ 0x50, 0xba, 0xd6, 0xc9, 0x43, 0x50, 0xbe, 0x09,
+ 0x09, 0x8a, 0xdb, 0xaa, 0x76, 0x4e, 0x3b, 0x61,
+ 0x3c, 0x7c, 0x44, 0xe7, 0xdb, 0x10, 0xa7
+};
+static const u8 key66[] __initconst = {
+ 0x88, 0xdf, 0xca, 0x68, 0xaf, 0x4f, 0xb3, 0xfd,
+ 0x6e, 0xa7, 0x95, 0x35, 0x8a, 0xe8, 0x37, 0xe8,
+ 0xc8, 0x55, 0xa2, 0x2a, 0x6d, 0x77, 0xf8, 0x93,
+ 0x7a, 0x41, 0xf3, 0x7b, 0x95, 0xdf, 0x89, 0xf5
+};
+enum { nonce66 = 0x1024b4fdd415cf82ULL };
+
+static const u8 input67[] __initconst = {
+ 0xd4, 0x2e, 0xfa, 0x92, 0xe9, 0x29, 0x68, 0xb7,
+ 0x54, 0x2c, 0xf7, 0xa4, 0x2d, 0xb7, 0x50, 0xb5,
+ 0xc5, 0xb2, 0x9d, 0x17, 0x5e, 0x0a, 0xca, 0x37,
+ 0xbf, 0x60, 0xae, 0xd2, 0x98, 0xe9, 0xfa, 0x59,
+ 0x67, 0x62, 0xe6, 0x43, 0x0c, 0x77, 0x80, 0x82,
+ 0x33, 0x61, 0xa3, 0xff, 0xc1, 0xa0, 0x8f, 0x56,
+ 0xbc, 0xec, 0x65, 0x43, 0x88, 0xa5, 0xff, 0x51,
+ 0x64, 0x30, 0xee, 0x34, 0xb7, 0x5c, 0x28, 0x68,
+ 0xc3, 0x52, 0xd2, 0xac, 0x78, 0x2a, 0xa6, 0x10,
+ 0xb8, 0xb2, 0x4c, 0x80, 0x4f, 0x99, 0xb2, 0x36,
+ 0x94, 0x8f, 0x66, 0xcb, 0xa1, 0x91, 0xed, 0x06,
+ 0x42, 0x6d, 0xc1, 0xae, 0x55, 0x93, 0xdd, 0x93,
+ 0x9e, 0x88, 0x34, 0x7f, 0x98, 0xeb, 0xbe, 0x61,
+ 0xf9, 0xa9, 0x0f, 0xd9, 0xc4, 0x87, 0xd5, 0xef,
+ 0xcc, 0x71, 0x8c, 0x0e, 0xce, 0xad, 0x02, 0xcf,
+ 0xa2, 0x61, 0xdf, 0xb1, 0xfe, 0x3b, 0xdc, 0xc0,
+ 0x58, 0xb5, 0x71, 0xa1, 0x83, 0xc9, 0xb4, 0xaf,
+ 0x9d, 0x54, 0x12, 0xcd, 0xea, 0x06, 0xd6, 0x4e,
+ 0xe5, 0x27, 0x0c, 0xc3, 0xbb, 0xa8, 0x0a, 0x81,
+ 0x75, 0xc3, 0xc9, 0xd4, 0x35, 0x3e, 0x53, 0x9f,
+ 0xaa, 0x20, 0xc0, 0x68, 0x39, 0x2c, 0x96, 0x39,
+ 0x53, 0x81, 0xda, 0x07, 0x0f, 0x44, 0xa5, 0x47,
+ 0x0e, 0xb3, 0x87, 0x0d, 0x1b, 0xc1, 0xe5, 0x41,
+ 0x35, 0x12, 0x58, 0x96, 0x69, 0x8a, 0x1a, 0xa3,
+ 0x9d, 0x3d, 0xd4, 0xb1, 0x8e, 0x1f, 0x96, 0x87,
+ 0xda, 0xd3, 0x19, 0xe2, 0xb1, 0x3a, 0x19, 0x74,
+ 0xa0, 0x00, 0x9f, 0x4d, 0xbc, 0xcb, 0x0c, 0xe9,
+ 0xec, 0x10, 0xdf, 0x2a, 0x88, 0xdc, 0x30, 0x51,
+ 0x46, 0x56, 0x53, 0x98, 0x6a, 0x26, 0x14, 0x05,
+ 0x54, 0x81, 0x55, 0x0b, 0x3c, 0x85, 0xdd, 0x33,
+ 0x81, 0x11, 0x29, 0x82, 0x46, 0x35, 0xe1, 0xdb,
+ 0x59, 0x7b
+};
+static const u8 output67[] __initconst = {
+ 0x64, 0x6c, 0xda, 0x7f, 0xd4, 0xa9, 0x2a, 0x5e,
+ 0x22, 0xae, 0x8d, 0x67, 0xdb, 0xee, 0xfd, 0xd0,
+ 0x44, 0x80, 0x17, 0xb2, 0xe3, 0x87, 0xad, 0x57,
+ 0x15, 0xcb, 0x88, 0x64, 0xc0, 0xf1, 0x49, 0x3d,
+ 0xfa, 0xbe, 0xa8, 0x9f, 0x12, 0xc3, 0x57, 0x56,
+ 0x70, 0xa5, 0xc5, 0x6b, 0xf1, 0xab, 0xd5, 0xde,
+ 0x77, 0x92, 0x6a, 0x56, 0x03, 0xf5, 0x21, 0x0d,
+ 0xb6, 0xc4, 0xcc, 0x62, 0x44, 0x3f, 0xb1, 0xc1,
+ 0x61, 0x41, 0x90, 0xb2, 0xd5, 0xb8, 0xf3, 0x57,
+ 0xfb, 0xc2, 0x6b, 0x25, 0x58, 0xc8, 0x45, 0x20,
+ 0x72, 0x29, 0x6f, 0x9d, 0xb5, 0x81, 0x4d, 0x2b,
+ 0xb2, 0x89, 0x9e, 0x91, 0x53, 0x97, 0x1c, 0xd9,
+ 0x3d, 0x79, 0xdc, 0x14, 0xae, 0x01, 0x73, 0x75,
+ 0xf0, 0xca, 0xd5, 0xab, 0x62, 0x5c, 0x7a, 0x7d,
+ 0x3f, 0xfe, 0x22, 0x7d, 0xee, 0xe2, 0xcb, 0x76,
+ 0x55, 0xec, 0x06, 0xdd, 0x41, 0x47, 0x18, 0x62,
+ 0x1d, 0x57, 0xd0, 0xd6, 0xb6, 0x0f, 0x4b, 0xfc,
+ 0x79, 0x19, 0xf4, 0xd6, 0x37, 0x86, 0x18, 0x1f,
+ 0x98, 0x0d, 0x9e, 0x15, 0x2d, 0xb6, 0x9a, 0x8a,
+ 0x8c, 0x80, 0x22, 0x2f, 0x82, 0xc4, 0xc7, 0x36,
+ 0xfa, 0xfa, 0x07, 0xbd, 0xc2, 0x2a, 0xe2, 0xea,
+ 0x93, 0xc8, 0xb2, 0x90, 0x33, 0xf2, 0xee, 0x4b,
+ 0x1b, 0xf4, 0x37, 0x92, 0x13, 0xbb, 0xe2, 0xce,
+ 0xe3, 0x03, 0xcf, 0x07, 0x94, 0xab, 0x9a, 0xc9,
+ 0xff, 0x83, 0x69, 0x3a, 0xda, 0x2c, 0xd0, 0x47,
+ 0x3d, 0x6c, 0x1a, 0x60, 0x68, 0x47, 0xb9, 0x36,
+ 0x52, 0xdd, 0x16, 0xef, 0x6c, 0xbf, 0x54, 0x11,
+ 0x72, 0x62, 0xce, 0x8c, 0x9d, 0x90, 0xa0, 0x25,
+ 0x06, 0x92, 0x3e, 0x12, 0x7e, 0x1a, 0x1d, 0xe5,
+ 0xa2, 0x71, 0xce, 0x1c, 0x4c, 0x6a, 0x7c, 0xdc,
+ 0x3d, 0xe3, 0x6e, 0x48, 0x9d, 0xb3, 0x64, 0x7d,
+ 0x78, 0x40
+};
+static const u8 key67[] __initconst = {
+ 0xa9, 0x20, 0x75, 0x89, 0x7e, 0x37, 0x85, 0x48,
+ 0xa3, 0xfb, 0x7b, 0xe8, 0x30, 0xa7, 0xe3, 0x6e,
+ 0xa6, 0xc1, 0x71, 0x17, 0xc1, 0x6c, 0x9b, 0xc2,
+ 0xde, 0xf0, 0xa7, 0x19, 0xec, 0xce, 0xc6, 0x53
+};
+enum { nonce67 = 0x4adc4d1f968c8a10ULL };
+
+static const u8 input68[] __initconst = {
+ 0x99, 0xae, 0x72, 0xfb, 0x16, 0xe1, 0xf1, 0x59,
+ 0x43, 0x15, 0x4e, 0x33, 0xa0, 0x95, 0xe7, 0x6c,
+ 0x74, 0x24, 0x31, 0xca, 0x3b, 0x2e, 0xeb, 0xd7,
+ 0x11, 0xd8, 0xe0, 0x56, 0x92, 0x91, 0x61, 0x57,
+ 0xe2, 0x82, 0x9f, 0x8f, 0x37, 0xf5, 0x3d, 0x24,
+ 0x92, 0x9d, 0x87, 0x00, 0x8d, 0x89, 0xe0, 0x25,
+ 0x8b, 0xe4, 0x20, 0x5b, 0x8a, 0x26, 0x2c, 0x61,
+ 0x78, 0xb0, 0xa6, 0x3e, 0x82, 0x18, 0xcf, 0xdc,
+ 0x2d, 0x24, 0xdd, 0x81, 0x42, 0xc4, 0x95, 0xf0,
+ 0x48, 0x60, 0x71, 0xe3, 0xe3, 0xac, 0xec, 0xbe,
+ 0x98, 0x6b, 0x0c, 0xb5, 0x6a, 0xa9, 0xc8, 0x79,
+ 0x23, 0x2e, 0x38, 0x0b, 0x72, 0x88, 0x8c, 0xe7,
+ 0x71, 0x8b, 0x36, 0xe3, 0x58, 0x3d, 0x9c, 0xa0,
+ 0xa2, 0xea, 0xcf, 0x0c, 0x6a, 0x6c, 0x64, 0xdf,
+ 0x97, 0x21, 0x8f, 0x93, 0xfb, 0xba, 0xf3, 0x5a,
+ 0xd7, 0x8f, 0xa6, 0x37, 0x15, 0x50, 0x43, 0x02,
+ 0x46, 0x7f, 0x93, 0x46, 0x86, 0x31, 0xe2, 0xaa,
+ 0x24, 0xa8, 0x26, 0xae, 0xe6, 0xc0, 0x05, 0x73,
+ 0x0b, 0x4f, 0x7e, 0xed, 0x65, 0xeb, 0x56, 0x1e,
+ 0xb6, 0xb3, 0x0b, 0xc3, 0x0e, 0x31, 0x95, 0xa9,
+ 0x18, 0x4d, 0xaf, 0x38, 0xd7, 0xec, 0xc6, 0x44,
+ 0x72, 0x77, 0x4e, 0x25, 0x4b, 0x25, 0xdd, 0x1e,
+ 0x8c, 0xa2, 0xdf, 0xf6, 0x2a, 0x97, 0x1a, 0x88,
+ 0x2c, 0x8a, 0x5d, 0xfe, 0xe8, 0xfb, 0x35, 0xe8,
+ 0x0f, 0x2b, 0x7a, 0x18, 0x69, 0x43, 0x31, 0x1d,
+ 0x38, 0x6a, 0x62, 0x95, 0x0f, 0x20, 0x4b, 0xbb,
+ 0x97, 0x3c, 0xe0, 0x64, 0x2f, 0x52, 0xc9, 0x2d,
+ 0x4d, 0x9d, 0x54, 0x04, 0x3d, 0xc9, 0xea, 0xeb,
+ 0xd0, 0x86, 0x52, 0xff, 0x42, 0xe1, 0x0d, 0x7a,
+ 0xad, 0x88, 0xf9, 0x9b, 0x1e, 0x5e, 0x12, 0x27,
+ 0x95, 0x3e, 0x0c, 0x2c, 0x13, 0x00, 0x6f, 0x8e,
+ 0x93, 0x69, 0x0e, 0x01, 0x8c, 0xc1, 0xfd, 0xb3
+};
+static const u8 output68[] __initconst = {
+ 0x26, 0x3e, 0xf2, 0xb1, 0xf5, 0xef, 0x81, 0xa4,
+ 0xb7, 0x42, 0xd4, 0x26, 0x18, 0x4b, 0xdd, 0x6a,
+ 0x47, 0x15, 0xcb, 0x0e, 0x57, 0xdb, 0xa7, 0x29,
+ 0x7e, 0x7b, 0x3f, 0x47, 0x89, 0x57, 0xab, 0xea,
+ 0x14, 0x7b, 0xcf, 0x37, 0xdb, 0x1c, 0xe1, 0x11,
+ 0x77, 0xae, 0x2e, 0x4c, 0xd2, 0x08, 0x3f, 0xa6,
+ 0x62, 0x86, 0xa6, 0xb2, 0x07, 0xd5, 0x3f, 0x9b,
+ 0xdc, 0xc8, 0x50, 0x4b, 0x7b, 0xb9, 0x06, 0xe6,
+ 0xeb, 0xac, 0x98, 0x8c, 0x36, 0x0c, 0x1e, 0xb2,
+ 0xc8, 0xfb, 0x24, 0x60, 0x2c, 0x08, 0x17, 0x26,
+ 0x5b, 0xc8, 0xc2, 0xdf, 0x9c, 0x73, 0x67, 0x4a,
+ 0xdb, 0xcf, 0xd5, 0x2c, 0x2b, 0xca, 0x24, 0xcc,
+ 0xdb, 0xc9, 0xa8, 0xf2, 0x5d, 0x67, 0xdf, 0x5c,
+ 0x62, 0x0b, 0x58, 0xc0, 0x83, 0xde, 0x8b, 0xf6,
+ 0x15, 0x0a, 0xd6, 0x32, 0xd8, 0xf5, 0xf2, 0x5f,
+ 0x33, 0xce, 0x7e, 0xab, 0x76, 0xcd, 0x14, 0x91,
+ 0xd8, 0x41, 0x90, 0x93, 0xa1, 0xaf, 0xf3, 0x45,
+ 0x6c, 0x1b, 0x25, 0xbd, 0x48, 0x51, 0x6d, 0x15,
+ 0x47, 0xe6, 0x23, 0x50, 0x32, 0x69, 0x1e, 0xb5,
+ 0x94, 0xd3, 0x97, 0xba, 0xd7, 0x37, 0x4a, 0xba,
+ 0xb9, 0xcd, 0xfb, 0x96, 0x9a, 0x90, 0xe0, 0x37,
+ 0xf8, 0xdf, 0x91, 0x6c, 0x62, 0x13, 0x19, 0x21,
+ 0x4b, 0xa9, 0xf1, 0x12, 0x66, 0xe2, 0x74, 0xd7,
+ 0x81, 0xa0, 0x74, 0x8d, 0x7e, 0x7e, 0xc9, 0xb1,
+ 0x69, 0x8f, 0xed, 0xb3, 0xf6, 0x97, 0xcd, 0x72,
+ 0x78, 0x93, 0xd3, 0x54, 0x6b, 0x43, 0xac, 0x29,
+ 0xb4, 0xbc, 0x7d, 0xa4, 0x26, 0x4b, 0x7b, 0xab,
+ 0xd6, 0x67, 0x22, 0xff, 0x03, 0x92, 0xb6, 0xd4,
+ 0x96, 0x94, 0x5a, 0xe5, 0x02, 0x35, 0x77, 0xfa,
+ 0x3f, 0x54, 0x1d, 0xdd, 0x35, 0x39, 0xfe, 0x03,
+ 0xdd, 0x8e, 0x3c, 0x8c, 0xc2, 0x69, 0x2a, 0xb1,
+ 0xb7, 0xb3, 0xa1, 0x89, 0x84, 0xea, 0x16, 0xe2
+};
+static const u8 key68[] __initconst = {
+ 0xd2, 0x49, 0x7f, 0xd7, 0x49, 0x66, 0x0d, 0xb3,
+ 0x5a, 0x7e, 0x3c, 0xfc, 0x37, 0x83, 0x0e, 0xf7,
+ 0x96, 0xd8, 0xd6, 0x33, 0x79, 0x2b, 0x84, 0x53,
+ 0x06, 0xbc, 0x6c, 0x0a, 0x55, 0x84, 0xfe, 0xab
+};
+enum { nonce68 = 0x6a6df7ff0a20de06ULL };
+
+static const u8 input69[] __initconst = {
+ 0xf9, 0x18, 0x4c, 0xd2, 0x3f, 0xf7, 0x22, 0xd9,
+ 0x58, 0xb6, 0x3b, 0x38, 0x69, 0x79, 0xf4, 0x71,
+ 0x5f, 0x38, 0x52, 0x1f, 0x17, 0x6f, 0x6f, 0xd9,
+ 0x09, 0x2b, 0xfb, 0x67, 0xdc, 0xc9, 0xe8, 0x4a,
+ 0x70, 0x9f, 0x2e, 0x3c, 0x06, 0xe5, 0x12, 0x20,
+ 0x25, 0x29, 0xd0, 0xdc, 0x81, 0xc5, 0xc6, 0x0f,
+ 0xd2, 0xa8, 0x81, 0x15, 0x98, 0xb2, 0x71, 0x5a,
+ 0x9a, 0xe9, 0xfb, 0xaf, 0x0e, 0x5f, 0x8a, 0xf3,
+ 0x16, 0x4a, 0x47, 0xf2, 0x5c, 0xbf, 0xda, 0x52,
+ 0x9a, 0xa6, 0x36, 0xfd, 0xc6, 0xf7, 0x66, 0x00,
+ 0xcc, 0x6c, 0xd4, 0xb3, 0x07, 0x6d, 0xeb, 0xfe,
+ 0x92, 0x71, 0x25, 0xd0, 0xcf, 0x9c, 0xe8, 0x65,
+ 0x45, 0x10, 0xcf, 0x62, 0x74, 0x7d, 0xf2, 0x1b,
+ 0x57, 0xa0, 0xf1, 0x6b, 0xa4, 0xd5, 0xfa, 0x12,
+ 0x27, 0x5a, 0xf7, 0x99, 0xfc, 0xca, 0xf3, 0xb8,
+ 0x2c, 0x8b, 0xba, 0x28, 0x74, 0xde, 0x8f, 0x78,
+ 0xa2, 0x8c, 0xaf, 0x89, 0x4b, 0x05, 0xe2, 0xf3,
+ 0xf8, 0xd2, 0xef, 0xac, 0xa4, 0xc4, 0xe2, 0xe2,
+ 0x36, 0xbb, 0x5e, 0xae, 0xe6, 0x87, 0x3d, 0x88,
+ 0x9f, 0xb8, 0x11, 0xbb, 0xcf, 0x57, 0xce, 0xd0,
+ 0xba, 0x62, 0xf4, 0xf8, 0x9b, 0x95, 0x04, 0xc9,
+ 0xcf, 0x01, 0xe9, 0xf1, 0xc8, 0xc6, 0x22, 0xa4,
+ 0xf2, 0x8b, 0x2f, 0x24, 0x0a, 0xf5, 0x6e, 0xb7,
+ 0xd4, 0x2c, 0xb6, 0xf7, 0x5c, 0x97, 0x61, 0x0b,
+ 0xd9, 0xb5, 0x06, 0xcd, 0xed, 0x3e, 0x1f, 0xc5,
+ 0xb2, 0x6c, 0xa3, 0xea, 0xb8, 0xad, 0xa6, 0x42,
+ 0x88, 0x7a, 0x52, 0xd5, 0x64, 0xba, 0xb5, 0x20,
+ 0x10, 0xa0, 0x0f, 0x0d, 0xea, 0xef, 0x5a, 0x9b,
+ 0x27, 0xb8, 0xca, 0x20, 0x19, 0x6d, 0xa8, 0xc4,
+ 0x46, 0x04, 0xb3, 0xe8, 0xf8, 0x66, 0x1b, 0x0a,
+ 0xce, 0x76, 0x5d, 0x59, 0x58, 0x05, 0xee, 0x3e,
+ 0x3c, 0x86, 0x5b, 0x49, 0x1c, 0x72, 0x18, 0x01,
+ 0x62, 0x92, 0x0f, 0x3e, 0xd1, 0x57, 0x5e, 0x20,
+ 0x7b, 0xfb, 0x4d, 0x3c, 0xc5, 0x35, 0x43, 0x2f,
+ 0xb0, 0xc5, 0x7c, 0xe4, 0xa2, 0x84, 0x13, 0x77
+};
+static const u8 output69[] __initconst = {
+ 0xbb, 0x4a, 0x7f, 0x7c, 0xd5, 0x2f, 0x89, 0x06,
+ 0xec, 0x20, 0xf1, 0x9a, 0x11, 0x09, 0x14, 0x2e,
+ 0x17, 0x50, 0xf9, 0xd5, 0xf5, 0x48, 0x7c, 0x7a,
+ 0x55, 0xc0, 0x57, 0x03, 0xe3, 0xc4, 0xb2, 0xb7,
+ 0x18, 0x47, 0x95, 0xde, 0xaf, 0x80, 0x06, 0x3c,
+ 0x5a, 0xf2, 0xc3, 0x53, 0xe3, 0x29, 0x92, 0xf8,
+ 0xff, 0x64, 0x85, 0xb9, 0xf7, 0xd3, 0x80, 0xd2,
+ 0x0c, 0x5d, 0x7b, 0x57, 0x0c, 0x51, 0x79, 0x86,
+ 0xf3, 0x20, 0xd2, 0xb8, 0x6e, 0x0c, 0x5a, 0xce,
+ 0xeb, 0x88, 0x02, 0x8b, 0x82, 0x1b, 0x7f, 0xf5,
+ 0xde, 0x7f, 0x48, 0x48, 0xdf, 0xa0, 0x55, 0xc6,
+ 0x0c, 0x22, 0xa1, 0x80, 0x8d, 0x3b, 0xcb, 0x40,
+ 0x2d, 0x3d, 0x0b, 0xf2, 0xe0, 0x22, 0x13, 0x99,
+ 0xe1, 0xa7, 0x27, 0x68, 0x31, 0xe1, 0x24, 0x5d,
+ 0xd2, 0xee, 0x16, 0xc1, 0xd7, 0xa8, 0x14, 0x19,
+ 0x23, 0x72, 0x67, 0x27, 0xdc, 0x5e, 0xb9, 0xc7,
+ 0xd8, 0xe3, 0x55, 0x50, 0x40, 0x98, 0x7b, 0xe7,
+ 0x34, 0x1c, 0x3b, 0x18, 0x14, 0xd8, 0x62, 0xc1,
+ 0x93, 0x84, 0xf3, 0x5b, 0xdd, 0x9e, 0x1f, 0x3b,
+ 0x0b, 0xbc, 0x4e, 0x5b, 0x79, 0xa3, 0xca, 0x74,
+ 0x2a, 0x98, 0xe8, 0x04, 0x39, 0xef, 0xc6, 0x76,
+ 0x6d, 0xee, 0x9f, 0x67, 0x5b, 0x59, 0x3a, 0xe5,
+ 0xf2, 0x3b, 0xca, 0x89, 0xe8, 0x9b, 0x03, 0x3d,
+ 0x11, 0xd2, 0x4a, 0x70, 0xaf, 0x88, 0xb0, 0x94,
+ 0x96, 0x26, 0xab, 0x3c, 0xc1, 0xb8, 0xe4, 0xe7,
+ 0x14, 0x61, 0x64, 0x3a, 0x61, 0x08, 0x0f, 0xa9,
+ 0xce, 0x64, 0xb2, 0x40, 0xf8, 0x20, 0x3a, 0xa9,
+ 0x31, 0xbd, 0x7e, 0x16, 0xca, 0xf5, 0x62, 0x0f,
+ 0x91, 0x9f, 0x8e, 0x1d, 0xa4, 0x77, 0xf3, 0x87,
+ 0x61, 0xe8, 0x14, 0xde, 0x18, 0x68, 0x4e, 0x9d,
+ 0x73, 0xcd, 0x8a, 0xe4, 0x80, 0x84, 0x23, 0xaa,
+ 0x9d, 0x64, 0x1c, 0x80, 0x41, 0xca, 0x82, 0x40,
+ 0x94, 0x55, 0xe3, 0x28, 0xa1, 0x97, 0x71, 0xba,
+ 0xf2, 0x2c, 0x39, 0x62, 0x29, 0x56, 0xd0, 0xff,
+ 0xb2, 0x82, 0x20, 0x59, 0x1f, 0xc3, 0x64, 0x57
+};
+static const u8 key69[] __initconst = {
+ 0x19, 0x09, 0xe9, 0x7c, 0xd9, 0x02, 0x4a, 0x0c,
+ 0x52, 0x25, 0xad, 0x5c, 0x2e, 0x8d, 0x86, 0x10,
+ 0x85, 0x2b, 0xba, 0xa4, 0x44, 0x5b, 0x39, 0x3e,
+ 0x18, 0xaa, 0xce, 0x0e, 0xe2, 0x69, 0x3c, 0xcf
+};
+enum { nonce69 = 0xdb925a1948f0f060ULL };
+
+static const u8 input70[] __initconst = {
+ 0x10, 0xe7, 0x83, 0xcf, 0x42, 0x9f, 0xf2, 0x41,
+ 0xc7, 0xe4, 0xdb, 0xf9, 0xa3, 0x02, 0x1d, 0x8d,
+ 0x50, 0x81, 0x2c, 0x6b, 0x92, 0xe0, 0x4e, 0xea,
+ 0x26, 0x83, 0x2a, 0xd0, 0x31, 0xf1, 0x23, 0xf3,
+ 0x0e, 0x88, 0x14, 0x31, 0xf9, 0x01, 0x63, 0x59,
+ 0x21, 0xd1, 0x8b, 0xdd, 0x06, 0xd0, 0xc6, 0xab,
+ 0x91, 0x71, 0x82, 0x4d, 0xd4, 0x62, 0x37, 0x17,
+ 0xf9, 0x50, 0xf9, 0xb5, 0x74, 0xce, 0x39, 0x80,
+ 0x80, 0x78, 0xf8, 0xdc, 0x1c, 0xdb, 0x7c, 0x3d,
+ 0xd4, 0x86, 0x31, 0x00, 0x75, 0x7b, 0xd1, 0x42,
+ 0x9f, 0x1b, 0x97, 0x88, 0x0e, 0x14, 0x0e, 0x1e,
+ 0x7d, 0x7b, 0xc4, 0xd2, 0xf3, 0xc1, 0x6d, 0x17,
+ 0x5d, 0xc4, 0x75, 0x54, 0x0f, 0x38, 0x65, 0x89,
+ 0xd8, 0x7d, 0xab, 0xc9, 0xa7, 0x0a, 0x21, 0x0b,
+ 0x37, 0x12, 0x05, 0x07, 0xb5, 0x68, 0x32, 0x32,
+ 0xb9, 0xf8, 0x97, 0x17, 0x03, 0xed, 0x51, 0x8f,
+ 0x3d, 0x5a, 0xd0, 0x12, 0x01, 0x6e, 0x2e, 0x91,
+ 0x1c, 0xbe, 0x6b, 0xa3, 0xcc, 0x75, 0x62, 0x06,
+ 0x8e, 0x65, 0xbb, 0xe2, 0x29, 0x71, 0x4b, 0x89,
+ 0x6a, 0x9d, 0x85, 0x8c, 0x8c, 0xdf, 0x94, 0x95,
+ 0x23, 0x66, 0xf8, 0x92, 0xee, 0x56, 0xeb, 0xb3,
+ 0xeb, 0xd2, 0x4a, 0x3b, 0x77, 0x8a, 0x6e, 0xf6,
+ 0xca, 0xd2, 0x34, 0x00, 0xde, 0xbe, 0x1d, 0x7a,
+ 0x73, 0xef, 0x2b, 0x80, 0x56, 0x16, 0x29, 0xbf,
+ 0x6e, 0x33, 0xed, 0x0d, 0xe2, 0x02, 0x60, 0x74,
+ 0xe9, 0x0a, 0xbc, 0xd1, 0xc5, 0xe8, 0x53, 0x02,
+ 0x79, 0x0f, 0x25, 0x0c, 0xef, 0xab, 0xd3, 0xbc,
+ 0xb7, 0xfc, 0xf3, 0xb0, 0x34, 0xd1, 0x07, 0xd2,
+ 0x5a, 0x31, 0x1f, 0xec, 0x1f, 0x87, 0xed, 0xdd,
+ 0x6a, 0xc1, 0xe8, 0xb3, 0x25, 0x4c, 0xc6, 0x9b,
+ 0x91, 0x73, 0xec, 0x06, 0x73, 0x9e, 0x57, 0x65,
+ 0x32, 0x75, 0x11, 0x74, 0x6e, 0xa4, 0x7d, 0x0d,
+ 0x74, 0x9f, 0x51, 0x10, 0x10, 0x47, 0xc9, 0x71,
+ 0x6e, 0x97, 0xae, 0x44, 0x41, 0xef, 0x98, 0x78,
+ 0xf4, 0xc5, 0xbd, 0x5e, 0x00, 0xe5, 0xfd, 0xe2,
+ 0xbe, 0x8c, 0xc2, 0xae, 0xc2, 0xee, 0x59, 0xf6,
+ 0xcb, 0x20, 0x54, 0x84, 0xc3, 0x31, 0x7e, 0x67,
+ 0x71, 0xb6, 0x76, 0xbe, 0x81, 0x8f, 0x82, 0xad,
+ 0x01, 0x8f, 0xc4, 0x00, 0x04, 0x3d, 0x8d, 0x34,
+ 0xaa, 0xea, 0xc0, 0xea, 0x91, 0x42, 0xb6, 0xb8,
+ 0x43, 0xf3, 0x17, 0xb2, 0x73, 0x64, 0x82, 0x97,
+ 0xd5, 0xc9, 0x07, 0x77, 0xb1, 0x26, 0xe2, 0x00,
+ 0x6a, 0xae, 0x70, 0x0b, 0xbe, 0xe6, 0xb8, 0x42,
+ 0x81, 0x55, 0xf7, 0xb8, 0x96, 0x41, 0x9d, 0xd4,
+ 0x2c, 0x27, 0x00, 0xcc, 0x91, 0x28, 0x22, 0xa4,
+ 0x7b, 0x42, 0x51, 0x9e, 0xd6, 0xec, 0xf3, 0x6b,
+ 0x00, 0xff, 0x5c, 0xa2, 0xac, 0x47, 0x33, 0x2d,
+ 0xf8, 0x11, 0x65, 0x5f, 0x4d, 0x79, 0x8b, 0x4f,
+ 0xad, 0xf0, 0x9d, 0xcd, 0xb9, 0x7b, 0x08, 0xf7,
+ 0x32, 0x51, 0xfa, 0x39, 0xaa, 0x78, 0x05, 0xb1,
+ 0xf3, 0x5d, 0xe8, 0x7c, 0x8e, 0x4f, 0xa2, 0xe0,
+ 0x98, 0x0c, 0xb2, 0xa7, 0xf0, 0x35, 0x8e, 0x70,
+ 0x7c, 0x82, 0xf3, 0x1b, 0x26, 0x28, 0x12, 0xe5,
+ 0x23, 0x57, 0xe4, 0xb4, 0x9b, 0x00, 0x39, 0x97,
+ 0xef, 0x7c, 0x46, 0x9b, 0x34, 0x6b, 0xe7, 0x0e,
+ 0xa3, 0x2a, 0x18, 0x11, 0x64, 0xc6, 0x7c, 0x8b,
+ 0x06, 0x02, 0xf5, 0x69, 0x76, 0xf9, 0xaa, 0x09,
+ 0x5f, 0x68, 0xf8, 0x4a, 0x79, 0x58, 0xec, 0x37,
+ 0xcf, 0x3a, 0xcc, 0x97, 0x70, 0x1d, 0x3e, 0x52,
+ 0x18, 0x0a, 0xad, 0x28, 0x5b, 0x3b, 0xe9, 0x03,
+ 0x84, 0xe9, 0x68, 0x50, 0xce, 0xc4, 0xbc, 0x3e,
+ 0x21, 0xad, 0x63, 0xfe, 0xc6, 0xfd, 0x6e, 0x69,
+ 0x84, 0xa9, 0x30, 0xb1, 0x7a, 0xc4, 0x31, 0x10,
+ 0xc1, 0x1f, 0x6e, 0xeb, 0xa5, 0xa6, 0x01
+};
+static const u8 output70[] __initconst = {
+ 0x0f, 0x93, 0x2a, 0x20, 0xb3, 0x87, 0x2d, 0xce,
+ 0xd1, 0x3b, 0x30, 0xfd, 0x06, 0x6d, 0x0a, 0xaa,
+ 0x3e, 0xc4, 0x29, 0x02, 0x8a, 0xde, 0xa6, 0x4b,
+ 0x45, 0x1b, 0x4f, 0x25, 0x59, 0xd5, 0x56, 0x6a,
+ 0x3b, 0x37, 0xbd, 0x3e, 0x47, 0x12, 0x2c, 0x4e,
+ 0x60, 0x5f, 0x05, 0x75, 0x61, 0x23, 0x05, 0x74,
+ 0xcb, 0xfc, 0x5a, 0xb3, 0xac, 0x5c, 0x3d, 0xab,
+ 0x52, 0x5f, 0x05, 0xbc, 0x57, 0xc0, 0x7e, 0xcf,
+ 0x34, 0x5d, 0x7f, 0x41, 0xa3, 0x17, 0x78, 0xd5,
+ 0x9f, 0xec, 0x0f, 0x1e, 0xf9, 0xfe, 0xa3, 0xbd,
+ 0x28, 0xb0, 0xba, 0x4d, 0x84, 0xdb, 0xae, 0x8f,
+ 0x1d, 0x98, 0xb7, 0xdc, 0xf9, 0xad, 0x55, 0x9c,
+ 0x89, 0xfe, 0x9b, 0x9c, 0xa9, 0x89, 0xf6, 0x97,
+ 0x9c, 0x3f, 0x09, 0x3e, 0xc6, 0x02, 0xc2, 0x55,
+ 0x58, 0x09, 0x54, 0x66, 0xe4, 0x36, 0x81, 0x35,
+ 0xca, 0x88, 0x17, 0x89, 0x80, 0x24, 0x2b, 0x21,
+ 0x89, 0xee, 0x45, 0x5a, 0xe7, 0x1f, 0xd5, 0xa5,
+ 0x16, 0xa4, 0xda, 0x70, 0x7e, 0xe9, 0x4f, 0x24,
+ 0x61, 0x97, 0xab, 0xa0, 0xe0, 0xe7, 0xb8, 0x5c,
+ 0x0f, 0x25, 0x17, 0x37, 0x75, 0x12, 0xb5, 0x40,
+ 0xde, 0x1c, 0x0d, 0x8a, 0x77, 0x62, 0x3c, 0x86,
+ 0xd9, 0x70, 0x2e, 0x96, 0x30, 0xd2, 0x55, 0xb3,
+ 0x6b, 0xc3, 0xf2, 0x9c, 0x47, 0xf3, 0x3a, 0x24,
+ 0x52, 0xc6, 0x38, 0xd8, 0x22, 0xb3, 0x0c, 0xfd,
+ 0x2f, 0xa3, 0x3c, 0xb5, 0xe8, 0x26, 0xe1, 0xa3,
+ 0xad, 0xb0, 0x82, 0x17, 0xc1, 0x53, 0xb8, 0x34,
+ 0x48, 0xee, 0x39, 0xae, 0x51, 0x43, 0xec, 0x82,
+ 0xce, 0x87, 0xc6, 0x76, 0xb9, 0x76, 0xd3, 0x53,
+ 0xfe, 0x49, 0x24, 0x7d, 0x02, 0x42, 0x2b, 0x72,
+ 0xfb, 0xcb, 0xd8, 0x96, 0x02, 0xc6, 0x9a, 0x20,
+ 0xf3, 0x5a, 0x67, 0xe8, 0x13, 0xf8, 0xb2, 0xcb,
+ 0xa2, 0xec, 0x18, 0x20, 0x4a, 0xb0, 0x73, 0x53,
+ 0x21, 0xb0, 0x77, 0x53, 0xd8, 0x76, 0xa1, 0x30,
+ 0x17, 0x72, 0x2e, 0x33, 0x5f, 0x33, 0x6b, 0x28,
+ 0xfb, 0xb0, 0xf4, 0xec, 0x8e, 0xed, 0x20, 0x7d,
+ 0x57, 0x8c, 0x74, 0x28, 0x64, 0x8b, 0xeb, 0x59,
+ 0x38, 0x3f, 0xe7, 0x83, 0x2e, 0xe5, 0x64, 0x4d,
+ 0x5c, 0x1f, 0xe1, 0x3b, 0xd9, 0x84, 0xdb, 0xc9,
+ 0xec, 0xd8, 0xc1, 0x7c, 0x1f, 0x1b, 0x68, 0x35,
+ 0xc6, 0x34, 0x10, 0xef, 0x19, 0xc9, 0x0a, 0xd6,
+ 0x43, 0x7f, 0xa6, 0xcb, 0x9d, 0xf4, 0xf0, 0x16,
+ 0xb1, 0xb1, 0x96, 0x64, 0xec, 0x8d, 0x22, 0x4c,
+ 0x4b, 0xe8, 0x1a, 0xba, 0x6f, 0xb7, 0xfc, 0xa5,
+ 0x69, 0x3e, 0xad, 0x78, 0x79, 0x19, 0xb5, 0x04,
+ 0x69, 0xe5, 0x3f, 0xff, 0x60, 0x8c, 0xda, 0x0b,
+ 0x7b, 0xf7, 0xe7, 0xe6, 0x29, 0x3a, 0x85, 0xba,
+ 0xb5, 0xb0, 0x35, 0xbd, 0x38, 0xce, 0x34, 0x5e,
+ 0xf2, 0xdc, 0xd1, 0x8f, 0xc3, 0x03, 0x24, 0xa2,
+ 0x03, 0xf7, 0x4e, 0x49, 0x5b, 0xcf, 0x6d, 0xb0,
+ 0xeb, 0xe3, 0x30, 0x28, 0xd5, 0x5b, 0x82, 0x5f,
+ 0xe4, 0x7c, 0x1e, 0xec, 0xd2, 0x39, 0xf9, 0x6f,
+ 0x2e, 0xb3, 0xcd, 0x01, 0xb1, 0x67, 0xaa, 0xea,
+ 0xaa, 0xb3, 0x63, 0xaf, 0xd9, 0xb2, 0x1f, 0xba,
+ 0x05, 0x20, 0xeb, 0x19, 0x32, 0xf0, 0x6c, 0x3f,
+ 0x40, 0xcc, 0x93, 0xb3, 0xd8, 0x25, 0xa6, 0xe4,
+ 0xce, 0xd7, 0x7e, 0x48, 0x99, 0x65, 0x7f, 0x86,
+ 0xc5, 0xd4, 0x79, 0x6b, 0xab, 0x43, 0xb8, 0x6b,
+ 0xf1, 0x2f, 0xea, 0x4c, 0x5e, 0xf0, 0x3b, 0xb4,
+ 0xb8, 0xb0, 0x94, 0x0c, 0x6b, 0xe7, 0x22, 0x93,
+ 0xaa, 0x01, 0xcb, 0xf1, 0x11, 0x60, 0xf6, 0x69,
+ 0xcf, 0x14, 0xde, 0xfb, 0x90, 0x05, 0x27, 0x0c,
+ 0x1a, 0x9e, 0xf0, 0xb4, 0xc6, 0xa1, 0xe8, 0xdd,
+ 0xd0, 0x4c, 0x25, 0x4f, 0x9c, 0xb7, 0xb1, 0xb0,
+ 0x21, 0xdb, 0x87, 0x09, 0x03, 0xf2, 0xb3
+};
+static const u8 key70[] __initconst = {
+ 0x3b, 0x5b, 0x59, 0x36, 0x44, 0xd1, 0xba, 0x71,
+ 0x55, 0x87, 0x4d, 0x62, 0x3d, 0xc2, 0xfc, 0xaa,
+ 0x3f, 0x4e, 0x1a, 0xe4, 0xca, 0x09, 0xfc, 0x6a,
+ 0xb2, 0xd6, 0x5d, 0x79, 0xf9, 0x1a, 0x91, 0xa7
+};
+enum { nonce70 = 0x3fd6786dd147a85ULL };
+
+static const u8 input71[] __initconst = {
+ 0x18, 0x78, 0xd6, 0x79, 0xe4, 0x9a, 0x6c, 0x73,
+ 0x17, 0xd4, 0x05, 0x0f, 0x1e, 0x9f, 0xd9, 0x2b,
+ 0x86, 0x48, 0x7d, 0xf4, 0xd9, 0x1c, 0x76, 0xfc,
+ 0x8e, 0x22, 0x34, 0xe1, 0x48, 0x4a, 0x8d, 0x79,
+ 0xb7, 0xbb, 0x88, 0xab, 0x90, 0xde, 0xc5, 0xb4,
+ 0xb4, 0xe7, 0x85, 0x49, 0xda, 0x57, 0xeb, 0xc9,
+ 0xcd, 0x21, 0xfc, 0x45, 0x6e, 0x32, 0x67, 0xf2,
+ 0x4f, 0xa6, 0x54, 0xe5, 0x20, 0xed, 0xcf, 0xc6,
+ 0x62, 0x25, 0x8e, 0x00, 0xf8, 0x6b, 0xa2, 0x80,
+ 0xac, 0x88, 0xa6, 0x59, 0x27, 0x83, 0x95, 0x11,
+ 0x3f, 0x70, 0x5e, 0x3f, 0x11, 0xfb, 0x26, 0xbf,
+ 0xe1, 0x48, 0x75, 0xf9, 0x86, 0xbf, 0xa6, 0x5d,
+ 0x15, 0x61, 0x66, 0xbf, 0x78, 0x8f, 0x6b, 0x9b,
+ 0xda, 0x98, 0xb7, 0x19, 0xe2, 0xf2, 0xa3, 0x9c,
+ 0x7c, 0x6a, 0x9a, 0xd8, 0x3d, 0x4c, 0x2c, 0xe1,
+ 0x09, 0xb4, 0x28, 0x82, 0x4e, 0xab, 0x0c, 0x75,
+ 0x63, 0xeb, 0xbc, 0xd0, 0x71, 0xa2, 0x73, 0x85,
+ 0xed, 0x53, 0x7a, 0x3f, 0x68, 0x9f, 0xd0, 0xa9,
+ 0x00, 0x5a, 0x9e, 0x80, 0x55, 0x00, 0xe6, 0xae,
+ 0x0c, 0x03, 0x40, 0xed, 0xfc, 0x68, 0x4a, 0xb7,
+ 0x1e, 0x09, 0x65, 0x30, 0x5a, 0x3d, 0x97, 0x4d,
+ 0x5e, 0x51, 0x8e, 0xda, 0xc3, 0x55, 0x8c, 0xfb,
+ 0xcf, 0x83, 0x05, 0x35, 0x0d, 0x08, 0x1b, 0xf3,
+ 0x3a, 0x57, 0x96, 0xac, 0x58, 0x8b, 0xfa, 0x00,
+ 0x49, 0x15, 0x78, 0xd2, 0x4b, 0xed, 0xb8, 0x59,
+ 0x78, 0x9b, 0x7f, 0xaa, 0xfc, 0xe7, 0x46, 0xdc,
+ 0x7b, 0x34, 0xd0, 0x34, 0xe5, 0x10, 0xff, 0x4d,
+ 0x5a, 0x4d, 0x60, 0xa7, 0x16, 0x54, 0xc4, 0xfd,
+ 0xca, 0x5d, 0x68, 0xc7, 0x4a, 0x01, 0x8d, 0x7f,
+ 0x74, 0x5d, 0xff, 0xb8, 0x37, 0x15, 0x62, 0xfa,
+ 0x44, 0x45, 0xcf, 0x77, 0x3b, 0x1d, 0xb2, 0xd2,
+ 0x0d, 0x42, 0x00, 0x39, 0x68, 0x1f, 0xcc, 0x89,
+ 0x73, 0x5d, 0xa9, 0x2e, 0xfd, 0x58, 0x62, 0xca,
+ 0x35, 0x8e, 0x70, 0x70, 0xaa, 0x6e, 0x14, 0xe9,
+ 0xa4, 0xe2, 0x10, 0x66, 0x71, 0xdc, 0x4c, 0xfc,
+ 0xa9, 0xdc, 0x8f, 0x57, 0x4d, 0xc5, 0xac, 0xd7,
+ 0xa9, 0xf3, 0xf3, 0xa1, 0xff, 0x62, 0xa0, 0x8f,
+ 0xe4, 0x96, 0x3e, 0xcb, 0x9f, 0x76, 0x42, 0x39,
+ 0x1f, 0x24, 0xfd, 0xfd, 0x79, 0xe8, 0x27, 0xdf,
+ 0xa8, 0xf6, 0x33, 0x8b, 0x31, 0x59, 0x69, 0xcf,
+ 0x6a, 0xef, 0x89, 0x4d, 0xa7, 0xf6, 0x7e, 0x97,
+ 0x14, 0xbd, 0xda, 0xdd, 0xb4, 0x84, 0x04, 0x24,
+ 0xe0, 0x17, 0xe1, 0x0f, 0x1f, 0x8a, 0x6a, 0x71,
+ 0x74, 0x41, 0xdc, 0x59, 0x5c, 0x8f, 0x01, 0x25,
+ 0x92, 0xf0, 0x2e, 0x15, 0x62, 0x71, 0x9a, 0x9f,
+ 0x87, 0xdf, 0x62, 0x49, 0x7f, 0x86, 0x62, 0xfc,
+ 0x20, 0x84, 0xd7, 0xe3, 0x3a, 0xd9, 0x37, 0x85,
+ 0xb7, 0x84, 0x5a, 0xf9, 0xed, 0x21, 0x32, 0x94,
+ 0x3e, 0x04, 0xe7, 0x8c, 0x46, 0x76, 0x21, 0x67,
+ 0xf6, 0x95, 0x64, 0x92, 0xb7, 0x15, 0xf6, 0xe3,
+ 0x41, 0x27, 0x9d, 0xd7, 0xe3, 0x79, 0x75, 0x92,
+ 0xd0, 0xc1, 0xf3, 0x40, 0x92, 0x08, 0xde, 0x90,
+ 0x22, 0x82, 0xb2, 0x69, 0xae, 0x1a, 0x35, 0x11,
+ 0x89, 0xc8, 0x06, 0x82, 0x95, 0x23, 0x44, 0x08,
+ 0x22, 0xf2, 0x71, 0x73, 0x1b, 0x88, 0x11, 0xcf,
+ 0x1c, 0x7e, 0x8a, 0x2e, 0xdc, 0x79, 0x57, 0xce,
+ 0x1f, 0xe7, 0x6c, 0x07, 0xd8, 0x06, 0xbe, 0xec,
+ 0xa3, 0xcf, 0xf9, 0x68, 0xa5, 0xb8, 0xf0, 0xe3,
+ 0x3f, 0x01, 0x92, 0xda, 0xf1, 0xa0, 0x2d, 0x7b,
+ 0xab, 0x57, 0x58, 0x2a, 0xaf, 0xab, 0xbd, 0xf2,
+ 0xe5, 0xaf, 0x7e, 0x1f, 0x46, 0x24, 0x9e, 0x20,
+ 0x22, 0x0f, 0x84, 0x4c, 0xb7, 0xd8, 0x03, 0xe8,
+ 0x09, 0x73, 0x6c, 0xc6, 0x9b, 0x90, 0xe0, 0xdb,
+ 0xf2, 0x71, 0xba, 0xad, 0xb3, 0xec, 0xda, 0x7a
+};
+static const u8 output71[] __initconst = {
+ 0x28, 0xc5, 0x9b, 0x92, 0xf9, 0x21, 0x4f, 0xbb,
+ 0xef, 0x3b, 0xf0, 0xf5, 0x3a, 0x6d, 0x7f, 0xd6,
+ 0x6a, 0x8d, 0xa1, 0x01, 0x5c, 0x62, 0x20, 0x8b,
+ 0x5b, 0x39, 0xd5, 0xd3, 0xc2, 0xf6, 0x9d, 0x5e,
+ 0xcc, 0xe1, 0xa2, 0x61, 0x16, 0xe2, 0xce, 0xe9,
+ 0x86, 0xd0, 0xfc, 0xce, 0x9a, 0x28, 0x27, 0xc4,
+ 0x0c, 0xb9, 0xaa, 0x8d, 0x48, 0xdb, 0xbf, 0x82,
+ 0x7d, 0xd0, 0x35, 0xc4, 0x06, 0x34, 0xb4, 0x19,
+ 0x51, 0x73, 0xf4, 0x7a, 0xf4, 0xfd, 0xe9, 0x1d,
+ 0xdc, 0x0f, 0x7e, 0xf7, 0x96, 0x03, 0xe3, 0xb1,
+ 0x2e, 0x22, 0x59, 0xb7, 0x6d, 0x1c, 0x97, 0x8c,
+ 0xd7, 0x31, 0x08, 0x26, 0x4c, 0x6d, 0xc6, 0x14,
+ 0xa5, 0xeb, 0x45, 0x6a, 0x88, 0xa3, 0xa2, 0x36,
+ 0xc4, 0x35, 0xb1, 0x5a, 0xa0, 0xad, 0xf7, 0x06,
+ 0x9b, 0x5d, 0xc1, 0x15, 0xc1, 0xce, 0x0a, 0xb0,
+ 0x57, 0x2e, 0x3f, 0x6f, 0x0d, 0x10, 0xd9, 0x11,
+ 0x2c, 0x9c, 0xad, 0x2d, 0xa5, 0x81, 0xfb, 0x4e,
+ 0x8f, 0xd5, 0x32, 0x4e, 0xaf, 0x5c, 0xc1, 0x86,
+ 0xde, 0x56, 0x5a, 0x33, 0x29, 0xf7, 0x67, 0xc6,
+ 0x37, 0x6f, 0xb2, 0x37, 0x4e, 0xd4, 0x69, 0x79,
+ 0xaf, 0xd5, 0x17, 0x79, 0xe0, 0xba, 0x62, 0xa3,
+ 0x68, 0xa4, 0x87, 0x93, 0x8d, 0x7e, 0x8f, 0xa3,
+ 0x9c, 0xef, 0xda, 0xe3, 0xa5, 0x1f, 0xcd, 0x30,
+ 0xa6, 0x55, 0xac, 0x4c, 0x69, 0x74, 0x02, 0xc7,
+ 0x5d, 0x95, 0x81, 0x4a, 0x68, 0x11, 0xd3, 0xa9,
+ 0x98, 0xb1, 0x0b, 0x0d, 0xae, 0x40, 0x86, 0x65,
+ 0xbf, 0xcc, 0x2d, 0xef, 0x57, 0xca, 0x1f, 0xe4,
+ 0x34, 0x4e, 0xa6, 0x5e, 0x82, 0x6e, 0x61, 0xad,
+ 0x0b, 0x3c, 0xf8, 0xeb, 0x01, 0x43, 0x7f, 0x87,
+ 0xa2, 0xa7, 0x6a, 0xe9, 0x62, 0x23, 0x24, 0x61,
+ 0xf1, 0xf7, 0x36, 0xdb, 0x10, 0xe5, 0x57, 0x72,
+ 0x3a, 0xc2, 0xae, 0xcc, 0x75, 0xc7, 0x80, 0x05,
+ 0x0a, 0x5c, 0x4c, 0x95, 0xda, 0x02, 0x01, 0x14,
+ 0x06, 0x6b, 0x5c, 0x65, 0xc2, 0xb8, 0x4a, 0xd6,
+ 0xd3, 0xb4, 0xd8, 0x12, 0x52, 0xb5, 0x60, 0xd3,
+ 0x8e, 0x5f, 0x5c, 0x76, 0x33, 0x7a, 0x05, 0xe5,
+ 0xcb, 0xef, 0x4f, 0x89, 0xf1, 0xba, 0x32, 0x6f,
+ 0x33, 0xcd, 0x15, 0x8d, 0xa3, 0x0c, 0x3f, 0x63,
+ 0x11, 0xe7, 0x0e, 0xe0, 0x00, 0x01, 0xe9, 0xe8,
+ 0x8e, 0x36, 0x34, 0x8d, 0x96, 0xb5, 0x03, 0xcf,
+ 0x55, 0x62, 0x49, 0x7a, 0x34, 0x44, 0xa5, 0xee,
+ 0x8c, 0x46, 0x06, 0x22, 0xab, 0x1d, 0x53, 0x9c,
+ 0xa1, 0xf9, 0x67, 0x18, 0x57, 0x89, 0xf9, 0xc2,
+ 0xd1, 0x7e, 0xbe, 0x36, 0x40, 0xcb, 0xe9, 0x04,
+ 0xde, 0xb1, 0x3b, 0x29, 0x52, 0xc5, 0x9a, 0xb5,
+ 0xa2, 0x7c, 0x7b, 0xfe, 0xe5, 0x92, 0x73, 0xea,
+ 0xea, 0x7b, 0xba, 0x0a, 0x8c, 0x88, 0x15, 0xe6,
+ 0x53, 0xbf, 0x1c, 0x33, 0xf4, 0x9b, 0x9a, 0x5e,
+ 0x8d, 0xae, 0x60, 0xdc, 0xcb, 0x5d, 0xfa, 0xbe,
+ 0x06, 0xc3, 0x3f, 0x06, 0xe7, 0x00, 0x40, 0x7b,
+ 0xaa, 0x94, 0xfa, 0x6d, 0x1f, 0xe4, 0xc5, 0xa9,
+ 0x1b, 0x5f, 0x36, 0xea, 0x5a, 0xdd, 0xa5, 0x48,
+ 0x6a, 0x55, 0xd2, 0x47, 0x28, 0xbf, 0x96, 0xf1,
+ 0x9f, 0xb6, 0x11, 0x4b, 0xd3, 0x44, 0x7d, 0x48,
+ 0x41, 0x61, 0xdb, 0x12, 0xd4, 0xc2, 0x59, 0x82,
+ 0x4c, 0x47, 0x5c, 0x04, 0xf6, 0x7b, 0xd3, 0x92,
+ 0x2e, 0xe8, 0x40, 0xef, 0x15, 0x32, 0x97, 0xdc,
+ 0x35, 0x4c, 0x6e, 0xa4, 0x97, 0xe9, 0x24, 0xde,
+ 0x63, 0x8b, 0xb1, 0x6b, 0x48, 0xbb, 0x46, 0x1f,
+ 0x84, 0xd6, 0x17, 0xb0, 0x5a, 0x4a, 0x4e, 0xd5,
+ 0x31, 0xd7, 0xcf, 0xa0, 0x39, 0xc6, 0x2e, 0xfc,
+ 0xa6, 0xa3, 0xd3, 0x0f, 0xa4, 0x28, 0xac, 0xb2,
+ 0xf4, 0x48, 0x8d, 0x50, 0xa5, 0x1c, 0x44, 0x5d,
+ 0x6e, 0x38, 0xb7, 0x2b, 0x8a, 0x45, 0xa7, 0x3d
+};
+static const u8 key71[] __initconst = {
+ 0x8b, 0x68, 0xc4, 0xb7, 0x0d, 0x81, 0xef, 0x52,
+ 0x1e, 0x05, 0x96, 0x72, 0x62, 0x89, 0x27, 0x83,
+ 0xd0, 0xc7, 0x33, 0x6d, 0xf2, 0xcc, 0x69, 0xf9,
+ 0x23, 0xae, 0x99, 0xb1, 0xd1, 0x05, 0x4e, 0x54
+};
+enum { nonce71 = 0x983f03656d64b5f6ULL };
+
+static const u8 input72[] __initconst = {
+ 0x6b, 0x09, 0xc9, 0x57, 0x3d, 0x79, 0x04, 0x8c,
+ 0x65, 0xad, 0x4a, 0x0f, 0xa1, 0x31, 0x3a, 0xdd,
+ 0x14, 0x8e, 0xe8, 0xfe, 0xbf, 0x42, 0x87, 0x98,
+ 0x2e, 0x8d, 0x83, 0xa3, 0xf8, 0x55, 0x3d, 0x84,
+ 0x1e, 0x0e, 0x05, 0x4a, 0x38, 0x9e, 0xe7, 0xfe,
+ 0xd0, 0x4d, 0x79, 0x74, 0x3a, 0x0b, 0x9b, 0xe1,
+ 0xfd, 0x51, 0x84, 0x4e, 0xb2, 0x25, 0xe4, 0x64,
+ 0x4c, 0xda, 0xcf, 0x46, 0xec, 0xba, 0x12, 0xeb,
+ 0x5a, 0x33, 0x09, 0x6e, 0x78, 0x77, 0x8f, 0x30,
+ 0xb1, 0x7d, 0x3f, 0x60, 0x8c, 0xf2, 0x1d, 0x8e,
+ 0xb4, 0x70, 0xa2, 0x90, 0x7c, 0x79, 0x1a, 0x2c,
+ 0xf6, 0x28, 0x79, 0x7c, 0x53, 0xc5, 0xfa, 0xcc,
+ 0x65, 0x9b, 0xe1, 0x51, 0xd1, 0x7f, 0x1d, 0xc4,
+ 0xdb, 0xd4, 0xd9, 0x04, 0x61, 0x7d, 0xbe, 0x12,
+ 0xfc, 0xcd, 0xaf, 0xe4, 0x0f, 0x9c, 0x20, 0xb5,
+ 0x22, 0x40, 0x18, 0xda, 0xe4, 0xda, 0x8c, 0x2d,
+ 0x84, 0xe3, 0x5f, 0x53, 0x17, 0xed, 0x78, 0xdc,
+ 0x2f, 0xe8, 0x31, 0xc7, 0xe6, 0x39, 0x71, 0x40,
+ 0xb4, 0x0f, 0xc9, 0xa9, 0x7e, 0x78, 0x87, 0xc1,
+ 0x05, 0x78, 0xbb, 0x01, 0xf2, 0x8f, 0x33, 0xb0,
+ 0x6e, 0x84, 0xcd, 0x36, 0x33, 0x5c, 0x5b, 0x8e,
+ 0xf1, 0xac, 0x30, 0xfe, 0x33, 0xec, 0x08, 0xf3,
+ 0x7e, 0xf2, 0xf0, 0x4c, 0xf2, 0xad, 0xd8, 0xc1,
+ 0xd4, 0x4e, 0x87, 0x06, 0xd4, 0x75, 0xe7, 0xe3,
+ 0x09, 0xd3, 0x4d, 0xe3, 0x21, 0x32, 0xba, 0xb4,
+ 0x68, 0x68, 0xcb, 0x4c, 0xa3, 0x1e, 0xb3, 0x87,
+ 0x7b, 0xd3, 0x0c, 0x63, 0x37, 0x71, 0x79, 0xfb,
+ 0x58, 0x36, 0x57, 0x0f, 0x34, 0x1d, 0xc1, 0x42,
+ 0x02, 0x17, 0xe7, 0xed, 0xe8, 0xe7, 0x76, 0xcb,
+ 0x42, 0xc4, 0x4b, 0xe2, 0xb2, 0x5e, 0x42, 0xd5,
+ 0xec, 0x9d, 0xc1, 0x32, 0x71, 0xe4, 0xeb, 0x10,
+ 0x68, 0x1a, 0x6e, 0x99, 0x8e, 0x73, 0x12, 0x1f,
+ 0x97, 0x0c, 0x9e, 0xcd, 0x02, 0x3e, 0x4c, 0xa0,
+ 0xf2, 0x8d, 0xe5, 0x44, 0xca, 0x6d, 0xfe, 0x07,
+ 0xe3, 0xe8, 0x9b, 0x76, 0xc1, 0x6d, 0xb7, 0x6e,
+ 0x0d, 0x14, 0x00, 0x6f, 0x8a, 0xfd, 0x43, 0xc6,
+ 0x43, 0xa5, 0x9c, 0x02, 0x47, 0x10, 0xd4, 0xb4,
+ 0x9b, 0x55, 0x67, 0xc8, 0x7f, 0xc1, 0x8a, 0x1f,
+ 0x1e, 0xd1, 0xbc, 0x99, 0x5d, 0x50, 0x4f, 0x89,
+ 0xf1, 0xe6, 0x5d, 0x91, 0x40, 0xdc, 0x20, 0x67,
+ 0x56, 0xc2, 0xef, 0xbd, 0x2c, 0xa2, 0x99, 0x38,
+ 0xe0, 0x45, 0xec, 0x44, 0x05, 0x52, 0x65, 0x11,
+ 0xfc, 0x3b, 0x19, 0xcb, 0x71, 0xc2, 0x8e, 0x0e,
+ 0x03, 0x2a, 0x03, 0x3b, 0x63, 0x06, 0x31, 0x9a,
+ 0xac, 0x53, 0x04, 0x14, 0xd4, 0x80, 0x9d, 0x6b,
+ 0x42, 0x7e, 0x7e, 0x4e, 0xdc, 0xc7, 0x01, 0x49,
+ 0x9f, 0xf5, 0x19, 0x86, 0x13, 0x28, 0x2b, 0xa6,
+ 0xa6, 0xbe, 0xa1, 0x7e, 0x71, 0x05, 0x00, 0xff,
+ 0x59, 0x2d, 0xb6, 0x63, 0xf0, 0x1e, 0x2e, 0x69,
+ 0x9b, 0x85, 0xf1, 0x1e, 0x8a, 0x64, 0x39, 0xab,
+ 0x00, 0x12, 0xe4, 0x33, 0x4b, 0xb5, 0xd8, 0xb3,
+ 0x6b, 0x5b, 0x8b, 0x5c, 0xd7, 0x6f, 0x23, 0xcf,
+ 0x3f, 0x2e, 0x5e, 0x47, 0xb9, 0xb8, 0x1f, 0xf0,
+ 0x1d, 0xda, 0xe7, 0x4f, 0x6e, 0xab, 0xc3, 0x36,
+ 0xb4, 0x74, 0x6b, 0xeb, 0xc7, 0x5d, 0x91, 0xe5,
+ 0xda, 0xf2, 0xc2, 0x11, 0x17, 0x48, 0xf8, 0x9c,
+ 0xc9, 0x8b, 0xc1, 0xa2, 0xf4, 0xcd, 0x16, 0xf8,
+ 0x27, 0xd9, 0x6c, 0x6f, 0xb5, 0x8f, 0x77, 0xca,
+ 0x1b, 0xd8, 0xef, 0x84, 0x68, 0x71, 0x53, 0xc1,
+ 0x43, 0x0f, 0x9f, 0x98, 0xae, 0x7e, 0x31, 0xd2,
+ 0x98, 0xfb, 0x20, 0xa2, 0xad, 0x00, 0x10, 0x83,
+ 0x00, 0x8b, 0xeb, 0x56, 0xd2, 0xc4, 0xcc, 0x7f,
+ 0x2f, 0x4e, 0xfa, 0x88, 0x13, 0xa4, 0x2c, 0xde,
+ 0x6b, 0x77, 0x86, 0x10, 0x6a, 0xab, 0x43, 0x0a,
+ 0x02
+};
+static const u8 output72[] __initconst = {
+ 0x42, 0x89, 0xa4, 0x80, 0xd2, 0xcb, 0x5f, 0x7f,
+ 0x2a, 0x1a, 0x23, 0x00, 0xa5, 0x6a, 0x95, 0xa3,
+ 0x9a, 0x41, 0xa1, 0xd0, 0x2d, 0x1e, 0xd6, 0x13,
+ 0x34, 0x40, 0x4e, 0x7f, 0x1a, 0xbe, 0xa0, 0x3d,
+ 0x33, 0x9c, 0x56, 0x2e, 0x89, 0x25, 0x45, 0xf9,
+ 0xf0, 0xba, 0x9c, 0x6d, 0xd1, 0xd1, 0xde, 0x51,
+ 0x47, 0x63, 0xc9, 0xbd, 0xfa, 0xa2, 0x9e, 0xad,
+ 0x6a, 0x7b, 0x21, 0x1a, 0x6c, 0x3e, 0xff, 0x46,
+ 0xbe, 0xf3, 0x35, 0x7a, 0x6e, 0xb3, 0xb9, 0xf7,
+ 0xda, 0x5e, 0xf0, 0x14, 0xb5, 0x70, 0xa4, 0x2b,
+ 0xdb, 0xbb, 0xc7, 0x31, 0x4b, 0x69, 0x5a, 0x83,
+ 0x70, 0xd9, 0x58, 0xd4, 0x33, 0x84, 0x23, 0xf0,
+ 0xae, 0xbb, 0x6d, 0x26, 0x7c, 0xc8, 0x30, 0xf7,
+ 0x24, 0xad, 0xbd, 0xe4, 0x2c, 0x38, 0x38, 0xac,
+ 0xe1, 0x4a, 0x9b, 0xac, 0x33, 0x0e, 0x4a, 0xf4,
+ 0x93, 0xed, 0x07, 0x82, 0x81, 0x4f, 0x8f, 0xb1,
+ 0xdd, 0x73, 0xd5, 0x50, 0x6d, 0x44, 0x1e, 0xbe,
+ 0xa7, 0xcd, 0x17, 0x57, 0xd5, 0x3b, 0x62, 0x36,
+ 0xcf, 0x7d, 0xc8, 0xd8, 0xd1, 0x78, 0xd7, 0x85,
+ 0x46, 0x76, 0x5d, 0xcc, 0xfe, 0xe8, 0x94, 0xc5,
+ 0xad, 0xbc, 0x5e, 0xbc, 0x8d, 0x1d, 0xdf, 0x03,
+ 0xc9, 0x6b, 0x1b, 0x81, 0xd1, 0xb6, 0x5a, 0x24,
+ 0xe3, 0xdc, 0x3f, 0x20, 0xc9, 0x07, 0x73, 0x4c,
+ 0x43, 0x13, 0x87, 0x58, 0x34, 0x0d, 0x14, 0x63,
+ 0x0f, 0x6f, 0xad, 0x8d, 0xac, 0x7c, 0x67, 0x68,
+ 0xa3, 0x9d, 0x7f, 0x00, 0xdf, 0x28, 0xee, 0x67,
+ 0xf4, 0x5c, 0x26, 0xcb, 0xef, 0x56, 0x71, 0xc8,
+ 0xc6, 0x67, 0x5f, 0x38, 0xbb, 0xa0, 0xb1, 0x5c,
+ 0x1f, 0xb3, 0x08, 0xd9, 0x38, 0xcf, 0x74, 0x54,
+ 0xc6, 0xa4, 0xc4, 0xc0, 0x9f, 0xb3, 0xd0, 0xda,
+ 0x62, 0x67, 0x8b, 0x81, 0x33, 0xf0, 0xa9, 0x73,
+ 0xa4, 0xd1, 0x46, 0x88, 0x8d, 0x85, 0x12, 0x40,
+ 0xba, 0x1a, 0xcd, 0x82, 0xd8, 0x8d, 0xc4, 0x52,
+ 0xe7, 0x01, 0x94, 0x2e, 0x0e, 0xd0, 0xaf, 0xe7,
+ 0x2d, 0x3f, 0x3c, 0xaa, 0xf4, 0xf5, 0xa7, 0x01,
+ 0x4c, 0x14, 0xe2, 0xc2, 0x96, 0x76, 0xbe, 0x05,
+ 0xaa, 0x19, 0xb1, 0xbd, 0x95, 0xbb, 0x5a, 0xf9,
+ 0xa5, 0xa7, 0xe6, 0x16, 0x38, 0x34, 0xf7, 0x9d,
+ 0x19, 0x66, 0x16, 0x8e, 0x7f, 0x2b, 0x5a, 0xfb,
+ 0xb5, 0x29, 0x79, 0xbf, 0x52, 0xae, 0x30, 0x95,
+ 0x3f, 0x31, 0x33, 0x28, 0xde, 0xc5, 0x0d, 0x55,
+ 0x89, 0xec, 0x21, 0x11, 0x0f, 0x8b, 0xfe, 0x63,
+ 0x3a, 0xf1, 0x95, 0x5c, 0xcd, 0x50, 0xe4, 0x5d,
+ 0x8f, 0xa7, 0xc8, 0xca, 0x93, 0xa0, 0x67, 0x82,
+ 0x63, 0x5c, 0xd0, 0xed, 0xe7, 0x08, 0xc5, 0x60,
+ 0xf8, 0xb4, 0x47, 0xf0, 0x1a, 0x65, 0x4e, 0xa3,
+ 0x51, 0x68, 0xc7, 0x14, 0xa1, 0xd9, 0x39, 0x72,
+ 0xa8, 0x6f, 0x7c, 0x7e, 0xf6, 0x03, 0x0b, 0x25,
+ 0x9b, 0xf2, 0xca, 0x49, 0xae, 0x5b, 0xf8, 0x0f,
+ 0x71, 0x51, 0x01, 0xa6, 0x23, 0xa9, 0xdf, 0xd0,
+ 0x7a, 0x39, 0x19, 0xf5, 0xc5, 0x26, 0x44, 0x7b,
+ 0x0a, 0x4a, 0x41, 0xbf, 0xf2, 0x8e, 0x83, 0x50,
+ 0x91, 0x96, 0x72, 0x02, 0xf6, 0x80, 0xbf, 0x95,
+ 0x41, 0xac, 0xda, 0xb0, 0xba, 0xe3, 0x76, 0xb1,
+ 0x9d, 0xff, 0x1f, 0x33, 0x02, 0x85, 0xfc, 0x2a,
+ 0x29, 0xe6, 0xe3, 0x9d, 0xd0, 0xef, 0xc2, 0xd6,
+ 0x9c, 0x4a, 0x62, 0xac, 0xcb, 0xea, 0x8b, 0xc3,
+ 0x08, 0x6e, 0x49, 0x09, 0x26, 0x19, 0xc1, 0x30,
+ 0xcc, 0x27, 0xaa, 0xc6, 0x45, 0x88, 0xbd, 0xae,
+ 0xd6, 0x79, 0xff, 0x4e, 0xfc, 0x66, 0x4d, 0x02,
+ 0xa5, 0xee, 0x8e, 0xa5, 0xb6, 0x15, 0x72, 0x24,
+ 0xb1, 0xbf, 0xbf, 0x64, 0xcf, 0xcc, 0x93, 0xe9,
+ 0xb6, 0xfd, 0xb4, 0xb6, 0x21, 0xb5, 0x48, 0x08,
+ 0x0f, 0x11, 0x65, 0xe1, 0x47, 0xee, 0x93, 0x29,
+ 0xad
+};
+static const u8 key72[] __initconst = {
+ 0xb9, 0xa2, 0xfc, 0x59, 0x06, 0x3f, 0x77, 0xa5,
+ 0x66, 0xd0, 0x2b, 0x22, 0x74, 0x22, 0x4c, 0x1e,
+ 0x6a, 0x39, 0xdf, 0xe1, 0x0d, 0x4c, 0x64, 0x99,
+ 0x54, 0x8a, 0xba, 0x1d, 0x2c, 0x21, 0x5f, 0xc3
+};
+enum { nonce72 = 0x3d069308fa3db04bULL };
+
+static const u8 input73[] __initconst = {
+ 0xe4, 0xdd, 0x36, 0xd4, 0xf5, 0x70, 0x51, 0x73,
+ 0x97, 0x1d, 0x45, 0x05, 0x92, 0xe7, 0xeb, 0xb7,
+ 0x09, 0x82, 0x6e, 0x25, 0x6c, 0x50, 0xf5, 0x40,
+ 0x19, 0xba, 0xbc, 0xf4, 0x39, 0x14, 0xc5, 0x15,
+ 0x83, 0x40, 0xbd, 0x26, 0xe0, 0xff, 0x3b, 0x22,
+ 0x7c, 0x7c, 0xd7, 0x0b, 0xe9, 0x25, 0x0c, 0x3d,
+ 0x92, 0x38, 0xbe, 0xe4, 0x22, 0x75, 0x65, 0xf1,
+ 0x03, 0x85, 0x34, 0x09, 0xb8, 0x77, 0xfb, 0x48,
+ 0xb1, 0x2e, 0x21, 0x67, 0x9b, 0x9d, 0xad, 0x18,
+ 0x82, 0x0d, 0x6b, 0xc3, 0xcf, 0x00, 0x61, 0x6e,
+ 0xda, 0xdc, 0xa7, 0x0b, 0x5c, 0x02, 0x1d, 0xa6,
+ 0x4e, 0x0d, 0x7f, 0x37, 0x01, 0x5a, 0x37, 0xf3,
+ 0x2b, 0xbf, 0xba, 0xe2, 0x1c, 0xb3, 0xa3, 0xbc,
+ 0x1c, 0x93, 0x1a, 0xb1, 0x71, 0xaf, 0xe2, 0xdd,
+ 0x17, 0xee, 0x53, 0xfa, 0xfb, 0x02, 0x40, 0x3e,
+ 0x03, 0xca, 0xe7, 0xc3, 0x51, 0x81, 0xcc, 0x8c,
+ 0xca, 0xcf, 0x4e, 0xc5, 0x78, 0x99, 0xfd, 0xbf,
+ 0xea, 0xab, 0x38, 0x81, 0xfc, 0xd1, 0x9e, 0x41,
+ 0x0b, 0x84, 0x25, 0xf1, 0x6b, 0x3c, 0xf5, 0x40,
+ 0x0d, 0xc4, 0x3e, 0xb3, 0x6a, 0xec, 0x6e, 0x75,
+ 0xdc, 0x9b, 0xdf, 0x08, 0x21, 0x16, 0xfb, 0x7a,
+ 0x8e, 0x19, 0x13, 0x02, 0xa7, 0xfc, 0x58, 0x21,
+ 0xc3, 0xb3, 0x59, 0x5a, 0x9c, 0xef, 0x38, 0xbd,
+ 0x87, 0x55, 0xd7, 0x0d, 0x1f, 0x84, 0xdc, 0x98,
+ 0x22, 0xca, 0x87, 0x96, 0x71, 0x6d, 0x68, 0x00,
+ 0xcb, 0x4f, 0x2f, 0xc4, 0x64, 0x0c, 0xc1, 0x53,
+ 0x0c, 0x90, 0xe7, 0x3c, 0x88, 0xca, 0xc5, 0x85,
+ 0xa3, 0x2a, 0x96, 0x7c, 0x82, 0x6d, 0x45, 0xf5,
+ 0xb7, 0x8d, 0x17, 0x69, 0xd6, 0xcd, 0x3c, 0xd3,
+ 0xe7, 0x1c, 0xce, 0x93, 0x50, 0xd4, 0x59, 0xa2,
+ 0xd8, 0x8b, 0x72, 0x60, 0x5b, 0x25, 0x14, 0xcd,
+ 0x5a, 0xe8, 0x8c, 0xdb, 0x23, 0x8d, 0x2b, 0x59,
+ 0x12, 0x13, 0x10, 0x47, 0xa4, 0xc8, 0x3c, 0xc1,
+ 0x81, 0x89, 0x6c, 0x98, 0xec, 0x8f, 0x7b, 0x32,
+ 0xf2, 0x87, 0xd9, 0xa2, 0x0d, 0xc2, 0x08, 0xf9,
+ 0xd5, 0xf3, 0x91, 0xe7, 0xb3, 0x87, 0xa7, 0x0b,
+ 0x64, 0x8f, 0xb9, 0x55, 0x1c, 0x81, 0x96, 0x6c,
+ 0xa1, 0xc9, 0x6e, 0x3b, 0xcd, 0x17, 0x1b, 0xfc,
+ 0xa6, 0x05, 0xba, 0x4a, 0x7d, 0x03, 0x3c, 0x59,
+ 0xc8, 0xee, 0x50, 0xb2, 0x5b, 0xe1, 0x4d, 0x6a,
+ 0x1f, 0x09, 0xdc, 0xa2, 0x51, 0xd1, 0x93, 0x3a,
+ 0x5f, 0x72, 0x1d, 0x26, 0x14, 0x62, 0xa2, 0x41,
+ 0x3d, 0x08, 0x70, 0x7b, 0x27, 0x3d, 0xbc, 0xdf,
+ 0x15, 0xfa, 0xb9, 0x5f, 0xb5, 0x38, 0x84, 0x0b,
+ 0x58, 0x3d, 0xee, 0x3f, 0x32, 0x65, 0x6d, 0xd7,
+ 0xce, 0x97, 0x3c, 0x8d, 0xfb, 0x63, 0xb9, 0xb0,
+ 0xa8, 0x4a, 0x72, 0x99, 0x97, 0x58, 0xc8, 0xa7,
+ 0xf9, 0x4c, 0xae, 0xc1, 0x63, 0xb9, 0x57, 0x18,
+ 0x8a, 0xfa, 0xab, 0xe9, 0xf3, 0x67, 0xe6, 0xfd,
+ 0xd2, 0x9d, 0x5c, 0xa9, 0x8e, 0x11, 0x0a, 0xf4,
+ 0x4b, 0xf1, 0xec, 0x1a, 0xaf, 0x50, 0x5d, 0x16,
+ 0x13, 0x69, 0x2e, 0xbd, 0x0d, 0xe6, 0xf0, 0xb2,
+ 0xed, 0xb4, 0x4c, 0x59, 0x77, 0x37, 0x00, 0x0b,
+ 0xc7, 0xa7, 0x9e, 0x37, 0xf3, 0x60, 0x70, 0xef,
+ 0xf3, 0xc1, 0x74, 0x52, 0x87, 0xc6, 0xa1, 0x81,
+ 0xbd, 0x0a, 0x2c, 0x5d, 0x2c, 0x0c, 0x6a, 0x81,
+ 0xa1, 0xfe, 0x26, 0x78, 0x6c, 0x03, 0x06, 0x07,
+ 0x34, 0xaa, 0xd1, 0x1b, 0x40, 0x03, 0x39, 0x56,
+ 0xcf, 0x2a, 0x92, 0xc1, 0x4e, 0xdf, 0x29, 0x24,
+ 0x83, 0x22, 0x7a, 0xea, 0x67, 0x1e, 0xe7, 0x54,
+ 0x64, 0xd3, 0xbd, 0x3a, 0x5d, 0xae, 0xca, 0xf0,
+ 0x9c, 0xd6, 0x5a, 0x9a, 0x62, 0xc8, 0xc7, 0x83,
+ 0xf9, 0x89, 0xde, 0x2d, 0x53, 0x64, 0x61, 0xf7,
+ 0xa3, 0xa7, 0x31, 0x38, 0xc6, 0x22, 0x9c, 0xb4,
+ 0x87, 0xe0
+};
+static const u8 output73[] __initconst = {
+ 0x34, 0xed, 0x05, 0xb0, 0x14, 0xbc, 0x8c, 0xcc,
+ 0x95, 0xbd, 0x99, 0x0f, 0xb1, 0x98, 0x17, 0x10,
+ 0xae, 0xe0, 0x08, 0x53, 0xa3, 0x69, 0xd2, 0xed,
+ 0x66, 0xdb, 0x2a, 0x34, 0x8d, 0x0c, 0x6e, 0xce,
+ 0x63, 0x69, 0xc9, 0xe4, 0x57, 0xc3, 0x0c, 0x8b,
+ 0xa6, 0x2c, 0xa7, 0xd2, 0x08, 0xff, 0x4f, 0xec,
+ 0x61, 0x8c, 0xee, 0x0d, 0xfa, 0x6b, 0xe0, 0xe8,
+ 0x71, 0xbc, 0x41, 0x46, 0xd7, 0x33, 0x1d, 0xc0,
+ 0xfd, 0xad, 0xca, 0x8b, 0x34, 0x56, 0xa4, 0x86,
+ 0x71, 0x62, 0xae, 0x5e, 0x3d, 0x2b, 0x66, 0x3e,
+ 0xae, 0xd8, 0xc0, 0xe1, 0x21, 0x3b, 0xca, 0xd2,
+ 0x6b, 0xa2, 0xb8, 0xc7, 0x98, 0x4a, 0xf3, 0xcf,
+ 0xb8, 0x62, 0xd8, 0x33, 0xe6, 0x80, 0xdb, 0x2f,
+ 0x0a, 0xaf, 0x90, 0x3c, 0xe1, 0xec, 0xe9, 0x21,
+ 0x29, 0x42, 0x9e, 0xa5, 0x50, 0xe9, 0x93, 0xd3,
+ 0x53, 0x1f, 0xac, 0x2a, 0x24, 0x07, 0xb8, 0xed,
+ 0xed, 0x38, 0x2c, 0xc4, 0xa1, 0x2b, 0x31, 0x5d,
+ 0x9c, 0x24, 0x7b, 0xbf, 0xd9, 0xbb, 0x4e, 0x87,
+ 0x8f, 0x32, 0x30, 0xf1, 0x11, 0x29, 0x54, 0x94,
+ 0x00, 0x95, 0x1d, 0x1d, 0x24, 0xc0, 0xd4, 0x34,
+ 0x49, 0x1d, 0xd5, 0xe3, 0xa6, 0xde, 0x8b, 0xbf,
+ 0x5a, 0x9f, 0x58, 0x5a, 0x9b, 0x70, 0xe5, 0x9b,
+ 0xb3, 0xdb, 0xe8, 0xb8, 0xca, 0x1b, 0x43, 0xe3,
+ 0xc6, 0x6f, 0x0a, 0xd6, 0x32, 0x11, 0xd4, 0x04,
+ 0xef, 0xa3, 0xe4, 0x3f, 0x12, 0xd8, 0xc1, 0x73,
+ 0x51, 0x87, 0x03, 0xbd, 0xba, 0x60, 0x79, 0xee,
+ 0x08, 0xcc, 0xf7, 0xc0, 0xaa, 0x4c, 0x33, 0xc4,
+ 0xc7, 0x09, 0xf5, 0x91, 0xcb, 0x74, 0x57, 0x08,
+ 0x1b, 0x90, 0xa9, 0x1b, 0x60, 0x02, 0xd2, 0x3f,
+ 0x7a, 0xbb, 0xfd, 0x78, 0xf0, 0x15, 0xf9, 0x29,
+ 0x82, 0x8f, 0xc4, 0xb2, 0x88, 0x1f, 0xbc, 0xcc,
+ 0x53, 0x27, 0x8b, 0x07, 0x5f, 0xfc, 0x91, 0x29,
+ 0x82, 0x80, 0x59, 0x0a, 0x3c, 0xea, 0xc4, 0x7e,
+ 0xad, 0xd2, 0x70, 0x46, 0xbd, 0x9e, 0x3b, 0x1c,
+ 0x8a, 0x62, 0xea, 0x69, 0xbd, 0xf6, 0x96, 0x15,
+ 0xb5, 0x57, 0xe8, 0x63, 0x5f, 0x65, 0x46, 0x84,
+ 0x58, 0x50, 0x87, 0x4b, 0x0e, 0x5b, 0x52, 0x90,
+ 0xb0, 0xae, 0x37, 0x0f, 0xdd, 0x7e, 0xa2, 0xa0,
+ 0x8b, 0x78, 0xc8, 0x5a, 0x1f, 0x53, 0xdb, 0xc5,
+ 0xbf, 0x73, 0x20, 0xa9, 0x44, 0xfb, 0x1e, 0xc7,
+ 0x97, 0xb2, 0x3a, 0x5a, 0x17, 0xe6, 0x8b, 0x9b,
+ 0xe8, 0xf8, 0x2a, 0x01, 0x27, 0xa3, 0x71, 0x28,
+ 0xe3, 0x19, 0xc6, 0xaf, 0xf5, 0x3a, 0x26, 0xc0,
+ 0x5c, 0x69, 0x30, 0x78, 0x75, 0x27, 0xf2, 0x0c,
+ 0x22, 0x71, 0x65, 0xc6, 0x8e, 0x7b, 0x47, 0xe3,
+ 0x31, 0xaf, 0x7b, 0xc6, 0xc2, 0x55, 0x68, 0x81,
+ 0xaa, 0x1b, 0x21, 0x65, 0xfb, 0x18, 0x35, 0x45,
+ 0x36, 0x9a, 0x44, 0xba, 0x5c, 0xff, 0x06, 0xde,
+ 0x3a, 0xc8, 0x44, 0x0b, 0xaa, 0x8e, 0x34, 0xe2,
+ 0x84, 0xac, 0x18, 0xfe, 0x9b, 0xe1, 0x4f, 0xaa,
+ 0xb6, 0x90, 0x0b, 0x1c, 0x2c, 0xd9, 0x9a, 0x10,
+ 0x18, 0xf9, 0x49, 0x41, 0x42, 0x1b, 0xb5, 0xe1,
+ 0x26, 0xac, 0x2d, 0x38, 0x00, 0x00, 0xe4, 0xb4,
+ 0x50, 0x6f, 0x14, 0x18, 0xd6, 0x3d, 0x00, 0x59,
+ 0x3c, 0x45, 0xf3, 0x42, 0x13, 0x44, 0xb8, 0x57,
+ 0xd4, 0x43, 0x5c, 0x8a, 0x2a, 0xb4, 0xfc, 0x0a,
+ 0x25, 0x5a, 0xdc, 0x8f, 0x11, 0x0b, 0x11, 0x44,
+ 0xc7, 0x0e, 0x54, 0x8b, 0x22, 0x01, 0x7e, 0x67,
+ 0x2e, 0x15, 0x3a, 0xb9, 0xee, 0x84, 0x10, 0xd4,
+ 0x80, 0x57, 0xd7, 0x75, 0xcf, 0x8b, 0xcb, 0x03,
+ 0xc9, 0x92, 0x2b, 0x69, 0xd8, 0x5a, 0x9b, 0x06,
+ 0x85, 0x47, 0xaa, 0x4c, 0x28, 0xde, 0x49, 0x58,
+ 0xe6, 0x11, 0x1e, 0x5e, 0x64, 0x8e, 0x3b, 0xe0,
+ 0x40, 0x2e, 0xac, 0x96, 0x97, 0x15, 0x37, 0x1e,
+ 0x30, 0xdd
+};
+static const u8 key73[] __initconst = {
+ 0x96, 0x06, 0x1e, 0xc1, 0x6d, 0xba, 0x49, 0x5b,
+ 0x65, 0x80, 0x79, 0xdd, 0xf3, 0x67, 0xa8, 0x6e,
+ 0x2d, 0x9c, 0x54, 0x46, 0xd8, 0x4a, 0xeb, 0x7e,
+ 0x23, 0x86, 0x51, 0xd8, 0x49, 0x49, 0x56, 0xe0
+};
+enum { nonce73 = 0xbefb83cb67e11ffdULL };
+
+static const u8 input74[] __initconst = {
+ 0x47, 0x22, 0x70, 0xe5, 0x2f, 0x41, 0x18, 0x45,
+ 0x07, 0xd3, 0x6d, 0x32, 0x0d, 0x43, 0x92, 0x2b,
+ 0x9b, 0x65, 0x73, 0x13, 0x1a, 0x4f, 0x49, 0x8f,
+ 0xff, 0xf8, 0xcc, 0xae, 0x15, 0xab, 0x9d, 0x7d,
+ 0xee, 0x22, 0x5d, 0x8b, 0xde, 0x81, 0x5b, 0x81,
+ 0x83, 0x49, 0x35, 0x9b, 0xb4, 0xbc, 0x4e, 0x01,
+ 0xc2, 0x29, 0xa7, 0xf1, 0xca, 0x3a, 0xce, 0x3f,
+ 0xf5, 0x31, 0x93, 0xa8, 0xe2, 0xc9, 0x7d, 0x03,
+ 0x26, 0xa4, 0xbc, 0xa8, 0x9c, 0xb9, 0x68, 0xf3,
+ 0xb3, 0x91, 0xe8, 0xe6, 0xc7, 0x2b, 0x1a, 0xce,
+ 0xd2, 0x41, 0x53, 0xbd, 0xa3, 0x2c, 0x54, 0x94,
+ 0x21, 0xa1, 0x40, 0xae, 0xc9, 0x0c, 0x11, 0x92,
+ 0xfd, 0x91, 0xa9, 0x40, 0xca, 0xde, 0x21, 0x4e,
+ 0x1e, 0x3d, 0xcc, 0x2c, 0x87, 0x11, 0xef, 0x46,
+ 0xed, 0x52, 0x03, 0x11, 0x19, 0x43, 0x25, 0xc7,
+ 0x0d, 0xc3, 0x37, 0x5f, 0xd3, 0x6f, 0x0c, 0x6a,
+ 0x45, 0x30, 0x88, 0xec, 0xf0, 0x21, 0xef, 0x1d,
+ 0x7b, 0x38, 0x63, 0x4b, 0x49, 0x0c, 0x72, 0xf6,
+ 0x4c, 0x40, 0xc3, 0xcc, 0x03, 0xa7, 0xae, 0xa8,
+ 0x8c, 0x37, 0x03, 0x1c, 0x11, 0xae, 0x0d, 0x1b,
+ 0x62, 0x97, 0x27, 0xfc, 0x56, 0x4b, 0xb7, 0xfd,
+ 0xbc, 0xfb, 0x0e, 0xfc, 0x61, 0xad, 0xc6, 0xb5,
+ 0x9c, 0x8c, 0xc6, 0x38, 0x27, 0x91, 0x29, 0x3d,
+ 0x29, 0xc8, 0x37, 0xc9, 0x96, 0x69, 0xe3, 0xdc,
+ 0x3e, 0x61, 0x35, 0x9b, 0x99, 0x4f, 0xb9, 0x4e,
+ 0x5a, 0x29, 0x1c, 0x2e, 0xcf, 0x16, 0xcb, 0x69,
+ 0x87, 0xe4, 0x1a, 0xc4, 0x6e, 0x78, 0x43, 0x00,
+ 0x03, 0xb2, 0x8b, 0x03, 0xd0, 0xb4, 0xf1, 0xd2,
+ 0x7d, 0x2d, 0x7e, 0xfc, 0x19, 0x66, 0x5b, 0xa3,
+ 0x60, 0x3f, 0x9d, 0xbd, 0xfa, 0x3e, 0xca, 0x7b,
+ 0x26, 0x08, 0x19, 0x16, 0x93, 0x5d, 0x83, 0xfd,
+ 0xf9, 0x21, 0xc6, 0x31, 0x34, 0x6f, 0x0c, 0xaa,
+ 0x28, 0xf9, 0x18, 0xa2, 0xc4, 0x78, 0x3b, 0x56,
+ 0xc0, 0x88, 0x16, 0xba, 0x22, 0x2c, 0x07, 0x2f,
+ 0x70, 0xd0, 0xb0, 0x46, 0x35, 0xc7, 0x14, 0xdc,
+ 0xbb, 0x56, 0x23, 0x1e, 0x36, 0x36, 0x2d, 0x73,
+ 0x78, 0xc7, 0xce, 0xf3, 0x58, 0xf7, 0x58, 0xb5,
+ 0x51, 0xff, 0x33, 0x86, 0x0e, 0x3b, 0x39, 0xfb,
+ 0x1a, 0xfd, 0xf8, 0x8b, 0x09, 0x33, 0x1b, 0x83,
+ 0xf2, 0xe6, 0x38, 0x37, 0xef, 0x47, 0x84, 0xd9,
+ 0x82, 0x77, 0x2b, 0x82, 0xcc, 0xf9, 0xee, 0x94,
+ 0x71, 0x78, 0x81, 0xc8, 0x4d, 0x91, 0xd7, 0x35,
+ 0x29, 0x31, 0x30, 0x5c, 0x4a, 0x23, 0x23, 0xb1,
+ 0x38, 0x6b, 0xac, 0x22, 0x3f, 0x80, 0xc7, 0xe0,
+ 0x7d, 0xfa, 0x76, 0x47, 0xd4, 0x6f, 0x93, 0xa0,
+ 0xa0, 0x93, 0x5d, 0x68, 0xf7, 0x43, 0x25, 0x8f,
+ 0x1b, 0xc7, 0x87, 0xea, 0x59, 0x0c, 0xa2, 0xfa,
+ 0xdb, 0x2f, 0x72, 0x43, 0xcf, 0x90, 0xf1, 0xd6,
+ 0x58, 0xf3, 0x17, 0x6a, 0xdf, 0xb3, 0x4e, 0x0e,
+ 0x38, 0x24, 0x48, 0x1f, 0xb7, 0x01, 0xec, 0x81,
+ 0xb1, 0x87, 0x5b, 0xec, 0x9c, 0x11, 0x1a, 0xff,
+ 0xa5, 0xca, 0x5a, 0x63, 0x31, 0xb2, 0xe4, 0xc6,
+ 0x3c, 0x1d, 0xaf, 0x27, 0xb2, 0xd4, 0x19, 0xa2,
+ 0xcc, 0x04, 0x92, 0x42, 0xd2, 0xc1, 0x8c, 0x3b,
+ 0xce, 0xf5, 0x74, 0xc1, 0x81, 0xf8, 0x20, 0x23,
+ 0x6f, 0x20, 0x6d, 0x78, 0x36, 0x72, 0x2c, 0x52,
+ 0xdf, 0x5e, 0xe8, 0x75, 0xce, 0x1c, 0x49, 0x9d,
+ 0x93, 0x6f, 0x65, 0xeb, 0xb1, 0xbd, 0x8e, 0x5e,
+ 0xe5, 0x89, 0xc4, 0x8a, 0x81, 0x3d, 0x9a, 0xa7,
+ 0x11, 0x82, 0x8e, 0x38, 0x5b, 0x5b, 0xca, 0x7d,
+ 0x4b, 0x72, 0xc2, 0x9c, 0x30, 0x5e, 0x7f, 0xc0,
+ 0x6f, 0x91, 0xd5, 0x67, 0x8c, 0x3e, 0xae, 0xda,
+ 0x2b, 0x3c, 0x53, 0xcc, 0x50, 0x97, 0x36, 0x0b,
+ 0x79, 0xd6, 0x73, 0x6e, 0x7d, 0x42, 0x56, 0xe1,
+ 0xaa, 0xfc, 0xb3, 0xa7, 0xc8, 0x01, 0xaa, 0xc1,
+ 0xfc, 0x5c, 0x72, 0x8e, 0x63, 0xa8, 0x46, 0x18,
+ 0xee, 0x11, 0xe7, 0x30, 0x09, 0x83, 0x6c, 0xd9,
+ 0xf4, 0x7a, 0x7b, 0xb5, 0x1f, 0x6d, 0xc7, 0xbc,
+ 0xcb, 0x55, 0xea, 0x40, 0x58, 0x7a, 0x00, 0x00,
+ 0x90, 0x60, 0xc5, 0x64, 0x69, 0x05, 0x99, 0xd2,
+ 0x49, 0x62, 0x4f, 0xcb, 0x97, 0xdf, 0xdd, 0x6b,
+ 0x60, 0x75, 0xe2, 0xe0, 0x6f, 0x76, 0xd0, 0x37,
+ 0x67, 0x0a, 0xcf, 0xff, 0xc8, 0x61, 0x84, 0x14,
+ 0x80, 0x7c, 0x1d, 0x31, 0x8d, 0x90, 0xde, 0x0b,
+ 0x1c, 0x74, 0x9f, 0x82, 0x96, 0x80, 0xda, 0xaf,
+ 0x8d, 0x99, 0x86, 0x9f, 0x24, 0x99, 0x28, 0x3e,
+ 0xe0, 0xa3, 0xc3, 0x90, 0x2d, 0x14, 0x65, 0x1e,
+ 0x3b, 0xb9, 0xba, 0x13, 0xa5, 0x77, 0x73, 0x63,
+ 0x9a, 0x06, 0x3d, 0xa9, 0x28, 0x9b, 0xba, 0x25,
+ 0x61, 0xc9, 0xcd, 0xcf, 0x7a, 0x4d, 0x96, 0x09,
+ 0xcb, 0xca, 0x03, 0x9c, 0x54, 0x34, 0x31, 0x85,
+ 0xa0, 0x3d, 0xe5, 0xbc, 0xa5, 0x5f, 0x1b, 0xd3,
+ 0x10, 0x63, 0x74, 0x9d, 0x01, 0x92, 0x88, 0xf0,
+ 0x27, 0x9c, 0x28, 0xd9, 0xfd, 0xe2, 0x4e, 0x01,
+ 0x8d, 0x61, 0x79, 0x60, 0x61, 0x5b, 0x76, 0xab,
+ 0x06, 0xd3, 0x44, 0x87, 0x43, 0x52, 0xcd, 0x06,
+ 0x68, 0x1e, 0x2d, 0xc5, 0xb0, 0x07, 0x25, 0xdf,
+ 0x0a, 0x50, 0xd7, 0xd9, 0x08, 0x53, 0x65, 0xf1,
+ 0x0c, 0x2c, 0xde, 0x3f, 0x9d, 0x03, 0x1f, 0xe1,
+ 0x49, 0x43, 0x3c, 0x83, 0x81, 0x37, 0xf8, 0xa2,
+ 0x0b, 0xf9, 0x61, 0x1c, 0xc1, 0xdb, 0x79, 0xbc,
+ 0x64, 0xce, 0x06, 0x4e, 0x87, 0x89, 0x62, 0x73,
+ 0x51, 0xbc, 0xa4, 0x32, 0xd4, 0x18, 0x62, 0xab,
+ 0x65, 0x7e, 0xad, 0x1e, 0x91, 0xa3, 0xfa, 0x2d,
+ 0x58, 0x9e, 0x2a, 0xe9, 0x74, 0x44, 0x64, 0x11,
+ 0xe6, 0xb6, 0xb3, 0x00, 0x7e, 0xa3, 0x16, 0xef,
+ 0x72
+};
+static const u8 output74[] __initconst = {
+ 0xf5, 0xca, 0x45, 0x65, 0x50, 0x35, 0x47, 0x67,
+ 0x6f, 0x4f, 0x67, 0xff, 0x34, 0xd9, 0xc3, 0x37,
+ 0x2a, 0x26, 0xb0, 0x4f, 0x08, 0x1e, 0x45, 0x13,
+ 0xc7, 0x2c, 0x14, 0x75, 0x33, 0xd8, 0x8e, 0x1e,
+ 0x1b, 0x11, 0x0d, 0x97, 0x04, 0x33, 0x8a, 0xe4,
+ 0xd8, 0x8d, 0x0e, 0x12, 0x8d, 0xdb, 0x6e, 0x02,
+ 0xfa, 0xe5, 0xbd, 0x3a, 0xb5, 0x28, 0x07, 0x7d,
+ 0x20, 0xf0, 0x12, 0x64, 0x83, 0x2f, 0x59, 0x79,
+ 0x17, 0x88, 0x3c, 0x2d, 0x08, 0x2f, 0x55, 0xda,
+ 0xcc, 0x02, 0x3a, 0x82, 0xcd, 0x03, 0x94, 0xdf,
+ 0xdf, 0xab, 0x8a, 0x13, 0xf5, 0xe6, 0x74, 0xdf,
+ 0x7b, 0xe2, 0xab, 0x34, 0xbc, 0x00, 0x85, 0xbf,
+ 0x5a, 0x48, 0xc8, 0xff, 0x8d, 0x6c, 0x27, 0x48,
+ 0x19, 0x2d, 0x08, 0xfa, 0x82, 0x62, 0x39, 0x55,
+ 0x32, 0x11, 0xa8, 0xd7, 0xb9, 0x08, 0x2c, 0xd6,
+ 0x7a, 0xd9, 0x83, 0x9f, 0x9b, 0xfb, 0xec, 0x3a,
+ 0xd1, 0x08, 0xc7, 0xad, 0xdc, 0x98, 0x4c, 0xbc,
+ 0x98, 0xeb, 0x36, 0xb0, 0x39, 0xf4, 0x3a, 0xd6,
+ 0x53, 0x02, 0xa0, 0xa9, 0x73, 0xa1, 0xca, 0xef,
+ 0xd8, 0xd2, 0xec, 0x0e, 0xf8, 0xf5, 0xac, 0x8d,
+ 0x34, 0x41, 0x06, 0xa8, 0xc6, 0xc3, 0x31, 0xbc,
+ 0xe5, 0xcc, 0x7e, 0x72, 0x63, 0x59, 0x3e, 0x63,
+ 0xc2, 0x8d, 0x2b, 0xd5, 0xb9, 0xfd, 0x1e, 0x31,
+ 0x69, 0x32, 0x05, 0xd6, 0xde, 0xc9, 0xe6, 0x4c,
+ 0xac, 0x68, 0xf7, 0x1f, 0x9d, 0xcd, 0x0e, 0xa2,
+ 0x15, 0x3d, 0xd6, 0x47, 0x99, 0xab, 0x08, 0x5f,
+ 0x28, 0xc3, 0x4c, 0xc2, 0xd5, 0xdd, 0x10, 0xb7,
+ 0xbd, 0xdb, 0x9b, 0xcf, 0x85, 0x27, 0x29, 0x76,
+ 0x98, 0xeb, 0xad, 0x31, 0x64, 0xe7, 0xfb, 0x61,
+ 0xe0, 0xd8, 0x1a, 0xa6, 0xe2, 0xe7, 0x43, 0x42,
+ 0x77, 0xc9, 0x82, 0x00, 0xac, 0x85, 0xe0, 0xa2,
+ 0xd4, 0x62, 0xe3, 0xb7, 0x17, 0x6e, 0xb2, 0x9e,
+ 0x21, 0x58, 0x73, 0xa9, 0x53, 0x2d, 0x3c, 0xe1,
+ 0xdd, 0xd6, 0x6e, 0x92, 0xf2, 0x1d, 0xc2, 0x22,
+ 0x5f, 0x9a, 0x7e, 0xd0, 0x52, 0xbf, 0x54, 0x19,
+ 0xd7, 0x80, 0x63, 0x3e, 0xd0, 0x08, 0x2d, 0x37,
+ 0x0c, 0x15, 0xf7, 0xde, 0xab, 0x2b, 0xe3, 0x16,
+ 0x21, 0x3a, 0xee, 0xa5, 0xdc, 0xdf, 0xde, 0xa3,
+ 0x69, 0xcb, 0xfd, 0x92, 0x89, 0x75, 0xcf, 0xc9,
+ 0x8a, 0xa4, 0xc8, 0xdd, 0xcc, 0x21, 0xe6, 0xfe,
+ 0x9e, 0x43, 0x76, 0xb2, 0x45, 0x22, 0xb9, 0xb5,
+ 0xac, 0x7e, 0x3d, 0x26, 0xb0, 0x53, 0xc8, 0xab,
+ 0xfd, 0xea, 0x2c, 0xd1, 0x44, 0xc5, 0x60, 0x1b,
+ 0x8a, 0x99, 0x0d, 0xa5, 0x0e, 0x67, 0x6e, 0x3a,
+ 0x96, 0x55, 0xec, 0xe8, 0xcc, 0xbe, 0x49, 0xd9,
+ 0xf2, 0x72, 0x9f, 0x30, 0x21, 0x97, 0x57, 0x19,
+ 0xbe, 0x5e, 0x33, 0x0c, 0xee, 0xc0, 0x72, 0x0d,
+ 0x2e, 0xd1, 0xe1, 0x52, 0xc2, 0xea, 0x41, 0xbb,
+ 0xe1, 0x6d, 0xd4, 0x17, 0xa9, 0x8d, 0x89, 0xa9,
+ 0xd6, 0x4b, 0xc6, 0x4c, 0xf2, 0x88, 0x97, 0x54,
+ 0x3f, 0x4f, 0x57, 0xb7, 0x37, 0xf0, 0x2c, 0x11,
+ 0x15, 0x56, 0xdb, 0x28, 0xb5, 0x16, 0x84, 0x66,
+ 0xce, 0x45, 0x3f, 0x61, 0x75, 0xb6, 0xbe, 0x00,
+ 0xd1, 0xe4, 0xf5, 0x27, 0x54, 0x7f, 0xc2, 0xf1,
+ 0xb3, 0x32, 0x9a, 0xe8, 0x07, 0x02, 0xf3, 0xdb,
+ 0xa9, 0xd1, 0xc2, 0xdf, 0xee, 0xad, 0xe5, 0x8a,
+ 0x3c, 0xfa, 0x67, 0xec, 0x6b, 0xa4, 0x08, 0xfe,
+ 0xba, 0x5a, 0x58, 0x0b, 0x78, 0x11, 0x91, 0x76,
+ 0xe3, 0x1a, 0x28, 0x54, 0x5e, 0xbd, 0x71, 0x1b,
+ 0x8b, 0xdc, 0x6c, 0xf4, 0x6f, 0xd7, 0xf4, 0xf3,
+ 0xe1, 0x03, 0xa4, 0x3c, 0x8d, 0x91, 0x2e, 0xba,
+ 0x5f, 0x7f, 0x8c, 0xaf, 0x69, 0x89, 0x29, 0x0a,
+ 0x5b, 0x25, 0x13, 0xc4, 0x2e, 0x16, 0xc2, 0x15,
+ 0x07, 0x5d, 0x58, 0x33, 0x7c, 0xe0, 0xf0, 0x55,
+ 0x5f, 0xbf, 0x5e, 0xf0, 0x71, 0x48, 0x8f, 0xf7,
+ 0x48, 0xb3, 0xf7, 0x0d, 0xa1, 0xd0, 0x63, 0xb1,
+ 0xad, 0xae, 0xb5, 0xb0, 0x5f, 0x71, 0xaf, 0x24,
+ 0x8b, 0xb9, 0x1c, 0x44, 0xd2, 0x1a, 0x53, 0xd1,
+ 0xd5, 0xb4, 0xa9, 0xff, 0x88, 0x73, 0xb5, 0xaa,
+ 0x15, 0x32, 0x5f, 0x59, 0x9d, 0x2e, 0xb5, 0xcb,
+ 0xde, 0x21, 0x2e, 0xe9, 0x35, 0xed, 0xfd, 0x0f,
+ 0xb6, 0xbb, 0xe6, 0x4b, 0x16, 0xf1, 0x45, 0x1e,
+ 0xb4, 0x84, 0xe9, 0x58, 0x1c, 0x0c, 0x95, 0xc0,
+ 0xcf, 0x49, 0x8b, 0x59, 0xa1, 0x78, 0xe6, 0x80,
+ 0x12, 0x49, 0x7a, 0xd4, 0x66, 0x62, 0xdf, 0x9c,
+ 0x18, 0xc8, 0x8c, 0xda, 0xc1, 0xa6, 0xbc, 0x65,
+ 0x28, 0xd2, 0xa4, 0xe8, 0xf1, 0x35, 0xdb, 0x5a,
+ 0x75, 0x1f, 0x73, 0x60, 0xec, 0xa8, 0xda, 0x5a,
+ 0x43, 0x15, 0x83, 0x9b, 0xe7, 0xb1, 0xa6, 0x81,
+ 0xbb, 0xef, 0xf3, 0x8f, 0x0f, 0xd3, 0x79, 0xa2,
+ 0xe5, 0xaa, 0x42, 0xef, 0xa0, 0x13, 0x4e, 0x91,
+ 0x2d, 0xcb, 0x61, 0x7a, 0x9a, 0x33, 0x14, 0x50,
+ 0x77, 0x4a, 0xd0, 0x91, 0x48, 0xe0, 0x0c, 0xe0,
+ 0x11, 0xcb, 0xdf, 0xb0, 0xce, 0x06, 0xd2, 0x79,
+ 0x4d, 0x69, 0xb9, 0xc9, 0x36, 0x74, 0x8f, 0x81,
+ 0x72, 0x73, 0xf3, 0x17, 0xb7, 0x13, 0xcb, 0x5b,
+ 0xd2, 0x5c, 0x33, 0x61, 0xb7, 0x61, 0x79, 0xb0,
+ 0xc0, 0x4d, 0xa1, 0xc7, 0x5d, 0x98, 0xc9, 0xe1,
+ 0x98, 0xbd, 0x78, 0x5a, 0x2c, 0x64, 0x53, 0xaf,
+ 0xaf, 0x66, 0x51, 0x47, 0xe4, 0x48, 0x66, 0x8b,
+ 0x07, 0x52, 0xa3, 0x03, 0x93, 0x28, 0xad, 0xcc,
+ 0xa3, 0x86, 0xad, 0x63, 0x04, 0x35, 0x6c, 0x49,
+ 0xd5, 0x28, 0x0e, 0x00, 0x47, 0xf4, 0xd4, 0x32,
+ 0x27, 0x19, 0xb3, 0x29, 0xe7, 0xbc, 0xbb, 0xce,
+ 0x3e, 0x3e, 0xd5, 0x67, 0x20, 0xe4, 0x0b, 0x75,
+ 0x95, 0x24, 0xe0, 0x6c, 0xb6, 0x29, 0x0c, 0x14,
+ 0xfd
+};
+static const u8 key74[] __initconst = {
+ 0xf0, 0x41, 0x5b, 0x00, 0x56, 0xc4, 0xac, 0xf6,
+ 0xa2, 0x4c, 0x33, 0x41, 0x16, 0x09, 0x1b, 0x8e,
+ 0x4d, 0xe8, 0x8c, 0xd9, 0x48, 0xab, 0x3e, 0x60,
+ 0xcb, 0x49, 0x3e, 0xaf, 0x2b, 0x8b, 0xc8, 0xf0
+};
+enum { nonce74 = 0xcbdb0ffd0e923384ULL };
+
+static const struct chacha20_testvec chacha20_testvecs[] __initconst = {
+ { input01, output01, key01, nonce01, sizeof(input01) },
+ { input02, output02, key02, nonce02, sizeof(input02) },
+ { input03, output03, key03, nonce03, sizeof(input03) },
+ { input04, output04, key04, nonce04, sizeof(input04) },
+ { input05, output05, key05, nonce05, sizeof(input05) },
+ { input06, output06, key06, nonce06, sizeof(input06) },
+ { input07, output07, key07, nonce07, sizeof(input07) },
+ { input08, output08, key08, nonce08, sizeof(input08) },
+ { input09, output09, key09, nonce09, sizeof(input09) },
+ { input10, output10, key10, nonce10, sizeof(input10) },
+ { input11, output11, key11, nonce11, sizeof(input11) },
+ { input12, output12, key12, nonce12, sizeof(input12) },
+ { input13, output13, key13, nonce13, sizeof(input13) },
+ { input14, output14, key14, nonce14, sizeof(input14) },
+ { input15, output15, key15, nonce15, sizeof(input15) },
+ { input16, output16, key16, nonce16, sizeof(input16) },
+ { input17, output17, key17, nonce17, sizeof(input17) },
+ { input18, output18, key18, nonce18, sizeof(input18) },
+ { input19, output19, key19, nonce19, sizeof(input19) },
+ { input20, output20, key20, nonce20, sizeof(input20) },
+ { input21, output21, key21, nonce21, sizeof(input21) },
+ { input22, output22, key22, nonce22, sizeof(input22) },
+ { input23, output23, key23, nonce23, sizeof(input23) },
+ { input24, output24, key24, nonce24, sizeof(input24) },
+ { input25, output25, key25, nonce25, sizeof(input25) },
+ { input26, output26, key26, nonce26, sizeof(input26) },
+ { input27, output27, key27, nonce27, sizeof(input27) },
+ { input28, output28, key28, nonce28, sizeof(input28) },
+ { input29, output29, key29, nonce29, sizeof(input29) },
+ { input30, output30, key30, nonce30, sizeof(input30) },
+ { input31, output31, key31, nonce31, sizeof(input31) },
+ { input32, output32, key32, nonce32, sizeof(input32) },
+ { input33, output33, key33, nonce33, sizeof(input33) },
+ { input34, output34, key34, nonce34, sizeof(input34) },
+ { input35, output35, key35, nonce35, sizeof(input35) },
+ { input36, output36, key36, nonce36, sizeof(input36) },
+ { input37, output37, key37, nonce37, sizeof(input37) },
+ { input38, output38, key38, nonce38, sizeof(input38) },
+ { input39, output39, key39, nonce39, sizeof(input39) },
+ { input40, output40, key40, nonce40, sizeof(input40) },
+ { input41, output41, key41, nonce41, sizeof(input41) },
+ { input42, output42, key42, nonce42, sizeof(input42) },
+ { input43, output43, key43, nonce43, sizeof(input43) },
+ { input44, output44, key44, nonce44, sizeof(input44) },
+ { input45, output45, key45, nonce45, sizeof(input45) },
+ { input46, output46, key46, nonce46, sizeof(input46) },
+ { input47, output47, key47, nonce47, sizeof(input47) },
+ { input48, output48, key48, nonce48, sizeof(input48) },
+ { input49, output49, key49, nonce49, sizeof(input49) },
+ { input50, output50, key50, nonce50, sizeof(input50) },
+ { input51, output51, key51, nonce51, sizeof(input51) },
+ { input52, output52, key52, nonce52, sizeof(input52) },
+ { input53, output53, key53, nonce53, sizeof(input53) },
+ { input54, output54, key54, nonce54, sizeof(input54) },
+ { input55, output55, key55, nonce55, sizeof(input55) },
+ { input56, output56, key56, nonce56, sizeof(input56) },
+ { input57, output57, key57, nonce57, sizeof(input57) },
+ { input58, output58, key58, nonce58, sizeof(input58) },
+ { input59, output59, key59, nonce59, sizeof(input59) },
+ { input60, output60, key60, nonce60, sizeof(input60) },
+ { input61, output61, key61, nonce61, sizeof(input61) },
+ { input62, output62, key62, nonce62, sizeof(input62) },
+ { input63, output63, key63, nonce63, sizeof(input63) },
+ { input64, output64, key64, nonce64, sizeof(input64) },
+ { input65, output65, key65, nonce65, sizeof(input65) },
+ { input66, output66, key66, nonce66, sizeof(input66) },
+ { input67, output67, key67, nonce67, sizeof(input67) },
+ { input68, output68, key68, nonce68, sizeof(input68) },
+ { input69, output69, key69, nonce69, sizeof(input69) },
+ { input70, output70, key70, nonce70, sizeof(input70) },
+ { input71, output71, key71, nonce71, sizeof(input71) },
+ { input72, output72, key72, nonce72, sizeof(input72) },
+ { input73, output73, key73, nonce73, sizeof(input73) },
+ { input74, output74, key74, nonce74, sizeof(input74) }
+};
+
+static const struct hchacha20_testvec hchacha20_testvecs[] __initconst = {{
+ .key = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f },
+ .nonce = { 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x4a,
+ 0x00, 0x00, 0x00, 0x00, 0x31, 0x41, 0x59, 0x27 },
+ .output = { 0x82, 0x41, 0x3b, 0x42, 0x27, 0xb2, 0x7b, 0xfe,
+ 0xd3, 0x0e, 0x42, 0x50, 0x8a, 0x87, 0x7d, 0x73,
+ 0xa0, 0xf9, 0xe4, 0xd5, 0x8a, 0x74, 0xa8, 0x53,
+ 0xc1, 0x2e, 0xc4, 0x13, 0x26, 0xd3, 0xec, 0xdc }
+}};
+
+static bool __init chacha20_selftest(void)
+{
+ enum { MAXIMUM_TEST_BUFFER_LEN = 1UL << 10 };
+ size_t i, j, k;
+ u32 derived_key[CHACHA20_KEY_WORDS];
+ u8 *offset_input = NULL, *computed_output = NULL;
+ u8 offset_key[CHACHA20_KEY_SIZE + 1]
+ __aligned(__alignof__(unsigned long));
+ struct chacha20_ctx state;
+ bool success = true;
+ simd_context_t simd_context;
+
+ offset_input = kmalloc(MAXIMUM_TEST_BUFFER_LEN + 1, GFP_KERNEL);
+ computed_output = kmalloc(MAXIMUM_TEST_BUFFER_LEN + 1, GFP_KERNEL);
+ if (!computed_output || !offset_input) {
+ pr_info("chacha20 self-test malloc: FAIL\n");
+ success = false;
+ goto out;
+ }
+
+ simd_get(&simd_context);
+ for (i = 0; i < ARRAY_SIZE(chacha20_testvecs); ++i) {
+ /* Boring case */
+ memset(computed_output, 0, MAXIMUM_TEST_BUFFER_LEN + 1);
+ memset(&state, 0, sizeof(state));
+ chacha20_init(&state, chacha20_testvecs[i].key,
+ chacha20_testvecs[i].nonce);
+ chacha20(&state, computed_output, chacha20_testvecs[i].input,
+ chacha20_testvecs[i].ilen, &simd_context);
+ if (memcmp(computed_output, chacha20_testvecs[i].output,
+ chacha20_testvecs[i].ilen)) {
+ pr_info("chacha20 self-test %zu: FAIL\n", i + 1);
+ success = false;
+ }
+ for (k = chacha20_testvecs[i].ilen;
+ k < MAXIMUM_TEST_BUFFER_LEN + 1; ++k) {
+ if (computed_output[k]) {
+ pr_info("chacha20 self-test %zu (zero check): FAIL\n",
+ i + 1);
+ success = false;
+ }
+ }
+
+ /* Unaligned case */
+ memset(computed_output, 0, MAXIMUM_TEST_BUFFER_LEN + 1);
+ memset(&state, 0, sizeof(state));
+ memcpy(offset_input + 1, chacha20_testvecs[i].input,
+ chacha20_testvecs[i].ilen);
+ memcpy(offset_key + 1, chacha20_testvecs[i].key,
+ CHACHA20_KEY_SIZE);
+ chacha20_init(&state, offset_key + 1, chacha20_testvecs[i].nonce);
+ chacha20(&state, computed_output + 1, offset_input + 1,
+ chacha20_testvecs[i].ilen, &simd_context);
+ if (memcmp(computed_output + 1, chacha20_testvecs[i].output,
+ chacha20_testvecs[i].ilen)) {
+ pr_info("chacha20 self-test %zu (unaligned): FAIL\n",
+ i + 1);
+ success = false;
+ }
+ if (computed_output[0]) {
+ pr_info("chacha20 self-test %zu (unaligned, zero check): FAIL\n",
+ i + 1);
+ success = false;
+ }
+ for (k = chacha20_testvecs[i].ilen + 1;
+ k < MAXIMUM_TEST_BUFFER_LEN + 1; ++k) {
+ if (computed_output[k]) {
+ pr_info("chacha20 self-test %zu (unaligned, zero check): FAIL\n",
+ i + 1);
+ success = false;
+ }
+ }
+
+ /* Chunked case */
+ if (chacha20_testvecs[i].ilen <= CHACHA20_BLOCK_SIZE)
+ goto next_test;
+ memset(computed_output, 0, MAXIMUM_TEST_BUFFER_LEN + 1);
+ memset(&state, 0, sizeof(state));
+ chacha20_init(&state, chacha20_testvecs[i].key,
+ chacha20_testvecs[i].nonce);
+ chacha20(&state, computed_output, chacha20_testvecs[i].input,
+ CHACHA20_BLOCK_SIZE, &simd_context);
+ chacha20(&state, computed_output + CHACHA20_BLOCK_SIZE,
+ chacha20_testvecs[i].input + CHACHA20_BLOCK_SIZE,
+ chacha20_testvecs[i].ilen - CHACHA20_BLOCK_SIZE,
+ &simd_context);
+ if (memcmp(computed_output, chacha20_testvecs[i].output,
+ chacha20_testvecs[i].ilen)) {
+ pr_info("chacha20 self-test %zu (chunked): FAIL\n",
+ i + 1);
+ success = false;
+ }
+ for (k = chacha20_testvecs[i].ilen;
+ k < MAXIMUM_TEST_BUFFER_LEN + 1; ++k) {
+ if (computed_output[k]) {
+ pr_info("chacha20 self-test %zu (chunked, zero check): FAIL\n",
+ i + 1);
+ success = false;
+ }
+ }
+
+next_test:
+ /* Sliding unaligned case */
+ if (chacha20_testvecs[i].ilen > CHACHA20_BLOCK_SIZE + 1 ||
+ !chacha20_testvecs[i].ilen)
+ continue;
+ for (j = 1; j < CHACHA20_BLOCK_SIZE; ++j) {
+ memset(computed_output, 0, MAXIMUM_TEST_BUFFER_LEN + 1);
+ memset(&state, 0, sizeof(state));
+ memcpy(offset_input + j, chacha20_testvecs[i].input,
+ chacha20_testvecs[i].ilen);
+ chacha20_init(&state, chacha20_testvecs[i].key,
+ chacha20_testvecs[i].nonce);
+ chacha20(&state, computed_output + j, offset_input + j,
+ chacha20_testvecs[i].ilen, &simd_context);
+ if (memcmp(computed_output + j,
+ chacha20_testvecs[i].output,
+ chacha20_testvecs[i].ilen)) {
+ pr_info("chacha20 self-test %zu (unaligned, slide %zu): FAIL\n",
+ i + 1, j);
+ success = false;
+ }
+ for (k = j; k < j; ++k) {
+ if (computed_output[k]) {
+ pr_info("chacha20 self-test %zu (unaligned, slide %zu, zero check): FAIL\n",
+ i + 1, j);
+ success = false;
+ }
+ }
+ for (k = chacha20_testvecs[i].ilen + j;
+ k < MAXIMUM_TEST_BUFFER_LEN + 1; ++k) {
+ if (computed_output[k]) {
+ pr_info("chacha20 self-test %zu (unaligned, slide %zu, zero check): FAIL\n",
+ i + 1, j);
+ success = false;
+ }
+ }
+ }
+ }
+ for (i = 0; i < ARRAY_SIZE(hchacha20_testvecs); ++i) {
+ memset(&derived_key, 0, sizeof(derived_key));
+ hchacha20(derived_key, hchacha20_testvecs[i].nonce,
+ hchacha20_testvecs[i].key, &simd_context);
+ cpu_to_le32_array(derived_key, ARRAY_SIZE(derived_key));
+ if (memcmp(derived_key, hchacha20_testvecs[i].output,
+ CHACHA20_KEY_SIZE)) {
+ pr_info("hchacha20 self-test %zu: FAIL\n", i + 1);
+ success = false;
+ }
+ }
+ simd_put(&simd_context);
+ if (success)
+ pr_info("chacha20 self-tests: pass\n");
+
+out:
+ kfree(offset_input);
+ kfree(computed_output);
+ return success;
+}
+#endif
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 04/23] zinc: ChaCha20 x86_64 implementation
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
` (2 preceding siblings ...)
2018-09-25 14:56 ` [PATCH net-next v6 03/23] zinc: ChaCha20 generic C implementation and selftest Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-28 15:47 ` Ard Biesheuvel
2018-10-02 3:18 ` Herbert Xu
2018-09-25 14:56 ` Jason A. Donenfeld
` (19 subsequent siblings)
23 siblings, 2 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Andy Polyakov, Thomas Gleixner,
Ingo Molnar, x86
This provides SSSE3, AVX-2, AVX-512F, and AVX-512VL implementations for
ChaCha20. The AVX-512F implementation is disabled on Skylake, due to
throttling, and the VL ymm implementation is used instead. These come
from Andy Polyakov's implementation, with the following modifications
from Samuel Neves:
- Some cosmetic changes, like renaming labels to .Lname, constants,
and other Linux conventions.
- CPU feature checking is done in C by the glue code, so that has been
removed from the assembly.
- Eliminate translating certain instructions, such as pshufb, palignr,
vprotd, etc, to .byte directives. This is meant for compatibility
with ancient toolchains, but presumably it is unnecessary here,
since the build system already does checks on what GNU as can
assemble.
- When aligning the stack, the original code was saving %rsp to %r9.
To keep objtool happy, we use instead the DRAP idiom to save %rsp
to %r10:
leaq 8(%rsp),%r10
... code here ...
leaq -8(%r10),%rsp
- The original code assumes the stack comes aligned to 16 bytes. This
is not necessarily the case, and to avoid crashes,
`andq $-alignment, %rsp` was added in the prolog of a few functions.
- The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret".
We replace this by "ret". "rep ret" was meant to help with AMD K8
chips, cf. http://repzret.org/p/repzret. It makes no sense to
continue to use this kludge for code that won't even run on ancient
AMD chips.
While this is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit cded951378069a478391843f5f8653c1eb5128da
Cycle counts on a Core i7 6700HQ using the AVX-2 codepath:
size old new
---- ---- ----
0 62 52
16 414 376
32 410 400
48 414 422
64 362 356
80 714 666
96 714 700
112 712 718
128 692 646
144 1042 674
160 1042 694
176 1042 726
192 1018 650
208 1366 686
224 1366 696
240 1366 722
256 640 656
272 988 1246
288 988 1276
304 992 1296
320 972 1222
336 1318 1256
352 1318 1276
368 1316 1294
384 1294 1218
400 1642 1258
416 1642 1282
432 1642 1302
448 1628 1224
464 1970 1258
480 1970 1280
496 1970 1300
512 656 676
528 1010 1290
544 1010 1306
560 1010 1332
576 986 1254
592 1340 1284
608 1334 1310
624 1340 1334
640 1314 1254
656 1664 1282
672 1674 1306
688 1662 1336
704 1638 1250
720 1992 1292
736 1994 1308
752 1988 1334
768 1252 1254
784 1596 1290
800 1596 1314
816 1596 1330
832 1576 1256
848 1922 1286
864 1922 1314
880 1926 1338
896 1898 1258
912 2248 1288
928 2248 1320
944 2248 1338
960 2226 1268
976 2574 1288
992 2576 1312
1008 2574 1340
Cycle counts on a Xeon Gold 5120 using the AVX-512 codepath:
size old new
---- ---- ----
0 64 54
16 386 372
32 388 396
48 388 420
64 366 350
80 708 666
96 708 692
112 706 736
128 692 648
144 1036 682
160 1036 708
176 1036 730
192 1016 658
208 1360 684
224 1362 708
240 1360 732
256 644 500
272 990 526
288 988 556
304 988 576
320 972 500
336 1314 532
352 1316 558
368 1318 578
384 1308 506
400 1644 532
416 1644 556
432 1644 594
448 1624 508
464 1970 534
480 1970 556
496 1968 582
512 660 624
528 1016 682
544 1016 702
560 1018 728
576 998 654
592 1344 680
608 1344 708
624 1344 730
640 1326 654
656 1670 686
672 1670 708
688 1670 732
704 1652 658
720 1998 682
736 1998 710
752 1996 734
768 1256 662
784 1606 688
800 1606 714
816 1606 736
832 1584 660
848 1948 688
864 1950 714
880 1948 736
896 1912 688
912 2258 718
928 2258 744
944 2256 768
960 2238 692
976 2584 718
992 2584 744
1008 2584 770
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
lib/zinc/Makefile | 1 +
lib/zinc/chacha20/chacha20-x86_64-glue.h | 105 +
lib/zinc/chacha20/chacha20-x86_64.S | 2632 ++++++++++++++++++++++
lib/zinc/chacha20/chacha20.c | 4 +-
4 files changed, 2741 insertions(+), 1 deletion(-)
create mode 100644 lib/zinc/chacha20/chacha20-x86_64-glue.h
create mode 100644 lib/zinc/chacha20/chacha20-x86_64.S
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 3d80144d55a6..223a0816c918 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -3,4 +3,5 @@ ccflags-y += -D'pr_fmt(fmt)="zinc: " fmt'
ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
zinc_chacha20-y := chacha20/chacha20.o
+zinc_chacha20-$(CONFIG_ZINC_ARCH_X86_64) += chacha20/chacha20-x86_64.o
obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
diff --git a/lib/zinc/chacha20/chacha20-x86_64-glue.h b/lib/zinc/chacha20/chacha20-x86_64-glue.h
new file mode 100644
index 000000000000..9b47001661a6
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-x86_64-glue.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/fpu/api.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <asm/intel-family.h>
+
+#ifdef CONFIG_AS_SSSE3
+asmlinkage void hchacha20_ssse3(u32 *derived_key, const u8 *nonce,
+ const u8 *key);
+asmlinkage void chacha20_ssse3(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#endif
+#ifdef CONFIG_AS_AVX2
+asmlinkage void chacha20_avx2(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#endif
+#ifdef CONFIG_AS_AVX512
+asmlinkage void chacha20_avx512(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+asmlinkage void chacha20_avx512vl(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#endif
+
+static bool chacha20_use_ssse3 __ro_after_init;
+static bool chacha20_use_avx2 __ro_after_init;
+static bool chacha20_use_avx512 __ro_after_init;
+static bool chacha20_use_avx512vl __ro_after_init;
+
+static void __init chacha20_fpu_init(void)
+{
+ chacha20_use_ssse3 = boot_cpu_has(X86_FEATURE_SSSE3);
+ chacha20_use_avx2 =
+ boot_cpu_has(X86_FEATURE_AVX) &&
+ boot_cpu_has(X86_FEATURE_AVX2) &&
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+ chacha20_use_avx512 =
+ boot_cpu_has(X86_FEATURE_AVX) &&
+ boot_cpu_has(X86_FEATURE_AVX2) &&
+ boot_cpu_has(X86_FEATURE_AVX512F) &&
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+ XFEATURE_MASK_AVX512, NULL) &&
+ /* Skylake downclocks unacceptably much when using zmm. */
+ boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X;
+ chacha20_use_avx512vl =
+ boot_cpu_has(X86_FEATURE_AVX) &&
+ boot_cpu_has(X86_FEATURE_AVX2) &&
+ boot_cpu_has(X86_FEATURE_AVX512F) &&
+ boot_cpu_has(X86_FEATURE_AVX512VL) &&
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+ XFEATURE_MASK_AVX512, NULL);
+}
+
+static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst,
+ const u8 *src, const size_t len,
+ simd_context_t *simd_context)
+{
+ if (!chacha20_use_ssse3 || len <= CHACHA20_BLOCK_SIZE ||
+ !simd_use(simd_context))
+ return false;
+
+#ifdef CONFIG_AS_AVX512
+ if (chacha20_use_avx512 && len >= CHACHA20_BLOCK_SIZE * 8) {
+ chacha20_avx512(dst, src, len, state->key, state->counter);
+ goto success;
+ }
+ if (chacha20_use_avx512vl && len >= CHACHA20_BLOCK_SIZE * 4) {
+ chacha20_avx512vl(dst, src, len, state->key, state->counter);
+ goto success;
+ }
+#endif
+#ifdef CONFIG_AS_AVX2
+ if (chacha20_use_avx2 && len >= CHACHA20_BLOCK_SIZE * 4) {
+ chacha20_avx2(dst, src, len, state->key, state->counter);
+ goto success;
+ }
+#endif
+#ifdef CONFIG_AS_SSSE3
+ if (chacha20_use_ssse3) {
+ chacha20_ssse3(dst, src, len, state->key, state->counter);
+ goto success;
+ }
+#endif
+ return false;
+success:
+ state->counter[0] += (len + 63) / 64;
+ return true;
+}
+
+static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_AS_SSSE3)
+ if (chacha20_use_ssse3 && simd_use(simd_context)) {
+ hchacha20_ssse3(derived_key, nonce, key);
+ return true;
+ }
+#endif
+ return false;
+}
diff --git a/lib/zinc/chacha20/chacha20-x86_64.S b/lib/zinc/chacha20/chacha20-x86_64.S
new file mode 100644
index 000000000000..3d10c7f21642
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-x86_64.S
@@ -0,0 +1,2632 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2017 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.section .rodata.cst16.Lzero, "aM", @progbits, 16
+.align 16
+.Lzero:
+.long 0,0,0,0
+.section .rodata.cst16.Lone, "aM", @progbits, 16
+.align 16
+.Lone:
+.long 1,0,0,0
+.section .rodata.cst16.Linc, "aM", @progbits, 16
+.align 16
+.Linc:
+.long 0,1,2,3
+.section .rodata.cst16.Lfour, "aM", @progbits, 16
+.align 16
+.Lfour:
+.long 4,4,4,4
+.section .rodata.cst32.Lincy, "aM", @progbits, 32
+.align 32
+.Lincy:
+.long 0,2,4,6,1,3,5,7
+.section .rodata.cst32.Leight, "aM", @progbits, 32
+.align 32
+.Leight:
+.long 8,8,8,8,8,8,8,8
+.section .rodata.cst16.Lrot16, "aM", @progbits, 16
+.align 16
+.Lrot16:
+.byte 0x2,0x3,0x0,0x1, 0x6,0x7,0x4,0x5, 0xa,0xb,0x8,0x9, 0xe,0xf,0xc,0xd
+.section .rodata.cst16.Lrot24, "aM", @progbits, 16
+.align 16
+.Lrot24:
+.byte 0x3,0x0,0x1,0x2, 0x7,0x4,0x5,0x6, 0xb,0x8,0x9,0xa, 0xf,0xc,0xd,0xe
+.section .rodata.cst16.Lsigma, "aM", @progbits, 16
+.align 16
+.Lsigma:
+.byte 101,120,112,97,110,100,32,51,50,45,98,121,116,101,32,107,0
+.section .rodata.cst64.Lzeroz, "aM", @progbits, 64
+.align 64
+.Lzeroz:
+.long 0,0,0,0, 1,0,0,0, 2,0,0,0, 3,0,0,0
+.section .rodata.cst64.Lfourz, "aM", @progbits, 64
+.align 64
+.Lfourz:
+.long 4,0,0,0, 4,0,0,0, 4,0,0,0, 4,0,0,0
+.section .rodata.cst64.Lincz, "aM", @progbits, 64
+.align 64
+.Lincz:
+.long 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+.section .rodata.cst64.Lsixteen, "aM", @progbits, 64
+.align 64
+.Lsixteen:
+.long 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16
+.section .rodata.cst32.Ltwoy, "aM", @progbits, 32
+.align 64
+.Ltwoy:
+.long 2,0,0,0, 2,0,0,0
+
+.text
+
+#ifdef CONFIG_AS_SSSE3
+.align 32
+ENTRY(hchacha20_ssse3)
+ movdqa .Lsigma(%rip),%xmm0
+ movdqu (%rdx),%xmm1
+ movdqu 16(%rdx),%xmm2
+ movdqu (%rsi),%xmm3
+ movdqa .Lrot16(%rip),%xmm6
+ movdqa .Lrot24(%rip),%xmm7
+ movq $10,%r8
+ .align 32
+.Loop_hssse3:
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm6,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $20,%xmm1
+ pslld $12,%xmm4
+ por %xmm4,%xmm1
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm7,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $25,%xmm1
+ pslld $7,%xmm4
+ por %xmm4,%xmm1
+ pshufd $78,%xmm2,%xmm2
+ pshufd $57,%xmm1,%xmm1
+ pshufd $147,%xmm3,%xmm3
+ nop
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm6,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $20,%xmm1
+ pslld $12,%xmm4
+ por %xmm4,%xmm1
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm7,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $25,%xmm1
+ pslld $7,%xmm4
+ por %xmm4,%xmm1
+ pshufd $78,%xmm2,%xmm2
+ pshufd $147,%xmm1,%xmm1
+ pshufd $57,%xmm3,%xmm3
+ decq %r8
+ jnz .Loop_hssse3
+ movdqu %xmm0,0(%rdi)
+ movdqu %xmm3,16(%rdi)
+ ret
+ENDPROC(hchacha20_ssse3)
+
+.align 32
+ENTRY(chacha20_ssse3)
+.Lchacha20_ssse3:
+ cmpq $0,%rdx
+ je .Lssse3_epilogue
+ leaq 8(%rsp),%r10
+
+ cmpq $128,%rdx
+ ja .Lchacha20_4x
+
+.Ldo_sse3_after_all:
+ subq $64+8,%rsp
+ andq $-32,%rsp
+ movdqa .Lsigma(%rip),%xmm0
+ movdqu (%rcx),%xmm1
+ movdqu 16(%rcx),%xmm2
+ movdqu (%r8),%xmm3
+ movdqa .Lrot16(%rip),%xmm6
+ movdqa .Lrot24(%rip),%xmm7
+
+ movdqa %xmm0,0(%rsp)
+ movdqa %xmm1,16(%rsp)
+ movdqa %xmm2,32(%rsp)
+ movdqa %xmm3,48(%rsp)
+ movq $10,%r8
+ jmp .Loop_ssse3
+
+.align 32
+.Loop_outer_ssse3:
+ movdqa .Lone(%rip),%xmm3
+ movdqa 0(%rsp),%xmm0
+ movdqa 16(%rsp),%xmm1
+ movdqa 32(%rsp),%xmm2
+ paddd 48(%rsp),%xmm3
+ movq $10,%r8
+ movdqa %xmm3,48(%rsp)
+ jmp .Loop_ssse3
+
+.align 32
+.Loop_ssse3:
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm6,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $20,%xmm1
+ pslld $12,%xmm4
+ por %xmm4,%xmm1
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm7,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $25,%xmm1
+ pslld $7,%xmm4
+ por %xmm4,%xmm1
+ pshufd $78,%xmm2,%xmm2
+ pshufd $57,%xmm1,%xmm1
+ pshufd $147,%xmm3,%xmm3
+ nop
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm6,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $20,%xmm1
+ pslld $12,%xmm4
+ por %xmm4,%xmm1
+ paddd %xmm1,%xmm0
+ pxor %xmm0,%xmm3
+ pshufb %xmm7,%xmm3
+ paddd %xmm3,%xmm2
+ pxor %xmm2,%xmm1
+ movdqa %xmm1,%xmm4
+ psrld $25,%xmm1
+ pslld $7,%xmm4
+ por %xmm4,%xmm1
+ pshufd $78,%xmm2,%xmm2
+ pshufd $147,%xmm1,%xmm1
+ pshufd $57,%xmm3,%xmm3
+ decq %r8
+ jnz .Loop_ssse3
+ paddd 0(%rsp),%xmm0
+ paddd 16(%rsp),%xmm1
+ paddd 32(%rsp),%xmm2
+ paddd 48(%rsp),%xmm3
+
+ cmpq $64,%rdx
+ jb .Ltail_ssse3
+
+ movdqu 0(%rsi),%xmm4
+ movdqu 16(%rsi),%xmm5
+ pxor %xmm4,%xmm0
+ movdqu 32(%rsi),%xmm4
+ pxor %xmm5,%xmm1
+ movdqu 48(%rsi),%xmm5
+ leaq 64(%rsi),%rsi
+ pxor %xmm4,%xmm2
+ pxor %xmm5,%xmm3
+
+ movdqu %xmm0,0(%rdi)
+ movdqu %xmm1,16(%rdi)
+ movdqu %xmm2,32(%rdi)
+ movdqu %xmm3,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ subq $64,%rdx
+ jnz .Loop_outer_ssse3
+
+ jmp .Ldone_ssse3
+
+.align 16
+.Ltail_ssse3:
+ movdqa %xmm0,0(%rsp)
+ movdqa %xmm1,16(%rsp)
+ movdqa %xmm2,32(%rsp)
+ movdqa %xmm3,48(%rsp)
+ xorq %r8,%r8
+
+.Loop_tail_ssse3:
+ movzbl (%rsi,%r8,1),%eax
+ movzbl (%rsp,%r8,1),%ecx
+ leaq 1(%r8),%r8
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r8,1)
+ decq %rdx
+ jnz .Loop_tail_ssse3
+
+.Ldone_ssse3:
+ leaq -8(%r10),%rsp
+
+.Lssse3_epilogue:
+ ret
+
+.align 32
+.Lchacha20_4x:
+ leaq 8(%rsp),%r10
+
+.Lproceed4x:
+ subq $0x140+8,%rsp
+ andq $-32,%rsp
+ movdqa .Lsigma(%rip),%xmm11
+ movdqu (%rcx),%xmm15
+ movdqu 16(%rcx),%xmm7
+ movdqu (%r8),%xmm3
+ leaq 256(%rsp),%rcx
+ leaq .Lrot16(%rip),%r9
+ leaq .Lrot24(%rip),%r11
+
+ pshufd $0x00,%xmm11,%xmm8
+ pshufd $0x55,%xmm11,%xmm9
+ movdqa %xmm8,64(%rsp)
+ pshufd $0xaa,%xmm11,%xmm10
+ movdqa %xmm9,80(%rsp)
+ pshufd $0xff,%xmm11,%xmm11
+ movdqa %xmm10,96(%rsp)
+ movdqa %xmm11,112(%rsp)
+
+ pshufd $0x00,%xmm15,%xmm12
+ pshufd $0x55,%xmm15,%xmm13
+ movdqa %xmm12,128-256(%rcx)
+ pshufd $0xaa,%xmm15,%xmm14
+ movdqa %xmm13,144-256(%rcx)
+ pshufd $0xff,%xmm15,%xmm15
+ movdqa %xmm14,160-256(%rcx)
+ movdqa %xmm15,176-256(%rcx)
+
+ pshufd $0x00,%xmm7,%xmm4
+ pshufd $0x55,%xmm7,%xmm5
+ movdqa %xmm4,192-256(%rcx)
+ pshufd $0xaa,%xmm7,%xmm6
+ movdqa %xmm5,208-256(%rcx)
+ pshufd $0xff,%xmm7,%xmm7
+ movdqa %xmm6,224-256(%rcx)
+ movdqa %xmm7,240-256(%rcx)
+
+ pshufd $0x00,%xmm3,%xmm0
+ pshufd $0x55,%xmm3,%xmm1
+ paddd .Linc(%rip),%xmm0
+ pshufd $0xaa,%xmm3,%xmm2
+ movdqa %xmm1,272-256(%rcx)
+ pshufd $0xff,%xmm3,%xmm3
+ movdqa %xmm2,288-256(%rcx)
+ movdqa %xmm3,304-256(%rcx)
+
+ jmp .Loop_enter4x
+
+.align 32
+.Loop_outer4x:
+ movdqa 64(%rsp),%xmm8
+ movdqa 80(%rsp),%xmm9
+ movdqa 96(%rsp),%xmm10
+ movdqa 112(%rsp),%xmm11
+ movdqa 128-256(%rcx),%xmm12
+ movdqa 144-256(%rcx),%xmm13
+ movdqa 160-256(%rcx),%xmm14
+ movdqa 176-256(%rcx),%xmm15
+ movdqa 192-256(%rcx),%xmm4
+ movdqa 208-256(%rcx),%xmm5
+ movdqa 224-256(%rcx),%xmm6
+ movdqa 240-256(%rcx),%xmm7
+ movdqa 256-256(%rcx),%xmm0
+ movdqa 272-256(%rcx),%xmm1
+ movdqa 288-256(%rcx),%xmm2
+ movdqa 304-256(%rcx),%xmm3
+ paddd .Lfour(%rip),%xmm0
+
+.Loop_enter4x:
+ movdqa %xmm6,32(%rsp)
+ movdqa %xmm7,48(%rsp)
+ movdqa (%r9),%xmm7
+ movl $10,%eax
+ movdqa %xmm0,256-256(%rcx)
+ jmp .Loop4x
+
+.align 32
+.Loop4x:
+ paddd %xmm12,%xmm8
+ paddd %xmm13,%xmm9
+ pxor %xmm8,%xmm0
+ pxor %xmm9,%xmm1
+ pshufb %xmm7,%xmm0
+ pshufb %xmm7,%xmm1
+ paddd %xmm0,%xmm4
+ paddd %xmm1,%xmm5
+ pxor %xmm4,%xmm12
+ pxor %xmm5,%xmm13
+ movdqa %xmm12,%xmm6
+ pslld $12,%xmm12
+ psrld $20,%xmm6
+ movdqa %xmm13,%xmm7
+ pslld $12,%xmm13
+ por %xmm6,%xmm12
+ psrld $20,%xmm7
+ movdqa (%r11),%xmm6
+ por %xmm7,%xmm13
+ paddd %xmm12,%xmm8
+ paddd %xmm13,%xmm9
+ pxor %xmm8,%xmm0
+ pxor %xmm9,%xmm1
+ pshufb %xmm6,%xmm0
+ pshufb %xmm6,%xmm1
+ paddd %xmm0,%xmm4
+ paddd %xmm1,%xmm5
+ pxor %xmm4,%xmm12
+ pxor %xmm5,%xmm13
+ movdqa %xmm12,%xmm7
+ pslld $7,%xmm12
+ psrld $25,%xmm7
+ movdqa %xmm13,%xmm6
+ pslld $7,%xmm13
+ por %xmm7,%xmm12
+ psrld $25,%xmm6
+ movdqa (%r9),%xmm7
+ por %xmm6,%xmm13
+ movdqa %xmm4,0(%rsp)
+ movdqa %xmm5,16(%rsp)
+ movdqa 32(%rsp),%xmm4
+ movdqa 48(%rsp),%xmm5
+ paddd %xmm14,%xmm10
+ paddd %xmm15,%xmm11
+ pxor %xmm10,%xmm2
+ pxor %xmm11,%xmm3
+ pshufb %xmm7,%xmm2
+ pshufb %xmm7,%xmm3
+ paddd %xmm2,%xmm4
+ paddd %xmm3,%xmm5
+ pxor %xmm4,%xmm14
+ pxor %xmm5,%xmm15
+ movdqa %xmm14,%xmm6
+ pslld $12,%xmm14
+ psrld $20,%xmm6
+ movdqa %xmm15,%xmm7
+ pslld $12,%xmm15
+ por %xmm6,%xmm14
+ psrld $20,%xmm7
+ movdqa (%r11),%xmm6
+ por %xmm7,%xmm15
+ paddd %xmm14,%xmm10
+ paddd %xmm15,%xmm11
+ pxor %xmm10,%xmm2
+ pxor %xmm11,%xmm3
+ pshufb %xmm6,%xmm2
+ pshufb %xmm6,%xmm3
+ paddd %xmm2,%xmm4
+ paddd %xmm3,%xmm5
+ pxor %xmm4,%xmm14
+ pxor %xmm5,%xmm15
+ movdqa %xmm14,%xmm7
+ pslld $7,%xmm14
+ psrld $25,%xmm7
+ movdqa %xmm15,%xmm6
+ pslld $7,%xmm15
+ por %xmm7,%xmm14
+ psrld $25,%xmm6
+ movdqa (%r9),%xmm7
+ por %xmm6,%xmm15
+ paddd %xmm13,%xmm8
+ paddd %xmm14,%xmm9
+ pxor %xmm8,%xmm3
+ pxor %xmm9,%xmm0
+ pshufb %xmm7,%xmm3
+ pshufb %xmm7,%xmm0
+ paddd %xmm3,%xmm4
+ paddd %xmm0,%xmm5
+ pxor %xmm4,%xmm13
+ pxor %xmm5,%xmm14
+ movdqa %xmm13,%xmm6
+ pslld $12,%xmm13
+ psrld $20,%xmm6
+ movdqa %xmm14,%xmm7
+ pslld $12,%xmm14
+ por %xmm6,%xmm13
+ psrld $20,%xmm7
+ movdqa (%r11),%xmm6
+ por %xmm7,%xmm14
+ paddd %xmm13,%xmm8
+ paddd %xmm14,%xmm9
+ pxor %xmm8,%xmm3
+ pxor %xmm9,%xmm0
+ pshufb %xmm6,%xmm3
+ pshufb %xmm6,%xmm0
+ paddd %xmm3,%xmm4
+ paddd %xmm0,%xmm5
+ pxor %xmm4,%xmm13
+ pxor %xmm5,%xmm14
+ movdqa %xmm13,%xmm7
+ pslld $7,%xmm13
+ psrld $25,%xmm7
+ movdqa %xmm14,%xmm6
+ pslld $7,%xmm14
+ por %xmm7,%xmm13
+ psrld $25,%xmm6
+ movdqa (%r9),%xmm7
+ por %xmm6,%xmm14
+ movdqa %xmm4,32(%rsp)
+ movdqa %xmm5,48(%rsp)
+ movdqa 0(%rsp),%xmm4
+ movdqa 16(%rsp),%xmm5
+ paddd %xmm15,%xmm10
+ paddd %xmm12,%xmm11
+ pxor %xmm10,%xmm1
+ pxor %xmm11,%xmm2
+ pshufb %xmm7,%xmm1
+ pshufb %xmm7,%xmm2
+ paddd %xmm1,%xmm4
+ paddd %xmm2,%xmm5
+ pxor %xmm4,%xmm15
+ pxor %xmm5,%xmm12
+ movdqa %xmm15,%xmm6
+ pslld $12,%xmm15
+ psrld $20,%xmm6
+ movdqa %xmm12,%xmm7
+ pslld $12,%xmm12
+ por %xmm6,%xmm15
+ psrld $20,%xmm7
+ movdqa (%r11),%xmm6
+ por %xmm7,%xmm12
+ paddd %xmm15,%xmm10
+ paddd %xmm12,%xmm11
+ pxor %xmm10,%xmm1
+ pxor %xmm11,%xmm2
+ pshufb %xmm6,%xmm1
+ pshufb %xmm6,%xmm2
+ paddd %xmm1,%xmm4
+ paddd %xmm2,%xmm5
+ pxor %xmm4,%xmm15
+ pxor %xmm5,%xmm12
+ movdqa %xmm15,%xmm7
+ pslld $7,%xmm15
+ psrld $25,%xmm7
+ movdqa %xmm12,%xmm6
+ pslld $7,%xmm12
+ por %xmm7,%xmm15
+ psrld $25,%xmm6
+ movdqa (%r9),%xmm7
+ por %xmm6,%xmm12
+ decl %eax
+ jnz .Loop4x
+
+ paddd 64(%rsp),%xmm8
+ paddd 80(%rsp),%xmm9
+ paddd 96(%rsp),%xmm10
+ paddd 112(%rsp),%xmm11
+
+ movdqa %xmm8,%xmm6
+ punpckldq %xmm9,%xmm8
+ movdqa %xmm10,%xmm7
+ punpckldq %xmm11,%xmm10
+ punpckhdq %xmm9,%xmm6
+ punpckhdq %xmm11,%xmm7
+ movdqa %xmm8,%xmm9
+ punpcklqdq %xmm10,%xmm8
+ movdqa %xmm6,%xmm11
+ punpcklqdq %xmm7,%xmm6
+ punpckhqdq %xmm10,%xmm9
+ punpckhqdq %xmm7,%xmm11
+ paddd 128-256(%rcx),%xmm12
+ paddd 144-256(%rcx),%xmm13
+ paddd 160-256(%rcx),%xmm14
+ paddd 176-256(%rcx),%xmm15
+
+ movdqa %xmm8,0(%rsp)
+ movdqa %xmm9,16(%rsp)
+ movdqa 32(%rsp),%xmm8
+ movdqa 48(%rsp),%xmm9
+
+ movdqa %xmm12,%xmm10
+ punpckldq %xmm13,%xmm12
+ movdqa %xmm14,%xmm7
+ punpckldq %xmm15,%xmm14
+ punpckhdq %xmm13,%xmm10
+ punpckhdq %xmm15,%xmm7
+ movdqa %xmm12,%xmm13
+ punpcklqdq %xmm14,%xmm12
+ movdqa %xmm10,%xmm15
+ punpcklqdq %xmm7,%xmm10
+ punpckhqdq %xmm14,%xmm13
+ punpckhqdq %xmm7,%xmm15
+ paddd 192-256(%rcx),%xmm4
+ paddd 208-256(%rcx),%xmm5
+ paddd 224-256(%rcx),%xmm8
+ paddd 240-256(%rcx),%xmm9
+
+ movdqa %xmm6,32(%rsp)
+ movdqa %xmm11,48(%rsp)
+
+ movdqa %xmm4,%xmm14
+ punpckldq %xmm5,%xmm4
+ movdqa %xmm8,%xmm7
+ punpckldq %xmm9,%xmm8
+ punpckhdq %xmm5,%xmm14
+ punpckhdq %xmm9,%xmm7
+ movdqa %xmm4,%xmm5
+ punpcklqdq %xmm8,%xmm4
+ movdqa %xmm14,%xmm9
+ punpcklqdq %xmm7,%xmm14
+ punpckhqdq %xmm8,%xmm5
+ punpckhqdq %xmm7,%xmm9
+ paddd 256-256(%rcx),%xmm0
+ paddd 272-256(%rcx),%xmm1
+ paddd 288-256(%rcx),%xmm2
+ paddd 304-256(%rcx),%xmm3
+
+ movdqa %xmm0,%xmm8
+ punpckldq %xmm1,%xmm0
+ movdqa %xmm2,%xmm7
+ punpckldq %xmm3,%xmm2
+ punpckhdq %xmm1,%xmm8
+ punpckhdq %xmm3,%xmm7
+ movdqa %xmm0,%xmm1
+ punpcklqdq %xmm2,%xmm0
+ movdqa %xmm8,%xmm3
+ punpcklqdq %xmm7,%xmm8
+ punpckhqdq %xmm2,%xmm1
+ punpckhqdq %xmm7,%xmm3
+ cmpq $256,%rdx
+ jb .Ltail4x
+
+ movdqu 0(%rsi),%xmm6
+ movdqu 16(%rsi),%xmm11
+ movdqu 32(%rsi),%xmm2
+ movdqu 48(%rsi),%xmm7
+ pxor 0(%rsp),%xmm6
+ pxor %xmm12,%xmm11
+ pxor %xmm4,%xmm2
+ pxor %xmm0,%xmm7
+
+ movdqu %xmm6,0(%rdi)
+ movdqu 64(%rsi),%xmm6
+ movdqu %xmm11,16(%rdi)
+ movdqu 80(%rsi),%xmm11
+ movdqu %xmm2,32(%rdi)
+ movdqu 96(%rsi),%xmm2
+ movdqu %xmm7,48(%rdi)
+ movdqu 112(%rsi),%xmm7
+ leaq 128(%rsi),%rsi
+ pxor 16(%rsp),%xmm6
+ pxor %xmm13,%xmm11
+ pxor %xmm5,%xmm2
+ pxor %xmm1,%xmm7
+
+ movdqu %xmm6,64(%rdi)
+ movdqu 0(%rsi),%xmm6
+ movdqu %xmm11,80(%rdi)
+ movdqu 16(%rsi),%xmm11
+ movdqu %xmm2,96(%rdi)
+ movdqu 32(%rsi),%xmm2
+ movdqu %xmm7,112(%rdi)
+ leaq 128(%rdi),%rdi
+ movdqu 48(%rsi),%xmm7
+ pxor 32(%rsp),%xmm6
+ pxor %xmm10,%xmm11
+ pxor %xmm14,%xmm2
+ pxor %xmm8,%xmm7
+
+ movdqu %xmm6,0(%rdi)
+ movdqu 64(%rsi),%xmm6
+ movdqu %xmm11,16(%rdi)
+ movdqu 80(%rsi),%xmm11
+ movdqu %xmm2,32(%rdi)
+ movdqu 96(%rsi),%xmm2
+ movdqu %xmm7,48(%rdi)
+ movdqu 112(%rsi),%xmm7
+ leaq 128(%rsi),%rsi
+ pxor 48(%rsp),%xmm6
+ pxor %xmm15,%xmm11
+ pxor %xmm9,%xmm2
+ pxor %xmm3,%xmm7
+ movdqu %xmm6,64(%rdi)
+ movdqu %xmm11,80(%rdi)
+ movdqu %xmm2,96(%rdi)
+ movdqu %xmm7,112(%rdi)
+ leaq 128(%rdi),%rdi
+
+ subq $256,%rdx
+ jnz .Loop_outer4x
+
+ jmp .Ldone4x
+
+.Ltail4x:
+ cmpq $192,%rdx
+ jae .L192_or_more4x
+ cmpq $128,%rdx
+ jae .L128_or_more4x
+ cmpq $64,%rdx
+ jae .L64_or_more4x
+
+
+ xorq %r9,%r9
+
+ movdqa %xmm12,16(%rsp)
+ movdqa %xmm4,32(%rsp)
+ movdqa %xmm0,48(%rsp)
+ jmp .Loop_tail4x
+
+.align 32
+.L64_or_more4x:
+ movdqu 0(%rsi),%xmm6
+ movdqu 16(%rsi),%xmm11
+ movdqu 32(%rsi),%xmm2
+ movdqu 48(%rsi),%xmm7
+ pxor 0(%rsp),%xmm6
+ pxor %xmm12,%xmm11
+ pxor %xmm4,%xmm2
+ pxor %xmm0,%xmm7
+ movdqu %xmm6,0(%rdi)
+ movdqu %xmm11,16(%rdi)
+ movdqu %xmm2,32(%rdi)
+ movdqu %xmm7,48(%rdi)
+ je .Ldone4x
+
+ movdqa 16(%rsp),%xmm6
+ leaq 64(%rsi),%rsi
+ xorq %r9,%r9
+ movdqa %xmm6,0(%rsp)
+ movdqa %xmm13,16(%rsp)
+ leaq 64(%rdi),%rdi
+ movdqa %xmm5,32(%rsp)
+ subq $64,%rdx
+ movdqa %xmm1,48(%rsp)
+ jmp .Loop_tail4x
+
+.align 32
+.L128_or_more4x:
+ movdqu 0(%rsi),%xmm6
+ movdqu 16(%rsi),%xmm11
+ movdqu 32(%rsi),%xmm2
+ movdqu 48(%rsi),%xmm7
+ pxor 0(%rsp),%xmm6
+ pxor %xmm12,%xmm11
+ pxor %xmm4,%xmm2
+ pxor %xmm0,%xmm7
+
+ movdqu %xmm6,0(%rdi)
+ movdqu 64(%rsi),%xmm6
+ movdqu %xmm11,16(%rdi)
+ movdqu 80(%rsi),%xmm11
+ movdqu %xmm2,32(%rdi)
+ movdqu 96(%rsi),%xmm2
+ movdqu %xmm7,48(%rdi)
+ movdqu 112(%rsi),%xmm7
+ pxor 16(%rsp),%xmm6
+ pxor %xmm13,%xmm11
+ pxor %xmm5,%xmm2
+ pxor %xmm1,%xmm7
+ movdqu %xmm6,64(%rdi)
+ movdqu %xmm11,80(%rdi)
+ movdqu %xmm2,96(%rdi)
+ movdqu %xmm7,112(%rdi)
+ je .Ldone4x
+
+ movdqa 32(%rsp),%xmm6
+ leaq 128(%rsi),%rsi
+ xorq %r9,%r9
+ movdqa %xmm6,0(%rsp)
+ movdqa %xmm10,16(%rsp)
+ leaq 128(%rdi),%rdi
+ movdqa %xmm14,32(%rsp)
+ subq $128,%rdx
+ movdqa %xmm8,48(%rsp)
+ jmp .Loop_tail4x
+
+.align 32
+.L192_or_more4x:
+ movdqu 0(%rsi),%xmm6
+ movdqu 16(%rsi),%xmm11
+ movdqu 32(%rsi),%xmm2
+ movdqu 48(%rsi),%xmm7
+ pxor 0(%rsp),%xmm6
+ pxor %xmm12,%xmm11
+ pxor %xmm4,%xmm2
+ pxor %xmm0,%xmm7
+
+ movdqu %xmm6,0(%rdi)
+ movdqu 64(%rsi),%xmm6
+ movdqu %xmm11,16(%rdi)
+ movdqu 80(%rsi),%xmm11
+ movdqu %xmm2,32(%rdi)
+ movdqu 96(%rsi),%xmm2
+ movdqu %xmm7,48(%rdi)
+ movdqu 112(%rsi),%xmm7
+ leaq 128(%rsi),%rsi
+ pxor 16(%rsp),%xmm6
+ pxor %xmm13,%xmm11
+ pxor %xmm5,%xmm2
+ pxor %xmm1,%xmm7
+
+ movdqu %xmm6,64(%rdi)
+ movdqu 0(%rsi),%xmm6
+ movdqu %xmm11,80(%rdi)
+ movdqu 16(%rsi),%xmm11
+ movdqu %xmm2,96(%rdi)
+ movdqu 32(%rsi),%xmm2
+ movdqu %xmm7,112(%rdi)
+ leaq 128(%rdi),%rdi
+ movdqu 48(%rsi),%xmm7
+ pxor 32(%rsp),%xmm6
+ pxor %xmm10,%xmm11
+ pxor %xmm14,%xmm2
+ pxor %xmm8,%xmm7
+ movdqu %xmm6,0(%rdi)
+ movdqu %xmm11,16(%rdi)
+ movdqu %xmm2,32(%rdi)
+ movdqu %xmm7,48(%rdi)
+ je .Ldone4x
+
+ movdqa 48(%rsp),%xmm6
+ leaq 64(%rsi),%rsi
+ xorq %r9,%r9
+ movdqa %xmm6,0(%rsp)
+ movdqa %xmm15,16(%rsp)
+ leaq 64(%rdi),%rdi
+ movdqa %xmm9,32(%rsp)
+ subq $192,%rdx
+ movdqa %xmm3,48(%rsp)
+
+.Loop_tail4x:
+ movzbl (%rsi,%r9,1),%eax
+ movzbl (%rsp,%r9,1),%ecx
+ leaq 1(%r9),%r9
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r9,1)
+ decq %rdx
+ jnz .Loop_tail4x
+
+.Ldone4x:
+ leaq -8(%r10),%rsp
+
+.L4x_epilogue:
+ ret
+ENDPROC(chacha20_ssse3)
+#endif /* CONFIG_AS_SSSE3 */
+
+#ifdef CONFIG_AS_AVX2
+.align 32
+ENTRY(chacha20_avx2)
+.Lchacha20_avx2:
+ cmpq $0,%rdx
+ je .L8x_epilogue
+ leaq 8(%rsp),%r10
+
+ subq $0x280+8,%rsp
+ andq $-32,%rsp
+ vzeroupper
+
+ vbroadcasti128 .Lsigma(%rip),%ymm11
+ vbroadcasti128 (%rcx),%ymm3
+ vbroadcasti128 16(%rcx),%ymm15
+ vbroadcasti128 (%r8),%ymm7
+ leaq 256(%rsp),%rcx
+ leaq 512(%rsp),%rax
+ leaq .Lrot16(%rip),%r9
+ leaq .Lrot24(%rip),%r11
+
+ vpshufd $0x00,%ymm11,%ymm8
+ vpshufd $0x55,%ymm11,%ymm9
+ vmovdqa %ymm8,128-256(%rcx)
+ vpshufd $0xaa,%ymm11,%ymm10
+ vmovdqa %ymm9,160-256(%rcx)
+ vpshufd $0xff,%ymm11,%ymm11
+ vmovdqa %ymm10,192-256(%rcx)
+ vmovdqa %ymm11,224-256(%rcx)
+
+ vpshufd $0x00,%ymm3,%ymm0
+ vpshufd $0x55,%ymm3,%ymm1
+ vmovdqa %ymm0,256-256(%rcx)
+ vpshufd $0xaa,%ymm3,%ymm2
+ vmovdqa %ymm1,288-256(%rcx)
+ vpshufd $0xff,%ymm3,%ymm3
+ vmovdqa %ymm2,320-256(%rcx)
+ vmovdqa %ymm3,352-256(%rcx)
+
+ vpshufd $0x00,%ymm15,%ymm12
+ vpshufd $0x55,%ymm15,%ymm13
+ vmovdqa %ymm12,384-512(%rax)
+ vpshufd $0xaa,%ymm15,%ymm14
+ vmovdqa %ymm13,416-512(%rax)
+ vpshufd $0xff,%ymm15,%ymm15
+ vmovdqa %ymm14,448-512(%rax)
+ vmovdqa %ymm15,480-512(%rax)
+
+ vpshufd $0x00,%ymm7,%ymm4
+ vpshufd $0x55,%ymm7,%ymm5
+ vpaddd .Lincy(%rip),%ymm4,%ymm4
+ vpshufd $0xaa,%ymm7,%ymm6
+ vmovdqa %ymm5,544-512(%rax)
+ vpshufd $0xff,%ymm7,%ymm7
+ vmovdqa %ymm6,576-512(%rax)
+ vmovdqa %ymm7,608-512(%rax)
+
+ jmp .Loop_enter8x
+
+.align 32
+.Loop_outer8x:
+ vmovdqa 128-256(%rcx),%ymm8
+ vmovdqa 160-256(%rcx),%ymm9
+ vmovdqa 192-256(%rcx),%ymm10
+ vmovdqa 224-256(%rcx),%ymm11
+ vmovdqa 256-256(%rcx),%ymm0
+ vmovdqa 288-256(%rcx),%ymm1
+ vmovdqa 320-256(%rcx),%ymm2
+ vmovdqa 352-256(%rcx),%ymm3
+ vmovdqa 384-512(%rax),%ymm12
+ vmovdqa 416-512(%rax),%ymm13
+ vmovdqa 448-512(%rax),%ymm14
+ vmovdqa 480-512(%rax),%ymm15
+ vmovdqa 512-512(%rax),%ymm4
+ vmovdqa 544-512(%rax),%ymm5
+ vmovdqa 576-512(%rax),%ymm6
+ vmovdqa 608-512(%rax),%ymm7
+ vpaddd .Leight(%rip),%ymm4,%ymm4
+
+.Loop_enter8x:
+ vmovdqa %ymm14,64(%rsp)
+ vmovdqa %ymm15,96(%rsp)
+ vbroadcasti128 (%r9),%ymm15
+ vmovdqa %ymm4,512-512(%rax)
+ movl $10,%eax
+ jmp .Loop8x
+
+.align 32
+.Loop8x:
+ vpaddd %ymm0,%ymm8,%ymm8
+ vpxor %ymm4,%ymm8,%ymm4
+ vpshufb %ymm15,%ymm4,%ymm4
+ vpaddd %ymm1,%ymm9,%ymm9
+ vpxor %ymm5,%ymm9,%ymm5
+ vpshufb %ymm15,%ymm5,%ymm5
+ vpaddd %ymm4,%ymm12,%ymm12
+ vpxor %ymm0,%ymm12,%ymm0
+ vpslld $12,%ymm0,%ymm14
+ vpsrld $20,%ymm0,%ymm0
+ vpor %ymm0,%ymm14,%ymm0
+ vbroadcasti128 (%r11),%ymm14
+ vpaddd %ymm5,%ymm13,%ymm13
+ vpxor %ymm1,%ymm13,%ymm1
+ vpslld $12,%ymm1,%ymm15
+ vpsrld $20,%ymm1,%ymm1
+ vpor %ymm1,%ymm15,%ymm1
+ vpaddd %ymm0,%ymm8,%ymm8
+ vpxor %ymm4,%ymm8,%ymm4
+ vpshufb %ymm14,%ymm4,%ymm4
+ vpaddd %ymm1,%ymm9,%ymm9
+ vpxor %ymm5,%ymm9,%ymm5
+ vpshufb %ymm14,%ymm5,%ymm5
+ vpaddd %ymm4,%ymm12,%ymm12
+ vpxor %ymm0,%ymm12,%ymm0
+ vpslld $7,%ymm0,%ymm15
+ vpsrld $25,%ymm0,%ymm0
+ vpor %ymm0,%ymm15,%ymm0
+ vbroadcasti128 (%r9),%ymm15
+ vpaddd %ymm5,%ymm13,%ymm13
+ vpxor %ymm1,%ymm13,%ymm1
+ vpslld $7,%ymm1,%ymm14
+ vpsrld $25,%ymm1,%ymm1
+ vpor %ymm1,%ymm14,%ymm1
+ vmovdqa %ymm12,0(%rsp)
+ vmovdqa %ymm13,32(%rsp)
+ vmovdqa 64(%rsp),%ymm12
+ vmovdqa 96(%rsp),%ymm13
+ vpaddd %ymm2,%ymm10,%ymm10
+ vpxor %ymm6,%ymm10,%ymm6
+ vpshufb %ymm15,%ymm6,%ymm6
+ vpaddd %ymm3,%ymm11,%ymm11
+ vpxor %ymm7,%ymm11,%ymm7
+ vpshufb %ymm15,%ymm7,%ymm7
+ vpaddd %ymm6,%ymm12,%ymm12
+ vpxor %ymm2,%ymm12,%ymm2
+ vpslld $12,%ymm2,%ymm14
+ vpsrld $20,%ymm2,%ymm2
+ vpor %ymm2,%ymm14,%ymm2
+ vbroadcasti128 (%r11),%ymm14
+ vpaddd %ymm7,%ymm13,%ymm13
+ vpxor %ymm3,%ymm13,%ymm3
+ vpslld $12,%ymm3,%ymm15
+ vpsrld $20,%ymm3,%ymm3
+ vpor %ymm3,%ymm15,%ymm3
+ vpaddd %ymm2,%ymm10,%ymm10
+ vpxor %ymm6,%ymm10,%ymm6
+ vpshufb %ymm14,%ymm6,%ymm6
+ vpaddd %ymm3,%ymm11,%ymm11
+ vpxor %ymm7,%ymm11,%ymm7
+ vpshufb %ymm14,%ymm7,%ymm7
+ vpaddd %ymm6,%ymm12,%ymm12
+ vpxor %ymm2,%ymm12,%ymm2
+ vpslld $7,%ymm2,%ymm15
+ vpsrld $25,%ymm2,%ymm2
+ vpor %ymm2,%ymm15,%ymm2
+ vbroadcasti128 (%r9),%ymm15
+ vpaddd %ymm7,%ymm13,%ymm13
+ vpxor %ymm3,%ymm13,%ymm3
+ vpslld $7,%ymm3,%ymm14
+ vpsrld $25,%ymm3,%ymm3
+ vpor %ymm3,%ymm14,%ymm3
+ vpaddd %ymm1,%ymm8,%ymm8
+ vpxor %ymm7,%ymm8,%ymm7
+ vpshufb %ymm15,%ymm7,%ymm7
+ vpaddd %ymm2,%ymm9,%ymm9
+ vpxor %ymm4,%ymm9,%ymm4
+ vpshufb %ymm15,%ymm4,%ymm4
+ vpaddd %ymm7,%ymm12,%ymm12
+ vpxor %ymm1,%ymm12,%ymm1
+ vpslld $12,%ymm1,%ymm14
+ vpsrld $20,%ymm1,%ymm1
+ vpor %ymm1,%ymm14,%ymm1
+ vbroadcasti128 (%r11),%ymm14
+ vpaddd %ymm4,%ymm13,%ymm13
+ vpxor %ymm2,%ymm13,%ymm2
+ vpslld $12,%ymm2,%ymm15
+ vpsrld $20,%ymm2,%ymm2
+ vpor %ymm2,%ymm15,%ymm2
+ vpaddd %ymm1,%ymm8,%ymm8
+ vpxor %ymm7,%ymm8,%ymm7
+ vpshufb %ymm14,%ymm7,%ymm7
+ vpaddd %ymm2,%ymm9,%ymm9
+ vpxor %ymm4,%ymm9,%ymm4
+ vpshufb %ymm14,%ymm4,%ymm4
+ vpaddd %ymm7,%ymm12,%ymm12
+ vpxor %ymm1,%ymm12,%ymm1
+ vpslld $7,%ymm1,%ymm15
+ vpsrld $25,%ymm1,%ymm1
+ vpor %ymm1,%ymm15,%ymm1
+ vbroadcasti128 (%r9),%ymm15
+ vpaddd %ymm4,%ymm13,%ymm13
+ vpxor %ymm2,%ymm13,%ymm2
+ vpslld $7,%ymm2,%ymm14
+ vpsrld $25,%ymm2,%ymm2
+ vpor %ymm2,%ymm14,%ymm2
+ vmovdqa %ymm12,64(%rsp)
+ vmovdqa %ymm13,96(%rsp)
+ vmovdqa 0(%rsp),%ymm12
+ vmovdqa 32(%rsp),%ymm13
+ vpaddd %ymm3,%ymm10,%ymm10
+ vpxor %ymm5,%ymm10,%ymm5
+ vpshufb %ymm15,%ymm5,%ymm5
+ vpaddd %ymm0,%ymm11,%ymm11
+ vpxor %ymm6,%ymm11,%ymm6
+ vpshufb %ymm15,%ymm6,%ymm6
+ vpaddd %ymm5,%ymm12,%ymm12
+ vpxor %ymm3,%ymm12,%ymm3
+ vpslld $12,%ymm3,%ymm14
+ vpsrld $20,%ymm3,%ymm3
+ vpor %ymm3,%ymm14,%ymm3
+ vbroadcasti128 (%r11),%ymm14
+ vpaddd %ymm6,%ymm13,%ymm13
+ vpxor %ymm0,%ymm13,%ymm0
+ vpslld $12,%ymm0,%ymm15
+ vpsrld $20,%ymm0,%ymm0
+ vpor %ymm0,%ymm15,%ymm0
+ vpaddd %ymm3,%ymm10,%ymm10
+ vpxor %ymm5,%ymm10,%ymm5
+ vpshufb %ymm14,%ymm5,%ymm5
+ vpaddd %ymm0,%ymm11,%ymm11
+ vpxor %ymm6,%ymm11,%ymm6
+ vpshufb %ymm14,%ymm6,%ymm6
+ vpaddd %ymm5,%ymm12,%ymm12
+ vpxor %ymm3,%ymm12,%ymm3
+ vpslld $7,%ymm3,%ymm15
+ vpsrld $25,%ymm3,%ymm3
+ vpor %ymm3,%ymm15,%ymm3
+ vbroadcasti128 (%r9),%ymm15
+ vpaddd %ymm6,%ymm13,%ymm13
+ vpxor %ymm0,%ymm13,%ymm0
+ vpslld $7,%ymm0,%ymm14
+ vpsrld $25,%ymm0,%ymm0
+ vpor %ymm0,%ymm14,%ymm0
+ decl %eax
+ jnz .Loop8x
+
+ leaq 512(%rsp),%rax
+ vpaddd 128-256(%rcx),%ymm8,%ymm8
+ vpaddd 160-256(%rcx),%ymm9,%ymm9
+ vpaddd 192-256(%rcx),%ymm10,%ymm10
+ vpaddd 224-256(%rcx),%ymm11,%ymm11
+
+ vpunpckldq %ymm9,%ymm8,%ymm14
+ vpunpckldq %ymm11,%ymm10,%ymm15
+ vpunpckhdq %ymm9,%ymm8,%ymm8
+ vpunpckhdq %ymm11,%ymm10,%ymm10
+ vpunpcklqdq %ymm15,%ymm14,%ymm9
+ vpunpckhqdq %ymm15,%ymm14,%ymm14
+ vpunpcklqdq %ymm10,%ymm8,%ymm11
+ vpunpckhqdq %ymm10,%ymm8,%ymm8
+ vpaddd 256-256(%rcx),%ymm0,%ymm0
+ vpaddd 288-256(%rcx),%ymm1,%ymm1
+ vpaddd 320-256(%rcx),%ymm2,%ymm2
+ vpaddd 352-256(%rcx),%ymm3,%ymm3
+
+ vpunpckldq %ymm1,%ymm0,%ymm10
+ vpunpckldq %ymm3,%ymm2,%ymm15
+ vpunpckhdq %ymm1,%ymm0,%ymm0
+ vpunpckhdq %ymm3,%ymm2,%ymm2
+ vpunpcklqdq %ymm15,%ymm10,%ymm1
+ vpunpckhqdq %ymm15,%ymm10,%ymm10
+ vpunpcklqdq %ymm2,%ymm0,%ymm3
+ vpunpckhqdq %ymm2,%ymm0,%ymm0
+ vperm2i128 $0x20,%ymm1,%ymm9,%ymm15
+ vperm2i128 $0x31,%ymm1,%ymm9,%ymm1
+ vperm2i128 $0x20,%ymm10,%ymm14,%ymm9
+ vperm2i128 $0x31,%ymm10,%ymm14,%ymm10
+ vperm2i128 $0x20,%ymm3,%ymm11,%ymm14
+ vperm2i128 $0x31,%ymm3,%ymm11,%ymm3
+ vperm2i128 $0x20,%ymm0,%ymm8,%ymm11
+ vperm2i128 $0x31,%ymm0,%ymm8,%ymm0
+ vmovdqa %ymm15,0(%rsp)
+ vmovdqa %ymm9,32(%rsp)
+ vmovdqa 64(%rsp),%ymm15
+ vmovdqa 96(%rsp),%ymm9
+
+ vpaddd 384-512(%rax),%ymm12,%ymm12
+ vpaddd 416-512(%rax),%ymm13,%ymm13
+ vpaddd 448-512(%rax),%ymm15,%ymm15
+ vpaddd 480-512(%rax),%ymm9,%ymm9
+
+ vpunpckldq %ymm13,%ymm12,%ymm2
+ vpunpckldq %ymm9,%ymm15,%ymm8
+ vpunpckhdq %ymm13,%ymm12,%ymm12
+ vpunpckhdq %ymm9,%ymm15,%ymm15
+ vpunpcklqdq %ymm8,%ymm2,%ymm13
+ vpunpckhqdq %ymm8,%ymm2,%ymm2
+ vpunpcklqdq %ymm15,%ymm12,%ymm9
+ vpunpckhqdq %ymm15,%ymm12,%ymm12
+ vpaddd 512-512(%rax),%ymm4,%ymm4
+ vpaddd 544-512(%rax),%ymm5,%ymm5
+ vpaddd 576-512(%rax),%ymm6,%ymm6
+ vpaddd 608-512(%rax),%ymm7,%ymm7
+
+ vpunpckldq %ymm5,%ymm4,%ymm15
+ vpunpckldq %ymm7,%ymm6,%ymm8
+ vpunpckhdq %ymm5,%ymm4,%ymm4
+ vpunpckhdq %ymm7,%ymm6,%ymm6
+ vpunpcklqdq %ymm8,%ymm15,%ymm5
+ vpunpckhqdq %ymm8,%ymm15,%ymm15
+ vpunpcklqdq %ymm6,%ymm4,%ymm7
+ vpunpckhqdq %ymm6,%ymm4,%ymm4
+ vperm2i128 $0x20,%ymm5,%ymm13,%ymm8
+ vperm2i128 $0x31,%ymm5,%ymm13,%ymm5
+ vperm2i128 $0x20,%ymm15,%ymm2,%ymm13
+ vperm2i128 $0x31,%ymm15,%ymm2,%ymm15
+ vperm2i128 $0x20,%ymm7,%ymm9,%ymm2
+ vperm2i128 $0x31,%ymm7,%ymm9,%ymm7
+ vperm2i128 $0x20,%ymm4,%ymm12,%ymm9
+ vperm2i128 $0x31,%ymm4,%ymm12,%ymm4
+ vmovdqa 0(%rsp),%ymm6
+ vmovdqa 32(%rsp),%ymm12
+
+ cmpq $512,%rdx
+ jb .Ltail8x
+
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ leaq 128(%rsi),%rsi
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ leaq 128(%rdi),%rdi
+
+ vpxor 0(%rsi),%ymm12,%ymm12
+ vpxor 32(%rsi),%ymm13,%ymm13
+ vpxor 64(%rsi),%ymm10,%ymm10
+ vpxor 96(%rsi),%ymm15,%ymm15
+ leaq 128(%rsi),%rsi
+ vmovdqu %ymm12,0(%rdi)
+ vmovdqu %ymm13,32(%rdi)
+ vmovdqu %ymm10,64(%rdi)
+ vmovdqu %ymm15,96(%rdi)
+ leaq 128(%rdi),%rdi
+
+ vpxor 0(%rsi),%ymm14,%ymm14
+ vpxor 32(%rsi),%ymm2,%ymm2
+ vpxor 64(%rsi),%ymm3,%ymm3
+ vpxor 96(%rsi),%ymm7,%ymm7
+ leaq 128(%rsi),%rsi
+ vmovdqu %ymm14,0(%rdi)
+ vmovdqu %ymm2,32(%rdi)
+ vmovdqu %ymm3,64(%rdi)
+ vmovdqu %ymm7,96(%rdi)
+ leaq 128(%rdi),%rdi
+
+ vpxor 0(%rsi),%ymm11,%ymm11
+ vpxor 32(%rsi),%ymm9,%ymm9
+ vpxor 64(%rsi),%ymm0,%ymm0
+ vpxor 96(%rsi),%ymm4,%ymm4
+ leaq 128(%rsi),%rsi
+ vmovdqu %ymm11,0(%rdi)
+ vmovdqu %ymm9,32(%rdi)
+ vmovdqu %ymm0,64(%rdi)
+ vmovdqu %ymm4,96(%rdi)
+ leaq 128(%rdi),%rdi
+
+ subq $512,%rdx
+ jnz .Loop_outer8x
+
+ jmp .Ldone8x
+
+.Ltail8x:
+ cmpq $448,%rdx
+ jae .L448_or_more8x
+ cmpq $384,%rdx
+ jae .L384_or_more8x
+ cmpq $320,%rdx
+ jae .L320_or_more8x
+ cmpq $256,%rdx
+ jae .L256_or_more8x
+ cmpq $192,%rdx
+ jae .L192_or_more8x
+ cmpq $128,%rdx
+ jae .L128_or_more8x
+ cmpq $64,%rdx
+ jae .L64_or_more8x
+
+ xorq %r9,%r9
+ vmovdqa %ymm6,0(%rsp)
+ vmovdqa %ymm8,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L64_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ je .Ldone8x
+
+ leaq 64(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm1,0(%rsp)
+ leaq 64(%rdi),%rdi
+ subq $64,%rdx
+ vmovdqa %ymm5,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L128_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ je .Ldone8x
+
+ leaq 128(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm12,0(%rsp)
+ leaq 128(%rdi),%rdi
+ subq $128,%rdx
+ vmovdqa %ymm13,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L192_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ vpxor 128(%rsi),%ymm12,%ymm12
+ vpxor 160(%rsi),%ymm13,%ymm13
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ vmovdqu %ymm12,128(%rdi)
+ vmovdqu %ymm13,160(%rdi)
+ je .Ldone8x
+
+ leaq 192(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm10,0(%rsp)
+ leaq 192(%rdi),%rdi
+ subq $192,%rdx
+ vmovdqa %ymm15,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L256_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ vpxor 128(%rsi),%ymm12,%ymm12
+ vpxor 160(%rsi),%ymm13,%ymm13
+ vpxor 192(%rsi),%ymm10,%ymm10
+ vpxor 224(%rsi),%ymm15,%ymm15
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ vmovdqu %ymm12,128(%rdi)
+ vmovdqu %ymm13,160(%rdi)
+ vmovdqu %ymm10,192(%rdi)
+ vmovdqu %ymm15,224(%rdi)
+ je .Ldone8x
+
+ leaq 256(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm14,0(%rsp)
+ leaq 256(%rdi),%rdi
+ subq $256,%rdx
+ vmovdqa %ymm2,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L320_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ vpxor 128(%rsi),%ymm12,%ymm12
+ vpxor 160(%rsi),%ymm13,%ymm13
+ vpxor 192(%rsi),%ymm10,%ymm10
+ vpxor 224(%rsi),%ymm15,%ymm15
+ vpxor 256(%rsi),%ymm14,%ymm14
+ vpxor 288(%rsi),%ymm2,%ymm2
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ vmovdqu %ymm12,128(%rdi)
+ vmovdqu %ymm13,160(%rdi)
+ vmovdqu %ymm10,192(%rdi)
+ vmovdqu %ymm15,224(%rdi)
+ vmovdqu %ymm14,256(%rdi)
+ vmovdqu %ymm2,288(%rdi)
+ je .Ldone8x
+
+ leaq 320(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm3,0(%rsp)
+ leaq 320(%rdi),%rdi
+ subq $320,%rdx
+ vmovdqa %ymm7,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L384_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ vpxor 128(%rsi),%ymm12,%ymm12
+ vpxor 160(%rsi),%ymm13,%ymm13
+ vpxor 192(%rsi),%ymm10,%ymm10
+ vpxor 224(%rsi),%ymm15,%ymm15
+ vpxor 256(%rsi),%ymm14,%ymm14
+ vpxor 288(%rsi),%ymm2,%ymm2
+ vpxor 320(%rsi),%ymm3,%ymm3
+ vpxor 352(%rsi),%ymm7,%ymm7
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ vmovdqu %ymm12,128(%rdi)
+ vmovdqu %ymm13,160(%rdi)
+ vmovdqu %ymm10,192(%rdi)
+ vmovdqu %ymm15,224(%rdi)
+ vmovdqu %ymm14,256(%rdi)
+ vmovdqu %ymm2,288(%rdi)
+ vmovdqu %ymm3,320(%rdi)
+ vmovdqu %ymm7,352(%rdi)
+ je .Ldone8x
+
+ leaq 384(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm11,0(%rsp)
+ leaq 384(%rdi),%rdi
+ subq $384,%rdx
+ vmovdqa %ymm9,32(%rsp)
+ jmp .Loop_tail8x
+
+.align 32
+.L448_or_more8x:
+ vpxor 0(%rsi),%ymm6,%ymm6
+ vpxor 32(%rsi),%ymm8,%ymm8
+ vpxor 64(%rsi),%ymm1,%ymm1
+ vpxor 96(%rsi),%ymm5,%ymm5
+ vpxor 128(%rsi),%ymm12,%ymm12
+ vpxor 160(%rsi),%ymm13,%ymm13
+ vpxor 192(%rsi),%ymm10,%ymm10
+ vpxor 224(%rsi),%ymm15,%ymm15
+ vpxor 256(%rsi),%ymm14,%ymm14
+ vpxor 288(%rsi),%ymm2,%ymm2
+ vpxor 320(%rsi),%ymm3,%ymm3
+ vpxor 352(%rsi),%ymm7,%ymm7
+ vpxor 384(%rsi),%ymm11,%ymm11
+ vpxor 416(%rsi),%ymm9,%ymm9
+ vmovdqu %ymm6,0(%rdi)
+ vmovdqu %ymm8,32(%rdi)
+ vmovdqu %ymm1,64(%rdi)
+ vmovdqu %ymm5,96(%rdi)
+ vmovdqu %ymm12,128(%rdi)
+ vmovdqu %ymm13,160(%rdi)
+ vmovdqu %ymm10,192(%rdi)
+ vmovdqu %ymm15,224(%rdi)
+ vmovdqu %ymm14,256(%rdi)
+ vmovdqu %ymm2,288(%rdi)
+ vmovdqu %ymm3,320(%rdi)
+ vmovdqu %ymm7,352(%rdi)
+ vmovdqu %ymm11,384(%rdi)
+ vmovdqu %ymm9,416(%rdi)
+ je .Ldone8x
+
+ leaq 448(%rsi),%rsi
+ xorq %r9,%r9
+ vmovdqa %ymm0,0(%rsp)
+ leaq 448(%rdi),%rdi
+ subq $448,%rdx
+ vmovdqa %ymm4,32(%rsp)
+
+.Loop_tail8x:
+ movzbl (%rsi,%r9,1),%eax
+ movzbl (%rsp,%r9,1),%ecx
+ leaq 1(%r9),%r9
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r9,1)
+ decq %rdx
+ jnz .Loop_tail8x
+
+.Ldone8x:
+ vzeroall
+ leaq -8(%r10),%rsp
+
+.L8x_epilogue:
+ ret
+ENDPROC(chacha20_avx2)
+#endif /* CONFIG_AS_AVX2 */
+
+#ifdef CONFIG_AS_AVX512
+.align 32
+ENTRY(chacha20_avx512)
+.Lchacha20_avx512:
+ cmpq $0,%rdx
+ je .Lavx512_epilogue
+ leaq 8(%rsp),%r10
+
+ cmpq $512,%rdx
+ ja .Lchacha20_16x
+
+ subq $64+8,%rsp
+ andq $-64,%rsp
+ vbroadcasti32x4 .Lsigma(%rip),%zmm0
+ vbroadcasti32x4 (%rcx),%zmm1
+ vbroadcasti32x4 16(%rcx),%zmm2
+ vbroadcasti32x4 (%r8),%zmm3
+
+ vmovdqa32 %zmm0,%zmm16
+ vmovdqa32 %zmm1,%zmm17
+ vmovdqa32 %zmm2,%zmm18
+ vpaddd .Lzeroz(%rip),%zmm3,%zmm3
+ vmovdqa32 .Lfourz(%rip),%zmm20
+ movq $10,%r8
+ vmovdqa32 %zmm3,%zmm19
+ jmp .Loop_avx512
+
+.align 16
+.Loop_outer_avx512:
+ vmovdqa32 %zmm16,%zmm0
+ vmovdqa32 %zmm17,%zmm1
+ vmovdqa32 %zmm18,%zmm2
+ vpaddd %zmm20,%zmm19,%zmm3
+ movq $10,%r8
+ vmovdqa32 %zmm3,%zmm19
+ jmp .Loop_avx512
+
+.align 32
+.Loop_avx512:
+ vpaddd %zmm1,%zmm0,%zmm0
+ vpxord %zmm0,%zmm3,%zmm3
+ vprold $16,%zmm3,%zmm3
+ vpaddd %zmm3,%zmm2,%zmm2
+ vpxord %zmm2,%zmm1,%zmm1
+ vprold $12,%zmm1,%zmm1
+ vpaddd %zmm1,%zmm0,%zmm0
+ vpxord %zmm0,%zmm3,%zmm3
+ vprold $8,%zmm3,%zmm3
+ vpaddd %zmm3,%zmm2,%zmm2
+ vpxord %zmm2,%zmm1,%zmm1
+ vprold $7,%zmm1,%zmm1
+ vpshufd $78,%zmm2,%zmm2
+ vpshufd $57,%zmm1,%zmm1
+ vpshufd $147,%zmm3,%zmm3
+ vpaddd %zmm1,%zmm0,%zmm0
+ vpxord %zmm0,%zmm3,%zmm3
+ vprold $16,%zmm3,%zmm3
+ vpaddd %zmm3,%zmm2,%zmm2
+ vpxord %zmm2,%zmm1,%zmm1
+ vprold $12,%zmm1,%zmm1
+ vpaddd %zmm1,%zmm0,%zmm0
+ vpxord %zmm0,%zmm3,%zmm3
+ vprold $8,%zmm3,%zmm3
+ vpaddd %zmm3,%zmm2,%zmm2
+ vpxord %zmm2,%zmm1,%zmm1
+ vprold $7,%zmm1,%zmm1
+ vpshufd $78,%zmm2,%zmm2
+ vpshufd $147,%zmm1,%zmm1
+ vpshufd $57,%zmm3,%zmm3
+ decq %r8
+ jnz .Loop_avx512
+ vpaddd %zmm16,%zmm0,%zmm0
+ vpaddd %zmm17,%zmm1,%zmm1
+ vpaddd %zmm18,%zmm2,%zmm2
+ vpaddd %zmm19,%zmm3,%zmm3
+
+ subq $64,%rdx
+ jb .Ltail64_avx512
+
+ vpxor 0(%rsi),%xmm0,%xmm4
+ vpxor 16(%rsi),%xmm1,%xmm5
+ vpxor 32(%rsi),%xmm2,%xmm6
+ vpxor 48(%rsi),%xmm3,%xmm7
+ leaq 64(%rsi),%rsi
+
+ vmovdqu %xmm4,0(%rdi)
+ vmovdqu %xmm5,16(%rdi)
+ vmovdqu %xmm6,32(%rdi)
+ vmovdqu %xmm7,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ jz .Ldone_avx512
+
+ vextracti32x4 $1,%zmm0,%xmm4
+ vextracti32x4 $1,%zmm1,%xmm5
+ vextracti32x4 $1,%zmm2,%xmm6
+ vextracti32x4 $1,%zmm3,%xmm7
+
+ subq $64,%rdx
+ jb .Ltail_avx512
+
+ vpxor 0(%rsi),%xmm4,%xmm4
+ vpxor 16(%rsi),%xmm5,%xmm5
+ vpxor 32(%rsi),%xmm6,%xmm6
+ vpxor 48(%rsi),%xmm7,%xmm7
+ leaq 64(%rsi),%rsi
+
+ vmovdqu %xmm4,0(%rdi)
+ vmovdqu %xmm5,16(%rdi)
+ vmovdqu %xmm6,32(%rdi)
+ vmovdqu %xmm7,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ jz .Ldone_avx512
+
+ vextracti32x4 $2,%zmm0,%xmm4
+ vextracti32x4 $2,%zmm1,%xmm5
+ vextracti32x4 $2,%zmm2,%xmm6
+ vextracti32x4 $2,%zmm3,%xmm7
+
+ subq $64,%rdx
+ jb .Ltail_avx512
+
+ vpxor 0(%rsi),%xmm4,%xmm4
+ vpxor 16(%rsi),%xmm5,%xmm5
+ vpxor 32(%rsi),%xmm6,%xmm6
+ vpxor 48(%rsi),%xmm7,%xmm7
+ leaq 64(%rsi),%rsi
+
+ vmovdqu %xmm4,0(%rdi)
+ vmovdqu %xmm5,16(%rdi)
+ vmovdqu %xmm6,32(%rdi)
+ vmovdqu %xmm7,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ jz .Ldone_avx512
+
+ vextracti32x4 $3,%zmm0,%xmm4
+ vextracti32x4 $3,%zmm1,%xmm5
+ vextracti32x4 $3,%zmm2,%xmm6
+ vextracti32x4 $3,%zmm3,%xmm7
+
+ subq $64,%rdx
+ jb .Ltail_avx512
+
+ vpxor 0(%rsi),%xmm4,%xmm4
+ vpxor 16(%rsi),%xmm5,%xmm5
+ vpxor 32(%rsi),%xmm6,%xmm6
+ vpxor 48(%rsi),%xmm7,%xmm7
+ leaq 64(%rsi),%rsi
+
+ vmovdqu %xmm4,0(%rdi)
+ vmovdqu %xmm5,16(%rdi)
+ vmovdqu %xmm6,32(%rdi)
+ vmovdqu %xmm7,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ jnz .Loop_outer_avx512
+
+ jmp .Ldone_avx512
+
+.align 16
+.Ltail64_avx512:
+ vmovdqa %xmm0,0(%rsp)
+ vmovdqa %xmm1,16(%rsp)
+ vmovdqa %xmm2,32(%rsp)
+ vmovdqa %xmm3,48(%rsp)
+ addq $64,%rdx
+ jmp .Loop_tail_avx512
+
+.align 16
+.Ltail_avx512:
+ vmovdqa %xmm4,0(%rsp)
+ vmovdqa %xmm5,16(%rsp)
+ vmovdqa %xmm6,32(%rsp)
+ vmovdqa %xmm7,48(%rsp)
+ addq $64,%rdx
+
+.Loop_tail_avx512:
+ movzbl (%rsi,%r8,1),%eax
+ movzbl (%rsp,%r8,1),%ecx
+ leaq 1(%r8),%r8
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r8,1)
+ decq %rdx
+ jnz .Loop_tail_avx512
+
+ vmovdqa32 %zmm16,0(%rsp)
+
+.Ldone_avx512:
+ vzeroall
+ leaq -8(%r10),%rsp
+
+.Lavx512_epilogue:
+ ret
+
+.align 32
+.Lchacha20_16x:
+ leaq 8(%rsp),%r10
+
+ subq $64+8,%rsp
+ andq $-64,%rsp
+ vzeroupper
+
+ leaq .Lsigma(%rip),%r9
+ vbroadcasti32x4 (%r9),%zmm3
+ vbroadcasti32x4 (%rcx),%zmm7
+ vbroadcasti32x4 16(%rcx),%zmm11
+ vbroadcasti32x4 (%r8),%zmm15
+
+ vpshufd $0x00,%zmm3,%zmm0
+ vpshufd $0x55,%zmm3,%zmm1
+ vpshufd $0xaa,%zmm3,%zmm2
+ vpshufd $0xff,%zmm3,%zmm3
+ vmovdqa64 %zmm0,%zmm16
+ vmovdqa64 %zmm1,%zmm17
+ vmovdqa64 %zmm2,%zmm18
+ vmovdqa64 %zmm3,%zmm19
+
+ vpshufd $0x00,%zmm7,%zmm4
+ vpshufd $0x55,%zmm7,%zmm5
+ vpshufd $0xaa,%zmm7,%zmm6
+ vpshufd $0xff,%zmm7,%zmm7
+ vmovdqa64 %zmm4,%zmm20
+ vmovdqa64 %zmm5,%zmm21
+ vmovdqa64 %zmm6,%zmm22
+ vmovdqa64 %zmm7,%zmm23
+
+ vpshufd $0x00,%zmm11,%zmm8
+ vpshufd $0x55,%zmm11,%zmm9
+ vpshufd $0xaa,%zmm11,%zmm10
+ vpshufd $0xff,%zmm11,%zmm11
+ vmovdqa64 %zmm8,%zmm24
+ vmovdqa64 %zmm9,%zmm25
+ vmovdqa64 %zmm10,%zmm26
+ vmovdqa64 %zmm11,%zmm27
+
+ vpshufd $0x00,%zmm15,%zmm12
+ vpshufd $0x55,%zmm15,%zmm13
+ vpshufd $0xaa,%zmm15,%zmm14
+ vpshufd $0xff,%zmm15,%zmm15
+ vpaddd .Lincz(%rip),%zmm12,%zmm12
+ vmovdqa64 %zmm12,%zmm28
+ vmovdqa64 %zmm13,%zmm29
+ vmovdqa64 %zmm14,%zmm30
+ vmovdqa64 %zmm15,%zmm31
+
+ movl $10,%eax
+ jmp .Loop16x
+
+.align 32
+.Loop_outer16x:
+ vpbroadcastd 0(%r9),%zmm0
+ vpbroadcastd 4(%r9),%zmm1
+ vpbroadcastd 8(%r9),%zmm2
+ vpbroadcastd 12(%r9),%zmm3
+ vpaddd .Lsixteen(%rip),%zmm28,%zmm28
+ vmovdqa64 %zmm20,%zmm4
+ vmovdqa64 %zmm21,%zmm5
+ vmovdqa64 %zmm22,%zmm6
+ vmovdqa64 %zmm23,%zmm7
+ vmovdqa64 %zmm24,%zmm8
+ vmovdqa64 %zmm25,%zmm9
+ vmovdqa64 %zmm26,%zmm10
+ vmovdqa64 %zmm27,%zmm11
+ vmovdqa64 %zmm28,%zmm12
+ vmovdqa64 %zmm29,%zmm13
+ vmovdqa64 %zmm30,%zmm14
+ vmovdqa64 %zmm31,%zmm15
+
+ vmovdqa64 %zmm0,%zmm16
+ vmovdqa64 %zmm1,%zmm17
+ vmovdqa64 %zmm2,%zmm18
+ vmovdqa64 %zmm3,%zmm19
+
+ movl $10,%eax
+ jmp .Loop16x
+
+.align 32
+.Loop16x:
+ vpaddd %zmm4,%zmm0,%zmm0
+ vpaddd %zmm5,%zmm1,%zmm1
+ vpaddd %zmm6,%zmm2,%zmm2
+ vpaddd %zmm7,%zmm3,%zmm3
+ vpxord %zmm0,%zmm12,%zmm12
+ vpxord %zmm1,%zmm13,%zmm13
+ vpxord %zmm2,%zmm14,%zmm14
+ vpxord %zmm3,%zmm15,%zmm15
+ vprold $16,%zmm12,%zmm12
+ vprold $16,%zmm13,%zmm13
+ vprold $16,%zmm14,%zmm14
+ vprold $16,%zmm15,%zmm15
+ vpaddd %zmm12,%zmm8,%zmm8
+ vpaddd %zmm13,%zmm9,%zmm9
+ vpaddd %zmm14,%zmm10,%zmm10
+ vpaddd %zmm15,%zmm11,%zmm11
+ vpxord %zmm8,%zmm4,%zmm4
+ vpxord %zmm9,%zmm5,%zmm5
+ vpxord %zmm10,%zmm6,%zmm6
+ vpxord %zmm11,%zmm7,%zmm7
+ vprold $12,%zmm4,%zmm4
+ vprold $12,%zmm5,%zmm5
+ vprold $12,%zmm6,%zmm6
+ vprold $12,%zmm7,%zmm7
+ vpaddd %zmm4,%zmm0,%zmm0
+ vpaddd %zmm5,%zmm1,%zmm1
+ vpaddd %zmm6,%zmm2,%zmm2
+ vpaddd %zmm7,%zmm3,%zmm3
+ vpxord %zmm0,%zmm12,%zmm12
+ vpxord %zmm1,%zmm13,%zmm13
+ vpxord %zmm2,%zmm14,%zmm14
+ vpxord %zmm3,%zmm15,%zmm15
+ vprold $8,%zmm12,%zmm12
+ vprold $8,%zmm13,%zmm13
+ vprold $8,%zmm14,%zmm14
+ vprold $8,%zmm15,%zmm15
+ vpaddd %zmm12,%zmm8,%zmm8
+ vpaddd %zmm13,%zmm9,%zmm9
+ vpaddd %zmm14,%zmm10,%zmm10
+ vpaddd %zmm15,%zmm11,%zmm11
+ vpxord %zmm8,%zmm4,%zmm4
+ vpxord %zmm9,%zmm5,%zmm5
+ vpxord %zmm10,%zmm6,%zmm6
+ vpxord %zmm11,%zmm7,%zmm7
+ vprold $7,%zmm4,%zmm4
+ vprold $7,%zmm5,%zmm5
+ vprold $7,%zmm6,%zmm6
+ vprold $7,%zmm7,%zmm7
+ vpaddd %zmm5,%zmm0,%zmm0
+ vpaddd %zmm6,%zmm1,%zmm1
+ vpaddd %zmm7,%zmm2,%zmm2
+ vpaddd %zmm4,%zmm3,%zmm3
+ vpxord %zmm0,%zmm15,%zmm15
+ vpxord %zmm1,%zmm12,%zmm12
+ vpxord %zmm2,%zmm13,%zmm13
+ vpxord %zmm3,%zmm14,%zmm14
+ vprold $16,%zmm15,%zmm15
+ vprold $16,%zmm12,%zmm12
+ vprold $16,%zmm13,%zmm13
+ vprold $16,%zmm14,%zmm14
+ vpaddd %zmm15,%zmm10,%zmm10
+ vpaddd %zmm12,%zmm11,%zmm11
+ vpaddd %zmm13,%zmm8,%zmm8
+ vpaddd %zmm14,%zmm9,%zmm9
+ vpxord %zmm10,%zmm5,%zmm5
+ vpxord %zmm11,%zmm6,%zmm6
+ vpxord %zmm8,%zmm7,%zmm7
+ vpxord %zmm9,%zmm4,%zmm4
+ vprold $12,%zmm5,%zmm5
+ vprold $12,%zmm6,%zmm6
+ vprold $12,%zmm7,%zmm7
+ vprold $12,%zmm4,%zmm4
+ vpaddd %zmm5,%zmm0,%zmm0
+ vpaddd %zmm6,%zmm1,%zmm1
+ vpaddd %zmm7,%zmm2,%zmm2
+ vpaddd %zmm4,%zmm3,%zmm3
+ vpxord %zmm0,%zmm15,%zmm15
+ vpxord %zmm1,%zmm12,%zmm12
+ vpxord %zmm2,%zmm13,%zmm13
+ vpxord %zmm3,%zmm14,%zmm14
+ vprold $8,%zmm15,%zmm15
+ vprold $8,%zmm12,%zmm12
+ vprold $8,%zmm13,%zmm13
+ vprold $8,%zmm14,%zmm14
+ vpaddd %zmm15,%zmm10,%zmm10
+ vpaddd %zmm12,%zmm11,%zmm11
+ vpaddd %zmm13,%zmm8,%zmm8
+ vpaddd %zmm14,%zmm9,%zmm9
+ vpxord %zmm10,%zmm5,%zmm5
+ vpxord %zmm11,%zmm6,%zmm6
+ vpxord %zmm8,%zmm7,%zmm7
+ vpxord %zmm9,%zmm4,%zmm4
+ vprold $7,%zmm5,%zmm5
+ vprold $7,%zmm6,%zmm6
+ vprold $7,%zmm7,%zmm7
+ vprold $7,%zmm4,%zmm4
+ decl %eax
+ jnz .Loop16x
+
+ vpaddd %zmm16,%zmm0,%zmm0
+ vpaddd %zmm17,%zmm1,%zmm1
+ vpaddd %zmm18,%zmm2,%zmm2
+ vpaddd %zmm19,%zmm3,%zmm3
+
+ vpunpckldq %zmm1,%zmm0,%zmm18
+ vpunpckldq %zmm3,%zmm2,%zmm19
+ vpunpckhdq %zmm1,%zmm0,%zmm0
+ vpunpckhdq %zmm3,%zmm2,%zmm2
+ vpunpcklqdq %zmm19,%zmm18,%zmm1
+ vpunpckhqdq %zmm19,%zmm18,%zmm18
+ vpunpcklqdq %zmm2,%zmm0,%zmm3
+ vpunpckhqdq %zmm2,%zmm0,%zmm0
+ vpaddd %zmm20,%zmm4,%zmm4
+ vpaddd %zmm21,%zmm5,%zmm5
+ vpaddd %zmm22,%zmm6,%zmm6
+ vpaddd %zmm23,%zmm7,%zmm7
+
+ vpunpckldq %zmm5,%zmm4,%zmm2
+ vpunpckldq %zmm7,%zmm6,%zmm19
+ vpunpckhdq %zmm5,%zmm4,%zmm4
+ vpunpckhdq %zmm7,%zmm6,%zmm6
+ vpunpcklqdq %zmm19,%zmm2,%zmm5
+ vpunpckhqdq %zmm19,%zmm2,%zmm2
+ vpunpcklqdq %zmm6,%zmm4,%zmm7
+ vpunpckhqdq %zmm6,%zmm4,%zmm4
+ vshufi32x4 $0x44,%zmm5,%zmm1,%zmm19
+ vshufi32x4 $0xee,%zmm5,%zmm1,%zmm5
+ vshufi32x4 $0x44,%zmm2,%zmm18,%zmm1
+ vshufi32x4 $0xee,%zmm2,%zmm18,%zmm2
+ vshufi32x4 $0x44,%zmm7,%zmm3,%zmm18
+ vshufi32x4 $0xee,%zmm7,%zmm3,%zmm7
+ vshufi32x4 $0x44,%zmm4,%zmm0,%zmm3
+ vshufi32x4 $0xee,%zmm4,%zmm0,%zmm4
+ vpaddd %zmm24,%zmm8,%zmm8
+ vpaddd %zmm25,%zmm9,%zmm9
+ vpaddd %zmm26,%zmm10,%zmm10
+ vpaddd %zmm27,%zmm11,%zmm11
+
+ vpunpckldq %zmm9,%zmm8,%zmm6
+ vpunpckldq %zmm11,%zmm10,%zmm0
+ vpunpckhdq %zmm9,%zmm8,%zmm8
+ vpunpckhdq %zmm11,%zmm10,%zmm10
+ vpunpcklqdq %zmm0,%zmm6,%zmm9
+ vpunpckhqdq %zmm0,%zmm6,%zmm6
+ vpunpcklqdq %zmm10,%zmm8,%zmm11
+ vpunpckhqdq %zmm10,%zmm8,%zmm8
+ vpaddd %zmm28,%zmm12,%zmm12
+ vpaddd %zmm29,%zmm13,%zmm13
+ vpaddd %zmm30,%zmm14,%zmm14
+ vpaddd %zmm31,%zmm15,%zmm15
+
+ vpunpckldq %zmm13,%zmm12,%zmm10
+ vpunpckldq %zmm15,%zmm14,%zmm0
+ vpunpckhdq %zmm13,%zmm12,%zmm12
+ vpunpckhdq %zmm15,%zmm14,%zmm14
+ vpunpcklqdq %zmm0,%zmm10,%zmm13
+ vpunpckhqdq %zmm0,%zmm10,%zmm10
+ vpunpcklqdq %zmm14,%zmm12,%zmm15
+ vpunpckhqdq %zmm14,%zmm12,%zmm12
+ vshufi32x4 $0x44,%zmm13,%zmm9,%zmm0
+ vshufi32x4 $0xee,%zmm13,%zmm9,%zmm13
+ vshufi32x4 $0x44,%zmm10,%zmm6,%zmm9
+ vshufi32x4 $0xee,%zmm10,%zmm6,%zmm10
+ vshufi32x4 $0x44,%zmm15,%zmm11,%zmm6
+ vshufi32x4 $0xee,%zmm15,%zmm11,%zmm15
+ vshufi32x4 $0x44,%zmm12,%zmm8,%zmm11
+ vshufi32x4 $0xee,%zmm12,%zmm8,%zmm12
+ vshufi32x4 $0x88,%zmm0,%zmm19,%zmm16
+ vshufi32x4 $0xdd,%zmm0,%zmm19,%zmm19
+ vshufi32x4 $0x88,%zmm13,%zmm5,%zmm0
+ vshufi32x4 $0xdd,%zmm13,%zmm5,%zmm13
+ vshufi32x4 $0x88,%zmm9,%zmm1,%zmm17
+ vshufi32x4 $0xdd,%zmm9,%zmm1,%zmm1
+ vshufi32x4 $0x88,%zmm10,%zmm2,%zmm9
+ vshufi32x4 $0xdd,%zmm10,%zmm2,%zmm10
+ vshufi32x4 $0x88,%zmm6,%zmm18,%zmm14
+ vshufi32x4 $0xdd,%zmm6,%zmm18,%zmm18
+ vshufi32x4 $0x88,%zmm15,%zmm7,%zmm6
+ vshufi32x4 $0xdd,%zmm15,%zmm7,%zmm15
+ vshufi32x4 $0x88,%zmm11,%zmm3,%zmm8
+ vshufi32x4 $0xdd,%zmm11,%zmm3,%zmm3
+ vshufi32x4 $0x88,%zmm12,%zmm4,%zmm11
+ vshufi32x4 $0xdd,%zmm12,%zmm4,%zmm12
+ cmpq $1024,%rdx
+ jb .Ltail16x
+
+ vpxord 0(%rsi),%zmm16,%zmm16
+ vpxord 64(%rsi),%zmm17,%zmm17
+ vpxord 128(%rsi),%zmm14,%zmm14
+ vpxord 192(%rsi),%zmm8,%zmm8
+ vmovdqu32 %zmm16,0(%rdi)
+ vmovdqu32 %zmm17,64(%rdi)
+ vmovdqu32 %zmm14,128(%rdi)
+ vmovdqu32 %zmm8,192(%rdi)
+
+ vpxord 256(%rsi),%zmm19,%zmm19
+ vpxord 320(%rsi),%zmm1,%zmm1
+ vpxord 384(%rsi),%zmm18,%zmm18
+ vpxord 448(%rsi),%zmm3,%zmm3
+ vmovdqu32 %zmm19,256(%rdi)
+ vmovdqu32 %zmm1,320(%rdi)
+ vmovdqu32 %zmm18,384(%rdi)
+ vmovdqu32 %zmm3,448(%rdi)
+
+ vpxord 512(%rsi),%zmm0,%zmm0
+ vpxord 576(%rsi),%zmm9,%zmm9
+ vpxord 640(%rsi),%zmm6,%zmm6
+ vpxord 704(%rsi),%zmm11,%zmm11
+ vmovdqu32 %zmm0,512(%rdi)
+ vmovdqu32 %zmm9,576(%rdi)
+ vmovdqu32 %zmm6,640(%rdi)
+ vmovdqu32 %zmm11,704(%rdi)
+
+ vpxord 768(%rsi),%zmm13,%zmm13
+ vpxord 832(%rsi),%zmm10,%zmm10
+ vpxord 896(%rsi),%zmm15,%zmm15
+ vpxord 960(%rsi),%zmm12,%zmm12
+ leaq 1024(%rsi),%rsi
+ vmovdqu32 %zmm13,768(%rdi)
+ vmovdqu32 %zmm10,832(%rdi)
+ vmovdqu32 %zmm15,896(%rdi)
+ vmovdqu32 %zmm12,960(%rdi)
+ leaq 1024(%rdi),%rdi
+
+ subq $1024,%rdx
+ jnz .Loop_outer16x
+
+ jmp .Ldone16x
+
+.align 32
+.Ltail16x:
+ xorq %r9,%r9
+ subq %rsi,%rdi
+ cmpq $64,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm16,%zmm16
+ vmovdqu32 %zmm16,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm17,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $128,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm17,%zmm17
+ vmovdqu32 %zmm17,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm14,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $192,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm14,%zmm14
+ vmovdqu32 %zmm14,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm8,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $256,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm8,%zmm8
+ vmovdqu32 %zmm8,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm19,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $320,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm19,%zmm19
+ vmovdqu32 %zmm19,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm1,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $384,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm1,%zmm1
+ vmovdqu32 %zmm1,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm18,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $448,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm18,%zmm18
+ vmovdqu32 %zmm18,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm3,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $512,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm3,%zmm3
+ vmovdqu32 %zmm3,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm0,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $576,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm0,%zmm0
+ vmovdqu32 %zmm0,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm9,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $640,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm9,%zmm9
+ vmovdqu32 %zmm9,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm6,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $704,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm6,%zmm6
+ vmovdqu32 %zmm6,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm11,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $768,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm11,%zmm11
+ vmovdqu32 %zmm11,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm13,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $832,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm13,%zmm13
+ vmovdqu32 %zmm13,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm10,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $896,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm10,%zmm10
+ vmovdqu32 %zmm10,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm15,%zmm16
+ leaq 64(%rsi),%rsi
+
+ cmpq $960,%rdx
+ jb .Less_than_64_16x
+ vpxord (%rsi),%zmm15,%zmm15
+ vmovdqu32 %zmm15,(%rdi,%rsi,1)
+ je .Ldone16x
+ vmovdqa32 %zmm12,%zmm16
+ leaq 64(%rsi),%rsi
+
+.Less_than_64_16x:
+ vmovdqa32 %zmm16,0(%rsp)
+ leaq (%rdi,%rsi,1),%rdi
+ andq $63,%rdx
+
+.Loop_tail16x:
+ movzbl (%rsi,%r9,1),%eax
+ movzbl (%rsp,%r9,1),%ecx
+ leaq 1(%r9),%r9
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r9,1)
+ decq %rdx
+ jnz .Loop_tail16x
+
+ vpxord %zmm16,%zmm16,%zmm16
+ vmovdqa32 %zmm16,0(%rsp)
+
+.Ldone16x:
+ vzeroall
+ leaq -8(%r10),%rsp
+
+.L16x_epilogue:
+ ret
+ENDPROC(chacha20_avx512)
+
+.align 32
+ENTRY(chacha20_avx512vl)
+ cmpq $0,%rdx
+ je .Lavx512vl_epilogue
+
+ leaq 8(%rsp),%r10
+
+ cmpq $128,%rdx
+ ja .Lchacha20_8xvl
+
+ subq $64+8,%rsp
+ andq $-64,%rsp
+ vbroadcasti128 .Lsigma(%rip),%ymm0
+ vbroadcasti128 (%rcx),%ymm1
+ vbroadcasti128 16(%rcx),%ymm2
+ vbroadcasti128 (%r8),%ymm3
+
+ vmovdqa32 %ymm0,%ymm16
+ vmovdqa32 %ymm1,%ymm17
+ vmovdqa32 %ymm2,%ymm18
+ vpaddd .Lzeroz(%rip),%ymm3,%ymm3
+ vmovdqa32 .Ltwoy(%rip),%ymm20
+ movq $10,%r8
+ vmovdqa32 %ymm3,%ymm19
+ jmp .Loop_avx512vl
+
+.align 16
+.Loop_outer_avx512vl:
+ vmovdqa32 %ymm18,%ymm2
+ vpaddd %ymm20,%ymm19,%ymm3
+ movq $10,%r8
+ vmovdqa32 %ymm3,%ymm19
+ jmp .Loop_avx512vl
+
+.align 32
+.Loop_avx512vl:
+ vpaddd %ymm1,%ymm0,%ymm0
+ vpxor %ymm0,%ymm3,%ymm3
+ vprold $16,%ymm3,%ymm3
+ vpaddd %ymm3,%ymm2,%ymm2
+ vpxor %ymm2,%ymm1,%ymm1
+ vprold $12,%ymm1,%ymm1
+ vpaddd %ymm1,%ymm0,%ymm0
+ vpxor %ymm0,%ymm3,%ymm3
+ vprold $8,%ymm3,%ymm3
+ vpaddd %ymm3,%ymm2,%ymm2
+ vpxor %ymm2,%ymm1,%ymm1
+ vprold $7,%ymm1,%ymm1
+ vpshufd $78,%ymm2,%ymm2
+ vpshufd $57,%ymm1,%ymm1
+ vpshufd $147,%ymm3,%ymm3
+ vpaddd %ymm1,%ymm0,%ymm0
+ vpxor %ymm0,%ymm3,%ymm3
+ vprold $16,%ymm3,%ymm3
+ vpaddd %ymm3,%ymm2,%ymm2
+ vpxor %ymm2,%ymm1,%ymm1
+ vprold $12,%ymm1,%ymm1
+ vpaddd %ymm1,%ymm0,%ymm0
+ vpxor %ymm0,%ymm3,%ymm3
+ vprold $8,%ymm3,%ymm3
+ vpaddd %ymm3,%ymm2,%ymm2
+ vpxor %ymm2,%ymm1,%ymm1
+ vprold $7,%ymm1,%ymm1
+ vpshufd $78,%ymm2,%ymm2
+ vpshufd $147,%ymm1,%ymm1
+ vpshufd $57,%ymm3,%ymm3
+ decq %r8
+ jnz .Loop_avx512vl
+ vpaddd %ymm16,%ymm0,%ymm0
+ vpaddd %ymm17,%ymm1,%ymm1
+ vpaddd %ymm18,%ymm2,%ymm2
+ vpaddd %ymm19,%ymm3,%ymm3
+
+ subq $64,%rdx
+ jb .Ltail64_avx512vl
+
+ vpxor 0(%rsi),%xmm0,%xmm4
+ vpxor 16(%rsi),%xmm1,%xmm5
+ vpxor 32(%rsi),%xmm2,%xmm6
+ vpxor 48(%rsi),%xmm3,%xmm7
+ leaq 64(%rsi),%rsi
+
+ vmovdqu %xmm4,0(%rdi)
+ vmovdqu %xmm5,16(%rdi)
+ vmovdqu %xmm6,32(%rdi)
+ vmovdqu %xmm7,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ jz .Ldone_avx512vl
+
+ vextracti128 $1,%ymm0,%xmm4
+ vextracti128 $1,%ymm1,%xmm5
+ vextracti128 $1,%ymm2,%xmm6
+ vextracti128 $1,%ymm3,%xmm7
+
+ subq $64,%rdx
+ jb .Ltail_avx512vl
+
+ vpxor 0(%rsi),%xmm4,%xmm4
+ vpxor 16(%rsi),%xmm5,%xmm5
+ vpxor 32(%rsi),%xmm6,%xmm6
+ vpxor 48(%rsi),%xmm7,%xmm7
+ leaq 64(%rsi),%rsi
+
+ vmovdqu %xmm4,0(%rdi)
+ vmovdqu %xmm5,16(%rdi)
+ vmovdqu %xmm6,32(%rdi)
+ vmovdqu %xmm7,48(%rdi)
+ leaq 64(%rdi),%rdi
+
+ vmovdqa32 %ymm16,%ymm0
+ vmovdqa32 %ymm17,%ymm1
+ jnz .Loop_outer_avx512vl
+
+ jmp .Ldone_avx512vl
+
+.align 16
+.Ltail64_avx512vl:
+ vmovdqa %xmm0,0(%rsp)
+ vmovdqa %xmm1,16(%rsp)
+ vmovdqa %xmm2,32(%rsp)
+ vmovdqa %xmm3,48(%rsp)
+ addq $64,%rdx
+ jmp .Loop_tail_avx512vl
+
+.align 16
+.Ltail_avx512vl:
+ vmovdqa %xmm4,0(%rsp)
+ vmovdqa %xmm5,16(%rsp)
+ vmovdqa %xmm6,32(%rsp)
+ vmovdqa %xmm7,48(%rsp)
+ addq $64,%rdx
+
+.Loop_tail_avx512vl:
+ movzbl (%rsi,%r8,1),%eax
+ movzbl (%rsp,%r8,1),%ecx
+ leaq 1(%r8),%r8
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r8,1)
+ decq %rdx
+ jnz .Loop_tail_avx512vl
+
+ vmovdqa32 %ymm16,0(%rsp)
+ vmovdqa32 %ymm16,32(%rsp)
+
+.Ldone_avx512vl:
+ vzeroall
+ leaq -8(%r10),%rsp
+.Lavx512vl_epilogue:
+ ret
+
+.align 32
+.Lchacha20_8xvl:
+ leaq 8(%rsp),%r10
+ subq $64+8,%rsp
+ andq $-64,%rsp
+ vzeroupper
+
+ leaq .Lsigma(%rip),%r9
+ vbroadcasti128 (%r9),%ymm3
+ vbroadcasti128 (%rcx),%ymm7
+ vbroadcasti128 16(%rcx),%ymm11
+ vbroadcasti128 (%r8),%ymm15
+
+ vpshufd $0x00,%ymm3,%ymm0
+ vpshufd $0x55,%ymm3,%ymm1
+ vpshufd $0xaa,%ymm3,%ymm2
+ vpshufd $0xff,%ymm3,%ymm3
+ vmovdqa64 %ymm0,%ymm16
+ vmovdqa64 %ymm1,%ymm17
+ vmovdqa64 %ymm2,%ymm18
+ vmovdqa64 %ymm3,%ymm19
+
+ vpshufd $0x00,%ymm7,%ymm4
+ vpshufd $0x55,%ymm7,%ymm5
+ vpshufd $0xaa,%ymm7,%ymm6
+ vpshufd $0xff,%ymm7,%ymm7
+ vmovdqa64 %ymm4,%ymm20
+ vmovdqa64 %ymm5,%ymm21
+ vmovdqa64 %ymm6,%ymm22
+ vmovdqa64 %ymm7,%ymm23
+
+ vpshufd $0x00,%ymm11,%ymm8
+ vpshufd $0x55,%ymm11,%ymm9
+ vpshufd $0xaa,%ymm11,%ymm10
+ vpshufd $0xff,%ymm11,%ymm11
+ vmovdqa64 %ymm8,%ymm24
+ vmovdqa64 %ymm9,%ymm25
+ vmovdqa64 %ymm10,%ymm26
+ vmovdqa64 %ymm11,%ymm27
+
+ vpshufd $0x00,%ymm15,%ymm12
+ vpshufd $0x55,%ymm15,%ymm13
+ vpshufd $0xaa,%ymm15,%ymm14
+ vpshufd $0xff,%ymm15,%ymm15
+ vpaddd .Lincy(%rip),%ymm12,%ymm12
+ vmovdqa64 %ymm12,%ymm28
+ vmovdqa64 %ymm13,%ymm29
+ vmovdqa64 %ymm14,%ymm30
+ vmovdqa64 %ymm15,%ymm31
+
+ movl $10,%eax
+ jmp .Loop8xvl
+
+.align 32
+.Loop_outer8xvl:
+
+
+ vpbroadcastd 8(%r9),%ymm2
+ vpbroadcastd 12(%r9),%ymm3
+ vpaddd .Leight(%rip),%ymm28,%ymm28
+ vmovdqa64 %ymm20,%ymm4
+ vmovdqa64 %ymm21,%ymm5
+ vmovdqa64 %ymm22,%ymm6
+ vmovdqa64 %ymm23,%ymm7
+ vmovdqa64 %ymm24,%ymm8
+ vmovdqa64 %ymm25,%ymm9
+ vmovdqa64 %ymm26,%ymm10
+ vmovdqa64 %ymm27,%ymm11
+ vmovdqa64 %ymm28,%ymm12
+ vmovdqa64 %ymm29,%ymm13
+ vmovdqa64 %ymm30,%ymm14
+ vmovdqa64 %ymm31,%ymm15
+
+ vmovdqa64 %ymm0,%ymm16
+ vmovdqa64 %ymm1,%ymm17
+ vmovdqa64 %ymm2,%ymm18
+ vmovdqa64 %ymm3,%ymm19
+
+ movl $10,%eax
+ jmp .Loop8xvl
+
+.align 32
+.Loop8xvl:
+ vpaddd %ymm4,%ymm0,%ymm0
+ vpaddd %ymm5,%ymm1,%ymm1
+ vpaddd %ymm6,%ymm2,%ymm2
+ vpaddd %ymm7,%ymm3,%ymm3
+ vpxor %ymm0,%ymm12,%ymm12
+ vpxor %ymm1,%ymm13,%ymm13
+ vpxor %ymm2,%ymm14,%ymm14
+ vpxor %ymm3,%ymm15,%ymm15
+ vprold $16,%ymm12,%ymm12
+ vprold $16,%ymm13,%ymm13
+ vprold $16,%ymm14,%ymm14
+ vprold $16,%ymm15,%ymm15
+ vpaddd %ymm12,%ymm8,%ymm8
+ vpaddd %ymm13,%ymm9,%ymm9
+ vpaddd %ymm14,%ymm10,%ymm10
+ vpaddd %ymm15,%ymm11,%ymm11
+ vpxor %ymm8,%ymm4,%ymm4
+ vpxor %ymm9,%ymm5,%ymm5
+ vpxor %ymm10,%ymm6,%ymm6
+ vpxor %ymm11,%ymm7,%ymm7
+ vprold $12,%ymm4,%ymm4
+ vprold $12,%ymm5,%ymm5
+ vprold $12,%ymm6,%ymm6
+ vprold $12,%ymm7,%ymm7
+ vpaddd %ymm4,%ymm0,%ymm0
+ vpaddd %ymm5,%ymm1,%ymm1
+ vpaddd %ymm6,%ymm2,%ymm2
+ vpaddd %ymm7,%ymm3,%ymm3
+ vpxor %ymm0,%ymm12,%ymm12
+ vpxor %ymm1,%ymm13,%ymm13
+ vpxor %ymm2,%ymm14,%ymm14
+ vpxor %ymm3,%ymm15,%ymm15
+ vprold $8,%ymm12,%ymm12
+ vprold $8,%ymm13,%ymm13
+ vprold $8,%ymm14,%ymm14
+ vprold $8,%ymm15,%ymm15
+ vpaddd %ymm12,%ymm8,%ymm8
+ vpaddd %ymm13,%ymm9,%ymm9
+ vpaddd %ymm14,%ymm10,%ymm10
+ vpaddd %ymm15,%ymm11,%ymm11
+ vpxor %ymm8,%ymm4,%ymm4
+ vpxor %ymm9,%ymm5,%ymm5
+ vpxor %ymm10,%ymm6,%ymm6
+ vpxor %ymm11,%ymm7,%ymm7
+ vprold $7,%ymm4,%ymm4
+ vprold $7,%ymm5,%ymm5
+ vprold $7,%ymm6,%ymm6
+ vprold $7,%ymm7,%ymm7
+ vpaddd %ymm5,%ymm0,%ymm0
+ vpaddd %ymm6,%ymm1,%ymm1
+ vpaddd %ymm7,%ymm2,%ymm2
+ vpaddd %ymm4,%ymm3,%ymm3
+ vpxor %ymm0,%ymm15,%ymm15
+ vpxor %ymm1,%ymm12,%ymm12
+ vpxor %ymm2,%ymm13,%ymm13
+ vpxor %ymm3,%ymm14,%ymm14
+ vprold $16,%ymm15,%ymm15
+ vprold $16,%ymm12,%ymm12
+ vprold $16,%ymm13,%ymm13
+ vprold $16,%ymm14,%ymm14
+ vpaddd %ymm15,%ymm10,%ymm10
+ vpaddd %ymm12,%ymm11,%ymm11
+ vpaddd %ymm13,%ymm8,%ymm8
+ vpaddd %ymm14,%ymm9,%ymm9
+ vpxor %ymm10,%ymm5,%ymm5
+ vpxor %ymm11,%ymm6,%ymm6
+ vpxor %ymm8,%ymm7,%ymm7
+ vpxor %ymm9,%ymm4,%ymm4
+ vprold $12,%ymm5,%ymm5
+ vprold $12,%ymm6,%ymm6
+ vprold $12,%ymm7,%ymm7
+ vprold $12,%ymm4,%ymm4
+ vpaddd %ymm5,%ymm0,%ymm0
+ vpaddd %ymm6,%ymm1,%ymm1
+ vpaddd %ymm7,%ymm2,%ymm2
+ vpaddd %ymm4,%ymm3,%ymm3
+ vpxor %ymm0,%ymm15,%ymm15
+ vpxor %ymm1,%ymm12,%ymm12
+ vpxor %ymm2,%ymm13,%ymm13
+ vpxor %ymm3,%ymm14,%ymm14
+ vprold $8,%ymm15,%ymm15
+ vprold $8,%ymm12,%ymm12
+ vprold $8,%ymm13,%ymm13
+ vprold $8,%ymm14,%ymm14
+ vpaddd %ymm15,%ymm10,%ymm10
+ vpaddd %ymm12,%ymm11,%ymm11
+ vpaddd %ymm13,%ymm8,%ymm8
+ vpaddd %ymm14,%ymm9,%ymm9
+ vpxor %ymm10,%ymm5,%ymm5
+ vpxor %ymm11,%ymm6,%ymm6
+ vpxor %ymm8,%ymm7,%ymm7
+ vpxor %ymm9,%ymm4,%ymm4
+ vprold $7,%ymm5,%ymm5
+ vprold $7,%ymm6,%ymm6
+ vprold $7,%ymm7,%ymm7
+ vprold $7,%ymm4,%ymm4
+ decl %eax
+ jnz .Loop8xvl
+
+ vpaddd %ymm16,%ymm0,%ymm0
+ vpaddd %ymm17,%ymm1,%ymm1
+ vpaddd %ymm18,%ymm2,%ymm2
+ vpaddd %ymm19,%ymm3,%ymm3
+
+ vpunpckldq %ymm1,%ymm0,%ymm18
+ vpunpckldq %ymm3,%ymm2,%ymm19
+ vpunpckhdq %ymm1,%ymm0,%ymm0
+ vpunpckhdq %ymm3,%ymm2,%ymm2
+ vpunpcklqdq %ymm19,%ymm18,%ymm1
+ vpunpckhqdq %ymm19,%ymm18,%ymm18
+ vpunpcklqdq %ymm2,%ymm0,%ymm3
+ vpunpckhqdq %ymm2,%ymm0,%ymm0
+ vpaddd %ymm20,%ymm4,%ymm4
+ vpaddd %ymm21,%ymm5,%ymm5
+ vpaddd %ymm22,%ymm6,%ymm6
+ vpaddd %ymm23,%ymm7,%ymm7
+
+ vpunpckldq %ymm5,%ymm4,%ymm2
+ vpunpckldq %ymm7,%ymm6,%ymm19
+ vpunpckhdq %ymm5,%ymm4,%ymm4
+ vpunpckhdq %ymm7,%ymm6,%ymm6
+ vpunpcklqdq %ymm19,%ymm2,%ymm5
+ vpunpckhqdq %ymm19,%ymm2,%ymm2
+ vpunpcklqdq %ymm6,%ymm4,%ymm7
+ vpunpckhqdq %ymm6,%ymm4,%ymm4
+ vshufi32x4 $0,%ymm5,%ymm1,%ymm19
+ vshufi32x4 $3,%ymm5,%ymm1,%ymm5
+ vshufi32x4 $0,%ymm2,%ymm18,%ymm1
+ vshufi32x4 $3,%ymm2,%ymm18,%ymm2
+ vshufi32x4 $0,%ymm7,%ymm3,%ymm18
+ vshufi32x4 $3,%ymm7,%ymm3,%ymm7
+ vshufi32x4 $0,%ymm4,%ymm0,%ymm3
+ vshufi32x4 $3,%ymm4,%ymm0,%ymm4
+ vpaddd %ymm24,%ymm8,%ymm8
+ vpaddd %ymm25,%ymm9,%ymm9
+ vpaddd %ymm26,%ymm10,%ymm10
+ vpaddd %ymm27,%ymm11,%ymm11
+
+ vpunpckldq %ymm9,%ymm8,%ymm6
+ vpunpckldq %ymm11,%ymm10,%ymm0
+ vpunpckhdq %ymm9,%ymm8,%ymm8
+ vpunpckhdq %ymm11,%ymm10,%ymm10
+ vpunpcklqdq %ymm0,%ymm6,%ymm9
+ vpunpckhqdq %ymm0,%ymm6,%ymm6
+ vpunpcklqdq %ymm10,%ymm8,%ymm11
+ vpunpckhqdq %ymm10,%ymm8,%ymm8
+ vpaddd %ymm28,%ymm12,%ymm12
+ vpaddd %ymm29,%ymm13,%ymm13
+ vpaddd %ymm30,%ymm14,%ymm14
+ vpaddd %ymm31,%ymm15,%ymm15
+
+ vpunpckldq %ymm13,%ymm12,%ymm10
+ vpunpckldq %ymm15,%ymm14,%ymm0
+ vpunpckhdq %ymm13,%ymm12,%ymm12
+ vpunpckhdq %ymm15,%ymm14,%ymm14
+ vpunpcklqdq %ymm0,%ymm10,%ymm13
+ vpunpckhqdq %ymm0,%ymm10,%ymm10
+ vpunpcklqdq %ymm14,%ymm12,%ymm15
+ vpunpckhqdq %ymm14,%ymm12,%ymm12
+ vperm2i128 $0x20,%ymm13,%ymm9,%ymm0
+ vperm2i128 $0x31,%ymm13,%ymm9,%ymm13
+ vperm2i128 $0x20,%ymm10,%ymm6,%ymm9
+ vperm2i128 $0x31,%ymm10,%ymm6,%ymm10
+ vperm2i128 $0x20,%ymm15,%ymm11,%ymm6
+ vperm2i128 $0x31,%ymm15,%ymm11,%ymm15
+ vperm2i128 $0x20,%ymm12,%ymm8,%ymm11
+ vperm2i128 $0x31,%ymm12,%ymm8,%ymm12
+ cmpq $512,%rdx
+ jb .Ltail8xvl
+
+ movl $0x80,%eax
+ vpxord 0(%rsi),%ymm19,%ymm19
+ vpxor 32(%rsi),%ymm0,%ymm0
+ vpxor 64(%rsi),%ymm5,%ymm5
+ vpxor 96(%rsi),%ymm13,%ymm13
+ leaq (%rsi,%rax,1),%rsi
+ vmovdqu32 %ymm19,0(%rdi)
+ vmovdqu %ymm0,32(%rdi)
+ vmovdqu %ymm5,64(%rdi)
+ vmovdqu %ymm13,96(%rdi)
+ leaq (%rdi,%rax,1),%rdi
+
+ vpxor 0(%rsi),%ymm1,%ymm1
+ vpxor 32(%rsi),%ymm9,%ymm9
+ vpxor 64(%rsi),%ymm2,%ymm2
+ vpxor 96(%rsi),%ymm10,%ymm10
+ leaq (%rsi,%rax,1),%rsi
+ vmovdqu %ymm1,0(%rdi)
+ vmovdqu %ymm9,32(%rdi)
+ vmovdqu %ymm2,64(%rdi)
+ vmovdqu %ymm10,96(%rdi)
+ leaq (%rdi,%rax,1),%rdi
+
+ vpxord 0(%rsi),%ymm18,%ymm18
+ vpxor 32(%rsi),%ymm6,%ymm6
+ vpxor 64(%rsi),%ymm7,%ymm7
+ vpxor 96(%rsi),%ymm15,%ymm15
+ leaq (%rsi,%rax,1),%rsi
+ vmovdqu32 %ymm18,0(%rdi)
+ vmovdqu %ymm6,32(%rdi)
+ vmovdqu %ymm7,64(%rdi)
+ vmovdqu %ymm15,96(%rdi)
+ leaq (%rdi,%rax,1),%rdi
+
+ vpxor 0(%rsi),%ymm3,%ymm3
+ vpxor 32(%rsi),%ymm11,%ymm11
+ vpxor 64(%rsi),%ymm4,%ymm4
+ vpxor 96(%rsi),%ymm12,%ymm12
+ leaq (%rsi,%rax,1),%rsi
+ vmovdqu %ymm3,0(%rdi)
+ vmovdqu %ymm11,32(%rdi)
+ vmovdqu %ymm4,64(%rdi)
+ vmovdqu %ymm12,96(%rdi)
+ leaq (%rdi,%rax,1),%rdi
+
+ vpbroadcastd 0(%r9),%ymm0
+ vpbroadcastd 4(%r9),%ymm1
+
+ subq $512,%rdx
+ jnz .Loop_outer8xvl
+
+ jmp .Ldone8xvl
+
+.align 32
+.Ltail8xvl:
+ vmovdqa64 %ymm19,%ymm8
+ xorq %r9,%r9
+ subq %rsi,%rdi
+ cmpq $64,%rdx
+ jb .Less_than_64_8xvl
+ vpxor 0(%rsi),%ymm8,%ymm8
+ vpxor 32(%rsi),%ymm0,%ymm0
+ vmovdqu %ymm8,0(%rdi,%rsi,1)
+ vmovdqu %ymm0,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa %ymm5,%ymm8
+ vmovdqa %ymm13,%ymm0
+ leaq 64(%rsi),%rsi
+
+ cmpq $128,%rdx
+ jb .Less_than_64_8xvl
+ vpxor 0(%rsi),%ymm5,%ymm5
+ vpxor 32(%rsi),%ymm13,%ymm13
+ vmovdqu %ymm5,0(%rdi,%rsi,1)
+ vmovdqu %ymm13,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa %ymm1,%ymm8
+ vmovdqa %ymm9,%ymm0
+ leaq 64(%rsi),%rsi
+
+ cmpq $192,%rdx
+ jb .Less_than_64_8xvl
+ vpxor 0(%rsi),%ymm1,%ymm1
+ vpxor 32(%rsi),%ymm9,%ymm9
+ vmovdqu %ymm1,0(%rdi,%rsi,1)
+ vmovdqu %ymm9,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa %ymm2,%ymm8
+ vmovdqa %ymm10,%ymm0
+ leaq 64(%rsi),%rsi
+
+ cmpq $256,%rdx
+ jb .Less_than_64_8xvl
+ vpxor 0(%rsi),%ymm2,%ymm2
+ vpxor 32(%rsi),%ymm10,%ymm10
+ vmovdqu %ymm2,0(%rdi,%rsi,1)
+ vmovdqu %ymm10,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa32 %ymm18,%ymm8
+ vmovdqa %ymm6,%ymm0
+ leaq 64(%rsi),%rsi
+
+ cmpq $320,%rdx
+ jb .Less_than_64_8xvl
+ vpxord 0(%rsi),%ymm18,%ymm18
+ vpxor 32(%rsi),%ymm6,%ymm6
+ vmovdqu32 %ymm18,0(%rdi,%rsi,1)
+ vmovdqu %ymm6,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa %ymm7,%ymm8
+ vmovdqa %ymm15,%ymm0
+ leaq 64(%rsi),%rsi
+
+ cmpq $384,%rdx
+ jb .Less_than_64_8xvl
+ vpxor 0(%rsi),%ymm7,%ymm7
+ vpxor 32(%rsi),%ymm15,%ymm15
+ vmovdqu %ymm7,0(%rdi,%rsi,1)
+ vmovdqu %ymm15,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa %ymm3,%ymm8
+ vmovdqa %ymm11,%ymm0
+ leaq 64(%rsi),%rsi
+
+ cmpq $448,%rdx
+ jb .Less_than_64_8xvl
+ vpxor 0(%rsi),%ymm3,%ymm3
+ vpxor 32(%rsi),%ymm11,%ymm11
+ vmovdqu %ymm3,0(%rdi,%rsi,1)
+ vmovdqu %ymm11,32(%rdi,%rsi,1)
+ je .Ldone8xvl
+ vmovdqa %ymm4,%ymm8
+ vmovdqa %ymm12,%ymm0
+ leaq 64(%rsi),%rsi
+
+.Less_than_64_8xvl:
+ vmovdqa %ymm8,0(%rsp)
+ vmovdqa %ymm0,32(%rsp)
+ leaq (%rdi,%rsi,1),%rdi
+ andq $63,%rdx
+
+.Loop_tail8xvl:
+ movzbl (%rsi,%r9,1),%eax
+ movzbl (%rsp,%r9,1),%ecx
+ leaq 1(%r9),%r9
+ xorl %ecx,%eax
+ movb %al,-1(%rdi,%r9,1)
+ decq %rdx
+ jnz .Loop_tail8xvl
+
+ vpxor %ymm8,%ymm8,%ymm8
+ vmovdqa %ymm8,0(%rsp)
+ vmovdqa %ymm8,32(%rsp)
+
+.Ldone8xvl:
+ vzeroall
+ leaq -8(%r10),%rsp
+.L8xvl_epilogue:
+ ret
+ENDPROC(chacha20_avx512vl)
+
+#endif /* CONFIG_AS_AVX512 */
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
index c82d9fc71f21..4354b874a6a5 100644
--- a/lib/zinc/chacha20/chacha20.c
+++ b/lib/zinc/chacha20/chacha20.c
@@ -14,7 +14,9 @@
#include <linux/init.h>
#include <crypto/algapi.h>
-#ifndef HAVE_CHACHA20_ARCH_IMPLEMENTATION
+#if defined(CONFIG_ZINC_ARCH_X86_64)
+#include "chacha20-x86_64-glue.h"
+#else
void __init chacha20_fpu_init(void)
{
}
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 05/23] zinc: import Andy Polyakov's ChaCha20 ARM and ARM64 implementations
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
` (22 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Andy Polyakov, Russell King,
linux-arm-kernel
These NEON and non-NEON implementations come from Andy Polyakov's
implementation, and are included here in raw form without modification,
so that subsequent commits that fix these up for the kernel can see how
it has changed. This awkward commit splitting has been requested for the
ARM[64] implementations in particular.
While this is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit 87cc649f30aaf69b351701875b9dac07c29ce8a2
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
lib/zinc/chacha20/chacha20-arm-cryptogams.S | 1440 ++++++++++++
lib/zinc/chacha20/chacha20-arm64-cryptogams.S | 1973 +++++++++++++++++
2 files changed, 3413 insertions(+)
create mode 100644 lib/zinc/chacha20/chacha20-arm-cryptogams.S
create mode 100644 lib/zinc/chacha20/chacha20-arm64-cryptogams.S
diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
new file mode 100644
index 000000000000..05a3a9e6e93f
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
@@ -0,0 +1,1440 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+#if defined(__thumb2__) || defined(__clang__)
+.syntax unified
+#endif
+#if defined(__thumb2__)
+.thumb
+#else
+.code 32
+#endif
+
+#if defined(__thumb2__) || defined(__clang__)
+#define ldrhsb ldrbhs
+#endif
+
+.align 5
+.Lsigma:
+.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
+.Lone:
+.long 1,0,0,0
+.Lrot8:
+.long 0x02010003,0x06050407
+#if __ARM_MAX_ARCH__>=7
+.LOPENSSL_armcap:
+.word OPENSSL_armcap_P-.LChaCha20_ctr32
+#else
+.word -1
+#endif
+
+.globl ChaCha20_ctr32
+.type ChaCha20_ctr32,%function
+.align 5
+ChaCha20_ctr32:
+.LChaCha20_ctr32:
+ ldr r12,[sp,#0] @ pull pointer to counter and nonce
+ stmdb sp!,{r0-r2,r4-r11,lr}
+#if __ARM_ARCH__<7 && !defined(__thumb2__)
+ sub r14,pc,#16 @ ChaCha20_ctr32
+#else
+ adr r14,.LChaCha20_ctr32
+#endif
+ cmp r2,#0 @ len==0?
+#ifdef __thumb2__
+ itt eq
+#endif
+ addeq sp,sp,#4*3
+ beq .Lno_data
+#if __ARM_MAX_ARCH__>=7
+ cmp r2,#192 @ test len
+ bls .Lshort
+ ldr r4,[r14,#-24]
+ ldr r4,[r14,r4]
+# ifdef __APPLE__
+ ldr r4,[r4]
+# endif
+ tst r4,#ARMV7_NEON
+ bne .LChaCha20_neon
+.Lshort:
+#endif
+ ldmia r12,{r4-r7} @ load counter and nonce
+ sub sp,sp,#4*(16) @ off-load area
+ sub r14,r14,#64 @ .Lsigma
+ stmdb sp!,{r4-r7} @ copy counter and nonce
+ ldmia r3,{r4-r11} @ load key
+ ldmia r14,{r0-r3} @ load sigma
+ stmdb sp!,{r4-r11} @ copy key
+ stmdb sp!,{r0-r3} @ copy sigma
+ str r10,[sp,#4*(16+10)] @ off-load "rx"
+ str r11,[sp,#4*(16+11)] @ off-load "rx"
+ b .Loop_outer_enter
+
+.align 4
+.Loop_outer:
+ ldmia sp,{r0-r9} @ load key material
+ str r11,[sp,#4*(32+2)] @ save len
+ str r12, [sp,#4*(32+1)] @ save inp
+ str r14, [sp,#4*(32+0)] @ save out
+.Loop_outer_enter:
+ ldr r11, [sp,#4*(15)]
+ mov r4,r4,ror#19 @ twist b[0..3]
+ ldr r12,[sp,#4*(12)] @ modulo-scheduled load
+ mov r5,r5,ror#19
+ ldr r10, [sp,#4*(13)]
+ mov r6,r6,ror#19
+ ldr r14,[sp,#4*(14)]
+ mov r7,r7,ror#19
+ mov r11,r11,ror#8 @ twist d[0..3]
+ mov r12,r12,ror#8
+ mov r10,r10,ror#8
+ mov r14,r14,ror#8
+ str r11, [sp,#4*(16+15)]
+ mov r11,#10
+ b .Loop
+
+.align 4
+.Loop:
+ subs r11,r11,#1
+ add r0,r0,r4,ror#13
+ add r1,r1,r5,ror#13
+ eor r12,r0,r12,ror#24
+ eor r10,r1,r10,ror#24
+ add r8,r8,r12,ror#16
+ add r9,r9,r10,ror#16
+ eor r4,r8,r4,ror#13
+ eor r5,r9,r5,ror#13
+ add r0,r0,r4,ror#20
+ add r1,r1,r5,ror#20
+ eor r12,r0,r12,ror#16
+ eor r10,r1,r10,ror#16
+ add r8,r8,r12,ror#24
+ str r10,[sp,#4*(16+13)]
+ add r9,r9,r10,ror#24
+ ldr r10,[sp,#4*(16+15)]
+ str r8,[sp,#4*(16+8)]
+ eor r4,r4,r8,ror#12
+ str r9,[sp,#4*(16+9)]
+ eor r5,r5,r9,ror#12
+ ldr r8,[sp,#4*(16+10)]
+ add r2,r2,r6,ror#13
+ ldr r9,[sp,#4*(16+11)]
+ add r3,r3,r7,ror#13
+ eor r14,r2,r14,ror#24
+ eor r10,r3,r10,ror#24
+ add r8,r8,r14,ror#16
+ add r9,r9,r10,ror#16
+ eor r6,r8,r6,ror#13
+ eor r7,r9,r7,ror#13
+ add r2,r2,r6,ror#20
+ add r3,r3,r7,ror#20
+ eor r14,r2,r14,ror#16
+ eor r10,r3,r10,ror#16
+ add r8,r8,r14,ror#24
+ add r9,r9,r10,ror#24
+ eor r6,r6,r8,ror#12
+ eor r7,r7,r9,ror#12
+ add r0,r0,r5,ror#13
+ add r1,r1,r6,ror#13
+ eor r10,r0,r10,ror#24
+ eor r12,r1,r12,ror#24
+ add r8,r8,r10,ror#16
+ add r9,r9,r12,ror#16
+ eor r5,r8,r5,ror#13
+ eor r6,r9,r6,ror#13
+ add r0,r0,r5,ror#20
+ add r1,r1,r6,ror#20
+ eor r10,r0,r10,ror#16
+ eor r12,r1,r12,ror#16
+ str r10,[sp,#4*(16+15)]
+ add r8,r8,r10,ror#24
+ ldr r10,[sp,#4*(16+13)]
+ add r9,r9,r12,ror#24
+ str r8,[sp,#4*(16+10)]
+ eor r5,r5,r8,ror#12
+ str r9,[sp,#4*(16+11)]
+ eor r6,r6,r9,ror#12
+ ldr r8,[sp,#4*(16+8)]
+ add r2,r2,r7,ror#13
+ ldr r9,[sp,#4*(16+9)]
+ add r3,r3,r4,ror#13
+ eor r10,r2,r10,ror#24
+ eor r14,r3,r14,ror#24
+ add r8,r8,r10,ror#16
+ add r9,r9,r14,ror#16
+ eor r7,r8,r7,ror#13
+ eor r4,r9,r4,ror#13
+ add r2,r2,r7,ror#20
+ add r3,r3,r4,ror#20
+ eor r10,r2,r10,ror#16
+ eor r14,r3,r14,ror#16
+ add r8,r8,r10,ror#24
+ add r9,r9,r14,ror#24
+ eor r7,r7,r8,ror#12
+ eor r4,r4,r9,ror#12
+ bne .Loop
+
+ ldr r11,[sp,#4*(32+2)] @ load len
+
+ str r8, [sp,#4*(16+8)] @ modulo-scheduled store
+ str r9, [sp,#4*(16+9)]
+ str r12,[sp,#4*(16+12)]
+ str r10, [sp,#4*(16+13)]
+ str r14,[sp,#4*(16+14)]
+
+ @ at this point we have first half of 512-bit result in
+ @ rx and second half at sp+4*(16+8)
+
+ cmp r11,#64 @ done yet?
+#ifdef __thumb2__
+ itete lo
+#endif
+ addlo r12,sp,#4*(0) @ shortcut or ...
+ ldrhs r12,[sp,#4*(32+1)] @ ... load inp
+ addlo r14,sp,#4*(0) @ shortcut or ...
+ ldrhs r14,[sp,#4*(32+0)] @ ... load out
+
+ ldr r8,[sp,#4*(0)] @ load key material
+ ldr r9,[sp,#4*(1)]
+
+#if __ARM_ARCH__>=6 || !defined(__ARMEB__)
+# if __ARM_ARCH__<7
+ orr r10,r12,r14
+ tst r10,#3 @ are input and output aligned?
+ ldr r10,[sp,#4*(2)]
+ bne .Lunaligned
+ cmp r11,#64 @ restore flags
+# else
+ ldr r10,[sp,#4*(2)]
+# endif
+ ldr r11,[sp,#4*(3)]
+
+ add r0,r0,r8 @ accumulate key material
+ add r1,r1,r9
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+
+ add r2,r2,r10
+ add r3,r3,r11
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r0,r0,r8 @ xor with input
+ eorhs r1,r1,r9
+ add r8,sp,#4*(4)
+ str r0,[r14],#16 @ store output
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r2,r2,r10
+ eorhs r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r1,[r14,#-12]
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#13 @ accumulate key material
+ add r5,r9,r5,ror#13
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+ add r6,r10,r6,ror#13
+ add r7,r11,r7,ror#13
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r4,r4,r8
+ eorhs r5,r5,r9
+ add r8,sp,#4*(8)
+ str r4,[r14],#16 @ store output
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r6,r6,r10
+ eorhs r7,r7,r11
+ str r5,[r14,#-12]
+ ldmia r8,{r8-r11} @ load key material
+ str r6,[r14,#-8]
+ add r0,sp,#4*(16+8)
+ str r7,[r14,#-4]
+
+ ldmia r0,{r0-r7} @ load second half
+
+ add r0,r0,r8 @ accumulate key material
+ add r1,r1,r9
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+# ifdef __thumb2__
+ itt hi
+# endif
+ strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
+ strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
+ add r2,r2,r10
+ add r3,r3,r11
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r0,r0,r8
+ eorhs r1,r1,r9
+ add r8,sp,#4*(12)
+ str r0,[r14],#16 @ store output
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r2,r2,r10
+ eorhs r3,r3,r11
+ str r1,[r14,#-12]
+ ldmia r8,{r8-r11} @ load key material
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#24 @ accumulate key material
+ add r5,r9,r5,ror#24
+# ifdef __thumb2__
+ itt hi
+# endif
+ addhi r8,r8,#1 @ next counter value
+ strhi r8,[sp,#4*(12)] @ save next counter value
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+ add r6,r10,r6,ror#24
+ add r7,r11,r7,ror#24
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r4,r4,r8
+ eorhs r5,r5,r9
+# ifdef __thumb2__
+ it ne
+# endif
+ ldrne r8,[sp,#4*(32+2)] @ re-load len
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r6,r6,r10
+ eorhs r7,r7,r11
+ str r4,[r14],#16 @ store output
+ str r5,[r14,#-12]
+# ifdef __thumb2__
+ it hs
+# endif
+ subhs r11,r8,#64 @ len-=64
+ str r6,[r14,#-8]
+ str r7,[r14,#-4]
+ bhi .Loop_outer
+
+ beq .Ldone
+# if __ARM_ARCH__<7
+ b .Ltail
+
+.align 4
+.Lunaligned: @ unaligned endian-neutral path
+ cmp r11,#64 @ restore flags
+# endif
+#endif
+#if __ARM_ARCH__<7
+ ldr r11,[sp,#4*(3)]
+ add r0,r8,r0 @ accumulate key material
+ add r1,r9,r1
+ add r2,r10,r2
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r3,r11,r3
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r0,r8,r0 @ xor with input (or zero)
+ eor r1,r9,r1
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r2,r10,r2
+ strb r0,[r14],#16 @ store output
+ eor r3,r11,r3
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r1,[r14,#-12]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-8]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r3,[r14,#-4]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-15]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r1,[r14,#-11]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-7]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r3,[r14,#-3]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-14]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r1,[r14,#-10]
+ strb r2,[r14,#-6]
+ eor r0,r8,r0,lsr#8
+ strb r3,[r14,#-2]
+ eor r1,r9,r1,lsr#8
+ strb r0,[r14,#-13]
+ eor r2,r10,r2,lsr#8
+ strb r1,[r14,#-9]
+ eor r3,r11,r3,lsr#8
+ strb r2,[r14,#-5]
+ strb r3,[r14,#-1]
+ add r8,sp,#4*(4+0)
+ ldmia r8,{r8-r11} @ load key material
+ add r0,sp,#4*(16+8)
+ add r4,r8,r4,ror#13 @ accumulate key material
+ add r5,r9,r5,ror#13
+ add r6,r10,r6,ror#13
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r7,r11,r7,ror#13
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r4,r8,r4 @ xor with input (or zero)
+ eor r5,r9,r5
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r6,r10,r6
+ strb r4,[r14],#16 @ store output
+ eor r7,r11,r7
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r5,[r14,#-12]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-8]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r7,[r14,#-4]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-15]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r5,[r14,#-11]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-7]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r7,[r14,#-3]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-14]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r5,[r14,#-10]
+ strb r6,[r14,#-6]
+ eor r4,r8,r4,lsr#8
+ strb r7,[r14,#-2]
+ eor r5,r9,r5,lsr#8
+ strb r4,[r14,#-13]
+ eor r6,r10,r6,lsr#8
+ strb r5,[r14,#-9]
+ eor r7,r11,r7,lsr#8
+ strb r6,[r14,#-5]
+ strb r7,[r14,#-1]
+ add r8,sp,#4*(4+4)
+ ldmia r8,{r8-r11} @ load key material
+ ldmia r0,{r0-r7} @ load second half
+# ifdef __thumb2__
+ itt hi
+# endif
+ strhi r10,[sp,#4*(16+10)] @ copy "rx"
+ strhi r11,[sp,#4*(16+11)] @ copy "rx"
+ add r0,r8,r0 @ accumulate key material
+ add r1,r9,r1
+ add r2,r10,r2
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r3,r11,r3
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r0,r8,r0 @ xor with input (or zero)
+ eor r1,r9,r1
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r2,r10,r2
+ strb r0,[r14],#16 @ store output
+ eor r3,r11,r3
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r1,[r14,#-12]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-8]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r3,[r14,#-4]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-15]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r1,[r14,#-11]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-7]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r3,[r14,#-3]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-14]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r1,[r14,#-10]
+ strb r2,[r14,#-6]
+ eor r0,r8,r0,lsr#8
+ strb r3,[r14,#-2]
+ eor r1,r9,r1,lsr#8
+ strb r0,[r14,#-13]
+ eor r2,r10,r2,lsr#8
+ strb r1,[r14,#-9]
+ eor r3,r11,r3,lsr#8
+ strb r2,[r14,#-5]
+ strb r3,[r14,#-1]
+ add r8,sp,#4*(4+8)
+ ldmia r8,{r8-r11} @ load key material
+ add r4,r8,r4,ror#24 @ accumulate key material
+# ifdef __thumb2__
+ itt hi
+# endif
+ addhi r8,r8,#1 @ next counter value
+ strhi r8,[sp,#4*(12)] @ save next counter value
+ add r5,r9,r5,ror#24
+ add r6,r10,r6,ror#24
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r7,r11,r7,ror#24
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r4,r8,r4 @ xor with input (or zero)
+ eor r5,r9,r5
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r6,r10,r6
+ strb r4,[r14],#16 @ store output
+ eor r7,r11,r7
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r5,[r14,#-12]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-8]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r7,[r14,#-4]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-15]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r5,[r14,#-11]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-7]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r7,[r14,#-3]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-14]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r5,[r14,#-10]
+ strb r6,[r14,#-6]
+ eor r4,r8,r4,lsr#8
+ strb r7,[r14,#-2]
+ eor r5,r9,r5,lsr#8
+ strb r4,[r14,#-13]
+ eor r6,r10,r6,lsr#8
+ strb r5,[r14,#-9]
+ eor r7,r11,r7,lsr#8
+ strb r6,[r14,#-5]
+ strb r7,[r14,#-1]
+# ifdef __thumb2__
+ it ne
+# endif
+ ldrne r8,[sp,#4*(32+2)] @ re-load len
+# ifdef __thumb2__
+ it hs
+# endif
+ subhs r11,r8,#64 @ len-=64
+ bhi .Loop_outer
+
+ beq .Ldone
+#endif
+
+.Ltail:
+ ldr r12,[sp,#4*(32+1)] @ load inp
+ add r9,sp,#4*(0)
+ ldr r14,[sp,#4*(32+0)] @ load out
+
+.Loop_tail:
+ ldrb r10,[r9],#1 @ read buffer on stack
+ ldrb r11,[r12],#1 @ read input
+ subs r8,r8,#1
+ eor r11,r11,r10
+ strb r11,[r14],#1 @ store output
+ bne .Loop_tail
+
+.Ldone:
+ add sp,sp,#4*(32+3)
+.Lno_data:
+ ldmia sp!,{r4-r11,pc}
+.size ChaCha20_ctr32,.-ChaCha20_ctr32
+#if __ARM_MAX_ARCH__>=7
+.arch armv7-a
+.fpu neon
+
+.type ChaCha20_neon,%function
+.align 5
+ChaCha20_neon:
+ ldr r12,[sp,#0] @ pull pointer to counter and nonce
+ stmdb sp!,{r0-r2,r4-r11,lr}
+.LChaCha20_neon:
+ adr r14,.Lsigma
+ vstmdb sp!,{d8-d15} @ ABI spec says so
+ stmdb sp!,{r0-r3}
+
+ vld1.32 {q1-q2},[r3] @ load key
+ ldmia r3,{r4-r11} @ load key
+
+ sub sp,sp,#4*(16+16)
+ vld1.32 {q3},[r12] @ load counter and nonce
+ add r12,sp,#4*8
+ ldmia r14,{r0-r3} @ load sigma
+ vld1.32 {q0},[r14]! @ load sigma
+ vld1.32 {q12},[r14]! @ one
+ @ vld1.32 {d30},[r14] @ rot8
+ vst1.32 {q2-q3},[r12] @ copy 1/2key|counter|nonce
+ vst1.32 {q0-q1},[sp] @ copy sigma|1/2key
+
+ str r10,[sp,#4*(16+10)] @ off-load "rx"
+ str r11,[sp,#4*(16+11)] @ off-load "rx"
+ vshl.i32 d26,d24,#1 @ two
+ vstr d24,[sp,#4*(16+0)]
+ vshl.i32 d28,d24,#2 @ four
+ vstr d26,[sp,#4*(16+2)]
+ vmov q4,q0
+ vstr d28,[sp,#4*(16+4)]
+ vmov q8,q0
+ @ vstr d30,[sp,#4*(16+6)]
+ vmov q5,q1
+ vmov q9,q1
+ b .Loop_neon_enter
+
+.align 4
+.Loop_neon_outer:
+ ldmia sp,{r0-r9} @ load key material
+ cmp r11,#64*2 @ if len<=64*2
+ bls .Lbreak_neon @ switch to integer-only
+ @ vldr d30,[sp,#4*(16+6)] @ rot8
+ vmov q4,q0
+ str r11,[sp,#4*(32+2)] @ save len
+ vmov q8,q0
+ str r12, [sp,#4*(32+1)] @ save inp
+ vmov q5,q1
+ str r14, [sp,#4*(32+0)] @ save out
+ vmov q9,q1
+.Loop_neon_enter:
+ ldr r11, [sp,#4*(15)]
+ mov r4,r4,ror#19 @ twist b[0..3]
+ vadd.i32 q7,q3,q12 @ counter+1
+ ldr r12,[sp,#4*(12)] @ modulo-scheduled load
+ mov r5,r5,ror#19
+ vmov q6,q2
+ ldr r10, [sp,#4*(13)]
+ mov r6,r6,ror#19
+ vmov q10,q2
+ ldr r14,[sp,#4*(14)]
+ mov r7,r7,ror#19
+ vadd.i32 q11,q7,q12 @ counter+2
+ add r12,r12,#3 @ counter+3
+ mov r11,r11,ror#8 @ twist d[0..3]
+ mov r12,r12,ror#8
+ mov r10,r10,ror#8
+ mov r14,r14,ror#8
+ str r11, [sp,#4*(16+15)]
+ mov r11,#10
+ b .Loop_neon
+
+.align 4
+.Loop_neon:
+ subs r11,r11,#1
+ vadd.i32 q0,q0,q1
+ add r0,r0,r4,ror#13
+ vadd.i32 q4,q4,q5
+ add r1,r1,r5,ror#13
+ vadd.i32 q8,q8,q9
+ eor r12,r0,r12,ror#24
+ veor q3,q3,q0
+ eor r10,r1,r10,ror#24
+ veor q7,q7,q4
+ add r8,r8,r12,ror#16
+ veor q11,q11,q8
+ add r9,r9,r10,ror#16
+ vrev32.16 q3,q3
+ eor r4,r8,r4,ror#13
+ vrev32.16 q7,q7
+ eor r5,r9,r5,ror#13
+ vrev32.16 q11,q11
+ add r0,r0,r4,ror#20
+ vadd.i32 q2,q2,q3
+ add r1,r1,r5,ror#20
+ vadd.i32 q6,q6,q7
+ eor r12,r0,r12,ror#16
+ vadd.i32 q10,q10,q11
+ eor r10,r1,r10,ror#16
+ veor q12,q1,q2
+ add r8,r8,r12,ror#24
+ veor q13,q5,q6
+ str r10,[sp,#4*(16+13)]
+ veor q14,q9,q10
+ add r9,r9,r10,ror#24
+ vshr.u32 q1,q12,#20
+ ldr r10,[sp,#4*(16+15)]
+ vshr.u32 q5,q13,#20
+ str r8,[sp,#4*(16+8)]
+ vshr.u32 q9,q14,#20
+ eor r4,r4,r8,ror#12
+ vsli.32 q1,q12,#12
+ str r9,[sp,#4*(16+9)]
+ vsli.32 q5,q13,#12
+ eor r5,r5,r9,ror#12
+ vsli.32 q9,q14,#12
+ ldr r8,[sp,#4*(16+10)]
+ vadd.i32 q0,q0,q1
+ add r2,r2,r6,ror#13
+ vadd.i32 q4,q4,q5
+ ldr r9,[sp,#4*(16+11)]
+ vadd.i32 q8,q8,q9
+ add r3,r3,r7,ror#13
+ veor q12,q3,q0
+ eor r14,r2,r14,ror#24
+ veor q13,q7,q4
+ eor r10,r3,r10,ror#24
+ veor q14,q11,q8
+ add r8,r8,r14,ror#16
+ vshr.u32 q3,q12,#24
+ add r9,r9,r10,ror#16
+ vshr.u32 q7,q13,#24
+ eor r6,r8,r6,ror#13
+ vshr.u32 q11,q14,#24
+ eor r7,r9,r7,ror#13
+ vsli.32 q3,q12,#8
+ add r2,r2,r6,ror#20
+ vsli.32 q7,q13,#8
+ add r3,r3,r7,ror#20
+ vsli.32 q11,q14,#8
+ eor r14,r2,r14,ror#16
+ vadd.i32 q2,q2,q3
+ eor r10,r3,r10,ror#16
+ vadd.i32 q6,q6,q7
+ add r8,r8,r14,ror#24
+ vadd.i32 q10,q10,q11
+ add r9,r9,r10,ror#24
+ veor q12,q1,q2
+ eor r6,r6,r8,ror#12
+ veor q13,q5,q6
+ eor r7,r7,r9,ror#12
+ veor q14,q9,q10
+ vshr.u32 q1,q12,#25
+ vshr.u32 q5,q13,#25
+ vshr.u32 q9,q14,#25
+ vsli.32 q1,q12,#7
+ vsli.32 q5,q13,#7
+ vsli.32 q9,q14,#7
+ vext.8 q2,q2,q2,#8
+ vext.8 q6,q6,q6,#8
+ vext.8 q10,q10,q10,#8
+ vext.8 q1,q1,q1,#4
+ vext.8 q5,q5,q5,#4
+ vext.8 q9,q9,q9,#4
+ vext.8 q3,q3,q3,#12
+ vext.8 q7,q7,q7,#12
+ vext.8 q11,q11,q11,#12
+ vadd.i32 q0,q0,q1
+ add r0,r0,r5,ror#13
+ vadd.i32 q4,q4,q5
+ add r1,r1,r6,ror#13
+ vadd.i32 q8,q8,q9
+ eor r10,r0,r10,ror#24
+ veor q3,q3,q0
+ eor r12,r1,r12,ror#24
+ veor q7,q7,q4
+ add r8,r8,r10,ror#16
+ veor q11,q11,q8
+ add r9,r9,r12,ror#16
+ vrev32.16 q3,q3
+ eor r5,r8,r5,ror#13
+ vrev32.16 q7,q7
+ eor r6,r9,r6,ror#13
+ vrev32.16 q11,q11
+ add r0,r0,r5,ror#20
+ vadd.i32 q2,q2,q3
+ add r1,r1,r6,ror#20
+ vadd.i32 q6,q6,q7
+ eor r10,r0,r10,ror#16
+ vadd.i32 q10,q10,q11
+ eor r12,r1,r12,ror#16
+ veor q12,q1,q2
+ str r10,[sp,#4*(16+15)]
+ veor q13,q5,q6
+ add r8,r8,r10,ror#24
+ veor q14,q9,q10
+ ldr r10,[sp,#4*(16+13)]
+ vshr.u32 q1,q12,#20
+ add r9,r9,r12,ror#24
+ vshr.u32 q5,q13,#20
+ str r8,[sp,#4*(16+10)]
+ vshr.u32 q9,q14,#20
+ eor r5,r5,r8,ror#12
+ vsli.32 q1,q12,#12
+ str r9,[sp,#4*(16+11)]
+ vsli.32 q5,q13,#12
+ eor r6,r6,r9,ror#12
+ vsli.32 q9,q14,#12
+ ldr r8,[sp,#4*(16+8)]
+ vadd.i32 q0,q0,q1
+ add r2,r2,r7,ror#13
+ vadd.i32 q4,q4,q5
+ ldr r9,[sp,#4*(16+9)]
+ vadd.i32 q8,q8,q9
+ add r3,r3,r4,ror#13
+ veor q12,q3,q0
+ eor r10,r2,r10,ror#24
+ veor q13,q7,q4
+ eor r14,r3,r14,ror#24
+ veor q14,q11,q8
+ add r8,r8,r10,ror#16
+ vshr.u32 q3,q12,#24
+ add r9,r9,r14,ror#16
+ vshr.u32 q7,q13,#24
+ eor r7,r8,r7,ror#13
+ vshr.u32 q11,q14,#24
+ eor r4,r9,r4,ror#13
+ vsli.32 q3,q12,#8
+ add r2,r2,r7,ror#20
+ vsli.32 q7,q13,#8
+ add r3,r3,r4,ror#20
+ vsli.32 q11,q14,#8
+ eor r10,r2,r10,ror#16
+ vadd.i32 q2,q2,q3
+ eor r14,r3,r14,ror#16
+ vadd.i32 q6,q6,q7
+ add r8,r8,r10,ror#24
+ vadd.i32 q10,q10,q11
+ add r9,r9,r14,ror#24
+ veor q12,q1,q2
+ eor r7,r7,r8,ror#12
+ veor q13,q5,q6
+ eor r4,r4,r9,ror#12
+ veor q14,q9,q10
+ vshr.u32 q1,q12,#25
+ vshr.u32 q5,q13,#25
+ vshr.u32 q9,q14,#25
+ vsli.32 q1,q12,#7
+ vsli.32 q5,q13,#7
+ vsli.32 q9,q14,#7
+ vext.8 q2,q2,q2,#8
+ vext.8 q6,q6,q6,#8
+ vext.8 q10,q10,q10,#8
+ vext.8 q1,q1,q1,#12
+ vext.8 q5,q5,q5,#12
+ vext.8 q9,q9,q9,#12
+ vext.8 q3,q3,q3,#4
+ vext.8 q7,q7,q7,#4
+ vext.8 q11,q11,q11,#4
+ bne .Loop_neon
+
+ add r11,sp,#32
+ vld1.32 {q12-q13},[sp] @ load key material
+ vld1.32 {q14-q15},[r11]
+
+ ldr r11,[sp,#4*(32+2)] @ load len
+
+ str r8, [sp,#4*(16+8)] @ modulo-scheduled store
+ str r9, [sp,#4*(16+9)]
+ str r12,[sp,#4*(16+12)]
+ str r10, [sp,#4*(16+13)]
+ str r14,[sp,#4*(16+14)]
+
+ @ at this point we have first half of 512-bit result in
+ @ rx and second half at sp+4*(16+8)
+
+ ldr r12,[sp,#4*(32+1)] @ load inp
+ ldr r14,[sp,#4*(32+0)] @ load out
+
+ vadd.i32 q0,q0,q12 @ accumulate key material
+ vadd.i32 q4,q4,q12
+ vadd.i32 q8,q8,q12
+ vldr d24,[sp,#4*(16+0)] @ one
+
+ vadd.i32 q1,q1,q13
+ vadd.i32 q5,q5,q13
+ vadd.i32 q9,q9,q13
+ vldr d26,[sp,#4*(16+2)] @ two
+
+ vadd.i32 q2,q2,q14
+ vadd.i32 q6,q6,q14
+ vadd.i32 q10,q10,q14
+ vadd.i32 d14,d14,d24 @ counter+1
+ vadd.i32 d22,d22,d26 @ counter+2
+
+ vadd.i32 q3,q3,q15
+ vadd.i32 q7,q7,q15
+ vadd.i32 q11,q11,q15
+
+ cmp r11,#64*4
+ blo .Ltail_neon
+
+ vld1.8 {q12-q13},[r12]! @ load input
+ mov r11,sp
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12 @ xor with input
+ veor q1,q1,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q4,q4,q12
+ vst1.8 {q0-q1},[r14]! @ store output
+ veor q5,q5,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q6,q6,q14
+ vst1.8 {q2-q3},[r14]!
+ veor q7,q7,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q8,q8,q12
+ vld1.32 {q0-q1},[r11]! @ load for next iteration
+ veor d25,d25,d25
+ vldr d24,[sp,#4*(16+4)] @ four
+ veor q9,q9,q13
+ vld1.32 {q2-q3},[r11]
+ veor q10,q10,q14
+ vst1.8 {q4-q5},[r14]!
+ veor q11,q11,q15
+ vst1.8 {q6-q7},[r14]!
+
+ vadd.i32 d6,d6,d24 @ next counter value
+ vldr d24,[sp,#4*(16+0)] @ one
+
+ ldmia sp,{r8-r11} @ load key material
+ add r0,r0,r8 @ accumulate key material
+ ldr r8,[r12],#16 @ load input
+ vst1.8 {q8-q9},[r14]!
+ add r1,r1,r9
+ ldr r9,[r12,#-12]
+ vst1.8 {q10-q11},[r14]!
+ add r2,r2,r10
+ ldr r10,[r12,#-8]
+ add r3,r3,r11
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+ eor r0,r0,r8 @ xor with input
+ add r8,sp,#4*(4)
+ eor r1,r1,r9
+ str r0,[r14],#16 @ store output
+ eor r2,r2,r10
+ str r1,[r14,#-12]
+ eor r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#13 @ accumulate key material
+ ldr r8,[r12],#16 @ load input
+ add r5,r9,r5,ror#13
+ ldr r9,[r12,#-12]
+ add r6,r10,r6,ror#13
+ ldr r10,[r12,#-8]
+ add r7,r11,r7,ror#13
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ eor r4,r4,r8
+ add r8,sp,#4*(8)
+ eor r5,r5,r9
+ str r4,[r14],#16 @ store output
+ eor r6,r6,r10
+ str r5,[r14,#-12]
+ eor r7,r7,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r6,[r14,#-8]
+ add r0,sp,#4*(16+8)
+ str r7,[r14,#-4]
+
+ ldmia r0,{r0-r7} @ load second half
+
+ add r0,r0,r8 @ accumulate key material
+ ldr r8,[r12],#16 @ load input
+ add r1,r1,r9
+ ldr r9,[r12,#-12]
+# ifdef __thumb2__
+ it hi
+# endif
+ strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
+ add r2,r2,r10
+ ldr r10,[r12,#-8]
+# ifdef __thumb2__
+ it hi
+# endif
+ strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
+ add r3,r3,r11
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+ eor r0,r0,r8
+ add r8,sp,#4*(12)
+ eor r1,r1,r9
+ str r0,[r14],#16 @ store output
+ eor r2,r2,r10
+ str r1,[r14,#-12]
+ eor r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#24 @ accumulate key material
+ add r8,r8,#4 @ next counter value
+ add r5,r9,r5,ror#24
+ str r8,[sp,#4*(12)] @ save next counter value
+ ldr r8,[r12],#16 @ load input
+ add r6,r10,r6,ror#24
+ add r4,r4,#3 @ counter+3
+ ldr r9,[r12,#-12]
+ add r7,r11,r7,ror#24
+ ldr r10,[r12,#-8]
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ eor r4,r4,r8
+# ifdef __thumb2__
+ it hi
+# endif
+ ldrhi r8,[sp,#4*(32+2)] @ re-load len
+ eor r5,r5,r9
+ eor r6,r6,r10
+ str r4,[r14],#16 @ store output
+ eor r7,r7,r11
+ str r5,[r14,#-12]
+ sub r11,r8,#64*4 @ len-=64*4
+ str r6,[r14,#-8]
+ str r7,[r14,#-4]
+ bhi .Loop_neon_outer
+
+ b .Ldone_neon
+
+.align 4
+.Lbreak_neon:
+ @ harmonize NEON and integer-only stack frames: load data
+ @ from NEON frame, but save to integer-only one; distance
+ @ between the two is 4*(32+4+16-32)=4*(20).
+
+ str r11, [sp,#4*(20+32+2)] @ save len
+ add r11,sp,#4*(32+4)
+ str r12, [sp,#4*(20+32+1)] @ save inp
+ str r14, [sp,#4*(20+32+0)] @ save out
+
+ ldr r12,[sp,#4*(16+10)]
+ ldr r14,[sp,#4*(16+11)]
+ vldmia r11,{d8-d15} @ fulfill ABI requirement
+ str r12,[sp,#4*(20+16+10)] @ copy "rx"
+ str r14,[sp,#4*(20+16+11)] @ copy "rx"
+
+ ldr r11, [sp,#4*(15)]
+ mov r4,r4,ror#19 @ twist b[0..3]
+ ldr r12,[sp,#4*(12)] @ modulo-scheduled load
+ mov r5,r5,ror#19
+ ldr r10, [sp,#4*(13)]
+ mov r6,r6,ror#19
+ ldr r14,[sp,#4*(14)]
+ mov r7,r7,ror#19
+ mov r11,r11,ror#8 @ twist d[0..3]
+ mov r12,r12,ror#8
+ mov r10,r10,ror#8
+ mov r14,r14,ror#8
+ str r11, [sp,#4*(20+16+15)]
+ add r11,sp,#4*(20)
+ vst1.32 {q0-q1},[r11]! @ copy key
+ add sp,sp,#4*(20) @ switch frame
+ vst1.32 {q2-q3},[r11]
+ mov r11,#10
+ b .Loop @ go integer-only
+
+.align 4
+.Ltail_neon:
+ cmp r11,#64*3
+ bhs .L192_or_more_neon
+ cmp r11,#64*2
+ bhs .L128_or_more_neon
+ cmp r11,#64*1
+ bhs .L64_or_more_neon
+
+ add r8,sp,#4*(8)
+ vst1.8 {q0-q1},[sp]
+ add r10,sp,#4*(0)
+ vst1.8 {q2-q3},[r8]
+ b .Loop_tail_neon
+
+.align 4
+.L64_or_more_neon:
+ vld1.8 {q12-q13},[r12]!
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12
+ veor q1,q1,q13
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vst1.8 {q0-q1},[r14]!
+ vst1.8 {q2-q3},[r14]!
+
+ beq .Ldone_neon
+
+ add r8,sp,#4*(8)
+ vst1.8 {q4-q5},[sp]
+ add r10,sp,#4*(0)
+ vst1.8 {q6-q7},[r8]
+ sub r11,r11,#64*1 @ len-=64*1
+ b .Loop_tail_neon
+
+.align 4
+.L128_or_more_neon:
+ vld1.8 {q12-q13},[r12]!
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12
+ veor q1,q1,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q4,q4,q12
+ veor q5,q5,q13
+ vst1.8 {q0-q1},[r14]!
+ veor q6,q6,q14
+ vst1.8 {q2-q3},[r14]!
+ veor q7,q7,q15
+ vst1.8 {q4-q5},[r14]!
+ vst1.8 {q6-q7},[r14]!
+
+ beq .Ldone_neon
+
+ add r8,sp,#4*(8)
+ vst1.8 {q8-q9},[sp]
+ add r10,sp,#4*(0)
+ vst1.8 {q10-q11},[r8]
+ sub r11,r11,#64*2 @ len-=64*2
+ b .Loop_tail_neon
+
+.align 4
+.L192_or_more_neon:
+ vld1.8 {q12-q13},[r12]!
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12
+ veor q1,q1,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q4,q4,q12
+ veor q5,q5,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q6,q6,q14
+ vst1.8 {q0-q1},[r14]!
+ veor q7,q7,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q8,q8,q12
+ vst1.8 {q2-q3},[r14]!
+ veor q9,q9,q13
+ vst1.8 {q4-q5},[r14]!
+ veor q10,q10,q14
+ vst1.8 {q6-q7},[r14]!
+ veor q11,q11,q15
+ vst1.8 {q8-q9},[r14]!
+ vst1.8 {q10-q11},[r14]!
+
+ beq .Ldone_neon
+
+ ldmia sp,{r8-r11} @ load key material
+ add r0,r0,r8 @ accumulate key material
+ add r8,sp,#4*(4)
+ add r1,r1,r9
+ add r2,r2,r10
+ add r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+
+ add r4,r8,r4,ror#13 @ accumulate key material
+ add r8,sp,#4*(8)
+ add r5,r9,r5,ror#13
+ add r6,r10,r6,ror#13
+ add r7,r11,r7,ror#13
+ ldmia r8,{r8-r11} @ load key material
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ stmia sp,{r0-r7}
+ add r0,sp,#4*(16+8)
+
+ ldmia r0,{r0-r7} @ load second half
+
+ add r0,r0,r8 @ accumulate key material
+ add r8,sp,#4*(12)
+ add r1,r1,r9
+ add r2,r2,r10
+ add r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+
+ add r4,r8,r4,ror#24 @ accumulate key material
+ add r8,sp,#4*(8)
+ add r5,r9,r5,ror#24
+ add r4,r4,#3 @ counter+3
+ add r6,r10,r6,ror#24
+ add r7,r11,r7,ror#24
+ ldr r11,[sp,#4*(32+2)] @ re-load len
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ stmia r8,{r0-r7}
+ add r10,sp,#4*(0)
+ sub r11,r11,#64*3 @ len-=64*3
+
+.Loop_tail_neon:
+ ldrb r8,[r10],#1 @ read buffer on stack
+ ldrb r9,[r12],#1 @ read input
+ subs r11,r11,#1
+ eor r8,r8,r9
+ strb r8,[r14],#1 @ store output
+ bne .Loop_tail_neon
+
+.Ldone_neon:
+ add sp,sp,#4*(32+4)
+ vldmia sp,{d8-d15}
+ add sp,sp,#4*(16+3)
+ ldmia sp!,{r4-r11,pc}
+.size ChaCha20_neon,.-ChaCha20_neon
+.comm OPENSSL_armcap_P,4,4
+#endif
diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
new file mode 100644
index 000000000000..4d029bfdad3a
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
@@ -0,0 +1,1973 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+
+
+
+.align 5
+.Lsigma:
+.quad 0x3320646e61707865,0x6b20657479622d32 // endian-neutral
+.Lone:
+.long 1,0,0,0
+.LOPENSSL_armcap_P:
+#ifdef __ILP32__
+.long OPENSSL_armcap_P-.
+#else
+.quad OPENSSL_armcap_P-.
+#endif
+.byte 67,104,97,67,104,97,50,48,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+.align 2
+
+.globl ChaCha20_ctr32
+.type ChaCha20_ctr32,%function
+.align 5
+ChaCha20_ctr32:
+ cbz x2,.Labort
+ adr x5,.LOPENSSL_armcap_P
+ cmp x2,#192
+ b.lo .Lshort
+#ifdef __ILP32__
+ ldrsw x6,[x5]
+#else
+ ldr x6,[x5]
+#endif
+ ldr w17,[x6,x5]
+ tst w17,#ARMV7_NEON
+ b.ne ChaCha20_neon
+
+.Lshort:
+ stp x29,x30,[sp,#-96]!
+ add x29,sp,#0
+
+ adr x5,.Lsigma
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+ sub sp,sp,#64
+
+ ldp x22,x23,[x5] // load sigma
+ ldp x24,x25,[x3] // load key
+ ldp x26,x27,[x3,#16]
+ ldp x28,x30,[x4] // load counter
+#ifdef __ARMEB__
+ ror x24,x24,#32
+ ror x25,x25,#32
+ ror x26,x26,#32
+ ror x27,x27,#32
+ ror x28,x28,#32
+ ror x30,x30,#32
+#endif
+
+.Loop_outer:
+ mov w5,w22 // unpack key block
+ lsr x6,x22,#32
+ mov w7,w23
+ lsr x8,x23,#32
+ mov w9,w24
+ lsr x10,x24,#32
+ mov w11,w25
+ lsr x12,x25,#32
+ mov w13,w26
+ lsr x14,x26,#32
+ mov w15,w27
+ lsr x16,x27,#32
+ mov w17,w28
+ lsr x19,x28,#32
+ mov w20,w30
+ lsr x21,x30,#32
+
+ mov x4,#10
+ subs x2,x2,#64
+.Loop:
+ sub x4,x4,#1
+ add w5,w5,w9
+ add w6,w6,w10
+ add w7,w7,w11
+ add w8,w8,w12
+ eor w17,w17,w5
+ eor w19,w19,w6
+ eor w20,w20,w7
+ eor w21,w21,w8
+ ror w17,w17,#16
+ ror w19,w19,#16
+ ror w20,w20,#16
+ ror w21,w21,#16
+ add w13,w13,w17
+ add w14,w14,w19
+ add w15,w15,w20
+ add w16,w16,w21
+ eor w9,w9,w13
+ eor w10,w10,w14
+ eor w11,w11,w15
+ eor w12,w12,w16
+ ror w9,w9,#20
+ ror w10,w10,#20
+ ror w11,w11,#20
+ ror w12,w12,#20
+ add w5,w5,w9
+ add w6,w6,w10
+ add w7,w7,w11
+ add w8,w8,w12
+ eor w17,w17,w5
+ eor w19,w19,w6
+ eor w20,w20,w7
+ eor w21,w21,w8
+ ror w17,w17,#24
+ ror w19,w19,#24
+ ror w20,w20,#24
+ ror w21,w21,#24
+ add w13,w13,w17
+ add w14,w14,w19
+ add w15,w15,w20
+ add w16,w16,w21
+ eor w9,w9,w13
+ eor w10,w10,w14
+ eor w11,w11,w15
+ eor w12,w12,w16
+ ror w9,w9,#25
+ ror w10,w10,#25
+ ror w11,w11,#25
+ ror w12,w12,#25
+ add w5,w5,w10
+ add w6,w6,w11
+ add w7,w7,w12
+ add w8,w8,w9
+ eor w21,w21,w5
+ eor w17,w17,w6
+ eor w19,w19,w7
+ eor w20,w20,w8
+ ror w21,w21,#16
+ ror w17,w17,#16
+ ror w19,w19,#16
+ ror w20,w20,#16
+ add w15,w15,w21
+ add w16,w16,w17
+ add w13,w13,w19
+ add w14,w14,w20
+ eor w10,w10,w15
+ eor w11,w11,w16
+ eor w12,w12,w13
+ eor w9,w9,w14
+ ror w10,w10,#20
+ ror w11,w11,#20
+ ror w12,w12,#20
+ ror w9,w9,#20
+ add w5,w5,w10
+ add w6,w6,w11
+ add w7,w7,w12
+ add w8,w8,w9
+ eor w21,w21,w5
+ eor w17,w17,w6
+ eor w19,w19,w7
+ eor w20,w20,w8
+ ror w21,w21,#24
+ ror w17,w17,#24
+ ror w19,w19,#24
+ ror w20,w20,#24
+ add w15,w15,w21
+ add w16,w16,w17
+ add w13,w13,w19
+ add w14,w14,w20
+ eor w10,w10,w15
+ eor w11,w11,w16
+ eor w12,w12,w13
+ eor w9,w9,w14
+ ror w10,w10,#25
+ ror w11,w11,#25
+ ror w12,w12,#25
+ ror w9,w9,#25
+ cbnz x4,.Loop
+
+ add w5,w5,w22 // accumulate key block
+ add x6,x6,x22,lsr#32
+ add w7,w7,w23
+ add x8,x8,x23,lsr#32
+ add w9,w9,w24
+ add x10,x10,x24,lsr#32
+ add w11,w11,w25
+ add x12,x12,x25,lsr#32
+ add w13,w13,w26
+ add x14,x14,x26,lsr#32
+ add w15,w15,w27
+ add x16,x16,x27,lsr#32
+ add w17,w17,w28
+ add x19,x19,x28,lsr#32
+ add w20,w20,w30
+ add x21,x21,x30,lsr#32
+
+ b.lo .Ltail
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor x15,x15,x16
+ eor x17,x17,x19
+ eor x20,x20,x21
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#1 // increment counter
+ stp x9,x11,[x0,#16]
+ stp x13,x15,[x0,#32]
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+
+ b.hi .Loop_outer
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+.Labort:
+ ret
+
+.align 4
+.Ltail:
+ add x2,x2,#64
+.Less_than_64:
+ sub x0,x0,#1
+ add x1,x1,x2
+ add x0,x0,x2
+ add x4,sp,x2
+ neg x2,x2
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ stp x5,x7,[sp,#0]
+ stp x9,x11,[sp,#16]
+ stp x13,x15,[sp,#32]
+ stp x17,x20,[sp,#48]
+
+.Loop_tail:
+ ldrb w10,[x1,x2]
+ ldrb w11,[x4,x2]
+ add x2,x2,#1
+ eor w10,w10,w11
+ strb w10,[x0,x2]
+ cbnz x2,.Loop_tail
+
+ stp xzr,xzr,[sp,#0]
+ stp xzr,xzr,[sp,#16]
+ stp xzr,xzr,[sp,#32]
+ stp xzr,xzr,[sp,#48]
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+.size ChaCha20_ctr32,.-ChaCha20_ctr32
+
+.type ChaCha20_neon,%function
+.align 5
+ChaCha20_neon:
+ stp x29,x30,[sp,#-96]!
+ add x29,sp,#0
+
+ adr x5,.Lsigma
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+ cmp x2,#512
+ b.hs .L512_or_more_neon
+
+ sub sp,sp,#64
+
+ ldp x22,x23,[x5] // load sigma
+ ld1 {v24.4s},[x5],#16
+ ldp x24,x25,[x3] // load key
+ ldp x26,x27,[x3,#16]
+ ld1 {v25.4s,v26.4s},[x3]
+ ldp x28,x30,[x4] // load counter
+ ld1 {v27.4s},[x4]
+ ld1 {v31.4s},[x5]
+#ifdef __ARMEB__
+ rev64 v24.4s,v24.4s
+ ror x24,x24,#32
+ ror x25,x25,#32
+ ror x26,x26,#32
+ ror x27,x27,#32
+ ror x28,x28,#32
+ ror x30,x30,#32
+#endif
+ add v27.4s,v27.4s,v31.4s // += 1
+ add v28.4s,v27.4s,v31.4s
+ add v29.4s,v28.4s,v31.4s
+ shl v31.4s,v31.4s,#2 // 1 -> 4
+
+.Loop_outer_neon:
+ mov w5,w22 // unpack key block
+ lsr x6,x22,#32
+ mov v0.16b,v24.16b
+ mov w7,w23
+ lsr x8,x23,#32
+ mov v4.16b,v24.16b
+ mov w9,w24
+ lsr x10,x24,#32
+ mov v16.16b,v24.16b
+ mov w11,w25
+ mov v1.16b,v25.16b
+ lsr x12,x25,#32
+ mov v5.16b,v25.16b
+ mov w13,w26
+ mov v17.16b,v25.16b
+ lsr x14,x26,#32
+ mov v3.16b,v27.16b
+ mov w15,w27
+ mov v7.16b,v28.16b
+ lsr x16,x27,#32
+ mov v19.16b,v29.16b
+ mov w17,w28
+ mov v2.16b,v26.16b
+ lsr x19,x28,#32
+ mov v6.16b,v26.16b
+ mov w20,w30
+ mov v18.16b,v26.16b
+ lsr x21,x30,#32
+
+ mov x4,#10
+ subs x2,x2,#256
+.Loop_neon:
+ sub x4,x4,#1
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v16.4s,v16.4s,v17.4s
+ add w7,w7,w11
+ eor v3.16b,v3.16b,v0.16b
+ add w8,w8,w12
+ eor v7.16b,v7.16b,v4.16b
+ eor w17,w17,w5
+ eor v19.16b,v19.16b,v16.16b
+ eor w19,w19,w6
+ rev32 v3.8h,v3.8h
+ eor w20,w20,w7
+ rev32 v7.8h,v7.8h
+ eor w21,w21,w8
+ rev32 v19.8h,v19.8h
+ ror w17,w17,#16
+ add v2.4s,v2.4s,v3.4s
+ ror w19,w19,#16
+ add v6.4s,v6.4s,v7.4s
+ ror w20,w20,#16
+ add v18.4s,v18.4s,v19.4s
+ ror w21,w21,#16
+ eor v20.16b,v1.16b,v2.16b
+ add w13,w13,w17
+ eor v21.16b,v5.16b,v6.16b
+ add w14,w14,w19
+ eor v22.16b,v17.16b,v18.16b
+ add w15,w15,w20
+ ushr v1.4s,v20.4s,#20
+ add w16,w16,w21
+ ushr v5.4s,v21.4s,#20
+ eor w9,w9,w13
+ ushr v17.4s,v22.4s,#20
+ eor w10,w10,w14
+ sli v1.4s,v20.4s,#12
+ eor w11,w11,w15
+ sli v5.4s,v21.4s,#12
+ eor w12,w12,w16
+ sli v17.4s,v22.4s,#12
+ ror w9,w9,#20
+ add v0.4s,v0.4s,v1.4s
+ ror w10,w10,#20
+ add v4.4s,v4.4s,v5.4s
+ ror w11,w11,#20
+ add v16.4s,v16.4s,v17.4s
+ ror w12,w12,#20
+ eor v20.16b,v3.16b,v0.16b
+ add w5,w5,w9
+ eor v21.16b,v7.16b,v4.16b
+ add w6,w6,w10
+ eor v22.16b,v19.16b,v16.16b
+ add w7,w7,w11
+ ushr v3.4s,v20.4s,#24
+ add w8,w8,w12
+ ushr v7.4s,v21.4s,#24
+ eor w17,w17,w5
+ ushr v19.4s,v22.4s,#24
+ eor w19,w19,w6
+ sli v3.4s,v20.4s,#8
+ eor w20,w20,w7
+ sli v7.4s,v21.4s,#8
+ eor w21,w21,w8
+ sli v19.4s,v22.4s,#8
+ ror w17,w17,#24
+ add v2.4s,v2.4s,v3.4s
+ ror w19,w19,#24
+ add v6.4s,v6.4s,v7.4s
+ ror w20,w20,#24
+ add v18.4s,v18.4s,v19.4s
+ ror w21,w21,#24
+ eor v20.16b,v1.16b,v2.16b
+ add w13,w13,w17
+ eor v21.16b,v5.16b,v6.16b
+ add w14,w14,w19
+ eor v22.16b,v17.16b,v18.16b
+ add w15,w15,w20
+ ushr v1.4s,v20.4s,#25
+ add w16,w16,w21
+ ushr v5.4s,v21.4s,#25
+ eor w9,w9,w13
+ ushr v17.4s,v22.4s,#25
+ eor w10,w10,w14
+ sli v1.4s,v20.4s,#7
+ eor w11,w11,w15
+ sli v5.4s,v21.4s,#7
+ eor w12,w12,w16
+ sli v17.4s,v22.4s,#7
+ ror w9,w9,#25
+ ext v2.16b,v2.16b,v2.16b,#8
+ ror w10,w10,#25
+ ext v6.16b,v6.16b,v6.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v3.16b,v3.16b,v3.16b,#12
+ ext v7.16b,v7.16b,v7.16b,#12
+ ext v19.16b,v19.16b,v19.16b,#12
+ ext v1.16b,v1.16b,v1.16b,#4
+ ext v5.16b,v5.16b,v5.16b,#4
+ ext v17.16b,v17.16b,v17.16b,#4
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w10
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w11
+ add v16.4s,v16.4s,v17.4s
+ add w7,w7,w12
+ eor v3.16b,v3.16b,v0.16b
+ add w8,w8,w9
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w5
+ eor v19.16b,v19.16b,v16.16b
+ eor w17,w17,w6
+ rev32 v3.8h,v3.8h
+ eor w19,w19,w7
+ rev32 v7.8h,v7.8h
+ eor w20,w20,w8
+ rev32 v19.8h,v19.8h
+ ror w21,w21,#16
+ add v2.4s,v2.4s,v3.4s
+ ror w17,w17,#16
+ add v6.4s,v6.4s,v7.4s
+ ror w19,w19,#16
+ add v18.4s,v18.4s,v19.4s
+ ror w20,w20,#16
+ eor v20.16b,v1.16b,v2.16b
+ add w15,w15,w21
+ eor v21.16b,v5.16b,v6.16b
+ add w16,w16,w17
+ eor v22.16b,v17.16b,v18.16b
+ add w13,w13,w19
+ ushr v1.4s,v20.4s,#20
+ add w14,w14,w20
+ ushr v5.4s,v21.4s,#20
+ eor w10,w10,w15
+ ushr v17.4s,v22.4s,#20
+ eor w11,w11,w16
+ sli v1.4s,v20.4s,#12
+ eor w12,w12,w13
+ sli v5.4s,v21.4s,#12
+ eor w9,w9,w14
+ sli v17.4s,v22.4s,#12
+ ror w10,w10,#20
+ add v0.4s,v0.4s,v1.4s
+ ror w11,w11,#20
+ add v4.4s,v4.4s,v5.4s
+ ror w12,w12,#20
+ add v16.4s,v16.4s,v17.4s
+ ror w9,w9,#20
+ eor v20.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v21.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v22.16b,v19.16b,v16.16b
+ add w7,w7,w12
+ ushr v3.4s,v20.4s,#24
+ add w8,w8,w9
+ ushr v7.4s,v21.4s,#24
+ eor w21,w21,w5
+ ushr v19.4s,v22.4s,#24
+ eor w17,w17,w6
+ sli v3.4s,v20.4s,#8
+ eor w19,w19,w7
+ sli v7.4s,v21.4s,#8
+ eor w20,w20,w8
+ sli v19.4s,v22.4s,#8
+ ror w21,w21,#24
+ add v2.4s,v2.4s,v3.4s
+ ror w17,w17,#24
+ add v6.4s,v6.4s,v7.4s
+ ror w19,w19,#24
+ add v18.4s,v18.4s,v19.4s
+ ror w20,w20,#24
+ eor v20.16b,v1.16b,v2.16b
+ add w15,w15,w21
+ eor v21.16b,v5.16b,v6.16b
+ add w16,w16,w17
+ eor v22.16b,v17.16b,v18.16b
+ add w13,w13,w19
+ ushr v1.4s,v20.4s,#25
+ add w14,w14,w20
+ ushr v5.4s,v21.4s,#25
+ eor w10,w10,w15
+ ushr v17.4s,v22.4s,#25
+ eor w11,w11,w16
+ sli v1.4s,v20.4s,#7
+ eor w12,w12,w13
+ sli v5.4s,v21.4s,#7
+ eor w9,w9,w14
+ sli v17.4s,v22.4s,#7
+ ror w10,w10,#25
+ ext v2.16b,v2.16b,v2.16b,#8
+ ror w11,w11,#25
+ ext v6.16b,v6.16b,v6.16b,#8
+ ror w12,w12,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#4
+ ext v7.16b,v7.16b,v7.16b,#4
+ ext v19.16b,v19.16b,v19.16b,#4
+ ext v1.16b,v1.16b,v1.16b,#12
+ ext v5.16b,v5.16b,v5.16b,#12
+ ext v17.16b,v17.16b,v17.16b,#12
+ cbnz x4,.Loop_neon
+
+ add w5,w5,w22 // accumulate key block
+ add v0.4s,v0.4s,v24.4s
+ add x6,x6,x22,lsr#32
+ add v4.4s,v4.4s,v24.4s
+ add w7,w7,w23
+ add v16.4s,v16.4s,v24.4s
+ add x8,x8,x23,lsr#32
+ add v2.4s,v2.4s,v26.4s
+ add w9,w9,w24
+ add v6.4s,v6.4s,v26.4s
+ add x10,x10,x24,lsr#32
+ add v18.4s,v18.4s,v26.4s
+ add w11,w11,w25
+ add v3.4s,v3.4s,v27.4s
+ add x12,x12,x25,lsr#32
+ add w13,w13,w26
+ add v7.4s,v7.4s,v28.4s
+ add x14,x14,x26,lsr#32
+ add w15,w15,w27
+ add v19.4s,v19.4s,v29.4s
+ add x16,x16,x27,lsr#32
+ add w17,w17,w28
+ add v1.4s,v1.4s,v25.4s
+ add x19,x19,x28,lsr#32
+ add w20,w20,w30
+ add v5.4s,v5.4s,v25.4s
+ add x21,x21,x30,lsr#32
+ add v17.4s,v17.4s,v25.4s
+
+ b.lo .Ltail_neon
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor v0.16b,v0.16b,v20.16b
+ eor x15,x15,x16
+ eor v1.16b,v1.16b,v21.16b
+ eor x17,x17,x19
+ eor v2.16b,v2.16b,v22.16b
+ eor x20,x20,x21
+ eor v3.16b,v3.16b,v23.16b
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#4 // increment counter
+ stp x9,x11,[x0,#16]
+ add v27.4s,v27.4s,v31.4s // += 4
+ stp x13,x15,[x0,#32]
+ add v28.4s,v28.4s,v31.4s
+ stp x17,x20,[x0,#48]
+ add v29.4s,v29.4s,v31.4s
+ add x0,x0,#64
+
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+ ld1 {v0.16b,v1.16b,v2.16b,v3.16b},[x1],#64
+
+ eor v4.16b,v4.16b,v20.16b
+ eor v5.16b,v5.16b,v21.16b
+ eor v6.16b,v6.16b,v22.16b
+ eor v7.16b,v7.16b,v23.16b
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+
+ eor v16.16b,v16.16b,v0.16b
+ eor v17.16b,v17.16b,v1.16b
+ eor v18.16b,v18.16b,v2.16b
+ eor v19.16b,v19.16b,v3.16b
+ st1 {v16.16b,v17.16b,v18.16b,v19.16b},[x0],#64
+
+ b.hi .Loop_outer_neon
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+
+.Ltail_neon:
+ add x2,x2,#256
+ cmp x2,#64
+ b.lo .Less_than_64
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor x15,x15,x16
+ eor x17,x17,x19
+ eor x20,x20,x21
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#4 // increment counter
+ stp x9,x11,[x0,#16]
+ stp x13,x15,[x0,#32]
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+ b.eq .Ldone_neon
+ sub x2,x2,#64
+ cmp x2,#64
+ b.lo .Less_than_128
+
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+ eor v0.16b,v0.16b,v20.16b
+ eor v1.16b,v1.16b,v21.16b
+ eor v2.16b,v2.16b,v22.16b
+ eor v3.16b,v3.16b,v23.16b
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+ b.eq .Ldone_neon
+ sub x2,x2,#64
+ cmp x2,#64
+ b.lo .Less_than_192
+
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+ eor v4.16b,v4.16b,v20.16b
+ eor v5.16b,v5.16b,v21.16b
+ eor v6.16b,v6.16b,v22.16b
+ eor v7.16b,v7.16b,v23.16b
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+ b.eq .Ldone_neon
+ sub x2,x2,#64
+
+ st1 {v16.16b,v17.16b,v18.16b,v19.16b},[sp]
+ b .Last_neon
+
+.Less_than_128:
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[sp]
+ b .Last_neon
+.Less_than_192:
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[sp]
+ b .Last_neon
+
+.align 4
+.Last_neon:
+ sub x0,x0,#1
+ add x1,x1,x2
+ add x0,x0,x2
+ add x4,sp,x2
+ neg x2,x2
+
+.Loop_tail_neon:
+ ldrb w10,[x1,x2]
+ ldrb w11,[x4,x2]
+ add x2,x2,#1
+ eor w10,w10,w11
+ strb w10,[x0,x2]
+ cbnz x2,.Loop_tail_neon
+
+ stp xzr,xzr,[sp,#0]
+ stp xzr,xzr,[sp,#16]
+ stp xzr,xzr,[sp,#32]
+ stp xzr,xzr,[sp,#48]
+
+.Ldone_neon:
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+.size ChaCha20_neon,.-ChaCha20_neon
+.type ChaCha20_512_neon,%function
+.align 5
+ChaCha20_512_neon:
+ stp x29,x30,[sp,#-96]!
+ add x29,sp,#0
+
+ adr x5,.Lsigma
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+
+.L512_or_more_neon:
+ sub sp,sp,#128+64
+
+ ldp x22,x23,[x5] // load sigma
+ ld1 {v24.4s},[x5],#16
+ ldp x24,x25,[x3] // load key
+ ldp x26,x27,[x3,#16]
+ ld1 {v25.4s,v26.4s},[x3]
+ ldp x28,x30,[x4] // load counter
+ ld1 {v27.4s},[x4]
+ ld1 {v31.4s},[x5]
+#ifdef __ARMEB__
+ rev64 v24.4s,v24.4s
+ ror x24,x24,#32
+ ror x25,x25,#32
+ ror x26,x26,#32
+ ror x27,x27,#32
+ ror x28,x28,#32
+ ror x30,x30,#32
+#endif
+ add v27.4s,v27.4s,v31.4s // += 1
+ stp q24,q25,[sp,#0] // off-load key block, invariant part
+ add v27.4s,v27.4s,v31.4s // not typo
+ str q26,[sp,#32]
+ add v28.4s,v27.4s,v31.4s
+ add v29.4s,v28.4s,v31.4s
+ add v30.4s,v29.4s,v31.4s
+ shl v31.4s,v31.4s,#2 // 1 -> 4
+
+ stp d8,d9,[sp,#128+0] // meet ABI requirements
+ stp d10,d11,[sp,#128+16]
+ stp d12,d13,[sp,#128+32]
+ stp d14,d15,[sp,#128+48]
+
+ sub x2,x2,#512 // not typo
+
+.Loop_outer_512_neon:
+ mov v0.16b,v24.16b
+ mov v4.16b,v24.16b
+ mov v8.16b,v24.16b
+ mov v12.16b,v24.16b
+ mov v16.16b,v24.16b
+ mov v20.16b,v24.16b
+ mov v1.16b,v25.16b
+ mov w5,w22 // unpack key block
+ mov v5.16b,v25.16b
+ lsr x6,x22,#32
+ mov v9.16b,v25.16b
+ mov w7,w23
+ mov v13.16b,v25.16b
+ lsr x8,x23,#32
+ mov v17.16b,v25.16b
+ mov w9,w24
+ mov v21.16b,v25.16b
+ lsr x10,x24,#32
+ mov v3.16b,v27.16b
+ mov w11,w25
+ mov v7.16b,v28.16b
+ lsr x12,x25,#32
+ mov v11.16b,v29.16b
+ mov w13,w26
+ mov v15.16b,v30.16b
+ lsr x14,x26,#32
+ mov v2.16b,v26.16b
+ mov w15,w27
+ mov v6.16b,v26.16b
+ lsr x16,x27,#32
+ add v19.4s,v3.4s,v31.4s // +4
+ mov w17,w28
+ add v23.4s,v7.4s,v31.4s // +4
+ lsr x19,x28,#32
+ mov v10.16b,v26.16b
+ mov w20,w30
+ mov v14.16b,v26.16b
+ lsr x21,x30,#32
+ mov v18.16b,v26.16b
+ stp q27,q28,[sp,#48] // off-load key block, variable part
+ mov v22.16b,v26.16b
+ str q29,[sp,#80]
+
+ mov x4,#5
+ subs x2,x2,#512
+.Loop_upper_neon:
+ sub x4,x4,#1
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#12
+ ext v7.16b,v7.16b,v7.16b,#12
+ ext v11.16b,v11.16b,v11.16b,#12
+ ext v15.16b,v15.16b,v15.16b,#12
+ ext v19.16b,v19.16b,v19.16b,#12
+ ext v23.16b,v23.16b,v23.16b,#12
+ ext v1.16b,v1.16b,v1.16b,#4
+ ext v5.16b,v5.16b,v5.16b,#4
+ ext v9.16b,v9.16b,v9.16b,#4
+ ext v13.16b,v13.16b,v13.16b,#4
+ ext v17.16b,v17.16b,v17.16b,#4
+ ext v21.16b,v21.16b,v21.16b,#4
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#4
+ ext v7.16b,v7.16b,v7.16b,#4
+ ext v11.16b,v11.16b,v11.16b,#4
+ ext v15.16b,v15.16b,v15.16b,#4
+ ext v19.16b,v19.16b,v19.16b,#4
+ ext v23.16b,v23.16b,v23.16b,#4
+ ext v1.16b,v1.16b,v1.16b,#12
+ ext v5.16b,v5.16b,v5.16b,#12
+ ext v9.16b,v9.16b,v9.16b,#12
+ ext v13.16b,v13.16b,v13.16b,#12
+ ext v17.16b,v17.16b,v17.16b,#12
+ ext v21.16b,v21.16b,v21.16b,#12
+ cbnz x4,.Loop_upper_neon
+
+ add w5,w5,w22 // accumulate key block
+ add x6,x6,x22,lsr#32
+ add w7,w7,w23
+ add x8,x8,x23,lsr#32
+ add w9,w9,w24
+ add x10,x10,x24,lsr#32
+ add w11,w11,w25
+ add x12,x12,x25,lsr#32
+ add w13,w13,w26
+ add x14,x14,x26,lsr#32
+ add w15,w15,w27
+ add x16,x16,x27,lsr#32
+ add w17,w17,w28
+ add x19,x19,x28,lsr#32
+ add w20,w20,w30
+ add x21,x21,x30,lsr#32
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor x15,x15,x16
+ eor x17,x17,x19
+ eor x20,x20,x21
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#1 // increment counter
+ mov w5,w22 // unpack key block
+ lsr x6,x22,#32
+ stp x9,x11,[x0,#16]
+ mov w7,w23
+ lsr x8,x23,#32
+ stp x13,x15,[x0,#32]
+ mov w9,w24
+ lsr x10,x24,#32
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+ mov w11,w25
+ lsr x12,x25,#32
+ mov w13,w26
+ lsr x14,x26,#32
+ mov w15,w27
+ lsr x16,x27,#32
+ mov w17,w28
+ lsr x19,x28,#32
+ mov w20,w30
+ lsr x21,x30,#32
+
+ mov x4,#5
+.Loop_lower_neon:
+ sub x4,x4,#1
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#12
+ ext v7.16b,v7.16b,v7.16b,#12
+ ext v11.16b,v11.16b,v11.16b,#12
+ ext v15.16b,v15.16b,v15.16b,#12
+ ext v19.16b,v19.16b,v19.16b,#12
+ ext v23.16b,v23.16b,v23.16b,#12
+ ext v1.16b,v1.16b,v1.16b,#4
+ ext v5.16b,v5.16b,v5.16b,#4
+ ext v9.16b,v9.16b,v9.16b,#4
+ ext v13.16b,v13.16b,v13.16b,#4
+ ext v17.16b,v17.16b,v17.16b,#4
+ ext v21.16b,v21.16b,v21.16b,#4
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#4
+ ext v7.16b,v7.16b,v7.16b,#4
+ ext v11.16b,v11.16b,v11.16b,#4
+ ext v15.16b,v15.16b,v15.16b,#4
+ ext v19.16b,v19.16b,v19.16b,#4
+ ext v23.16b,v23.16b,v23.16b,#4
+ ext v1.16b,v1.16b,v1.16b,#12
+ ext v5.16b,v5.16b,v5.16b,#12
+ ext v9.16b,v9.16b,v9.16b,#12
+ ext v13.16b,v13.16b,v13.16b,#12
+ ext v17.16b,v17.16b,v17.16b,#12
+ ext v21.16b,v21.16b,v21.16b,#12
+ cbnz x4,.Loop_lower_neon
+
+ add w5,w5,w22 // accumulate key block
+ ldp q24,q25,[sp,#0]
+ add x6,x6,x22,lsr#32
+ ldp q26,q27,[sp,#32]
+ add w7,w7,w23
+ ldp q28,q29,[sp,#64]
+ add x8,x8,x23,lsr#32
+ add v0.4s,v0.4s,v24.4s
+ add w9,w9,w24
+ add v4.4s,v4.4s,v24.4s
+ add x10,x10,x24,lsr#32
+ add v8.4s,v8.4s,v24.4s
+ add w11,w11,w25
+ add v12.4s,v12.4s,v24.4s
+ add x12,x12,x25,lsr#32
+ add v16.4s,v16.4s,v24.4s
+ add w13,w13,w26
+ add v20.4s,v20.4s,v24.4s
+ add x14,x14,x26,lsr#32
+ add v2.4s,v2.4s,v26.4s
+ add w15,w15,w27
+ add v6.4s,v6.4s,v26.4s
+ add x16,x16,x27,lsr#32
+ add v10.4s,v10.4s,v26.4s
+ add w17,w17,w28
+ add v14.4s,v14.4s,v26.4s
+ add x19,x19,x28,lsr#32
+ add v18.4s,v18.4s,v26.4s
+ add w20,w20,w30
+ add v22.4s,v22.4s,v26.4s
+ add x21,x21,x30,lsr#32
+ add v19.4s,v19.4s,v31.4s // +4
+ add x5,x5,x6,lsl#32 // pack
+ add v23.4s,v23.4s,v31.4s // +4
+ add x7,x7,x8,lsl#32
+ add v3.4s,v3.4s,v27.4s
+ ldp x6,x8,[x1,#0] // load input
+ add v7.4s,v7.4s,v28.4s
+ add x9,x9,x10,lsl#32
+ add v11.4s,v11.4s,v29.4s
+ add x11,x11,x12,lsl#32
+ add v15.4s,v15.4s,v30.4s
+ ldp x10,x12,[x1,#16]
+ add v19.4s,v19.4s,v27.4s
+ add x13,x13,x14,lsl#32
+ add v23.4s,v23.4s,v28.4s
+ add x15,x15,x16,lsl#32
+ add v1.4s,v1.4s,v25.4s
+ ldp x14,x16,[x1,#32]
+ add v5.4s,v5.4s,v25.4s
+ add x17,x17,x19,lsl#32
+ add v9.4s,v9.4s,v25.4s
+ add x20,x20,x21,lsl#32
+ add v13.4s,v13.4s,v25.4s
+ ldp x19,x21,[x1,#48]
+ add v17.4s,v17.4s,v25.4s
+ add x1,x1,#64
+ add v21.4s,v21.4s,v25.4s
+
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ ld1 {v24.16b,v25.16b,v26.16b,v27.16b},[x1],#64
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor v0.16b,v0.16b,v24.16b
+ eor x15,x15,x16
+ eor v1.16b,v1.16b,v25.16b
+ eor x17,x17,x19
+ eor v2.16b,v2.16b,v26.16b
+ eor x20,x20,x21
+ eor v3.16b,v3.16b,v27.16b
+ ld1 {v24.16b,v25.16b,v26.16b,v27.16b},[x1],#64
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#7 // increment counter
+ stp x9,x11,[x0,#16]
+ stp x13,x15,[x0,#32]
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+
+ ld1 {v0.16b,v1.16b,v2.16b,v3.16b},[x1],#64
+ eor v4.16b,v4.16b,v24.16b
+ eor v5.16b,v5.16b,v25.16b
+ eor v6.16b,v6.16b,v26.16b
+ eor v7.16b,v7.16b,v27.16b
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+
+ ld1 {v4.16b,v5.16b,v6.16b,v7.16b},[x1],#64
+ eor v8.16b,v8.16b,v0.16b
+ ldp q24,q25,[sp,#0]
+ eor v9.16b,v9.16b,v1.16b
+ ldp q26,q27,[sp,#32]
+ eor v10.16b,v10.16b,v2.16b
+ eor v11.16b,v11.16b,v3.16b
+ st1 {v8.16b,v9.16b,v10.16b,v11.16b},[x0],#64
+
+ ld1 {v8.16b,v9.16b,v10.16b,v11.16b},[x1],#64
+ eor v12.16b,v12.16b,v4.16b
+ eor v13.16b,v13.16b,v5.16b
+ eor v14.16b,v14.16b,v6.16b
+ eor v15.16b,v15.16b,v7.16b
+ st1 {v12.16b,v13.16b,v14.16b,v15.16b},[x0],#64
+
+ ld1 {v12.16b,v13.16b,v14.16b,v15.16b},[x1],#64
+ eor v16.16b,v16.16b,v8.16b
+ eor v17.16b,v17.16b,v9.16b
+ eor v18.16b,v18.16b,v10.16b
+ eor v19.16b,v19.16b,v11.16b
+ st1 {v16.16b,v17.16b,v18.16b,v19.16b},[x0],#64
+
+ shl v0.4s,v31.4s,#1 // 4 -> 8
+ eor v20.16b,v20.16b,v12.16b
+ eor v21.16b,v21.16b,v13.16b
+ eor v22.16b,v22.16b,v14.16b
+ eor v23.16b,v23.16b,v15.16b
+ st1 {v20.16b,v21.16b,v22.16b,v23.16b},[x0],#64
+
+ add v27.4s,v27.4s,v0.4s // += 8
+ add v28.4s,v28.4s,v0.4s
+ add v29.4s,v29.4s,v0.4s
+ add v30.4s,v30.4s,v0.4s
+
+ b.hs .Loop_outer_512_neon
+
+ adds x2,x2,#512
+ ushr v0.4s,v31.4s,#2 // 4 -> 1
+
+ ldp d8,d9,[sp,#128+0] // meet ABI requirements
+ ldp d10,d11,[sp,#128+16]
+ ldp d12,d13,[sp,#128+32]
+ ldp d14,d15,[sp,#128+48]
+
+ stp q24,q31,[sp,#0] // wipe off-load area
+ stp q24,q31,[sp,#32]
+ stp q24,q31,[sp,#64]
+
+ b.eq .Ldone_512_neon
+
+ cmp x2,#192
+ sub v27.4s,v27.4s,v0.4s // -= 1
+ sub v28.4s,v28.4s,v0.4s
+ sub v29.4s,v29.4s,v0.4s
+ add sp,sp,#128
+ b.hs .Loop_outer_neon
+
+ eor v25.16b,v25.16b,v25.16b
+ eor v26.16b,v26.16b,v26.16b
+ eor v27.16b,v27.16b,v27.16b
+ eor v28.16b,v28.16b,v28.16b
+ eor v29.16b,v29.16b,v29.16b
+ eor v30.16b,v30.16b,v30.16b
+ b .Loop_outer
+
+.Ldone_512_neon:
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#128+64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+.size ChaCha20_512_neon,.-ChaCha20_512_neon
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 05/23] zinc: import Andy Polyakov's ChaCha20 ARM and ARM64 implementations
@ 2018-09-25 14:56 ` Jason A. Donenfeld
0 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-arm-kernel
These NEON and non-NEON implementations come from Andy Polyakov's
implementation, and are included here in raw form without modification,
so that subsequent commits that fix these up for the kernel can see how
it has changed. This awkward commit splitting has been requested for the
ARM[64] implementations in particular.
While this is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit 87cc649f30aaf69b351701875b9dac07c29ce8a2
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel at lists.infradead.org
---
lib/zinc/chacha20/chacha20-arm-cryptogams.S | 1440 ++++++++++++
lib/zinc/chacha20/chacha20-arm64-cryptogams.S | 1973 +++++++++++++++++
2 files changed, 3413 insertions(+)
create mode 100644 lib/zinc/chacha20/chacha20-arm-cryptogams.S
create mode 100644 lib/zinc/chacha20/chacha20-arm64-cryptogams.S
diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
new file mode 100644
index 000000000000..05a3a9e6e93f
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
@@ -0,0 +1,1440 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+#if defined(__thumb2__) || defined(__clang__)
+.syntax unified
+#endif
+#if defined(__thumb2__)
+.thumb
+#else
+.code 32
+#endif
+
+#if defined(__thumb2__) || defined(__clang__)
+#define ldrhsb ldrbhs
+#endif
+
+.align 5
+.Lsigma:
+.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
+.Lone:
+.long 1,0,0,0
+.Lrot8:
+.long 0x02010003,0x06050407
+#if __ARM_MAX_ARCH__>=7
+.LOPENSSL_armcap:
+.word OPENSSL_armcap_P-.LChaCha20_ctr32
+#else
+.word -1
+#endif
+
+.globl ChaCha20_ctr32
+.type ChaCha20_ctr32,%function
+.align 5
+ChaCha20_ctr32:
+.LChaCha20_ctr32:
+ ldr r12,[sp,#0] @ pull pointer to counter and nonce
+ stmdb sp!,{r0-r2,r4-r11,lr}
+#if __ARM_ARCH__<7 && !defined(__thumb2__)
+ sub r14,pc,#16 @ ChaCha20_ctr32
+#else
+ adr r14,.LChaCha20_ctr32
+#endif
+ cmp r2,#0 @ len==0?
+#ifdef __thumb2__
+ itt eq
+#endif
+ addeq sp,sp,#4*3
+ beq .Lno_data
+#if __ARM_MAX_ARCH__>=7
+ cmp r2,#192 @ test len
+ bls .Lshort
+ ldr r4,[r14,#-24]
+ ldr r4,[r14,r4]
+# ifdef __APPLE__
+ ldr r4,[r4]
+# endif
+ tst r4,#ARMV7_NEON
+ bne .LChaCha20_neon
+.Lshort:
+#endif
+ ldmia r12,{r4-r7} @ load counter and nonce
+ sub sp,sp,#4*(16) @ off-load area
+ sub r14,r14,#64 @ .Lsigma
+ stmdb sp!,{r4-r7} @ copy counter and nonce
+ ldmia r3,{r4-r11} @ load key
+ ldmia r14,{r0-r3} @ load sigma
+ stmdb sp!,{r4-r11} @ copy key
+ stmdb sp!,{r0-r3} @ copy sigma
+ str r10,[sp,#4*(16+10)] @ off-load "rx"
+ str r11,[sp,#4*(16+11)] @ off-load "rx"
+ b .Loop_outer_enter
+
+.align 4
+.Loop_outer:
+ ldmia sp,{r0-r9} @ load key material
+ str r11,[sp,#4*(32+2)] @ save len
+ str r12, [sp,#4*(32+1)] @ save inp
+ str r14, [sp,#4*(32+0)] @ save out
+.Loop_outer_enter:
+ ldr r11, [sp,#4*(15)]
+ mov r4,r4,ror#19 @ twist b[0..3]
+ ldr r12,[sp,#4*(12)] @ modulo-scheduled load
+ mov r5,r5,ror#19
+ ldr r10, [sp,#4*(13)]
+ mov r6,r6,ror#19
+ ldr r14,[sp,#4*(14)]
+ mov r7,r7,ror#19
+ mov r11,r11,ror#8 @ twist d[0..3]
+ mov r12,r12,ror#8
+ mov r10,r10,ror#8
+ mov r14,r14,ror#8
+ str r11, [sp,#4*(16+15)]
+ mov r11,#10
+ b .Loop
+
+.align 4
+.Loop:
+ subs r11,r11,#1
+ add r0,r0,r4,ror#13
+ add r1,r1,r5,ror#13
+ eor r12,r0,r12,ror#24
+ eor r10,r1,r10,ror#24
+ add r8,r8,r12,ror#16
+ add r9,r9,r10,ror#16
+ eor r4,r8,r4,ror#13
+ eor r5,r9,r5,ror#13
+ add r0,r0,r4,ror#20
+ add r1,r1,r5,ror#20
+ eor r12,r0,r12,ror#16
+ eor r10,r1,r10,ror#16
+ add r8,r8,r12,ror#24
+ str r10,[sp,#4*(16+13)]
+ add r9,r9,r10,ror#24
+ ldr r10,[sp,#4*(16+15)]
+ str r8,[sp,#4*(16+8)]
+ eor r4,r4,r8,ror#12
+ str r9,[sp,#4*(16+9)]
+ eor r5,r5,r9,ror#12
+ ldr r8,[sp,#4*(16+10)]
+ add r2,r2,r6,ror#13
+ ldr r9,[sp,#4*(16+11)]
+ add r3,r3,r7,ror#13
+ eor r14,r2,r14,ror#24
+ eor r10,r3,r10,ror#24
+ add r8,r8,r14,ror#16
+ add r9,r9,r10,ror#16
+ eor r6,r8,r6,ror#13
+ eor r7,r9,r7,ror#13
+ add r2,r2,r6,ror#20
+ add r3,r3,r7,ror#20
+ eor r14,r2,r14,ror#16
+ eor r10,r3,r10,ror#16
+ add r8,r8,r14,ror#24
+ add r9,r9,r10,ror#24
+ eor r6,r6,r8,ror#12
+ eor r7,r7,r9,ror#12
+ add r0,r0,r5,ror#13
+ add r1,r1,r6,ror#13
+ eor r10,r0,r10,ror#24
+ eor r12,r1,r12,ror#24
+ add r8,r8,r10,ror#16
+ add r9,r9,r12,ror#16
+ eor r5,r8,r5,ror#13
+ eor r6,r9,r6,ror#13
+ add r0,r0,r5,ror#20
+ add r1,r1,r6,ror#20
+ eor r10,r0,r10,ror#16
+ eor r12,r1,r12,ror#16
+ str r10,[sp,#4*(16+15)]
+ add r8,r8,r10,ror#24
+ ldr r10,[sp,#4*(16+13)]
+ add r9,r9,r12,ror#24
+ str r8,[sp,#4*(16+10)]
+ eor r5,r5,r8,ror#12
+ str r9,[sp,#4*(16+11)]
+ eor r6,r6,r9,ror#12
+ ldr r8,[sp,#4*(16+8)]
+ add r2,r2,r7,ror#13
+ ldr r9,[sp,#4*(16+9)]
+ add r3,r3,r4,ror#13
+ eor r10,r2,r10,ror#24
+ eor r14,r3,r14,ror#24
+ add r8,r8,r10,ror#16
+ add r9,r9,r14,ror#16
+ eor r7,r8,r7,ror#13
+ eor r4,r9,r4,ror#13
+ add r2,r2,r7,ror#20
+ add r3,r3,r4,ror#20
+ eor r10,r2,r10,ror#16
+ eor r14,r3,r14,ror#16
+ add r8,r8,r10,ror#24
+ add r9,r9,r14,ror#24
+ eor r7,r7,r8,ror#12
+ eor r4,r4,r9,ror#12
+ bne .Loop
+
+ ldr r11,[sp,#4*(32+2)] @ load len
+
+ str r8, [sp,#4*(16+8)] @ modulo-scheduled store
+ str r9, [sp,#4*(16+9)]
+ str r12,[sp,#4*(16+12)]
+ str r10, [sp,#4*(16+13)]
+ str r14,[sp,#4*(16+14)]
+
+ @ at this point we have first half of 512-bit result in
+ @ rx and second half at sp+4*(16+8)
+
+ cmp r11,#64 @ done yet?
+#ifdef __thumb2__
+ itete lo
+#endif
+ addlo r12,sp,#4*(0) @ shortcut or ...
+ ldrhs r12,[sp,#4*(32+1)] @ ... load inp
+ addlo r14,sp,#4*(0) @ shortcut or ...
+ ldrhs r14,[sp,#4*(32+0)] @ ... load out
+
+ ldr r8,[sp,#4*(0)] @ load key material
+ ldr r9,[sp,#4*(1)]
+
+#if __ARM_ARCH__>=6 || !defined(__ARMEB__)
+# if __ARM_ARCH__<7
+ orr r10,r12,r14
+ tst r10,#3 @ are input and output aligned?
+ ldr r10,[sp,#4*(2)]
+ bne .Lunaligned
+ cmp r11,#64 @ restore flags
+# else
+ ldr r10,[sp,#4*(2)]
+# endif
+ ldr r11,[sp,#4*(3)]
+
+ add r0,r0,r8 @ accumulate key material
+ add r1,r1,r9
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+
+ add r2,r2,r10
+ add r3,r3,r11
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r0,r0,r8 @ xor with input
+ eorhs r1,r1,r9
+ add r8,sp,#4*(4)
+ str r0,[r14],#16 @ store output
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r2,r2,r10
+ eorhs r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r1,[r14,#-12]
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#13 @ accumulate key material
+ add r5,r9,r5,ror#13
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+ add r6,r10,r6,ror#13
+ add r7,r11,r7,ror#13
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r4,r4,r8
+ eorhs r5,r5,r9
+ add r8,sp,#4*(8)
+ str r4,[r14],#16 @ store output
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r6,r6,r10
+ eorhs r7,r7,r11
+ str r5,[r14,#-12]
+ ldmia r8,{r8-r11} @ load key material
+ str r6,[r14,#-8]
+ add r0,sp,#4*(16+8)
+ str r7,[r14,#-4]
+
+ ldmia r0,{r0-r7} @ load second half
+
+ add r0,r0,r8 @ accumulate key material
+ add r1,r1,r9
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+# ifdef __thumb2__
+ itt hi
+# endif
+ strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
+ strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
+ add r2,r2,r10
+ add r3,r3,r11
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r0,r0,r8
+ eorhs r1,r1,r9
+ add r8,sp,#4*(12)
+ str r0,[r14],#16 @ store output
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r2,r2,r10
+ eorhs r3,r3,r11
+ str r1,[r14,#-12]
+ ldmia r8,{r8-r11} @ load key material
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#24 @ accumulate key material
+ add r5,r9,r5,ror#24
+# ifdef __thumb2__
+ itt hi
+# endif
+ addhi r8,r8,#1 @ next counter value
+ strhi r8,[sp,#4*(12)] @ save next counter value
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r8,[r12],#16 @ load input
+ ldrhs r9,[r12,#-12]
+ add r6,r10,r6,ror#24
+ add r7,r11,r7,ror#24
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhs r10,[r12,#-8]
+ ldrhs r11,[r12,#-4]
+# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r4,r4,r8
+ eorhs r5,r5,r9
+# ifdef __thumb2__
+ it ne
+# endif
+ ldrne r8,[sp,#4*(32+2)] @ re-load len
+# ifdef __thumb2__
+ itt hs
+# endif
+ eorhs r6,r6,r10
+ eorhs r7,r7,r11
+ str r4,[r14],#16 @ store output
+ str r5,[r14,#-12]
+# ifdef __thumb2__
+ it hs
+# endif
+ subhs r11,r8,#64 @ len-=64
+ str r6,[r14,#-8]
+ str r7,[r14,#-4]
+ bhi .Loop_outer
+
+ beq .Ldone
+# if __ARM_ARCH__<7
+ b .Ltail
+
+.align 4
+.Lunaligned: @ unaligned endian-neutral path
+ cmp r11,#64 @ restore flags
+# endif
+#endif
+#if __ARM_ARCH__<7
+ ldr r11,[sp,#4*(3)]
+ add r0,r8,r0 @ accumulate key material
+ add r1,r9,r1
+ add r2,r10,r2
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r3,r11,r3
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r0,r8,r0 @ xor with input (or zero)
+ eor r1,r9,r1
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r2,r10,r2
+ strb r0,[r14],#16 @ store output
+ eor r3,r11,r3
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r1,[r14,#-12]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-8]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r3,[r14,#-4]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-15]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r1,[r14,#-11]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-7]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r3,[r14,#-3]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-14]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r1,[r14,#-10]
+ strb r2,[r14,#-6]
+ eor r0,r8,r0,lsr#8
+ strb r3,[r14,#-2]
+ eor r1,r9,r1,lsr#8
+ strb r0,[r14,#-13]
+ eor r2,r10,r2,lsr#8
+ strb r1,[r14,#-9]
+ eor r3,r11,r3,lsr#8
+ strb r2,[r14,#-5]
+ strb r3,[r14,#-1]
+ add r8,sp,#4*(4+0)
+ ldmia r8,{r8-r11} @ load key material
+ add r0,sp,#4*(16+8)
+ add r4,r8,r4,ror#13 @ accumulate key material
+ add r5,r9,r5,ror#13
+ add r6,r10,r6,ror#13
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r7,r11,r7,ror#13
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r4,r8,r4 @ xor with input (or zero)
+ eor r5,r9,r5
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r6,r10,r6
+ strb r4,[r14],#16 @ store output
+ eor r7,r11,r7
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r5,[r14,#-12]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-8]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r7,[r14,#-4]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-15]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r5,[r14,#-11]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-7]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r7,[r14,#-3]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-14]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r5,[r14,#-10]
+ strb r6,[r14,#-6]
+ eor r4,r8,r4,lsr#8
+ strb r7,[r14,#-2]
+ eor r5,r9,r5,lsr#8
+ strb r4,[r14,#-13]
+ eor r6,r10,r6,lsr#8
+ strb r5,[r14,#-9]
+ eor r7,r11,r7,lsr#8
+ strb r6,[r14,#-5]
+ strb r7,[r14,#-1]
+ add r8,sp,#4*(4+4)
+ ldmia r8,{r8-r11} @ load key material
+ ldmia r0,{r0-r7} @ load second half
+# ifdef __thumb2__
+ itt hi
+# endif
+ strhi r10,[sp,#4*(16+10)] @ copy "rx"
+ strhi r11,[sp,#4*(16+11)] @ copy "rx"
+ add r0,r8,r0 @ accumulate key material
+ add r1,r9,r1
+ add r2,r10,r2
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r3,r11,r3
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r0,r8,r0 @ xor with input (or zero)
+ eor r1,r9,r1
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r2,r10,r2
+ strb r0,[r14],#16 @ store output
+ eor r3,r11,r3
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r1,[r14,#-12]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-8]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r3,[r14,#-4]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-15]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r1,[r14,#-11]
+ eor r0,r8,r0,lsr#8
+ strb r2,[r14,#-7]
+ eor r1,r9,r1,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r3,[r14,#-3]
+ eor r2,r10,r2,lsr#8
+ strb r0,[r14,#-14]
+ eor r3,r11,r3,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r1,[r14,#-10]
+ strb r2,[r14,#-6]
+ eor r0,r8,r0,lsr#8
+ strb r3,[r14,#-2]
+ eor r1,r9,r1,lsr#8
+ strb r0,[r14,#-13]
+ eor r2,r10,r2,lsr#8
+ strb r1,[r14,#-9]
+ eor r3,r11,r3,lsr#8
+ strb r2,[r14,#-5]
+ strb r3,[r14,#-1]
+ add r8,sp,#4*(4+8)
+ ldmia r8,{r8-r11} @ load key material
+ add r4,r8,r4,ror#24 @ accumulate key material
+# ifdef __thumb2__
+ itt hi
+# endif
+ addhi r8,r8,#1 @ next counter value
+ strhi r8,[sp,#4*(12)] @ save next counter value
+ add r5,r9,r5,ror#24
+ add r6,r10,r6,ror#24
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r8,r8,r8 @ zero or ...
+ ldrhsb r8,[r12],#16 @ ... load input
+ eorlo r9,r9,r9
+ ldrhsb r9,[r12,#-12]
+
+ add r7,r11,r7,ror#24
+# ifdef __thumb2__
+ itete lo
+# endif
+ eorlo r10,r10,r10
+ ldrhsb r10,[r12,#-8]
+ eorlo r11,r11,r11
+ ldrhsb r11,[r12,#-4]
+
+ eor r4,r8,r4 @ xor with input (or zero)
+ eor r5,r9,r5
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-15] @ load more input
+ ldrhsb r9,[r12,#-11]
+ eor r6,r10,r6
+ strb r4,[r14],#16 @ store output
+ eor r7,r11,r7
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-7]
+ ldrhsb r11,[r12,#-3]
+ strb r5,[r14,#-12]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-8]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-14] @ load more input
+ ldrhsb r9,[r12,#-10]
+ strb r7,[r14,#-4]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-15]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-6]
+ ldrhsb r11,[r12,#-2]
+ strb r5,[r14,#-11]
+ eor r4,r8,r4,lsr#8
+ strb r6,[r14,#-7]
+ eor r5,r9,r5,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r8,[r12,#-13] @ load more input
+ ldrhsb r9,[r12,#-9]
+ strb r7,[r14,#-3]
+ eor r6,r10,r6,lsr#8
+ strb r4,[r14,#-14]
+ eor r7,r11,r7,lsr#8
+# ifdef __thumb2__
+ itt hs
+# endif
+ ldrhsb r10,[r12,#-5]
+ ldrhsb r11,[r12,#-1]
+ strb r5,[r14,#-10]
+ strb r6,[r14,#-6]
+ eor r4,r8,r4,lsr#8
+ strb r7,[r14,#-2]
+ eor r5,r9,r5,lsr#8
+ strb r4,[r14,#-13]
+ eor r6,r10,r6,lsr#8
+ strb r5,[r14,#-9]
+ eor r7,r11,r7,lsr#8
+ strb r6,[r14,#-5]
+ strb r7,[r14,#-1]
+# ifdef __thumb2__
+ it ne
+# endif
+ ldrne r8,[sp,#4*(32+2)] @ re-load len
+# ifdef __thumb2__
+ it hs
+# endif
+ subhs r11,r8,#64 @ len-=64
+ bhi .Loop_outer
+
+ beq .Ldone
+#endif
+
+.Ltail:
+ ldr r12,[sp,#4*(32+1)] @ load inp
+ add r9,sp,#4*(0)
+ ldr r14,[sp,#4*(32+0)] @ load out
+
+.Loop_tail:
+ ldrb r10,[r9],#1 @ read buffer on stack
+ ldrb r11,[r12],#1 @ read input
+ subs r8,r8,#1
+ eor r11,r11,r10
+ strb r11,[r14],#1 @ store output
+ bne .Loop_tail
+
+.Ldone:
+ add sp,sp,#4*(32+3)
+.Lno_data:
+ ldmia sp!,{r4-r11,pc}
+.size ChaCha20_ctr32,.-ChaCha20_ctr32
+#if __ARM_MAX_ARCH__>=7
+.arch armv7-a
+.fpu neon
+
+.type ChaCha20_neon,%function
+.align 5
+ChaCha20_neon:
+ ldr r12,[sp,#0] @ pull pointer to counter and nonce
+ stmdb sp!,{r0-r2,r4-r11,lr}
+.LChaCha20_neon:
+ adr r14,.Lsigma
+ vstmdb sp!,{d8-d15} @ ABI spec says so
+ stmdb sp!,{r0-r3}
+
+ vld1.32 {q1-q2},[r3] @ load key
+ ldmia r3,{r4-r11} @ load key
+
+ sub sp,sp,#4*(16+16)
+ vld1.32 {q3},[r12] @ load counter and nonce
+ add r12,sp,#4*8
+ ldmia r14,{r0-r3} @ load sigma
+ vld1.32 {q0},[r14]! @ load sigma
+ vld1.32 {q12},[r14]! @ one
+ @ vld1.32 {d30},[r14] @ rot8
+ vst1.32 {q2-q3},[r12] @ copy 1/2key|counter|nonce
+ vst1.32 {q0-q1},[sp] @ copy sigma|1/2key
+
+ str r10,[sp,#4*(16+10)] @ off-load "rx"
+ str r11,[sp,#4*(16+11)] @ off-load "rx"
+ vshl.i32 d26,d24,#1 @ two
+ vstr d24,[sp,#4*(16+0)]
+ vshl.i32 d28,d24,#2 @ four
+ vstr d26,[sp,#4*(16+2)]
+ vmov q4,q0
+ vstr d28,[sp,#4*(16+4)]
+ vmov q8,q0
+ @ vstr d30,[sp,#4*(16+6)]
+ vmov q5,q1
+ vmov q9,q1
+ b .Loop_neon_enter
+
+.align 4
+.Loop_neon_outer:
+ ldmia sp,{r0-r9} @ load key material
+ cmp r11,#64*2 @ if len<=64*2
+ bls .Lbreak_neon @ switch to integer-only
+ @ vldr d30,[sp,#4*(16+6)] @ rot8
+ vmov q4,q0
+ str r11,[sp,#4*(32+2)] @ save len
+ vmov q8,q0
+ str r12, [sp,#4*(32+1)] @ save inp
+ vmov q5,q1
+ str r14, [sp,#4*(32+0)] @ save out
+ vmov q9,q1
+.Loop_neon_enter:
+ ldr r11, [sp,#4*(15)]
+ mov r4,r4,ror#19 @ twist b[0..3]
+ vadd.i32 q7,q3,q12 @ counter+1
+ ldr r12,[sp,#4*(12)] @ modulo-scheduled load
+ mov r5,r5,ror#19
+ vmov q6,q2
+ ldr r10, [sp,#4*(13)]
+ mov r6,r6,ror#19
+ vmov q10,q2
+ ldr r14,[sp,#4*(14)]
+ mov r7,r7,ror#19
+ vadd.i32 q11,q7,q12 @ counter+2
+ add r12,r12,#3 @ counter+3
+ mov r11,r11,ror#8 @ twist d[0..3]
+ mov r12,r12,ror#8
+ mov r10,r10,ror#8
+ mov r14,r14,ror#8
+ str r11, [sp,#4*(16+15)]
+ mov r11,#10
+ b .Loop_neon
+
+.align 4
+.Loop_neon:
+ subs r11,r11,#1
+ vadd.i32 q0,q0,q1
+ add r0,r0,r4,ror#13
+ vadd.i32 q4,q4,q5
+ add r1,r1,r5,ror#13
+ vadd.i32 q8,q8,q9
+ eor r12,r0,r12,ror#24
+ veor q3,q3,q0
+ eor r10,r1,r10,ror#24
+ veor q7,q7,q4
+ add r8,r8,r12,ror#16
+ veor q11,q11,q8
+ add r9,r9,r10,ror#16
+ vrev32.16 q3,q3
+ eor r4,r8,r4,ror#13
+ vrev32.16 q7,q7
+ eor r5,r9,r5,ror#13
+ vrev32.16 q11,q11
+ add r0,r0,r4,ror#20
+ vadd.i32 q2,q2,q3
+ add r1,r1,r5,ror#20
+ vadd.i32 q6,q6,q7
+ eor r12,r0,r12,ror#16
+ vadd.i32 q10,q10,q11
+ eor r10,r1,r10,ror#16
+ veor q12,q1,q2
+ add r8,r8,r12,ror#24
+ veor q13,q5,q6
+ str r10,[sp,#4*(16+13)]
+ veor q14,q9,q10
+ add r9,r9,r10,ror#24
+ vshr.u32 q1,q12,#20
+ ldr r10,[sp,#4*(16+15)]
+ vshr.u32 q5,q13,#20
+ str r8,[sp,#4*(16+8)]
+ vshr.u32 q9,q14,#20
+ eor r4,r4,r8,ror#12
+ vsli.32 q1,q12,#12
+ str r9,[sp,#4*(16+9)]
+ vsli.32 q5,q13,#12
+ eor r5,r5,r9,ror#12
+ vsli.32 q9,q14,#12
+ ldr r8,[sp,#4*(16+10)]
+ vadd.i32 q0,q0,q1
+ add r2,r2,r6,ror#13
+ vadd.i32 q4,q4,q5
+ ldr r9,[sp,#4*(16+11)]
+ vadd.i32 q8,q8,q9
+ add r3,r3,r7,ror#13
+ veor q12,q3,q0
+ eor r14,r2,r14,ror#24
+ veor q13,q7,q4
+ eor r10,r3,r10,ror#24
+ veor q14,q11,q8
+ add r8,r8,r14,ror#16
+ vshr.u32 q3,q12,#24
+ add r9,r9,r10,ror#16
+ vshr.u32 q7,q13,#24
+ eor r6,r8,r6,ror#13
+ vshr.u32 q11,q14,#24
+ eor r7,r9,r7,ror#13
+ vsli.32 q3,q12,#8
+ add r2,r2,r6,ror#20
+ vsli.32 q7,q13,#8
+ add r3,r3,r7,ror#20
+ vsli.32 q11,q14,#8
+ eor r14,r2,r14,ror#16
+ vadd.i32 q2,q2,q3
+ eor r10,r3,r10,ror#16
+ vadd.i32 q6,q6,q7
+ add r8,r8,r14,ror#24
+ vadd.i32 q10,q10,q11
+ add r9,r9,r10,ror#24
+ veor q12,q1,q2
+ eor r6,r6,r8,ror#12
+ veor q13,q5,q6
+ eor r7,r7,r9,ror#12
+ veor q14,q9,q10
+ vshr.u32 q1,q12,#25
+ vshr.u32 q5,q13,#25
+ vshr.u32 q9,q14,#25
+ vsli.32 q1,q12,#7
+ vsli.32 q5,q13,#7
+ vsli.32 q9,q14,#7
+ vext.8 q2,q2,q2,#8
+ vext.8 q6,q6,q6,#8
+ vext.8 q10,q10,q10,#8
+ vext.8 q1,q1,q1,#4
+ vext.8 q5,q5,q5,#4
+ vext.8 q9,q9,q9,#4
+ vext.8 q3,q3,q3,#12
+ vext.8 q7,q7,q7,#12
+ vext.8 q11,q11,q11,#12
+ vadd.i32 q0,q0,q1
+ add r0,r0,r5,ror#13
+ vadd.i32 q4,q4,q5
+ add r1,r1,r6,ror#13
+ vadd.i32 q8,q8,q9
+ eor r10,r0,r10,ror#24
+ veor q3,q3,q0
+ eor r12,r1,r12,ror#24
+ veor q7,q7,q4
+ add r8,r8,r10,ror#16
+ veor q11,q11,q8
+ add r9,r9,r12,ror#16
+ vrev32.16 q3,q3
+ eor r5,r8,r5,ror#13
+ vrev32.16 q7,q7
+ eor r6,r9,r6,ror#13
+ vrev32.16 q11,q11
+ add r0,r0,r5,ror#20
+ vadd.i32 q2,q2,q3
+ add r1,r1,r6,ror#20
+ vadd.i32 q6,q6,q7
+ eor r10,r0,r10,ror#16
+ vadd.i32 q10,q10,q11
+ eor r12,r1,r12,ror#16
+ veor q12,q1,q2
+ str r10,[sp,#4*(16+15)]
+ veor q13,q5,q6
+ add r8,r8,r10,ror#24
+ veor q14,q9,q10
+ ldr r10,[sp,#4*(16+13)]
+ vshr.u32 q1,q12,#20
+ add r9,r9,r12,ror#24
+ vshr.u32 q5,q13,#20
+ str r8,[sp,#4*(16+10)]
+ vshr.u32 q9,q14,#20
+ eor r5,r5,r8,ror#12
+ vsli.32 q1,q12,#12
+ str r9,[sp,#4*(16+11)]
+ vsli.32 q5,q13,#12
+ eor r6,r6,r9,ror#12
+ vsli.32 q9,q14,#12
+ ldr r8,[sp,#4*(16+8)]
+ vadd.i32 q0,q0,q1
+ add r2,r2,r7,ror#13
+ vadd.i32 q4,q4,q5
+ ldr r9,[sp,#4*(16+9)]
+ vadd.i32 q8,q8,q9
+ add r3,r3,r4,ror#13
+ veor q12,q3,q0
+ eor r10,r2,r10,ror#24
+ veor q13,q7,q4
+ eor r14,r3,r14,ror#24
+ veor q14,q11,q8
+ add r8,r8,r10,ror#16
+ vshr.u32 q3,q12,#24
+ add r9,r9,r14,ror#16
+ vshr.u32 q7,q13,#24
+ eor r7,r8,r7,ror#13
+ vshr.u32 q11,q14,#24
+ eor r4,r9,r4,ror#13
+ vsli.32 q3,q12,#8
+ add r2,r2,r7,ror#20
+ vsli.32 q7,q13,#8
+ add r3,r3,r4,ror#20
+ vsli.32 q11,q14,#8
+ eor r10,r2,r10,ror#16
+ vadd.i32 q2,q2,q3
+ eor r14,r3,r14,ror#16
+ vadd.i32 q6,q6,q7
+ add r8,r8,r10,ror#24
+ vadd.i32 q10,q10,q11
+ add r9,r9,r14,ror#24
+ veor q12,q1,q2
+ eor r7,r7,r8,ror#12
+ veor q13,q5,q6
+ eor r4,r4,r9,ror#12
+ veor q14,q9,q10
+ vshr.u32 q1,q12,#25
+ vshr.u32 q5,q13,#25
+ vshr.u32 q9,q14,#25
+ vsli.32 q1,q12,#7
+ vsli.32 q5,q13,#7
+ vsli.32 q9,q14,#7
+ vext.8 q2,q2,q2,#8
+ vext.8 q6,q6,q6,#8
+ vext.8 q10,q10,q10,#8
+ vext.8 q1,q1,q1,#12
+ vext.8 q5,q5,q5,#12
+ vext.8 q9,q9,q9,#12
+ vext.8 q3,q3,q3,#4
+ vext.8 q7,q7,q7,#4
+ vext.8 q11,q11,q11,#4
+ bne .Loop_neon
+
+ add r11,sp,#32
+ vld1.32 {q12-q13},[sp] @ load key material
+ vld1.32 {q14-q15},[r11]
+
+ ldr r11,[sp,#4*(32+2)] @ load len
+
+ str r8, [sp,#4*(16+8)] @ modulo-scheduled store
+ str r9, [sp,#4*(16+9)]
+ str r12,[sp,#4*(16+12)]
+ str r10, [sp,#4*(16+13)]
+ str r14,[sp,#4*(16+14)]
+
+ @ at this point we have first half of 512-bit result in
+ @ rx and second half at sp+4*(16+8)
+
+ ldr r12,[sp,#4*(32+1)] @ load inp
+ ldr r14,[sp,#4*(32+0)] @ load out
+
+ vadd.i32 q0,q0,q12 @ accumulate key material
+ vadd.i32 q4,q4,q12
+ vadd.i32 q8,q8,q12
+ vldr d24,[sp,#4*(16+0)] @ one
+
+ vadd.i32 q1,q1,q13
+ vadd.i32 q5,q5,q13
+ vadd.i32 q9,q9,q13
+ vldr d26,[sp,#4*(16+2)] @ two
+
+ vadd.i32 q2,q2,q14
+ vadd.i32 q6,q6,q14
+ vadd.i32 q10,q10,q14
+ vadd.i32 d14,d14,d24 @ counter+1
+ vadd.i32 d22,d22,d26 @ counter+2
+
+ vadd.i32 q3,q3,q15
+ vadd.i32 q7,q7,q15
+ vadd.i32 q11,q11,q15
+
+ cmp r11,#64*4
+ blo .Ltail_neon
+
+ vld1.8 {q12-q13},[r12]! @ load input
+ mov r11,sp
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12 @ xor with input
+ veor q1,q1,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q4,q4,q12
+ vst1.8 {q0-q1},[r14]! @ store output
+ veor q5,q5,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q6,q6,q14
+ vst1.8 {q2-q3},[r14]!
+ veor q7,q7,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q8,q8,q12
+ vld1.32 {q0-q1},[r11]! @ load for next iteration
+ veor d25,d25,d25
+ vldr d24,[sp,#4*(16+4)] @ four
+ veor q9,q9,q13
+ vld1.32 {q2-q3},[r11]
+ veor q10,q10,q14
+ vst1.8 {q4-q5},[r14]!
+ veor q11,q11,q15
+ vst1.8 {q6-q7},[r14]!
+
+ vadd.i32 d6,d6,d24 @ next counter value
+ vldr d24,[sp,#4*(16+0)] @ one
+
+ ldmia sp,{r8-r11} @ load key material
+ add r0,r0,r8 @ accumulate key material
+ ldr r8,[r12],#16 @ load input
+ vst1.8 {q8-q9},[r14]!
+ add r1,r1,r9
+ ldr r9,[r12,#-12]
+ vst1.8 {q10-q11},[r14]!
+ add r2,r2,r10
+ ldr r10,[r12,#-8]
+ add r3,r3,r11
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+ eor r0,r0,r8 @ xor with input
+ add r8,sp,#4*(4)
+ eor r1,r1,r9
+ str r0,[r14],#16 @ store output
+ eor r2,r2,r10
+ str r1,[r14,#-12]
+ eor r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#13 @ accumulate key material
+ ldr r8,[r12],#16 @ load input
+ add r5,r9,r5,ror#13
+ ldr r9,[r12,#-12]
+ add r6,r10,r6,ror#13
+ ldr r10,[r12,#-8]
+ add r7,r11,r7,ror#13
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ eor r4,r4,r8
+ add r8,sp,#4*(8)
+ eor r5,r5,r9
+ str r4,[r14],#16 @ store output
+ eor r6,r6,r10
+ str r5,[r14,#-12]
+ eor r7,r7,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r6,[r14,#-8]
+ add r0,sp,#4*(16+8)
+ str r7,[r14,#-4]
+
+ ldmia r0,{r0-r7} @ load second half
+
+ add r0,r0,r8 @ accumulate key material
+ ldr r8,[r12],#16 @ load input
+ add r1,r1,r9
+ ldr r9,[r12,#-12]
+# ifdef __thumb2__
+ it hi
+# endif
+ strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
+ add r2,r2,r10
+ ldr r10,[r12,#-8]
+# ifdef __thumb2__
+ it hi
+# endif
+ strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
+ add r3,r3,r11
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+ eor r0,r0,r8
+ add r8,sp,#4*(12)
+ eor r1,r1,r9
+ str r0,[r14],#16 @ store output
+ eor r2,r2,r10
+ str r1,[r14,#-12]
+ eor r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+ str r2,[r14,#-8]
+ str r3,[r14,#-4]
+
+ add r4,r8,r4,ror#24 @ accumulate key material
+ add r8,r8,#4 @ next counter value
+ add r5,r9,r5,ror#24
+ str r8,[sp,#4*(12)] @ save next counter value
+ ldr r8,[r12],#16 @ load input
+ add r6,r10,r6,ror#24
+ add r4,r4,#3 @ counter+3
+ ldr r9,[r12,#-12]
+ add r7,r11,r7,ror#24
+ ldr r10,[r12,#-8]
+ ldr r11,[r12,#-4]
+# ifdef __ARMEB__
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ eor r4,r4,r8
+# ifdef __thumb2__
+ it hi
+# endif
+ ldrhi r8,[sp,#4*(32+2)] @ re-load len
+ eor r5,r5,r9
+ eor r6,r6,r10
+ str r4,[r14],#16 @ store output
+ eor r7,r7,r11
+ str r5,[r14,#-12]
+ sub r11,r8,#64*4 @ len-=64*4
+ str r6,[r14,#-8]
+ str r7,[r14,#-4]
+ bhi .Loop_neon_outer
+
+ b .Ldone_neon
+
+.align 4
+.Lbreak_neon:
+ @ harmonize NEON and integer-only stack frames: load data
+ @ from NEON frame, but save to integer-only one; distance
+ @ between the two is 4*(32+4+16-32)=4*(20).
+
+ str r11, [sp,#4*(20+32+2)] @ save len
+ add r11,sp,#4*(32+4)
+ str r12, [sp,#4*(20+32+1)] @ save inp
+ str r14, [sp,#4*(20+32+0)] @ save out
+
+ ldr r12,[sp,#4*(16+10)]
+ ldr r14,[sp,#4*(16+11)]
+ vldmia r11,{d8-d15} @ fulfill ABI requirement
+ str r12,[sp,#4*(20+16+10)] @ copy "rx"
+ str r14,[sp,#4*(20+16+11)] @ copy "rx"
+
+ ldr r11, [sp,#4*(15)]
+ mov r4,r4,ror#19 @ twist b[0..3]
+ ldr r12,[sp,#4*(12)] @ modulo-scheduled load
+ mov r5,r5,ror#19
+ ldr r10, [sp,#4*(13)]
+ mov r6,r6,ror#19
+ ldr r14,[sp,#4*(14)]
+ mov r7,r7,ror#19
+ mov r11,r11,ror#8 @ twist d[0..3]
+ mov r12,r12,ror#8
+ mov r10,r10,ror#8
+ mov r14,r14,ror#8
+ str r11, [sp,#4*(20+16+15)]
+ add r11,sp,#4*(20)
+ vst1.32 {q0-q1},[r11]! @ copy key
+ add sp,sp,#4*(20) @ switch frame
+ vst1.32 {q2-q3},[r11]
+ mov r11,#10
+ b .Loop @ go integer-only
+
+.align 4
+.Ltail_neon:
+ cmp r11,#64*3
+ bhs .L192_or_more_neon
+ cmp r11,#64*2
+ bhs .L128_or_more_neon
+ cmp r11,#64*1
+ bhs .L64_or_more_neon
+
+ add r8,sp,#4*(8)
+ vst1.8 {q0-q1},[sp]
+ add r10,sp,#4*(0)
+ vst1.8 {q2-q3},[r8]
+ b .Loop_tail_neon
+
+.align 4
+.L64_or_more_neon:
+ vld1.8 {q12-q13},[r12]!
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12
+ veor q1,q1,q13
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vst1.8 {q0-q1},[r14]!
+ vst1.8 {q2-q3},[r14]!
+
+ beq .Ldone_neon
+
+ add r8,sp,#4*(8)
+ vst1.8 {q4-q5},[sp]
+ add r10,sp,#4*(0)
+ vst1.8 {q6-q7},[r8]
+ sub r11,r11,#64*1 @ len-=64*1
+ b .Loop_tail_neon
+
+.align 4
+.L128_or_more_neon:
+ vld1.8 {q12-q13},[r12]!
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12
+ veor q1,q1,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q4,q4,q12
+ veor q5,q5,q13
+ vst1.8 {q0-q1},[r14]!
+ veor q6,q6,q14
+ vst1.8 {q2-q3},[r14]!
+ veor q7,q7,q15
+ vst1.8 {q4-q5},[r14]!
+ vst1.8 {q6-q7},[r14]!
+
+ beq .Ldone_neon
+
+ add r8,sp,#4*(8)
+ vst1.8 {q8-q9},[sp]
+ add r10,sp,#4*(0)
+ vst1.8 {q10-q11},[r8]
+ sub r11,r11,#64*2 @ len-=64*2
+ b .Loop_tail_neon
+
+.align 4
+.L192_or_more_neon:
+ vld1.8 {q12-q13},[r12]!
+ vld1.8 {q14-q15},[r12]!
+ veor q0,q0,q12
+ veor q1,q1,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q2,q2,q14
+ veor q3,q3,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q4,q4,q12
+ veor q5,q5,q13
+ vld1.8 {q12-q13},[r12]!
+ veor q6,q6,q14
+ vst1.8 {q0-q1},[r14]!
+ veor q7,q7,q15
+ vld1.8 {q14-q15},[r12]!
+
+ veor q8,q8,q12
+ vst1.8 {q2-q3},[r14]!
+ veor q9,q9,q13
+ vst1.8 {q4-q5},[r14]!
+ veor q10,q10,q14
+ vst1.8 {q6-q7},[r14]!
+ veor q11,q11,q15
+ vst1.8 {q8-q9},[r14]!
+ vst1.8 {q10-q11},[r14]!
+
+ beq .Ldone_neon
+
+ ldmia sp,{r8-r11} @ load key material
+ add r0,r0,r8 @ accumulate key material
+ add r8,sp,#4*(4)
+ add r1,r1,r9
+ add r2,r2,r10
+ add r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+
+ add r4,r8,r4,ror#13 @ accumulate key material
+ add r8,sp,#4*(8)
+ add r5,r9,r5,ror#13
+ add r6,r10,r6,ror#13
+ add r7,r11,r7,ror#13
+ ldmia r8,{r8-r11} @ load key material
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ stmia sp,{r0-r7}
+ add r0,sp,#4*(16+8)
+
+ ldmia r0,{r0-r7} @ load second half
+
+ add r0,r0,r8 @ accumulate key material
+ add r8,sp,#4*(12)
+ add r1,r1,r9
+ add r2,r2,r10
+ add r3,r3,r11
+ ldmia r8,{r8-r11} @ load key material
+
+ add r4,r8,r4,ror#24 @ accumulate key material
+ add r8,sp,#4*(8)
+ add r5,r9,r5,ror#24
+ add r4,r4,#3 @ counter+3
+ add r6,r10,r6,ror#24
+ add r7,r11,r7,ror#24
+ ldr r11,[sp,#4*(32+2)] @ re-load len
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+ rev r7,r7
+# endif
+ stmia r8,{r0-r7}
+ add r10,sp,#4*(0)
+ sub r11,r11,#64*3 @ len-=64*3
+
+.Loop_tail_neon:
+ ldrb r8,[r10],#1 @ read buffer on stack
+ ldrb r9,[r12],#1 @ read input
+ subs r11,r11,#1
+ eor r8,r8,r9
+ strb r8,[r14],#1 @ store output
+ bne .Loop_tail_neon
+
+.Ldone_neon:
+ add sp,sp,#4*(32+4)
+ vldmia sp,{d8-d15}
+ add sp,sp,#4*(16+3)
+ ldmia sp!,{r4-r11,pc}
+.size ChaCha20_neon,.-ChaCha20_neon
+.comm OPENSSL_armcap_P,4,4
+#endif
diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
new file mode 100644
index 000000000000..4d029bfdad3a
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
@@ -0,0 +1,1973 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+
+
+
+.align 5
+.Lsigma:
+.quad 0x3320646e61707865,0x6b20657479622d32 // endian-neutral
+.Lone:
+.long 1,0,0,0
+.LOPENSSL_armcap_P:
+#ifdef __ILP32__
+.long OPENSSL_armcap_P-.
+#else
+.quad OPENSSL_armcap_P-.
+#endif
+.byte 67,104,97,67,104,97,50,48,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+.align 2
+
+.globl ChaCha20_ctr32
+.type ChaCha20_ctr32,%function
+.align 5
+ChaCha20_ctr32:
+ cbz x2,.Labort
+ adr x5,.LOPENSSL_armcap_P
+ cmp x2,#192
+ b.lo .Lshort
+#ifdef __ILP32__
+ ldrsw x6,[x5]
+#else
+ ldr x6,[x5]
+#endif
+ ldr w17,[x6,x5]
+ tst w17,#ARMV7_NEON
+ b.ne ChaCha20_neon
+
+.Lshort:
+ stp x29,x30,[sp,#-96]!
+ add x29,sp,#0
+
+ adr x5,.Lsigma
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+ sub sp,sp,#64
+
+ ldp x22,x23,[x5] // load sigma
+ ldp x24,x25,[x3] // load key
+ ldp x26,x27,[x3,#16]
+ ldp x28,x30,[x4] // load counter
+#ifdef __ARMEB__
+ ror x24,x24,#32
+ ror x25,x25,#32
+ ror x26,x26,#32
+ ror x27,x27,#32
+ ror x28,x28,#32
+ ror x30,x30,#32
+#endif
+
+.Loop_outer:
+ mov w5,w22 // unpack key block
+ lsr x6,x22,#32
+ mov w7,w23
+ lsr x8,x23,#32
+ mov w9,w24
+ lsr x10,x24,#32
+ mov w11,w25
+ lsr x12,x25,#32
+ mov w13,w26
+ lsr x14,x26,#32
+ mov w15,w27
+ lsr x16,x27,#32
+ mov w17,w28
+ lsr x19,x28,#32
+ mov w20,w30
+ lsr x21,x30,#32
+
+ mov x4,#10
+ subs x2,x2,#64
+.Loop:
+ sub x4,x4,#1
+ add w5,w5,w9
+ add w6,w6,w10
+ add w7,w7,w11
+ add w8,w8,w12
+ eor w17,w17,w5
+ eor w19,w19,w6
+ eor w20,w20,w7
+ eor w21,w21,w8
+ ror w17,w17,#16
+ ror w19,w19,#16
+ ror w20,w20,#16
+ ror w21,w21,#16
+ add w13,w13,w17
+ add w14,w14,w19
+ add w15,w15,w20
+ add w16,w16,w21
+ eor w9,w9,w13
+ eor w10,w10,w14
+ eor w11,w11,w15
+ eor w12,w12,w16
+ ror w9,w9,#20
+ ror w10,w10,#20
+ ror w11,w11,#20
+ ror w12,w12,#20
+ add w5,w5,w9
+ add w6,w6,w10
+ add w7,w7,w11
+ add w8,w8,w12
+ eor w17,w17,w5
+ eor w19,w19,w6
+ eor w20,w20,w7
+ eor w21,w21,w8
+ ror w17,w17,#24
+ ror w19,w19,#24
+ ror w20,w20,#24
+ ror w21,w21,#24
+ add w13,w13,w17
+ add w14,w14,w19
+ add w15,w15,w20
+ add w16,w16,w21
+ eor w9,w9,w13
+ eor w10,w10,w14
+ eor w11,w11,w15
+ eor w12,w12,w16
+ ror w9,w9,#25
+ ror w10,w10,#25
+ ror w11,w11,#25
+ ror w12,w12,#25
+ add w5,w5,w10
+ add w6,w6,w11
+ add w7,w7,w12
+ add w8,w8,w9
+ eor w21,w21,w5
+ eor w17,w17,w6
+ eor w19,w19,w7
+ eor w20,w20,w8
+ ror w21,w21,#16
+ ror w17,w17,#16
+ ror w19,w19,#16
+ ror w20,w20,#16
+ add w15,w15,w21
+ add w16,w16,w17
+ add w13,w13,w19
+ add w14,w14,w20
+ eor w10,w10,w15
+ eor w11,w11,w16
+ eor w12,w12,w13
+ eor w9,w9,w14
+ ror w10,w10,#20
+ ror w11,w11,#20
+ ror w12,w12,#20
+ ror w9,w9,#20
+ add w5,w5,w10
+ add w6,w6,w11
+ add w7,w7,w12
+ add w8,w8,w9
+ eor w21,w21,w5
+ eor w17,w17,w6
+ eor w19,w19,w7
+ eor w20,w20,w8
+ ror w21,w21,#24
+ ror w17,w17,#24
+ ror w19,w19,#24
+ ror w20,w20,#24
+ add w15,w15,w21
+ add w16,w16,w17
+ add w13,w13,w19
+ add w14,w14,w20
+ eor w10,w10,w15
+ eor w11,w11,w16
+ eor w12,w12,w13
+ eor w9,w9,w14
+ ror w10,w10,#25
+ ror w11,w11,#25
+ ror w12,w12,#25
+ ror w9,w9,#25
+ cbnz x4,.Loop
+
+ add w5,w5,w22 // accumulate key block
+ add x6,x6,x22,lsr#32
+ add w7,w7,w23
+ add x8,x8,x23,lsr#32
+ add w9,w9,w24
+ add x10,x10,x24,lsr#32
+ add w11,w11,w25
+ add x12,x12,x25,lsr#32
+ add w13,w13,w26
+ add x14,x14,x26,lsr#32
+ add w15,w15,w27
+ add x16,x16,x27,lsr#32
+ add w17,w17,w28
+ add x19,x19,x28,lsr#32
+ add w20,w20,w30
+ add x21,x21,x30,lsr#32
+
+ b.lo .Ltail
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor x15,x15,x16
+ eor x17,x17,x19
+ eor x20,x20,x21
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#1 // increment counter
+ stp x9,x11,[x0,#16]
+ stp x13,x15,[x0,#32]
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+
+ b.hi .Loop_outer
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+.Labort:
+ ret
+
+.align 4
+.Ltail:
+ add x2,x2,#64
+.Less_than_64:
+ sub x0,x0,#1
+ add x1,x1,x2
+ add x0,x0,x2
+ add x4,sp,x2
+ neg x2,x2
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ stp x5,x7,[sp,#0]
+ stp x9,x11,[sp,#16]
+ stp x13,x15,[sp,#32]
+ stp x17,x20,[sp,#48]
+
+.Loop_tail:
+ ldrb w10,[x1,x2]
+ ldrb w11,[x4,x2]
+ add x2,x2,#1
+ eor w10,w10,w11
+ strb w10,[x0,x2]
+ cbnz x2,.Loop_tail
+
+ stp xzr,xzr,[sp,#0]
+ stp xzr,xzr,[sp,#16]
+ stp xzr,xzr,[sp,#32]
+ stp xzr,xzr,[sp,#48]
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+.size ChaCha20_ctr32,.-ChaCha20_ctr32
+
+.type ChaCha20_neon,%function
+.align 5
+ChaCha20_neon:
+ stp x29,x30,[sp,#-96]!
+ add x29,sp,#0
+
+ adr x5,.Lsigma
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+ cmp x2,#512
+ b.hs .L512_or_more_neon
+
+ sub sp,sp,#64
+
+ ldp x22,x23,[x5] // load sigma
+ ld1 {v24.4s},[x5],#16
+ ldp x24,x25,[x3] // load key
+ ldp x26,x27,[x3,#16]
+ ld1 {v25.4s,v26.4s},[x3]
+ ldp x28,x30,[x4] // load counter
+ ld1 {v27.4s},[x4]
+ ld1 {v31.4s},[x5]
+#ifdef __ARMEB__
+ rev64 v24.4s,v24.4s
+ ror x24,x24,#32
+ ror x25,x25,#32
+ ror x26,x26,#32
+ ror x27,x27,#32
+ ror x28,x28,#32
+ ror x30,x30,#32
+#endif
+ add v27.4s,v27.4s,v31.4s // += 1
+ add v28.4s,v27.4s,v31.4s
+ add v29.4s,v28.4s,v31.4s
+ shl v31.4s,v31.4s,#2 // 1 -> 4
+
+.Loop_outer_neon:
+ mov w5,w22 // unpack key block
+ lsr x6,x22,#32
+ mov v0.16b,v24.16b
+ mov w7,w23
+ lsr x8,x23,#32
+ mov v4.16b,v24.16b
+ mov w9,w24
+ lsr x10,x24,#32
+ mov v16.16b,v24.16b
+ mov w11,w25
+ mov v1.16b,v25.16b
+ lsr x12,x25,#32
+ mov v5.16b,v25.16b
+ mov w13,w26
+ mov v17.16b,v25.16b
+ lsr x14,x26,#32
+ mov v3.16b,v27.16b
+ mov w15,w27
+ mov v7.16b,v28.16b
+ lsr x16,x27,#32
+ mov v19.16b,v29.16b
+ mov w17,w28
+ mov v2.16b,v26.16b
+ lsr x19,x28,#32
+ mov v6.16b,v26.16b
+ mov w20,w30
+ mov v18.16b,v26.16b
+ lsr x21,x30,#32
+
+ mov x4,#10
+ subs x2,x2,#256
+.Loop_neon:
+ sub x4,x4,#1
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v16.4s,v16.4s,v17.4s
+ add w7,w7,w11
+ eor v3.16b,v3.16b,v0.16b
+ add w8,w8,w12
+ eor v7.16b,v7.16b,v4.16b
+ eor w17,w17,w5
+ eor v19.16b,v19.16b,v16.16b
+ eor w19,w19,w6
+ rev32 v3.8h,v3.8h
+ eor w20,w20,w7
+ rev32 v7.8h,v7.8h
+ eor w21,w21,w8
+ rev32 v19.8h,v19.8h
+ ror w17,w17,#16
+ add v2.4s,v2.4s,v3.4s
+ ror w19,w19,#16
+ add v6.4s,v6.4s,v7.4s
+ ror w20,w20,#16
+ add v18.4s,v18.4s,v19.4s
+ ror w21,w21,#16
+ eor v20.16b,v1.16b,v2.16b
+ add w13,w13,w17
+ eor v21.16b,v5.16b,v6.16b
+ add w14,w14,w19
+ eor v22.16b,v17.16b,v18.16b
+ add w15,w15,w20
+ ushr v1.4s,v20.4s,#20
+ add w16,w16,w21
+ ushr v5.4s,v21.4s,#20
+ eor w9,w9,w13
+ ushr v17.4s,v22.4s,#20
+ eor w10,w10,w14
+ sli v1.4s,v20.4s,#12
+ eor w11,w11,w15
+ sli v5.4s,v21.4s,#12
+ eor w12,w12,w16
+ sli v17.4s,v22.4s,#12
+ ror w9,w9,#20
+ add v0.4s,v0.4s,v1.4s
+ ror w10,w10,#20
+ add v4.4s,v4.4s,v5.4s
+ ror w11,w11,#20
+ add v16.4s,v16.4s,v17.4s
+ ror w12,w12,#20
+ eor v20.16b,v3.16b,v0.16b
+ add w5,w5,w9
+ eor v21.16b,v7.16b,v4.16b
+ add w6,w6,w10
+ eor v22.16b,v19.16b,v16.16b
+ add w7,w7,w11
+ ushr v3.4s,v20.4s,#24
+ add w8,w8,w12
+ ushr v7.4s,v21.4s,#24
+ eor w17,w17,w5
+ ushr v19.4s,v22.4s,#24
+ eor w19,w19,w6
+ sli v3.4s,v20.4s,#8
+ eor w20,w20,w7
+ sli v7.4s,v21.4s,#8
+ eor w21,w21,w8
+ sli v19.4s,v22.4s,#8
+ ror w17,w17,#24
+ add v2.4s,v2.4s,v3.4s
+ ror w19,w19,#24
+ add v6.4s,v6.4s,v7.4s
+ ror w20,w20,#24
+ add v18.4s,v18.4s,v19.4s
+ ror w21,w21,#24
+ eor v20.16b,v1.16b,v2.16b
+ add w13,w13,w17
+ eor v21.16b,v5.16b,v6.16b
+ add w14,w14,w19
+ eor v22.16b,v17.16b,v18.16b
+ add w15,w15,w20
+ ushr v1.4s,v20.4s,#25
+ add w16,w16,w21
+ ushr v5.4s,v21.4s,#25
+ eor w9,w9,w13
+ ushr v17.4s,v22.4s,#25
+ eor w10,w10,w14
+ sli v1.4s,v20.4s,#7
+ eor w11,w11,w15
+ sli v5.4s,v21.4s,#7
+ eor w12,w12,w16
+ sli v17.4s,v22.4s,#7
+ ror w9,w9,#25
+ ext v2.16b,v2.16b,v2.16b,#8
+ ror w10,w10,#25
+ ext v6.16b,v6.16b,v6.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v3.16b,v3.16b,v3.16b,#12
+ ext v7.16b,v7.16b,v7.16b,#12
+ ext v19.16b,v19.16b,v19.16b,#12
+ ext v1.16b,v1.16b,v1.16b,#4
+ ext v5.16b,v5.16b,v5.16b,#4
+ ext v17.16b,v17.16b,v17.16b,#4
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w10
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w11
+ add v16.4s,v16.4s,v17.4s
+ add w7,w7,w12
+ eor v3.16b,v3.16b,v0.16b
+ add w8,w8,w9
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w5
+ eor v19.16b,v19.16b,v16.16b
+ eor w17,w17,w6
+ rev32 v3.8h,v3.8h
+ eor w19,w19,w7
+ rev32 v7.8h,v7.8h
+ eor w20,w20,w8
+ rev32 v19.8h,v19.8h
+ ror w21,w21,#16
+ add v2.4s,v2.4s,v3.4s
+ ror w17,w17,#16
+ add v6.4s,v6.4s,v7.4s
+ ror w19,w19,#16
+ add v18.4s,v18.4s,v19.4s
+ ror w20,w20,#16
+ eor v20.16b,v1.16b,v2.16b
+ add w15,w15,w21
+ eor v21.16b,v5.16b,v6.16b
+ add w16,w16,w17
+ eor v22.16b,v17.16b,v18.16b
+ add w13,w13,w19
+ ushr v1.4s,v20.4s,#20
+ add w14,w14,w20
+ ushr v5.4s,v21.4s,#20
+ eor w10,w10,w15
+ ushr v17.4s,v22.4s,#20
+ eor w11,w11,w16
+ sli v1.4s,v20.4s,#12
+ eor w12,w12,w13
+ sli v5.4s,v21.4s,#12
+ eor w9,w9,w14
+ sli v17.4s,v22.4s,#12
+ ror w10,w10,#20
+ add v0.4s,v0.4s,v1.4s
+ ror w11,w11,#20
+ add v4.4s,v4.4s,v5.4s
+ ror w12,w12,#20
+ add v16.4s,v16.4s,v17.4s
+ ror w9,w9,#20
+ eor v20.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v21.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v22.16b,v19.16b,v16.16b
+ add w7,w7,w12
+ ushr v3.4s,v20.4s,#24
+ add w8,w8,w9
+ ushr v7.4s,v21.4s,#24
+ eor w21,w21,w5
+ ushr v19.4s,v22.4s,#24
+ eor w17,w17,w6
+ sli v3.4s,v20.4s,#8
+ eor w19,w19,w7
+ sli v7.4s,v21.4s,#8
+ eor w20,w20,w8
+ sli v19.4s,v22.4s,#8
+ ror w21,w21,#24
+ add v2.4s,v2.4s,v3.4s
+ ror w17,w17,#24
+ add v6.4s,v6.4s,v7.4s
+ ror w19,w19,#24
+ add v18.4s,v18.4s,v19.4s
+ ror w20,w20,#24
+ eor v20.16b,v1.16b,v2.16b
+ add w15,w15,w21
+ eor v21.16b,v5.16b,v6.16b
+ add w16,w16,w17
+ eor v22.16b,v17.16b,v18.16b
+ add w13,w13,w19
+ ushr v1.4s,v20.4s,#25
+ add w14,w14,w20
+ ushr v5.4s,v21.4s,#25
+ eor w10,w10,w15
+ ushr v17.4s,v22.4s,#25
+ eor w11,w11,w16
+ sli v1.4s,v20.4s,#7
+ eor w12,w12,w13
+ sli v5.4s,v21.4s,#7
+ eor w9,w9,w14
+ sli v17.4s,v22.4s,#7
+ ror w10,w10,#25
+ ext v2.16b,v2.16b,v2.16b,#8
+ ror w11,w11,#25
+ ext v6.16b,v6.16b,v6.16b,#8
+ ror w12,w12,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#4
+ ext v7.16b,v7.16b,v7.16b,#4
+ ext v19.16b,v19.16b,v19.16b,#4
+ ext v1.16b,v1.16b,v1.16b,#12
+ ext v5.16b,v5.16b,v5.16b,#12
+ ext v17.16b,v17.16b,v17.16b,#12
+ cbnz x4,.Loop_neon
+
+ add w5,w5,w22 // accumulate key block
+ add v0.4s,v0.4s,v24.4s
+ add x6,x6,x22,lsr#32
+ add v4.4s,v4.4s,v24.4s
+ add w7,w7,w23
+ add v16.4s,v16.4s,v24.4s
+ add x8,x8,x23,lsr#32
+ add v2.4s,v2.4s,v26.4s
+ add w9,w9,w24
+ add v6.4s,v6.4s,v26.4s
+ add x10,x10,x24,lsr#32
+ add v18.4s,v18.4s,v26.4s
+ add w11,w11,w25
+ add v3.4s,v3.4s,v27.4s
+ add x12,x12,x25,lsr#32
+ add w13,w13,w26
+ add v7.4s,v7.4s,v28.4s
+ add x14,x14,x26,lsr#32
+ add w15,w15,w27
+ add v19.4s,v19.4s,v29.4s
+ add x16,x16,x27,lsr#32
+ add w17,w17,w28
+ add v1.4s,v1.4s,v25.4s
+ add x19,x19,x28,lsr#32
+ add w20,w20,w30
+ add v5.4s,v5.4s,v25.4s
+ add x21,x21,x30,lsr#32
+ add v17.4s,v17.4s,v25.4s
+
+ b.lo .Ltail_neon
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor v0.16b,v0.16b,v20.16b
+ eor x15,x15,x16
+ eor v1.16b,v1.16b,v21.16b
+ eor x17,x17,x19
+ eor v2.16b,v2.16b,v22.16b
+ eor x20,x20,x21
+ eor v3.16b,v3.16b,v23.16b
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#4 // increment counter
+ stp x9,x11,[x0,#16]
+ add v27.4s,v27.4s,v31.4s // += 4
+ stp x13,x15,[x0,#32]
+ add v28.4s,v28.4s,v31.4s
+ stp x17,x20,[x0,#48]
+ add v29.4s,v29.4s,v31.4s
+ add x0,x0,#64
+
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+ ld1 {v0.16b,v1.16b,v2.16b,v3.16b},[x1],#64
+
+ eor v4.16b,v4.16b,v20.16b
+ eor v5.16b,v5.16b,v21.16b
+ eor v6.16b,v6.16b,v22.16b
+ eor v7.16b,v7.16b,v23.16b
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+
+ eor v16.16b,v16.16b,v0.16b
+ eor v17.16b,v17.16b,v1.16b
+ eor v18.16b,v18.16b,v2.16b
+ eor v19.16b,v19.16b,v3.16b
+ st1 {v16.16b,v17.16b,v18.16b,v19.16b},[x0],#64
+
+ b.hi .Loop_outer_neon
+
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+
+.Ltail_neon:
+ add x2,x2,#256
+ cmp x2,#64
+ b.lo .Less_than_64
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor x15,x15,x16
+ eor x17,x17,x19
+ eor x20,x20,x21
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#4 // increment counter
+ stp x9,x11,[x0,#16]
+ stp x13,x15,[x0,#32]
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+ b.eq .Ldone_neon
+ sub x2,x2,#64
+ cmp x2,#64
+ b.lo .Less_than_128
+
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+ eor v0.16b,v0.16b,v20.16b
+ eor v1.16b,v1.16b,v21.16b
+ eor v2.16b,v2.16b,v22.16b
+ eor v3.16b,v3.16b,v23.16b
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+ b.eq .Ldone_neon
+ sub x2,x2,#64
+ cmp x2,#64
+ b.lo .Less_than_192
+
+ ld1 {v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+ eor v4.16b,v4.16b,v20.16b
+ eor v5.16b,v5.16b,v21.16b
+ eor v6.16b,v6.16b,v22.16b
+ eor v7.16b,v7.16b,v23.16b
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+ b.eq .Ldone_neon
+ sub x2,x2,#64
+
+ st1 {v16.16b,v17.16b,v18.16b,v19.16b},[sp]
+ b .Last_neon
+
+.Less_than_128:
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[sp]
+ b .Last_neon
+.Less_than_192:
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[sp]
+ b .Last_neon
+
+.align 4
+.Last_neon:
+ sub x0,x0,#1
+ add x1,x1,x2
+ add x0,x0,x2
+ add x4,sp,x2
+ neg x2,x2
+
+.Loop_tail_neon:
+ ldrb w10,[x1,x2]
+ ldrb w11,[x4,x2]
+ add x2,x2,#1
+ eor w10,w10,w11
+ strb w10,[x0,x2]
+ cbnz x2,.Loop_tail_neon
+
+ stp xzr,xzr,[sp,#0]
+ stp xzr,xzr,[sp,#16]
+ stp xzr,xzr,[sp,#32]
+ stp xzr,xzr,[sp,#48]
+
+.Ldone_neon:
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+.size ChaCha20_neon,.-ChaCha20_neon
+.type ChaCha20_512_neon,%function
+.align 5
+ChaCha20_512_neon:
+ stp x29,x30,[sp,#-96]!
+ add x29,sp,#0
+
+ adr x5,.Lsigma
+ stp x19,x20,[sp,#16]
+ stp x21,x22,[sp,#32]
+ stp x23,x24,[sp,#48]
+ stp x25,x26,[sp,#64]
+ stp x27,x28,[sp,#80]
+
+.L512_or_more_neon:
+ sub sp,sp,#128+64
+
+ ldp x22,x23,[x5] // load sigma
+ ld1 {v24.4s},[x5],#16
+ ldp x24,x25,[x3] // load key
+ ldp x26,x27,[x3,#16]
+ ld1 {v25.4s,v26.4s},[x3]
+ ldp x28,x30,[x4] // load counter
+ ld1 {v27.4s},[x4]
+ ld1 {v31.4s},[x5]
+#ifdef __ARMEB__
+ rev64 v24.4s,v24.4s
+ ror x24,x24,#32
+ ror x25,x25,#32
+ ror x26,x26,#32
+ ror x27,x27,#32
+ ror x28,x28,#32
+ ror x30,x30,#32
+#endif
+ add v27.4s,v27.4s,v31.4s // += 1
+ stp q24,q25,[sp,#0] // off-load key block, invariant part
+ add v27.4s,v27.4s,v31.4s // not typo
+ str q26,[sp,#32]
+ add v28.4s,v27.4s,v31.4s
+ add v29.4s,v28.4s,v31.4s
+ add v30.4s,v29.4s,v31.4s
+ shl v31.4s,v31.4s,#2 // 1 -> 4
+
+ stp d8,d9,[sp,#128+0] // meet ABI requirements
+ stp d10,d11,[sp,#128+16]
+ stp d12,d13,[sp,#128+32]
+ stp d14,d15,[sp,#128+48]
+
+ sub x2,x2,#512 // not typo
+
+.Loop_outer_512_neon:
+ mov v0.16b,v24.16b
+ mov v4.16b,v24.16b
+ mov v8.16b,v24.16b
+ mov v12.16b,v24.16b
+ mov v16.16b,v24.16b
+ mov v20.16b,v24.16b
+ mov v1.16b,v25.16b
+ mov w5,w22 // unpack key block
+ mov v5.16b,v25.16b
+ lsr x6,x22,#32
+ mov v9.16b,v25.16b
+ mov w7,w23
+ mov v13.16b,v25.16b
+ lsr x8,x23,#32
+ mov v17.16b,v25.16b
+ mov w9,w24
+ mov v21.16b,v25.16b
+ lsr x10,x24,#32
+ mov v3.16b,v27.16b
+ mov w11,w25
+ mov v7.16b,v28.16b
+ lsr x12,x25,#32
+ mov v11.16b,v29.16b
+ mov w13,w26
+ mov v15.16b,v30.16b
+ lsr x14,x26,#32
+ mov v2.16b,v26.16b
+ mov w15,w27
+ mov v6.16b,v26.16b
+ lsr x16,x27,#32
+ add v19.4s,v3.4s,v31.4s // +4
+ mov w17,w28
+ add v23.4s,v7.4s,v31.4s // +4
+ lsr x19,x28,#32
+ mov v10.16b,v26.16b
+ mov w20,w30
+ mov v14.16b,v26.16b
+ lsr x21,x30,#32
+ mov v18.16b,v26.16b
+ stp q27,q28,[sp,#48] // off-load key block, variable part
+ mov v22.16b,v26.16b
+ str q29,[sp,#80]
+
+ mov x4,#5
+ subs x2,x2,#512
+.Loop_upper_neon:
+ sub x4,x4,#1
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#12
+ ext v7.16b,v7.16b,v7.16b,#12
+ ext v11.16b,v11.16b,v11.16b,#12
+ ext v15.16b,v15.16b,v15.16b,#12
+ ext v19.16b,v19.16b,v19.16b,#12
+ ext v23.16b,v23.16b,v23.16b,#12
+ ext v1.16b,v1.16b,v1.16b,#4
+ ext v5.16b,v5.16b,v5.16b,#4
+ ext v9.16b,v9.16b,v9.16b,#4
+ ext v13.16b,v13.16b,v13.16b,#4
+ ext v17.16b,v17.16b,v17.16b,#4
+ ext v21.16b,v21.16b,v21.16b,#4
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#4
+ ext v7.16b,v7.16b,v7.16b,#4
+ ext v11.16b,v11.16b,v11.16b,#4
+ ext v15.16b,v15.16b,v15.16b,#4
+ ext v19.16b,v19.16b,v19.16b,#4
+ ext v23.16b,v23.16b,v23.16b,#4
+ ext v1.16b,v1.16b,v1.16b,#12
+ ext v5.16b,v5.16b,v5.16b,#12
+ ext v9.16b,v9.16b,v9.16b,#12
+ ext v13.16b,v13.16b,v13.16b,#12
+ ext v17.16b,v17.16b,v17.16b,#12
+ ext v21.16b,v21.16b,v21.16b,#12
+ cbnz x4,.Loop_upper_neon
+
+ add w5,w5,w22 // accumulate key block
+ add x6,x6,x22,lsr#32
+ add w7,w7,w23
+ add x8,x8,x23,lsr#32
+ add w9,w9,w24
+ add x10,x10,x24,lsr#32
+ add w11,w11,w25
+ add x12,x12,x25,lsr#32
+ add w13,w13,w26
+ add x14,x14,x26,lsr#32
+ add w15,w15,w27
+ add x16,x16,x27,lsr#32
+ add w17,w17,w28
+ add x19,x19,x28,lsr#32
+ add w20,w20,w30
+ add x21,x21,x30,lsr#32
+
+ add x5,x5,x6,lsl#32 // pack
+ add x7,x7,x8,lsl#32
+ ldp x6,x8,[x1,#0] // load input
+ add x9,x9,x10,lsl#32
+ add x11,x11,x12,lsl#32
+ ldp x10,x12,[x1,#16]
+ add x13,x13,x14,lsl#32
+ add x15,x15,x16,lsl#32
+ ldp x14,x16,[x1,#32]
+ add x17,x17,x19,lsl#32
+ add x20,x20,x21,lsl#32
+ ldp x19,x21,[x1,#48]
+ add x1,x1,#64
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor x15,x15,x16
+ eor x17,x17,x19
+ eor x20,x20,x21
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#1 // increment counter
+ mov w5,w22 // unpack key block
+ lsr x6,x22,#32
+ stp x9,x11,[x0,#16]
+ mov w7,w23
+ lsr x8,x23,#32
+ stp x13,x15,[x0,#32]
+ mov w9,w24
+ lsr x10,x24,#32
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+ mov w11,w25
+ lsr x12,x25,#32
+ mov w13,w26
+ lsr x14,x26,#32
+ mov w15,w27
+ lsr x16,x27,#32
+ mov w17,w28
+ lsr x19,x28,#32
+ mov w20,w30
+ lsr x21,x30,#32
+
+ mov x4,#5
+.Loop_lower_neon:
+ sub x4,x4,#1
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#12
+ ext v7.16b,v7.16b,v7.16b,#12
+ ext v11.16b,v11.16b,v11.16b,#12
+ ext v15.16b,v15.16b,v15.16b,#12
+ ext v19.16b,v19.16b,v19.16b,#12
+ ext v23.16b,v23.16b,v23.16b,#12
+ ext v1.16b,v1.16b,v1.16b,#4
+ ext v5.16b,v5.16b,v5.16b,#4
+ ext v9.16b,v9.16b,v9.16b,#4
+ ext v13.16b,v13.16b,v13.16b,#4
+ ext v17.16b,v17.16b,v17.16b,#4
+ ext v21.16b,v21.16b,v21.16b,#4
+ add v0.4s,v0.4s,v1.4s
+ add w5,w5,w9
+ add v4.4s,v4.4s,v5.4s
+ add w6,w6,w10
+ add v8.4s,v8.4s,v9.4s
+ add w7,w7,w11
+ add v12.4s,v12.4s,v13.4s
+ add w8,w8,w12
+ add v16.4s,v16.4s,v17.4s
+ eor w17,w17,w5
+ add v20.4s,v20.4s,v21.4s
+ eor w19,w19,w6
+ eor v3.16b,v3.16b,v0.16b
+ eor w20,w20,w7
+ eor v7.16b,v7.16b,v4.16b
+ eor w21,w21,w8
+ eor v11.16b,v11.16b,v8.16b
+ ror w17,w17,#16
+ eor v15.16b,v15.16b,v12.16b
+ ror w19,w19,#16
+ eor v19.16b,v19.16b,v16.16b
+ ror w20,w20,#16
+ eor v23.16b,v23.16b,v20.16b
+ ror w21,w21,#16
+ rev32 v3.8h,v3.8h
+ add w13,w13,w17
+ rev32 v7.8h,v7.8h
+ add w14,w14,w19
+ rev32 v11.8h,v11.8h
+ add w15,w15,w20
+ rev32 v15.8h,v15.8h
+ add w16,w16,w21
+ rev32 v19.8h,v19.8h
+ eor w9,w9,w13
+ rev32 v23.8h,v23.8h
+ eor w10,w10,w14
+ add v2.4s,v2.4s,v3.4s
+ eor w11,w11,w15
+ add v6.4s,v6.4s,v7.4s
+ eor w12,w12,w16
+ add v10.4s,v10.4s,v11.4s
+ ror w9,w9,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w10,w10,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w11,w11,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w12,w12,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w9
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w10
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w11
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w12
+ eor v28.16b,v17.16b,v18.16b
+ eor w17,w17,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w19,w19,w6
+ ushr v1.4s,v24.4s,#20
+ eor w20,w20,w7
+ ushr v5.4s,v25.4s,#20
+ eor w21,w21,w8
+ ushr v9.4s,v26.4s,#20
+ ror w17,w17,#24
+ ushr v13.4s,v27.4s,#20
+ ror w19,w19,#24
+ ushr v17.4s,v28.4s,#20
+ ror w20,w20,#24
+ ushr v21.4s,v29.4s,#20
+ ror w21,w21,#24
+ sli v1.4s,v24.4s,#12
+ add w13,w13,w17
+ sli v5.4s,v25.4s,#12
+ add w14,w14,w19
+ sli v9.4s,v26.4s,#12
+ add w15,w15,w20
+ sli v13.4s,v27.4s,#12
+ add w16,w16,w21
+ sli v17.4s,v28.4s,#12
+ eor w9,w9,w13
+ sli v21.4s,v29.4s,#12
+ eor w10,w10,w14
+ add v0.4s,v0.4s,v1.4s
+ eor w11,w11,w15
+ add v4.4s,v4.4s,v5.4s
+ eor w12,w12,w16
+ add v8.4s,v8.4s,v9.4s
+ ror w9,w9,#25
+ add v12.4s,v12.4s,v13.4s
+ ror w10,w10,#25
+ add v16.4s,v16.4s,v17.4s
+ ror w11,w11,#25
+ add v20.4s,v20.4s,v21.4s
+ ror w12,w12,#25
+ eor v24.16b,v3.16b,v0.16b
+ add w5,w5,w10
+ eor v25.16b,v7.16b,v4.16b
+ add w6,w6,w11
+ eor v26.16b,v11.16b,v8.16b
+ add w7,w7,w12
+ eor v27.16b,v15.16b,v12.16b
+ add w8,w8,w9
+ eor v28.16b,v19.16b,v16.16b
+ eor w21,w21,w5
+ eor v29.16b,v23.16b,v20.16b
+ eor w17,w17,w6
+ ushr v3.4s,v24.4s,#24
+ eor w19,w19,w7
+ ushr v7.4s,v25.4s,#24
+ eor w20,w20,w8
+ ushr v11.4s,v26.4s,#24
+ ror w21,w21,#16
+ ushr v15.4s,v27.4s,#24
+ ror w17,w17,#16
+ ushr v19.4s,v28.4s,#24
+ ror w19,w19,#16
+ ushr v23.4s,v29.4s,#24
+ ror w20,w20,#16
+ sli v3.4s,v24.4s,#8
+ add w15,w15,w21
+ sli v7.4s,v25.4s,#8
+ add w16,w16,w17
+ sli v11.4s,v26.4s,#8
+ add w13,w13,w19
+ sli v15.4s,v27.4s,#8
+ add w14,w14,w20
+ sli v19.4s,v28.4s,#8
+ eor w10,w10,w15
+ sli v23.4s,v29.4s,#8
+ eor w11,w11,w16
+ add v2.4s,v2.4s,v3.4s
+ eor w12,w12,w13
+ add v6.4s,v6.4s,v7.4s
+ eor w9,w9,w14
+ add v10.4s,v10.4s,v11.4s
+ ror w10,w10,#20
+ add v14.4s,v14.4s,v15.4s
+ ror w11,w11,#20
+ add v18.4s,v18.4s,v19.4s
+ ror w12,w12,#20
+ add v22.4s,v22.4s,v23.4s
+ ror w9,w9,#20
+ eor v24.16b,v1.16b,v2.16b
+ add w5,w5,w10
+ eor v25.16b,v5.16b,v6.16b
+ add w6,w6,w11
+ eor v26.16b,v9.16b,v10.16b
+ add w7,w7,w12
+ eor v27.16b,v13.16b,v14.16b
+ add w8,w8,w9
+ eor v28.16b,v17.16b,v18.16b
+ eor w21,w21,w5
+ eor v29.16b,v21.16b,v22.16b
+ eor w17,w17,w6
+ ushr v1.4s,v24.4s,#25
+ eor w19,w19,w7
+ ushr v5.4s,v25.4s,#25
+ eor w20,w20,w8
+ ushr v9.4s,v26.4s,#25
+ ror w21,w21,#24
+ ushr v13.4s,v27.4s,#25
+ ror w17,w17,#24
+ ushr v17.4s,v28.4s,#25
+ ror w19,w19,#24
+ ushr v21.4s,v29.4s,#25
+ ror w20,w20,#24
+ sli v1.4s,v24.4s,#7
+ add w15,w15,w21
+ sli v5.4s,v25.4s,#7
+ add w16,w16,w17
+ sli v9.4s,v26.4s,#7
+ add w13,w13,w19
+ sli v13.4s,v27.4s,#7
+ add w14,w14,w20
+ sli v17.4s,v28.4s,#7
+ eor w10,w10,w15
+ sli v21.4s,v29.4s,#7
+ eor w11,w11,w16
+ ext v2.16b,v2.16b,v2.16b,#8
+ eor w12,w12,w13
+ ext v6.16b,v6.16b,v6.16b,#8
+ eor w9,w9,w14
+ ext v10.16b,v10.16b,v10.16b,#8
+ ror w10,w10,#25
+ ext v14.16b,v14.16b,v14.16b,#8
+ ror w11,w11,#25
+ ext v18.16b,v18.16b,v18.16b,#8
+ ror w12,w12,#25
+ ext v22.16b,v22.16b,v22.16b,#8
+ ror w9,w9,#25
+ ext v3.16b,v3.16b,v3.16b,#4
+ ext v7.16b,v7.16b,v7.16b,#4
+ ext v11.16b,v11.16b,v11.16b,#4
+ ext v15.16b,v15.16b,v15.16b,#4
+ ext v19.16b,v19.16b,v19.16b,#4
+ ext v23.16b,v23.16b,v23.16b,#4
+ ext v1.16b,v1.16b,v1.16b,#12
+ ext v5.16b,v5.16b,v5.16b,#12
+ ext v9.16b,v9.16b,v9.16b,#12
+ ext v13.16b,v13.16b,v13.16b,#12
+ ext v17.16b,v17.16b,v17.16b,#12
+ ext v21.16b,v21.16b,v21.16b,#12
+ cbnz x4,.Loop_lower_neon
+
+ add w5,w5,w22 // accumulate key block
+ ldp q24,q25,[sp,#0]
+ add x6,x6,x22,lsr#32
+ ldp q26,q27,[sp,#32]
+ add w7,w7,w23
+ ldp q28,q29,[sp,#64]
+ add x8,x8,x23,lsr#32
+ add v0.4s,v0.4s,v24.4s
+ add w9,w9,w24
+ add v4.4s,v4.4s,v24.4s
+ add x10,x10,x24,lsr#32
+ add v8.4s,v8.4s,v24.4s
+ add w11,w11,w25
+ add v12.4s,v12.4s,v24.4s
+ add x12,x12,x25,lsr#32
+ add v16.4s,v16.4s,v24.4s
+ add w13,w13,w26
+ add v20.4s,v20.4s,v24.4s
+ add x14,x14,x26,lsr#32
+ add v2.4s,v2.4s,v26.4s
+ add w15,w15,w27
+ add v6.4s,v6.4s,v26.4s
+ add x16,x16,x27,lsr#32
+ add v10.4s,v10.4s,v26.4s
+ add w17,w17,w28
+ add v14.4s,v14.4s,v26.4s
+ add x19,x19,x28,lsr#32
+ add v18.4s,v18.4s,v26.4s
+ add w20,w20,w30
+ add v22.4s,v22.4s,v26.4s
+ add x21,x21,x30,lsr#32
+ add v19.4s,v19.4s,v31.4s // +4
+ add x5,x5,x6,lsl#32 // pack
+ add v23.4s,v23.4s,v31.4s // +4
+ add x7,x7,x8,lsl#32
+ add v3.4s,v3.4s,v27.4s
+ ldp x6,x8,[x1,#0] // load input
+ add v7.4s,v7.4s,v28.4s
+ add x9,x9,x10,lsl#32
+ add v11.4s,v11.4s,v29.4s
+ add x11,x11,x12,lsl#32
+ add v15.4s,v15.4s,v30.4s
+ ldp x10,x12,[x1,#16]
+ add v19.4s,v19.4s,v27.4s
+ add x13,x13,x14,lsl#32
+ add v23.4s,v23.4s,v28.4s
+ add x15,x15,x16,lsl#32
+ add v1.4s,v1.4s,v25.4s
+ ldp x14,x16,[x1,#32]
+ add v5.4s,v5.4s,v25.4s
+ add x17,x17,x19,lsl#32
+ add v9.4s,v9.4s,v25.4s
+ add x20,x20,x21,lsl#32
+ add v13.4s,v13.4s,v25.4s
+ ldp x19,x21,[x1,#48]
+ add v17.4s,v17.4s,v25.4s
+ add x1,x1,#64
+ add v21.4s,v21.4s,v25.4s
+
+#ifdef __ARMEB__
+ rev x5,x5
+ rev x7,x7
+ rev x9,x9
+ rev x11,x11
+ rev x13,x13
+ rev x15,x15
+ rev x17,x17
+ rev x20,x20
+#endif
+ ld1 {v24.16b,v25.16b,v26.16b,v27.16b},[x1],#64
+ eor x5,x5,x6
+ eor x7,x7,x8
+ eor x9,x9,x10
+ eor x11,x11,x12
+ eor x13,x13,x14
+ eor v0.16b,v0.16b,v24.16b
+ eor x15,x15,x16
+ eor v1.16b,v1.16b,v25.16b
+ eor x17,x17,x19
+ eor v2.16b,v2.16b,v26.16b
+ eor x20,x20,x21
+ eor v3.16b,v3.16b,v27.16b
+ ld1 {v24.16b,v25.16b,v26.16b,v27.16b},[x1],#64
+
+ stp x5,x7,[x0,#0] // store output
+ add x28,x28,#7 // increment counter
+ stp x9,x11,[x0,#16]
+ stp x13,x15,[x0,#32]
+ stp x17,x20,[x0,#48]
+ add x0,x0,#64
+ st1 {v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+
+ ld1 {v0.16b,v1.16b,v2.16b,v3.16b},[x1],#64
+ eor v4.16b,v4.16b,v24.16b
+ eor v5.16b,v5.16b,v25.16b
+ eor v6.16b,v6.16b,v26.16b
+ eor v7.16b,v7.16b,v27.16b
+ st1 {v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+
+ ld1 {v4.16b,v5.16b,v6.16b,v7.16b},[x1],#64
+ eor v8.16b,v8.16b,v0.16b
+ ldp q24,q25,[sp,#0]
+ eor v9.16b,v9.16b,v1.16b
+ ldp q26,q27,[sp,#32]
+ eor v10.16b,v10.16b,v2.16b
+ eor v11.16b,v11.16b,v3.16b
+ st1 {v8.16b,v9.16b,v10.16b,v11.16b},[x0],#64
+
+ ld1 {v8.16b,v9.16b,v10.16b,v11.16b},[x1],#64
+ eor v12.16b,v12.16b,v4.16b
+ eor v13.16b,v13.16b,v5.16b
+ eor v14.16b,v14.16b,v6.16b
+ eor v15.16b,v15.16b,v7.16b
+ st1 {v12.16b,v13.16b,v14.16b,v15.16b},[x0],#64
+
+ ld1 {v12.16b,v13.16b,v14.16b,v15.16b},[x1],#64
+ eor v16.16b,v16.16b,v8.16b
+ eor v17.16b,v17.16b,v9.16b
+ eor v18.16b,v18.16b,v10.16b
+ eor v19.16b,v19.16b,v11.16b
+ st1 {v16.16b,v17.16b,v18.16b,v19.16b},[x0],#64
+
+ shl v0.4s,v31.4s,#1 // 4 -> 8
+ eor v20.16b,v20.16b,v12.16b
+ eor v21.16b,v21.16b,v13.16b
+ eor v22.16b,v22.16b,v14.16b
+ eor v23.16b,v23.16b,v15.16b
+ st1 {v20.16b,v21.16b,v22.16b,v23.16b},[x0],#64
+
+ add v27.4s,v27.4s,v0.4s // += 8
+ add v28.4s,v28.4s,v0.4s
+ add v29.4s,v29.4s,v0.4s
+ add v30.4s,v30.4s,v0.4s
+
+ b.hs .Loop_outer_512_neon
+
+ adds x2,x2,#512
+ ushr v0.4s,v31.4s,#2 // 4 -> 1
+
+ ldp d8,d9,[sp,#128+0] // meet ABI requirements
+ ldp d10,d11,[sp,#128+16]
+ ldp d12,d13,[sp,#128+32]
+ ldp d14,d15,[sp,#128+48]
+
+ stp q24,q31,[sp,#0] // wipe off-load area
+ stp q24,q31,[sp,#32]
+ stp q24,q31,[sp,#64]
+
+ b.eq .Ldone_512_neon
+
+ cmp x2,#192
+ sub v27.4s,v27.4s,v0.4s // -= 1
+ sub v28.4s,v28.4s,v0.4s
+ sub v29.4s,v29.4s,v0.4s
+ add sp,sp,#128
+ b.hs .Loop_outer_neon
+
+ eor v25.16b,v25.16b,v25.16b
+ eor v26.16b,v26.16b,v26.16b
+ eor v27.16b,v27.16b,v27.16b
+ eor v28.16b,v28.16b,v28.16b
+ eor v29.16b,v29.16b,v29.16b
+ eor v30.16b,v30.16b,v30.16b
+ b .Loop_outer
+
+.Ldone_512_neon:
+ ldp x19,x20,[x29,#16]
+ add sp,sp,#128+64
+ ldp x21,x22,[x29,#32]
+ ldp x23,x24,[x29,#48]
+ ldp x25,x26,[x29,#64]
+ ldp x27,x28,[x29,#80]
+ ldp x29,x30,[sp],#96
+ ret
+.size ChaCha20_512_neon,.-ChaCha20_512_neon
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 06/23] zinc: port Andy Polyakov's ChaCha20 ARM and ARM64 implementations
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
` (22 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Russell King, linux-arm-kernel
These port and prepare Andy Polyakov's implementations for the kernel,
but don't actually wire up any of the code yet. The wiring will be done
in a subsequent commit, since we'll need to merge these implementations
with another one. We make a few small changes to the assembly:
- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed (prefixed with .L) to fit kernel conventions.
- Constants have been rearranged so that they are closer to the code
that is using them. [ARM only]
- The neon code can jump to the scalar code when it makes sense to do
so.
- The neon_512 function as a separate function has been removed, leaving
the decision up to the main neon entry point. [ARM64 only]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
lib/zinc/chacha20/chacha20-arm-cryptogams.S | 367 +++++++++---------
lib/zinc/chacha20/chacha20-arm64-cryptogams.S | 57 +--
2 files changed, 193 insertions(+), 231 deletions(-)
diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
index 05a3a9e6e93f..770bab469171 100644
--- a/lib/zinc/chacha20/chacha20-arm-cryptogams.S
+++ b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
@@ -1,9 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
+#include <linux/linkage.h>
.text
#if defined(__thumb2__) || defined(__clang__)
@@ -24,48 +27,25 @@
.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
.Lone:
.long 1,0,0,0
-.Lrot8:
-.long 0x02010003,0x06050407
-#if __ARM_MAX_ARCH__>=7
-.LOPENSSL_armcap:
-.word OPENSSL_armcap_P-.LChaCha20_ctr32
-#else
.word -1
-#endif
-.globl ChaCha20_ctr32
-.type ChaCha20_ctr32,%function
.align 5
-ChaCha20_ctr32:
-.LChaCha20_ctr32:
+ENTRY(chacha20_arm)
ldr r12,[sp,#0] @ pull pointer to counter and nonce
stmdb sp!,{r0-r2,r4-r11,lr}
-#if __ARM_ARCH__<7 && !defined(__thumb2__)
- sub r14,pc,#16 @ ChaCha20_ctr32
-#else
- adr r14,.LChaCha20_ctr32
-#endif
cmp r2,#0 @ len==0?
-#ifdef __thumb2__
+#ifdef __thumb2__
itt eq
#endif
addeq sp,sp,#4*3
- beq .Lno_data
-#if __ARM_MAX_ARCH__>=7
- cmp r2,#192 @ test len
- bls .Lshort
- ldr r4,[r14,#-24]
- ldr r4,[r14,r4]
-# ifdef __APPLE__
- ldr r4,[r4]
-# endif
- tst r4,#ARMV7_NEON
- bne .LChaCha20_neon
-.Lshort:
-#endif
+ beq .Lno_data_arm
ldmia r12,{r4-r7} @ load counter and nonce
sub sp,sp,#4*(16) @ off-load area
- sub r14,r14,#64 @ .Lsigma
+#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__)
+ sub r14,pc,#100 @ .Lsigma
+#else
+ adr r14,.Lsigma @ .Lsigma
+#endif
stmdb sp!,{r4-r7} @ copy counter and nonce
ldmia r3,{r4-r11} @ load key
ldmia r14,{r0-r3} @ load sigma
@@ -191,7 +171,7 @@ ChaCha20_ctr32:
@ rx and second half at sp+4*(16+8)
cmp r11,#64 @ done yet?
-#ifdef __thumb2__
+#ifdef __thumb2__
itete lo
#endif
addlo r12,sp,#4*(0) @ shortcut or ...
@@ -202,49 +182,49 @@ ChaCha20_ctr32:
ldr r8,[sp,#4*(0)] @ load key material
ldr r9,[sp,#4*(1)]
-#if __ARM_ARCH__>=6 || !defined(__ARMEB__)
-# if __ARM_ARCH__<7
+#if __LINUX_ARM_ARCH__ >= 6 || !defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ < 7
orr r10,r12,r14
tst r10,#3 @ are input and output aligned?
ldr r10,[sp,#4*(2)]
bne .Lunaligned
cmp r11,#64 @ restore flags
-# else
+#else
ldr r10,[sp,#4*(2)]
-# endif
+#endif
ldr r11,[sp,#4*(3)]
add r0,r0,r8 @ accumulate key material
add r1,r1,r9
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
add r2,r2,r10
add r3,r3,r11
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r0,r0,r8 @ xor with input
eorhs r1,r1,r9
add r8,sp,#4*(4)
str r0,[r14],#16 @ store output
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r2,r2,r10
eorhs r3,r3,r11
ldmia r8,{r8-r11} @ load key material
@@ -254,34 +234,34 @@ ChaCha20_ctr32:
add r4,r8,r4,ror#13 @ accumulate key material
add r5,r9,r5,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
add r6,r10,r6,ror#13
add r7,r11,r7,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r4,r4,r8
eorhs r5,r5,r9
add r8,sp,#4*(8)
str r4,[r14],#16 @ store output
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r6,r6,r10
eorhs r7,r7,r11
str r5,[r14,#-12]
@@ -294,39 +274,39 @@ ChaCha20_ctr32:
add r0,r0,r8 @ accumulate key material
add r1,r1,r9
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
add r2,r2,r10
add r3,r3,r11
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r0,r0,r8
eorhs r1,r1,r9
add r8,sp,#4*(12)
str r0,[r14],#16 @ store output
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r2,r2,r10
eorhs r3,r3,r11
str r1,[r14,#-12]
@@ -336,79 +316,79 @@ ChaCha20_ctr32:
add r4,r8,r4,ror#24 @ accumulate key material
add r5,r9,r5,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
addhi r8,r8,#1 @ next counter value
strhi r8,[sp,#4*(12)] @ save next counter value
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
add r6,r10,r6,ror#24
add r7,r11,r7,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r4,r4,r8
eorhs r5,r5,r9
-# ifdef __thumb2__
+#ifdef __thumb2__
it ne
-# endif
+#endif
ldrne r8,[sp,#4*(32+2)] @ re-load len
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r6,r6,r10
eorhs r7,r7,r11
str r4,[r14],#16 @ store output
str r5,[r14,#-12]
-# ifdef __thumb2__
+#ifdef __thumb2__
it hs
-# endif
+#endif
subhs r11,r8,#64 @ len-=64
str r6,[r14,#-8]
str r7,[r14,#-4]
bhi .Loop_outer
beq .Ldone
-# if __ARM_ARCH__<7
+#if __LINUX_ARM_ARCH__ < 7
b .Ltail
.align 4
.Lunaligned: @ unaligned endian-neutral path
cmp r11,#64 @ restore flags
-# endif
#endif
-#if __ARM_ARCH__<7
+#endif
+#if __LINUX_ARM_ARCH__ < 7
ldr r11,[sp,#4*(3)]
add r0,r8,r0 @ accumulate key material
add r1,r9,r1
add r2,r10,r2
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -416,53 +396,53 @@ ChaCha20_ctr32:
eor r0,r8,r0 @ xor with input (or zero)
eor r1,r9,r1
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r2,r10,r2
strb r0,[r14],#16 @ store output
eor r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r1,[r14,#-12]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-8]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r3,[r14,#-4]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-15]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r1,[r14,#-11]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-7]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r3,[r14,#-3]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-14]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r1,[r14,#-10]
@@ -482,18 +462,18 @@ ChaCha20_ctr32:
add r4,r8,r4,ror#13 @ accumulate key material
add r5,r9,r5,ror#13
add r6,r10,r6,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r7,r11,r7,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -501,53 +481,53 @@ ChaCha20_ctr32:
eor r4,r8,r4 @ xor with input (or zero)
eor r5,r9,r5
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r6,r10,r6
strb r4,[r14],#16 @ store output
eor r7,r11,r7
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r5,[r14,#-12]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-8]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r7,[r14,#-4]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-15]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r5,[r14,#-11]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-7]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r7,[r14,#-3]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-14]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r5,[r14,#-10]
@@ -564,26 +544,26 @@ ChaCha20_ctr32:
add r8,sp,#4*(4+4)
ldmia r8,{r8-r11} @ load key material
ldmia r0,{r0-r7} @ load second half
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
strhi r10,[sp,#4*(16+10)] @ copy "rx"
strhi r11,[sp,#4*(16+11)] @ copy "rx"
add r0,r8,r0 @ accumulate key material
add r1,r9,r1
add r2,r10,r2
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -591,53 +571,53 @@ ChaCha20_ctr32:
eor r0,r8,r0 @ xor with input (or zero)
eor r1,r9,r1
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r2,r10,r2
strb r0,[r14],#16 @ store output
eor r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r1,[r14,#-12]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-8]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r3,[r14,#-4]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-15]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r1,[r14,#-11]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-7]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r3,[r14,#-3]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-14]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r1,[r14,#-10]
@@ -654,25 +634,25 @@ ChaCha20_ctr32:
add r8,sp,#4*(4+8)
ldmia r8,{r8-r11} @ load key material
add r4,r8,r4,ror#24 @ accumulate key material
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
addhi r8,r8,#1 @ next counter value
strhi r8,[sp,#4*(12)] @ save next counter value
add r5,r9,r5,ror#24
add r6,r10,r6,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r7,r11,r7,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -680,53 +660,53 @@ ChaCha20_ctr32:
eor r4,r8,r4 @ xor with input (or zero)
eor r5,r9,r5
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r6,r10,r6
strb r4,[r14],#16 @ store output
eor r7,r11,r7
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r5,[r14,#-12]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-8]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r7,[r14,#-4]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-15]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r5,[r14,#-11]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-7]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r7,[r14,#-3]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-14]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r5,[r14,#-10]
@@ -740,13 +720,13 @@ ChaCha20_ctr32:
eor r7,r11,r7,lsr#8
strb r6,[r14,#-5]
strb r7,[r14,#-1]
-# ifdef __thumb2__
+#ifdef __thumb2__
it ne
-# endif
+#endif
ldrne r8,[sp,#4*(32+2)] @ re-load len
-# ifdef __thumb2__
+#ifdef __thumb2__
it hs
-# endif
+#endif
subhs r11,r8,#64 @ len-=64
bhi .Loop_outer
@@ -768,20 +748,33 @@ ChaCha20_ctr32:
.Ldone:
add sp,sp,#4*(32+3)
-.Lno_data:
+.Lno_data_arm:
ldmia sp!,{r4-r11,pc}
-.size ChaCha20_ctr32,.-ChaCha20_ctr32
-#if __ARM_MAX_ARCH__>=7
+ENDPROC(chacha20_arm)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+.align 5
+.Lsigma2:
+.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
+.Lone2:
+.long 1,0,0,0
+.word -1
+
.arch armv7-a
.fpu neon
-.type ChaCha20_neon,%function
.align 5
-ChaCha20_neon:
+ENTRY(chacha20_neon)
ldr r12,[sp,#0] @ pull pointer to counter and nonce
stmdb sp!,{r0-r2,r4-r11,lr}
-.LChaCha20_neon:
- adr r14,.Lsigma
+ cmp r2,#0 @ len==0?
+#ifdef __thumb2__
+ itt eq
+#endif
+ addeq sp,sp,#4*3
+ beq .Lno_data_neon
+.Lchacha20_neon_begin:
+ adr r14,.Lsigma2
vstmdb sp!,{d8-d15} @ ABI spec says so
stmdb sp!,{r0-r3}
@@ -1121,12 +1114,12 @@ ChaCha20_neon:
ldr r10,[r12,#-8]
add r3,r3,r11
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
+#endif
eor r0,r0,r8 @ xor with input
add r8,sp,#4*(4)
eor r1,r1,r9
@@ -1146,12 +1139,12 @@ ChaCha20_neon:
ldr r10,[r12,#-8]
add r7,r11,r7,ror#13
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
eor r4,r4,r8
add r8,sp,#4*(8)
eor r5,r5,r9
@@ -1170,24 +1163,24 @@ ChaCha20_neon:
ldr r8,[r12],#16 @ load input
add r1,r1,r9
ldr r9,[r12,#-12]
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
add r2,r2,r10
ldr r10,[r12,#-8]
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
add r3,r3,r11
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
+#endif
eor r0,r0,r8
add r8,sp,#4*(12)
eor r1,r1,r9
@@ -1210,16 +1203,16 @@ ChaCha20_neon:
add r7,r11,r7,ror#24
ldr r10,[r12,#-8]
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
eor r4,r4,r8
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
ldrhi r8,[sp,#4*(32+2)] @ re-load len
eor r5,r5,r9
eor r6,r6,r10
@@ -1379,7 +1372,7 @@ ChaCha20_neon:
add r6,r10,r6,ror#13
add r7,r11,r7,ror#13
ldmia r8,{r8-r11} @ load key material
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
@@ -1388,7 +1381,7 @@ ChaCha20_neon:
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
stmia sp,{r0-r7}
add r0,sp,#4*(16+8)
@@ -1408,7 +1401,7 @@ ChaCha20_neon:
add r6,r10,r6,ror#24
add r7,r11,r7,ror#24
ldr r11,[sp,#4*(32+2)] @ re-load len
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
@@ -1417,7 +1410,7 @@ ChaCha20_neon:
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
stmia r8,{r0-r7}
add r10,sp,#4*(0)
sub r11,r11,#64*3 @ len-=64*3
@@ -1434,7 +1427,7 @@ ChaCha20_neon:
add sp,sp,#4*(32+4)
vldmia sp,{d8-d15}
add sp,sp,#4*(16+3)
+.Lno_data_neon:
ldmia sp!,{r4-r11,pc}
-.size ChaCha20_neon,.-ChaCha20_neon
-.comm OPENSSL_armcap_P,4,4
+ENDPROC(chacha20_neon)
#endif
diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
index 4d029bfdad3a..5037510edc6f 100644
--- a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
+++ b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
@@ -1,46 +1,24 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
+#include <linux/linkage.h>
.text
-
-
-
.align 5
.Lsigma:
.quad 0x3320646e61707865,0x6b20657479622d32 // endian-neutral
.Lone:
.long 1,0,0,0
-.LOPENSSL_armcap_P:
-#ifdef __ILP32__
-.long OPENSSL_armcap_P-.
-#else
-.quad OPENSSL_armcap_P-.
-#endif
-.byte 67,104,97,67,104,97,50,48,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
-.align 2
-.globl ChaCha20_ctr32
-.type ChaCha20_ctr32,%function
.align 5
-ChaCha20_ctr32:
+ENTRY(chacha20_arm)
cbz x2,.Labort
- adr x5,.LOPENSSL_armcap_P
- cmp x2,#192
- b.lo .Lshort
-#ifdef __ILP32__
- ldrsw x6,[x5]
-#else
- ldr x6,[x5]
-#endif
- ldr w17,[x6,x5]
- tst w17,#ARMV7_NEON
- b.ne ChaCha20_neon
-.Lshort:
stp x29,x30,[sp,#-96]!
add x29,sp,#0
@@ -309,11 +287,13 @@ ChaCha20_ctr32:
ldp x27,x28,[x29,#80]
ldp x29,x30,[sp],#96
ret
-.size ChaCha20_ctr32,.-ChaCha20_ctr32
+ENDPROC(chacha20_arm)
-.type ChaCha20_neon,%function
+#ifdef CONFIG_KERNEL_MODE_NEON
.align 5
-ChaCha20_neon:
+ENTRY(chacha20_neon)
+ cbz x2,.Labort_neon
+
stp x29,x30,[sp,#-96]!
add x29,sp,#0
@@ -803,19 +783,6 @@ ChaCha20_neon:
ldp x27,x28,[x29,#80]
ldp x29,x30,[sp],#96
ret
-.size ChaCha20_neon,.-ChaCha20_neon
-.type ChaCha20_512_neon,%function
-.align 5
-ChaCha20_512_neon:
- stp x29,x30,[sp,#-96]!
- add x29,sp,#0
-
- adr x5,.Lsigma
- stp x19,x20,[sp,#16]
- stp x21,x22,[sp,#32]
- stp x23,x24,[sp,#48]
- stp x25,x26,[sp,#64]
- stp x27,x28,[sp,#80]
.L512_or_more_neon:
sub sp,sp,#128+64
@@ -1969,5 +1936,7 @@ ChaCha20_512_neon:
ldp x25,x26,[x29,#64]
ldp x27,x28,[x29,#80]
ldp x29,x30,[sp],#96
+.Labort_neon:
ret
-.size ChaCha20_512_neon,.-ChaCha20_512_neon
+ENDPROC(chacha20_neon)
+#endif
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 06/23] zinc: port Andy Polyakov's ChaCha20 ARM and ARM64 implementations
@ 2018-09-25 14:56 ` Jason A. Donenfeld
0 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-arm-kernel
These port and prepare Andy Polyakov's implementations for the kernel,
but don't actually wire up any of the code yet. The wiring will be done
in a subsequent commit, since we'll need to merge these implementations
with another one. We make a few small changes to the assembly:
- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed (prefixed with .L) to fit kernel conventions.
- Constants have been rearranged so that they are closer to the code
that is using them. [ARM only]
- The neon code can jump to the scalar code when it makes sense to do
so.
- The neon_512 function as a separate function has been removed, leaving
the decision up to the main neon entry point. [ARM64 only]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel at lists.infradead.org
---
lib/zinc/chacha20/chacha20-arm-cryptogams.S | 367 +++++++++---------
lib/zinc/chacha20/chacha20-arm64-cryptogams.S | 57 +--
2 files changed, 193 insertions(+), 231 deletions(-)
diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
index 05a3a9e6e93f..770bab469171 100644
--- a/lib/zinc/chacha20/chacha20-arm-cryptogams.S
+++ b/lib/zinc/chacha20/chacha20-arm-cryptogams.S
@@ -1,9 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
+#include <linux/linkage.h>
.text
#if defined(__thumb2__) || defined(__clang__)
@@ -24,48 +27,25 @@
.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
.Lone:
.long 1,0,0,0
-.Lrot8:
-.long 0x02010003,0x06050407
-#if __ARM_MAX_ARCH__>=7
-.LOPENSSL_armcap:
-.word OPENSSL_armcap_P-.LChaCha20_ctr32
-#else
.word -1
-#endif
-.globl ChaCha20_ctr32
-.type ChaCha20_ctr32,%function
.align 5
-ChaCha20_ctr32:
-.LChaCha20_ctr32:
+ENTRY(chacha20_arm)
ldr r12,[sp,#0] @ pull pointer to counter and nonce
stmdb sp!,{r0-r2,r4-r11,lr}
-#if __ARM_ARCH__<7 && !defined(__thumb2__)
- sub r14,pc,#16 @ ChaCha20_ctr32
-#else
- adr r14,.LChaCha20_ctr32
-#endif
cmp r2,#0 @ len==0?
-#ifdef __thumb2__
+#ifdef __thumb2__
itt eq
#endif
addeq sp,sp,#4*3
- beq .Lno_data
-#if __ARM_MAX_ARCH__>=7
- cmp r2,#192 @ test len
- bls .Lshort
- ldr r4,[r14,#-24]
- ldr r4,[r14,r4]
-# ifdef __APPLE__
- ldr r4,[r4]
-# endif
- tst r4,#ARMV7_NEON
- bne .LChaCha20_neon
-.Lshort:
-#endif
+ beq .Lno_data_arm
ldmia r12,{r4-r7} @ load counter and nonce
sub sp,sp,#4*(16) @ off-load area
- sub r14,r14,#64 @ .Lsigma
+#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__)
+ sub r14,pc,#100 @ .Lsigma
+#else
+ adr r14,.Lsigma @ .Lsigma
+#endif
stmdb sp!,{r4-r7} @ copy counter and nonce
ldmia r3,{r4-r11} @ load key
ldmia r14,{r0-r3} @ load sigma
@@ -191,7 +171,7 @@ ChaCha20_ctr32:
@ rx and second half@sp+4*(16+8)
cmp r11,#64 @ done yet?
-#ifdef __thumb2__
+#ifdef __thumb2__
itete lo
#endif
addlo r12,sp,#4*(0) @ shortcut or ...
@@ -202,49 +182,49 @@ ChaCha20_ctr32:
ldr r8,[sp,#4*(0)] @ load key material
ldr r9,[sp,#4*(1)]
-#if __ARM_ARCH__>=6 || !defined(__ARMEB__)
-# if __ARM_ARCH__<7
+#if __LINUX_ARM_ARCH__ >= 6 || !defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ < 7
orr r10,r12,r14
tst r10,#3 @ are input and output aligned?
ldr r10,[sp,#4*(2)]
bne .Lunaligned
cmp r11,#64 @ restore flags
-# else
+#else
ldr r10,[sp,#4*(2)]
-# endif
+#endif
ldr r11,[sp,#4*(3)]
add r0,r0,r8 @ accumulate key material
add r1,r1,r9
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
add r2,r2,r10
add r3,r3,r11
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r0,r0,r8 @ xor with input
eorhs r1,r1,r9
add r8,sp,#4*(4)
str r0,[r14],#16 @ store output
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r2,r2,r10
eorhs r3,r3,r11
ldmia r8,{r8-r11} @ load key material
@@ -254,34 +234,34 @@ ChaCha20_ctr32:
add r4,r8,r4,ror#13 @ accumulate key material
add r5,r9,r5,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
add r6,r10,r6,ror#13
add r7,r11,r7,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r4,r4,r8
eorhs r5,r5,r9
add r8,sp,#4*(8)
str r4,[r14],#16 @ store output
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r6,r6,r10
eorhs r7,r7,r11
str r5,[r14,#-12]
@@ -294,39 +274,39 @@ ChaCha20_ctr32:
add r0,r0,r8 @ accumulate key material
add r1,r1,r9
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
add r2,r2,r10
add r3,r3,r11
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r0,r0,r8
eorhs r1,r1,r9
add r8,sp,#4*(12)
str r0,[r14],#16 @ store output
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r2,r2,r10
eorhs r3,r3,r11
str r1,[r14,#-12]
@@ -336,79 +316,79 @@ ChaCha20_ctr32:
add r4,r8,r4,ror#24 @ accumulate key material
add r5,r9,r5,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
addhi r8,r8,#1 @ next counter value
strhi r8,[sp,#4*(12)] @ save next counter value
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r8,[r12],#16 @ load input
ldrhs r9,[r12,#-12]
add r6,r10,r6,ror#24
add r7,r11,r7,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhs r10,[r12,#-8]
ldrhs r11,[r12,#-4]
-# if __ARM_ARCH__>=6 && defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
-# ifdef __thumb2__
+#endif
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r4,r4,r8
eorhs r5,r5,r9
-# ifdef __thumb2__
+#ifdef __thumb2__
it ne
-# endif
+#endif
ldrne r8,[sp,#4*(32+2)] @ re-load len
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
eorhs r6,r6,r10
eorhs r7,r7,r11
str r4,[r14],#16 @ store output
str r5,[r14,#-12]
-# ifdef __thumb2__
+#ifdef __thumb2__
it hs
-# endif
+#endif
subhs r11,r8,#64 @ len-=64
str r6,[r14,#-8]
str r7,[r14,#-4]
bhi .Loop_outer
beq .Ldone
-# if __ARM_ARCH__<7
+#if __LINUX_ARM_ARCH__ < 7
b .Ltail
.align 4
.Lunaligned: @ unaligned endian-neutral path
cmp r11,#64 @ restore flags
-# endif
#endif
-#if __ARM_ARCH__<7
+#endif
+#if __LINUX_ARM_ARCH__ < 7
ldr r11,[sp,#4*(3)]
add r0,r8,r0 @ accumulate key material
add r1,r9,r1
add r2,r10,r2
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -416,53 +396,53 @@ ChaCha20_ctr32:
eor r0,r8,r0 @ xor with input (or zero)
eor r1,r9,r1
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r2,r10,r2
strb r0,[r14],#16 @ store output
eor r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r1,[r14,#-12]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-8]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r3,[r14,#-4]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-15]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r1,[r14,#-11]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-7]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r3,[r14,#-3]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-14]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r1,[r14,#-10]
@@ -482,18 +462,18 @@ ChaCha20_ctr32:
add r4,r8,r4,ror#13 @ accumulate key material
add r5,r9,r5,ror#13
add r6,r10,r6,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r7,r11,r7,ror#13
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -501,53 +481,53 @@ ChaCha20_ctr32:
eor r4,r8,r4 @ xor with input (or zero)
eor r5,r9,r5
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r6,r10,r6
strb r4,[r14],#16 @ store output
eor r7,r11,r7
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r5,[r14,#-12]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-8]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r7,[r14,#-4]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-15]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r5,[r14,#-11]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-7]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r7,[r14,#-3]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-14]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r5,[r14,#-10]
@@ -564,26 +544,26 @@ ChaCha20_ctr32:
add r8,sp,#4*(4+4)
ldmia r8,{r8-r11} @ load key material
ldmia r0,{r0-r7} @ load second half
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
strhi r10,[sp,#4*(16+10)] @ copy "rx"
strhi r11,[sp,#4*(16+11)] @ copy "rx"
add r0,r8,r0 @ accumulate key material
add r1,r9,r1
add r2,r10,r2
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -591,53 +571,53 @@ ChaCha20_ctr32:
eor r0,r8,r0 @ xor with input (or zero)
eor r1,r9,r1
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r2,r10,r2
strb r0,[r14],#16 @ store output
eor r3,r11,r3
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r1,[r14,#-12]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-8]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r3,[r14,#-4]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-15]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r1,[r14,#-11]
eor r0,r8,r0,lsr#8
strb r2,[r14,#-7]
eor r1,r9,r1,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r3,[r14,#-3]
eor r2,r10,r2,lsr#8
strb r0,[r14,#-14]
eor r3,r11,r3,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r1,[r14,#-10]
@@ -654,25 +634,25 @@ ChaCha20_ctr32:
add r8,sp,#4*(4+8)
ldmia r8,{r8-r11} @ load key material
add r4,r8,r4,ror#24 @ accumulate key material
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hi
-# endif
+#endif
addhi r8,r8,#1 @ next counter value
strhi r8,[sp,#4*(12)] @ save next counter value
add r5,r9,r5,ror#24
add r6,r10,r6,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r8,r8,r8 @ zero or ...
ldrhsb r8,[r12],#16 @ ... load input
eorlo r9,r9,r9
ldrhsb r9,[r12,#-12]
add r7,r11,r7,ror#24
-# ifdef __thumb2__
+#ifdef __thumb2__
itete lo
-# endif
+#endif
eorlo r10,r10,r10
ldrhsb r10,[r12,#-8]
eorlo r11,r11,r11
@@ -680,53 +660,53 @@ ChaCha20_ctr32:
eor r4,r8,r4 @ xor with input (or zero)
eor r5,r9,r5
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-15] @ load more input
ldrhsb r9,[r12,#-11]
eor r6,r10,r6
strb r4,[r14],#16 @ store output
eor r7,r11,r7
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-7]
ldrhsb r11,[r12,#-3]
strb r5,[r14,#-12]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-8]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-14] @ load more input
ldrhsb r9,[r12,#-10]
strb r7,[r14,#-4]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-15]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-6]
ldrhsb r11,[r12,#-2]
strb r5,[r14,#-11]
eor r4,r8,r4,lsr#8
strb r6,[r14,#-7]
eor r5,r9,r5,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r8,[r12,#-13] @ load more input
ldrhsb r9,[r12,#-9]
strb r7,[r14,#-3]
eor r6,r10,r6,lsr#8
strb r4,[r14,#-14]
eor r7,r11,r7,lsr#8
-# ifdef __thumb2__
+#ifdef __thumb2__
itt hs
-# endif
+#endif
ldrhsb r10,[r12,#-5]
ldrhsb r11,[r12,#-1]
strb r5,[r14,#-10]
@@ -740,13 +720,13 @@ ChaCha20_ctr32:
eor r7,r11,r7,lsr#8
strb r6,[r14,#-5]
strb r7,[r14,#-1]
-# ifdef __thumb2__
+#ifdef __thumb2__
it ne
-# endif
+#endif
ldrne r8,[sp,#4*(32+2)] @ re-load len
-# ifdef __thumb2__
+#ifdef __thumb2__
it hs
-# endif
+#endif
subhs r11,r8,#64 @ len-=64
bhi .Loop_outer
@@ -768,20 +748,33 @@ ChaCha20_ctr32:
.Ldone:
add sp,sp,#4*(32+3)
-.Lno_data:
+.Lno_data_arm:
ldmia sp!,{r4-r11,pc}
-.size ChaCha20_ctr32,.-ChaCha20_ctr32
-#if __ARM_MAX_ARCH__>=7
+ENDPROC(chacha20_arm)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+.align 5
+.Lsigma2:
+.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
+.Lone2:
+.long 1,0,0,0
+.word -1
+
.arch armv7-a
.fpu neon
-.type ChaCha20_neon,%function
.align 5
-ChaCha20_neon:
+ENTRY(chacha20_neon)
ldr r12,[sp,#0] @ pull pointer to counter and nonce
stmdb sp!,{r0-r2,r4-r11,lr}
-.LChaCha20_neon:
- adr r14,.Lsigma
+ cmp r2,#0 @ len==0?
+#ifdef __thumb2__
+ itt eq
+#endif
+ addeq sp,sp,#4*3
+ beq .Lno_data_neon
+.Lchacha20_neon_begin:
+ adr r14,.Lsigma2
vstmdb sp!,{d8-d15} @ ABI spec says so
stmdb sp!,{r0-r3}
@@ -1121,12 +1114,12 @@ ChaCha20_neon:
ldr r10,[r12,#-8]
add r3,r3,r11
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
+#endif
eor r0,r0,r8 @ xor with input
add r8,sp,#4*(4)
eor r1,r1,r9
@@ -1146,12 +1139,12 @@ ChaCha20_neon:
ldr r10,[r12,#-8]
add r7,r11,r7,ror#13
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
eor r4,r4,r8
add r8,sp,#4*(8)
eor r5,r5,r9
@@ -1170,24 +1163,24 @@ ChaCha20_neon:
ldr r8,[r12],#16 @ load input
add r1,r1,r9
ldr r9,[r12,#-12]
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it
add r2,r2,r10
ldr r10,[r12,#-8]
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it
add r3,r3,r11
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
+#endif
eor r0,r0,r8
add r8,sp,#4*(12)
eor r1,r1,r9
@@ -1210,16 +1203,16 @@ ChaCha20_neon:
add r7,r11,r7,ror#24
ldr r10,[r12,#-8]
ldr r11,[r12,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r4,r4
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
eor r4,r4,r8
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
ldrhi r8,[sp,#4*(32+2)] @ re-load len
eor r5,r5,r9
eor r6,r6,r10
@@ -1379,7 +1372,7 @@ ChaCha20_neon:
add r6,r10,r6,ror#13
add r7,r11,r7,ror#13
ldmia r8,{r8-r11} @ load key material
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
@@ -1388,7 +1381,7 @@ ChaCha20_neon:
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
stmia sp,{r0-r7}
add r0,sp,#4*(16+8)
@@ -1408,7 +1401,7 @@ ChaCha20_neon:
add r6,r10,r6,ror#24
add r7,r11,r7,ror#24
ldr r11,[sp,#4*(32+2)] @ re-load len
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
@@ -1417,7 +1410,7 @@ ChaCha20_neon:
rev r5,r5
rev r6,r6
rev r7,r7
-# endif
+#endif
stmia r8,{r0-r7}
add r10,sp,#4*(0)
sub r11,r11,#64*3 @ len-=64*3
@@ -1434,7 +1427,7 @@ ChaCha20_neon:
add sp,sp,#4*(32+4)
vldmia sp,{d8-d15}
add sp,sp,#4*(16+3)
+.Lno_data_neon:
ldmia sp!,{r4-r11,pc}
-.size ChaCha20_neon,.-ChaCha20_neon
-.comm OPENSSL_armcap_P,4,4
+ENDPROC(chacha20_neon)
#endif
diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
index 4d029bfdad3a..5037510edc6f 100644
--- a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
+++ b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S
@@ -1,46 +1,24 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
+#include <linux/linkage.h>
.text
-
-
-
.align 5
.Lsigma:
.quad 0x3320646e61707865,0x6b20657479622d32 // endian-neutral
.Lone:
.long 1,0,0,0
-.LOPENSSL_armcap_P:
-#ifdef __ILP32__
-.long OPENSSL_armcap_P-.
-#else
-.quad OPENSSL_armcap_P-.
-#endif
-.byte 67,104,97,67,104,97,50,48,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
-.align 2
-.globl ChaCha20_ctr32
-.type ChaCha20_ctr32,%function
.align 5
-ChaCha20_ctr32:
+ENTRY(chacha20_arm)
cbz x2,.Labort
- adr x5,.LOPENSSL_armcap_P
- cmp x2,#192
- b.lo .Lshort
-#ifdef __ILP32__
- ldrsw x6,[x5]
-#else
- ldr x6,[x5]
-#endif
- ldr w17,[x6,x5]
- tst w17,#ARMV7_NEON
- b.ne ChaCha20_neon
-.Lshort:
stp x29,x30,[sp,#-96]!
add x29,sp,#0
@@ -309,11 +287,13 @@ ChaCha20_ctr32:
ldp x27,x28,[x29,#80]
ldp x29,x30,[sp],#96
ret
-.size ChaCha20_ctr32,.-ChaCha20_ctr32
+ENDPROC(chacha20_arm)
-.type ChaCha20_neon,%function
+#ifdef CONFIG_KERNEL_MODE_NEON
.align 5
-ChaCha20_neon:
+ENTRY(chacha20_neon)
+ cbz x2,.Labort_neon
+
stp x29,x30,[sp,#-96]!
add x29,sp,#0
@@ -803,19 +783,6 @@ ChaCha20_neon:
ldp x27,x28,[x29,#80]
ldp x29,x30,[sp],#96
ret
-.size ChaCha20_neon,.-ChaCha20_neon
-.type ChaCha20_512_neon,%function
-.align 5
-ChaCha20_512_neon:
- stp x29,x30,[sp,#-96]!
- add x29,sp,#0
-
- adr x5,.Lsigma
- stp x19,x20,[sp,#16]
- stp x21,x22,[sp,#32]
- stp x23,x24,[sp,#48]
- stp x25,x26,[sp,#64]
- stp x27,x28,[sp,#80]
.L512_or_more_neon:
sub sp,sp,#128+64
@@ -1969,5 +1936,7 @@ ChaCha20_512_neon:
ldp x25,x26,[x29,#64]
ldp x27,x28,[x29,#80]
ldp x29,x30,[sp],#96
+.Labort_neon:
ret
-.size ChaCha20_512_neon,.-ChaCha20_512_neon
+ENDPROC(chacha20_neon)
+#endif
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
` (22 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Russell King, linux-arm-kernel
These wire Andy Polyakov's implementations up to the kernel for ARMv7,8
NEON, and introduce Eric Biggers' ultra-fast scalar implementation for
CPUs without NEON or for CPUs with slow NEON (Cortex-A5,7).
This commit does the following:
- Adds the glue code for the assembly implementations.
- Renames the ARMv8 code into place, since it can at this point be
used wholesale.
- Merges Andy Polyakov's ARMv7 NEON code with Eric Biggers' <=ARMv7
scalar code.
Commit note: Eric Biggers' scalar code is brand new, and quite possibly
prematurely added to this commit, and so it may require a bit of revision.
This commit delivers approximately the same or much better performance than
the existing crypto API's code and has been measured to do as such on:
- ARM1176JZF-S [ARMv6]
- Cortex-A7 [ARMv7]
- Cortex-A8 [ARMv7]
- Cortex-A9 [ARMv7]
- Cortex-A17 [ARMv7]
- Cortex-A53 [ARMv8]
- Cortex-A55 [ARMv8]
- Cortex-A73 [ARMv8]
- Cortex-A75 [ARMv8]
Interestingly, Andy Polyakov's scalar code is slower than Eric Biggers',
but is also significantly shorter. This has the advantage that it does
not evict other code from L1 cache -- particularly on ARM11 chips -- and
so in certain circumstances it can actually be faster. However, it wasn't
found that this had an affect on any code existing in the kernel today.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Co-authored-by: Eric Biggers <ebiggers@google.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
lib/zinc/Makefile | 2 +
lib/zinc/chacha20/chacha20-arm-glue.h | 88 +++
...acha20-arm-cryptogams.S => chacha20-arm.S} | 502 ++++++++++++++++--
...20-arm64-cryptogams.S => chacha20-arm64.S} | 0
lib/zinc/chacha20/chacha20.c | 2 +
5 files changed, 556 insertions(+), 38 deletions(-)
create mode 100644 lib/zinc/chacha20/chacha20-arm-glue.h
rename lib/zinc/chacha20/{chacha20-arm-cryptogams.S => chacha20-arm.S} (71%)
rename lib/zinc/chacha20/{chacha20-arm64-cryptogams.S => chacha20-arm64.S} (100%)
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 223a0816c918..e47f64e12bbd 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -4,4 +4,6 @@ ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
zinc_chacha20-y := chacha20/chacha20.o
zinc_chacha20-$(CONFIG_ZINC_ARCH_X86_64) += chacha20/chacha20-x86_64.o
+zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM) += chacha20/chacha20-arm.o
+zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o
obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
diff --git a/lib/zinc/chacha20/chacha20-arm-glue.h b/lib/zinc/chacha20/chacha20-arm-glue.h
new file mode 100644
index 000000000000..86cce851ed02
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm-glue.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#if defined(CONFIG_ARM)
+#include <asm/system_info.h>
+#include <asm/cputype.h>
+#endif
+
+asmlinkage void chacha20_arm(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#if defined(CONFIG_ARM)
+asmlinkage void hchacha20_arm(const u32 state[16], u32 out[8]);
+#endif
+#if defined(CONFIG_KERNEL_MODE_NEON)
+asmlinkage void chacha20_neon(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#endif
+
+static bool chacha20_use_neon __ro_after_init;
+
+static void __init chacha20_fpu_init(void)
+{
+#if defined(CONFIG_ARM64)
+ chacha20_use_neon = elf_hwcap & HWCAP_ASIMD;
+#elif defined(CONFIG_ARM)
+ switch (read_cpuid_part()) {
+ case ARM_CPU_PART_CORTEX_A7:
+ case ARM_CPU_PART_CORTEX_A5:
+ /* The Cortex-A7 and Cortex-A5 do not perform well with the NEON
+ * implementation but do incredibly with the scalar one and use
+ * less power.
+ */
+ break;
+ default:
+ chacha20_use_neon = elf_hwcap & HWCAP_NEON;
+ }
+#endif
+}
+
+static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst,
+ const u8 *src, size_t len,
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_KERNEL_MODE_NEON)
+ if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 &&
+ simd_use(simd_context))
+ chacha20_neon(dst, src, len, state->key, state->counter);
+ else
+#endif
+ chacha20_arm(dst, src, len, state->key, state->counter);
+
+ state->counter[0] += (len + 63) / 64;
+ return true;
+}
+
+static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_ARM)
+ u32 x[] = { CHACHA20_CONSTANT_EXPA,
+ CHACHA20_CONSTANT_ND_3,
+ CHACHA20_CONSTANT_2_BY,
+ CHACHA20_CONSTANT_TE_K,
+ get_unaligned_le32(key + 0),
+ get_unaligned_le32(key + 4),
+ get_unaligned_le32(key + 8),
+ get_unaligned_le32(key + 12),
+ get_unaligned_le32(key + 16),
+ get_unaligned_le32(key + 20),
+ get_unaligned_le32(key + 24),
+ get_unaligned_le32(key + 28),
+ get_unaligned_le32(nonce + 0),
+ get_unaligned_le32(nonce + 4),
+ get_unaligned_le32(nonce + 8),
+ get_unaligned_le32(nonce + 12)
+ };
+ hchacha20_arm(x, derived_key);
+ return true;
+#else
+ return false;
+#endif
+}
diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm.S
similarity index 71%
rename from lib/zinc/chacha20/chacha20-arm-cryptogams.S
rename to lib/zinc/chacha20/chacha20-arm.S
index 770bab469171..5abedafcf129 100644
--- a/lib/zinc/chacha20/chacha20-arm-cryptogams.S
+++ b/lib/zinc/chacha20/chacha20-arm.S
@@ -1,13 +1,475 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2018 Google, Inc.
* Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
- *
- * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
#include <linux/linkage.h>
+/*
+ * The following scalar routine was written by Eric Biggers.
+ *
+ * Design notes:
+ *
+ * 16 registers would be needed to hold the state matrix, but only 14 are
+ * available because 'sp' and 'pc' cannot be used. So we spill the elements
+ * (x8, x9) to the stack and swap them out with (x10, x11). This adds one
+ * 'ldrd' and one 'strd' instruction per round.
+ *
+ * All rotates are performed using the implicit rotate operand accepted by the
+ * 'add' and 'eor' instructions. This is faster than using explicit rotate
+ * instructions. To make this work, we allow the values in the second and last
+ * rows of the ChaCha state matrix (rows 'b' and 'd') to temporarily have the
+ * wrong rotation amount. The rotation amount is then fixed up just in time
+ * when the values are used. 'brot' is the number of bits the values in row 'b'
+ * need to be rotated right to arrive at the correct values, and 'drot'
+ * similarly for row 'd'. (brot, drot) start out as (0, 0) but we make it such
+ * that they end up as (25, 24) after every round.
+ */
+
+ // ChaCha state registers
+ X0 .req r0
+ X1 .req r1
+ X2 .req r2
+ X3 .req r3
+ X4 .req r4
+ X5 .req r5
+ X6 .req r6
+ X7 .req r7
+ X8_X10 .req r8 // shared by x8 and x10
+ X9_X11 .req r9 // shared by x9 and x11
+ X12 .req r10
+ X13 .req r11
+ X14 .req r12
+ X15 .req r14
+
+.Lexpand_32byte_k:
+ // "expand 32-byte k"
+ .word 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574
+
+#ifdef __thumb2__
+# define adrl adr
+#endif
+
+.macro __rev out, in, t0, t1, t2
+.if __LINUX_ARM_ARCH__ >= 6
+ rev \out, \in
+.else
+ lsl \t0, \in, #24
+ and \t1, \in, #0xff00
+ and \t2, \in, #0xff0000
+ orr \out, \t0, \in, lsr #24
+ orr \out, \out, \t1, lsl #8
+ orr \out, \out, \t2, lsr #8
+.endif
+.endm
+
+.macro _le32_bswap x, t0, t1, t2
+#ifdef __ARMEB__
+ __rev \x, \x, \t0, \t1, \t2
+#endif
+.endm
+
+.macro _le32_bswap_4x a, b, c, d, t0, t1, t2
+ _le32_bswap \a, \t0, \t1, \t2
+ _le32_bswap \b, \t0, \t1, \t2
+ _le32_bswap \c, \t0, \t1, \t2
+ _le32_bswap \d, \t0, \t1, \t2
+.endm
+
+.macro __ldrd a, b, src, offset
+#if __LINUX_ARM_ARCH__ >= 6
+ ldrd \a, \b, [\src, #\offset]
+#else
+ ldr \a, [\src, #\offset]
+ ldr \b, [\src, #\offset + 4]
+#endif
+.endm
+
+.macro __strd a, b, dst, offset
+#if __LINUX_ARM_ARCH__ >= 6
+ strd \a, \b, [\dst, #\offset]
+#else
+ str \a, [\dst, #\offset]
+ str \b, [\dst, #\offset + 4]
+#endif
+.endm
+
+.macro _halfround a1, b1, c1, d1, a2, b2, c2, d2
+
+ // a += b; d ^= a; d = rol(d, 16);
+ add \a1, \a1, \b1, ror #brot
+ add \a2, \a2, \b2, ror #brot
+ eor \d1, \a1, \d1, ror #drot
+ eor \d2, \a2, \d2, ror #drot
+ // drot == 32 - 16 == 16
+
+ // c += d; b ^= c; b = rol(b, 12);
+ add \c1, \c1, \d1, ror #16
+ add \c2, \c2, \d2, ror #16
+ eor \b1, \c1, \b1, ror #brot
+ eor \b2, \c2, \b2, ror #brot
+ // brot == 32 - 12 == 20
+
+ // a += b; d ^= a; d = rol(d, 8);
+ add \a1, \a1, \b1, ror #20
+ add \a2, \a2, \b2, ror #20
+ eor \d1, \a1, \d1, ror #16
+ eor \d2, \a2, \d2, ror #16
+ // drot == 32 - 8 == 24
+
+ // c += d; b ^= c; b = rol(b, 7);
+ add \c1, \c1, \d1, ror #24
+ add \c2, \c2, \d2, ror #24
+ eor \b1, \c1, \b1, ror #20
+ eor \b2, \c2, \b2, ror #20
+ // brot == 32 - 7 == 25
+.endm
+
+.macro _doubleround
+
+ // column round
+
+ // quarterrounds: (x0, x4, x8, x12) and (x1, x5, x9, x13)
+ _halfround X0, X4, X8_X10, X12, X1, X5, X9_X11, X13
+
+ // save (x8, x9); restore (x10, x11)
+ __strd X8_X10, X9_X11, sp, 0
+ __ldrd X8_X10, X9_X11, sp, 8
+
+ // quarterrounds: (x2, x6, x10, x14) and (x3, x7, x11, x15)
+ _halfround X2, X6, X8_X10, X14, X3, X7, X9_X11, X15
+
+ .set brot, 25
+ .set drot, 24
+
+ // diagonal round
+
+ // quarterrounds: (x0, x5, x10, x15) and (x1, x6, x11, x12)
+ _halfround X0, X5, X8_X10, X15, X1, X6, X9_X11, X12
+
+ // save (x10, x11); restore (x8, x9)
+ __strd X8_X10, X9_X11, sp, 8
+ __ldrd X8_X10, X9_X11, sp, 0
+
+ // quarterrounds: (x2, x7, x8, x13) and (x3, x4, x9, x14)
+ _halfround X2, X7, X8_X10, X13, X3, X4, X9_X11, X14
+.endm
+
+.macro _chacha_permute nrounds
+ .set brot, 0
+ .set drot, 0
+ .rept \nrounds / 2
+ _doubleround
+ .endr
+.endm
+
+.macro _chacha nrounds
+
+.Lnext_block\@:
+ // Stack: unused0-unused1 x10-x11 x0-x15 OUT IN LEN
+ // Registers contain x0-x9,x12-x15.
+
+ // Do the core ChaCha permutation to update x0-x15.
+ _chacha_permute \nrounds
+
+ add sp, #8
+ // Stack: x10-x11 orig_x0-orig_x15 OUT IN LEN
+ // Registers contain x0-x9,x12-x15.
+ // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'.
+
+ // Free up some registers (r8-r12,r14) by pushing (x8-x9,x12-x15).
+ push {X8_X10, X9_X11, X12, X13, X14, X15}
+
+ // Load (OUT, IN, LEN).
+ ldr r14, [sp, #96]
+ ldr r12, [sp, #100]
+ ldr r11, [sp, #104]
+
+ orr r10, r14, r12
+
+ // Use slow path if fewer than 64 bytes remain.
+ cmp r11, #64
+ blt .Lxor_slowpath\@
+
+ // Use slow path if IN and/or OUT isn't 4-byte aligned. Needed even on
+ // ARMv6+, since ldmia and stmia (used below) still require alignment.
+ tst r10, #3
+ bne .Lxor_slowpath\@
+
+ // Fast path: XOR 64 bytes of aligned data.
+
+ // Stack: x8-x9 x12-x15 x10-x11 orig_x0-orig_x15 OUT IN LEN
+ // Registers: r0-r7 are x0-x7; r8-r11 are free; r12 is IN; r14 is OUT.
+ // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'.
+
+ // x0-x3
+ __ldrd r8, r9, sp, 32
+ __ldrd r10, r11, sp, 40
+ add X0, X0, r8
+ add X1, X1, r9
+ add X2, X2, r10
+ add X3, X3, r11
+ _le32_bswap_4x X0, X1, X2, X3, r8, r9, r10
+ ldmia r12!, {r8-r11}
+ eor X0, X0, r8
+ eor X1, X1, r9
+ eor X2, X2, r10
+ eor X3, X3, r11
+ stmia r14!, {X0-X3}
+
+ // x4-x7
+ __ldrd r8, r9, sp, 48
+ __ldrd r10, r11, sp, 56
+ add X4, r8, X4, ror #brot
+ add X5, r9, X5, ror #brot
+ ldmia r12!, {X0-X3}
+ add X6, r10, X6, ror #brot
+ add X7, r11, X7, ror #brot
+ _le32_bswap_4x X4, X5, X6, X7, r8, r9, r10
+ eor X4, X4, X0
+ eor X5, X5, X1
+ eor X6, X6, X2
+ eor X7, X7, X3
+ stmia r14!, {X4-X7}
+
+ // x8-x15
+ pop {r0-r7} // (x8-x9,x12-x15,x10-x11)
+ __ldrd r8, r9, sp, 32
+ __ldrd r10, r11, sp, 40
+ add r0, r0, r8 // x8
+ add r1, r1, r9 // x9
+ add r6, r6, r10 // x10
+ add r7, r7, r11 // x11
+ _le32_bswap_4x r0, r1, r6, r7, r8, r9, r10
+ ldmia r12!, {r8-r11}
+ eor r0, r0, r8 // x8
+ eor r1, r1, r9 // x9
+ eor r6, r6, r10 // x10
+ eor r7, r7, r11 // x11
+ stmia r14!, {r0,r1,r6,r7}
+ ldmia r12!, {r0,r1,r6,r7}
+ __ldrd r8, r9, sp, 48
+ __ldrd r10, r11, sp, 56
+ add r2, r8, r2, ror #drot // x12
+ add r3, r9, r3, ror #drot // x13
+ add r4, r10, r4, ror #drot // x14
+ add r5, r11, r5, ror #drot // x15
+ _le32_bswap_4x r2, r3, r4, r5, r9, r10, r11
+ ldr r9, [sp, #72] // load LEN
+ eor r2, r2, r0 // x12
+ eor r3, r3, r1 // x13
+ eor r4, r4, r6 // x14
+ eor r5, r5, r7 // x15
+ subs r9, #64 // decrement and check LEN
+ stmia r14!, {r2-r5}
+
+ beq .Ldone\@
+
+.Lprepare_for_next_block\@:
+
+ // Stack: x0-x15 OUT IN LEN
+
+ // Increment block counter (x12)
+ add r8, #1
+
+ // Store updated (OUT, IN, LEN)
+ str r14, [sp, #64]
+ str r12, [sp, #68]
+ str r9, [sp, #72]
+
+ mov r14, sp
+
+ // Store updated block counter (x12)
+ str r8, [sp, #48]
+
+ sub sp, #16
+
+ // Reload state and do next block
+ ldmia r14!, {r0-r11} // load x0-x11
+ __strd r10, r11, sp, 8 // store x10-x11 before state
+ ldmia r14, {r10-r12,r14} // load x12-x15
+ b .Lnext_block\@
+
+.Lxor_slowpath\@:
+ // Slow path: < 64 bytes remaining, or unaligned input or output buffer.
+ // We handle it by storing the 64 bytes of keystream to the stack, then
+ // XOR-ing the needed portion with the data.
+
+ // Allocate keystream buffer
+ sub sp, #64
+ mov r14, sp
+
+ // Stack: ks0-ks15 x8-x9 x12-x15 x10-x11 orig_x0-orig_x15 OUT IN LEN
+ // Registers: r0-r7 are x0-x7; r8-r11 are free; r12 is IN; r14 is &ks0.
+ // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'.
+
+ // Save keystream for x0-x3
+ __ldrd r8, r9, sp, 96
+ __ldrd r10, r11, sp, 104
+ add X0, X0, r8
+ add X1, X1, r9
+ add X2, X2, r10
+ add X3, X3, r11
+ _le32_bswap_4x X0, X1, X2, X3, r8, r9, r10
+ stmia r14!, {X0-X3}
+
+ // Save keystream for x4-x7
+ __ldrd r8, r9, sp, 112
+ __ldrd r10, r11, sp, 120
+ add X4, r8, X4, ror #brot
+ add X5, r9, X5, ror #brot
+ add X6, r10, X6, ror #brot
+ add X7, r11, X7, ror #brot
+ _le32_bswap_4x X4, X5, X6, X7, r8, r9, r10
+ add r8, sp, #64
+ stmia r14!, {X4-X7}
+
+ // Save keystream for x8-x15
+ ldm r8, {r0-r7} // (x8-x9,x12-x15,x10-x11)
+ __ldrd r8, r9, sp, 128
+ __ldrd r10, r11, sp, 136
+ add r0, r0, r8 // x8
+ add r1, r1, r9 // x9
+ add r6, r6, r10 // x10
+ add r7, r7, r11 // x11
+ _le32_bswap_4x r0, r1, r6, r7, r8, r9, r10
+ stmia r14!, {r0,r1,r6,r7}
+ __ldrd r8, r9, sp, 144
+ __ldrd r10, r11, sp, 152
+ add r2, r8, r2, ror #drot // x12
+ add r3, r9, r3, ror #drot // x13
+ add r4, r10, r4, ror #drot // x14
+ add r5, r11, r5, ror #drot // x15
+ _le32_bswap_4x r2, r3, r4, r5, r9, r10, r11
+ stmia r14, {r2-r5}
+
+ // Stack: ks0-ks15 unused0-unused7 x0-x15 OUT IN LEN
+ // Registers: r8 is block counter, r12 is IN.
+
+ ldr r9, [sp, #168] // LEN
+ ldr r14, [sp, #160] // OUT
+ cmp r9, #64
+ mov r0, sp
+ movle r1, r9
+ movgt r1, #64
+ // r1 is number of bytes to XOR, in range [1, 64]
+
+.if __LINUX_ARM_ARCH__ < 6
+ orr r2, r12, r14
+ tst r2, #3 // IN or OUT misaligned?
+ bne .Lxor_next_byte\@
+.endif
+
+ // XOR a word at a time
+.rept 16
+ subs r1, #4
+ blt .Lxor_words_done\@
+ ldr r2, [r12], #4
+ ldr r3, [r0], #4
+ eor r2, r2, r3
+ str r2, [r14], #4
+.endr
+ b .Lxor_slowpath_done\@
+.Lxor_words_done\@:
+ ands r1, r1, #3
+ beq .Lxor_slowpath_done\@
+
+ // XOR a byte at a time
+.Lxor_next_byte\@:
+ ldrb r2, [r12], #1
+ ldrb r3, [r0], #1
+ eor r2, r2, r3
+ strb r2, [r14], #1
+ subs r1, #1
+ bne .Lxor_next_byte\@
+
+.Lxor_slowpath_done\@:
+ subs r9, #64
+ add sp, #96
+ bgt .Lprepare_for_next_block\@
+
+.Ldone\@:
+.endm // _chacha
+
+/*
+ * void chacha20_arm(u8 *out, const u8 *in, size_t len, const u32 key[8],
+ * const u32 iv[4]);
+ */
+ENTRY(chacha20_arm)
+ cmp r2, #0 // len == 0?
+ bxeq lr
+
+ push {r0-r2,r4-r11,lr}
+
+ // Push state x0-x15 onto stack.
+ // Also store an extra copy of x10-x11 just before the state.
+
+ ldr r4, [sp, #48] // iv
+ mov r0, sp
+ sub sp, #80
+
+ // iv: x12-x15
+ ldm r4, {X12,X13,X14,X15}
+ stmdb r0!, {X12,X13,X14,X15}
+
+ // key: x4-x11
+ __ldrd X8_X10, X9_X11, r3, 24
+ __strd X8_X10, X9_X11, sp, 8
+ stmdb r0!, {X8_X10, X9_X11}
+ ldm r3, {X4-X9_X11}
+ stmdb r0!, {X4-X9_X11}
+
+ // constants: x0-x3
+ adrl X3, .Lexpand_32byte_k
+ ldm X3, {X0-X3}
+ __strd X0, X1, sp, 16
+ __strd X2, X3, sp, 24
+
+ _chacha 20
+
+ add sp, #76
+ pop {r4-r11, pc}
+ENDPROC(chacha20_arm)
+
+/*
+ * void hchacha20_arm(const u32 state[16], u32 out[8]);
+ */
+ENTRY(hchacha20_arm)
+ push {r1,r4-r11,lr}
+
+ mov r14, r0
+ ldmia r14!, {r0-r11} // load x0-x11
+ push {r10-r11} // store x10-x11 to stack
+ ldm r14, {r10-r12,r14} // load x12-x15
+ sub sp, #8
+
+ _chacha_permute 20
+
+ // Skip over (unused0-unused1, x10-x11)
+ add sp, #16
+
+ // Fix up rotations of x12-x15
+ ror X12, X12, #drot
+ ror X13, X13, #drot
+ pop {r4} // load 'out'
+ ror X14, X14, #drot
+ ror X15, X15, #drot
+
+ // Store (x0-x3,x12-x15) to 'out'
+ stm r4, {X0,X1,X2,X3,X12,X13,X14,X15}
+
+ pop {r4-r11,pc}
+ENDPROC(hchacha20_arm)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+/*
+ * This following NEON routine was ported from Andy Polyakov's implementation
+ * from CRYPTOGAMS. It begins with parts of the CRYPTOGAMS scalar routine,
+ * since certain NEON code paths actually branch to it.
+ */
+
.text
#if defined(__thumb2__) || defined(__clang__)
.syntax unified
@@ -22,39 +484,6 @@
#define ldrhsb ldrbhs
#endif
-.align 5
-.Lsigma:
-.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
-.Lone:
-.long 1,0,0,0
-.word -1
-
-.align 5
-ENTRY(chacha20_arm)
- ldr r12,[sp,#0] @ pull pointer to counter and nonce
- stmdb sp!,{r0-r2,r4-r11,lr}
- cmp r2,#0 @ len==0?
-#ifdef __thumb2__
- itt eq
-#endif
- addeq sp,sp,#4*3
- beq .Lno_data_arm
- ldmia r12,{r4-r7} @ load counter and nonce
- sub sp,sp,#4*(16) @ off-load area
-#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__)
- sub r14,pc,#100 @ .Lsigma
-#else
- adr r14,.Lsigma @ .Lsigma
-#endif
- stmdb sp!,{r4-r7} @ copy counter and nonce
- ldmia r3,{r4-r11} @ load key
- ldmia r14,{r0-r3} @ load sigma
- stmdb sp!,{r4-r11} @ copy key
- stmdb sp!,{r0-r3} @ copy sigma
- str r10,[sp,#4*(16+10)] @ off-load "rx"
- str r11,[sp,#4*(16+11)] @ off-load "rx"
- b .Loop_outer_enter
-
.align 4
.Loop_outer:
ldmia sp,{r0-r9} @ load key material
@@ -748,11 +1177,8 @@ ENTRY(chacha20_arm)
.Ldone:
add sp,sp,#4*(32+3)
-.Lno_data_arm:
ldmia sp!,{r4-r11,pc}
-ENDPROC(chacha20_arm)
-#ifdef CONFIG_KERNEL_MODE_NEON
.align 5
.Lsigma2:
.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64.S
similarity index 100%
rename from lib/zinc/chacha20/chacha20-arm64-cryptogams.S
rename to lib/zinc/chacha20/chacha20-arm64.S
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
index 4354b874a6a5..fc4f74fca653 100644
--- a/lib/zinc/chacha20/chacha20.c
+++ b/lib/zinc/chacha20/chacha20.c
@@ -16,6 +16,8 @@
#if defined(CONFIG_ZINC_ARCH_X86_64)
#include "chacha20-x86_64-glue.h"
+#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64)
+#include "chacha20-arm-glue.h"
#else
void __init chacha20_fpu_init(void)
{
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations
@ 2018-09-25 14:56 ` Jason A. Donenfeld
0 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-arm-kernel
These wire Andy Polyakov's implementations up to the kernel for ARMv7,8
NEON, and introduce Eric Biggers' ultra-fast scalar implementation for
CPUs without NEON or for CPUs with slow NEON (Cortex-A5,7).
This commit does the following:
- Adds the glue code for the assembly implementations.
- Renames the ARMv8 code into place, since it can at this point be
used wholesale.
- Merges Andy Polyakov's ARMv7 NEON code with Eric Biggers' <=ARMv7
scalar code.
Commit note: Eric Biggers' scalar code is brand new, and quite possibly
prematurely added to this commit, and so it may require a bit of revision.
This commit delivers approximately the same or much better performance than
the existing crypto API's code and has been measured to do as such on:
- ARM1176JZF-S [ARMv6]
- Cortex-A7 [ARMv7]
- Cortex-A8 [ARMv7]
- Cortex-A9 [ARMv7]
- Cortex-A17 [ARMv7]
- Cortex-A53 [ARMv8]
- Cortex-A55 [ARMv8]
- Cortex-A73 [ARMv8]
- Cortex-A75 [ARMv8]
Interestingly, Andy Polyakov's scalar code is slower than Eric Biggers',
but is also significantly shorter. This has the advantage that it does
not evict other code from L1 cache -- particularly on ARM11 chips -- and
so in certain circumstances it can actually be faster. However, it wasn't
found that this had an affect on any code existing in the kernel today.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Co-authored-by: Eric Biggers <ebiggers@google.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel at lists.infradead.org
---
lib/zinc/Makefile | 2 +
lib/zinc/chacha20/chacha20-arm-glue.h | 88 +++
...acha20-arm-cryptogams.S => chacha20-arm.S} | 502 ++++++++++++++++--
...20-arm64-cryptogams.S => chacha20-arm64.S} | 0
lib/zinc/chacha20/chacha20.c | 2 +
5 files changed, 556 insertions(+), 38 deletions(-)
create mode 100644 lib/zinc/chacha20/chacha20-arm-glue.h
rename lib/zinc/chacha20/{chacha20-arm-cryptogams.S => chacha20-arm.S} (71%)
rename lib/zinc/chacha20/{chacha20-arm64-cryptogams.S => chacha20-arm64.S} (100%)
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 223a0816c918..e47f64e12bbd 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -4,4 +4,6 @@ ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
zinc_chacha20-y := chacha20/chacha20.o
zinc_chacha20-$(CONFIG_ZINC_ARCH_X86_64) += chacha20/chacha20-x86_64.o
+zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM) += chacha20/chacha20-arm.o
+zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o
obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
diff --git a/lib/zinc/chacha20/chacha20-arm-glue.h b/lib/zinc/chacha20/chacha20-arm-glue.h
new file mode 100644
index 000000000000..86cce851ed02
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm-glue.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#if defined(CONFIG_ARM)
+#include <asm/system_info.h>
+#include <asm/cputype.h>
+#endif
+
+asmlinkage void chacha20_arm(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#if defined(CONFIG_ARM)
+asmlinkage void hchacha20_arm(const u32 state[16], u32 out[8]);
+#endif
+#if defined(CONFIG_KERNEL_MODE_NEON)
+asmlinkage void chacha20_neon(u8 *out, const u8 *in, const size_t len,
+ const u32 key[8], const u32 counter[4]);
+#endif
+
+static bool chacha20_use_neon __ro_after_init;
+
+static void __init chacha20_fpu_init(void)
+{
+#if defined(CONFIG_ARM64)
+ chacha20_use_neon = elf_hwcap & HWCAP_ASIMD;
+#elif defined(CONFIG_ARM)
+ switch (read_cpuid_part()) {
+ case ARM_CPU_PART_CORTEX_A7:
+ case ARM_CPU_PART_CORTEX_A5:
+ /* The Cortex-A7 and Cortex-A5 do not perform well with the NEON
+ * implementation but do incredibly with the scalar one and use
+ * less power.
+ */
+ break;
+ default:
+ chacha20_use_neon = elf_hwcap & HWCAP_NEON;
+ }
+#endif
+}
+
+static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst,
+ const u8 *src, size_t len,
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_KERNEL_MODE_NEON)
+ if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 &&
+ simd_use(simd_context))
+ chacha20_neon(dst, src, len, state->key, state->counter);
+ else
+#endif
+ chacha20_arm(dst, src, len, state->key, state->counter);
+
+ state->counter[0] += (len + 63) / 64;
+ return true;
+}
+
+static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_ARM)
+ u32 x[] = { CHACHA20_CONSTANT_EXPA,
+ CHACHA20_CONSTANT_ND_3,
+ CHACHA20_CONSTANT_2_BY,
+ CHACHA20_CONSTANT_TE_K,
+ get_unaligned_le32(key + 0),
+ get_unaligned_le32(key + 4),
+ get_unaligned_le32(key + 8),
+ get_unaligned_le32(key + 12),
+ get_unaligned_le32(key + 16),
+ get_unaligned_le32(key + 20),
+ get_unaligned_le32(key + 24),
+ get_unaligned_le32(key + 28),
+ get_unaligned_le32(nonce + 0),
+ get_unaligned_le32(nonce + 4),
+ get_unaligned_le32(nonce + 8),
+ get_unaligned_le32(nonce + 12)
+ };
+ hchacha20_arm(x, derived_key);
+ return true;
+#else
+ return false;
+#endif
+}
diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm.S
similarity index 71%
rename from lib/zinc/chacha20/chacha20-arm-cryptogams.S
rename to lib/zinc/chacha20/chacha20-arm.S
index 770bab469171..5abedafcf129 100644
--- a/lib/zinc/chacha20/chacha20-arm-cryptogams.S
+++ b/lib/zinc/chacha20/chacha20-arm.S
@@ -1,13 +1,475 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2018 Google, Inc.
* Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
- *
- * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
#include <linux/linkage.h>
+/*
+ * The following scalar routine was written by Eric Biggers.
+ *
+ * Design notes:
+ *
+ * 16 registers would be needed to hold the state matrix, but only 14 are
+ * available because 'sp' and 'pc' cannot be used. So we spill the elements
+ * (x8, x9) to the stack and swap them out with (x10, x11). This adds one
+ * 'ldrd' and one 'strd' instruction per round.
+ *
+ * All rotates are performed using the implicit rotate operand accepted by the
+ * 'add' and 'eor' instructions. This is faster than using explicit rotate
+ * instructions. To make this work, we allow the values in the second and last
+ * rows of the ChaCha state matrix (rows 'b' and 'd') to temporarily have the
+ * wrong rotation amount. The rotation amount is then fixed up just in time
+ * when the values are used. 'brot' is the number of bits the values in row 'b'
+ * need to be rotated right to arrive at the correct values, and 'drot'
+ * similarly for row 'd'. (brot, drot) start out as (0, 0) but we make it such
+ * that they end up as (25, 24) after every round.
+ */
+
+ // ChaCha state registers
+ X0 .req r0
+ X1 .req r1
+ X2 .req r2
+ X3 .req r3
+ X4 .req r4
+ X5 .req r5
+ X6 .req r6
+ X7 .req r7
+ X8_X10 .req r8 // shared by x8 and x10
+ X9_X11 .req r9 // shared by x9 and x11
+ X12 .req r10
+ X13 .req r11
+ X14 .req r12
+ X15 .req r14
+
+.Lexpand_32byte_k:
+ // "expand 32-byte k"
+ .word 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574
+
+#ifdef __thumb2__
+# define adrl adr
+#endif
+
+.macro __rev out, in, t0, t1, t2
+.if __LINUX_ARM_ARCH__ >= 6
+ rev \out, \in
+.else
+ lsl \t0, \in, #24
+ and \t1, \in, #0xff00
+ and \t2, \in, #0xff0000
+ orr \out, \t0, \in, lsr #24
+ orr \out, \out, \t1, lsl #8
+ orr \out, \out, \t2, lsr #8
+.endif
+.endm
+
+.macro _le32_bswap x, t0, t1, t2
+#ifdef __ARMEB__
+ __rev \x, \x, \t0, \t1, \t2
+#endif
+.endm
+
+.macro _le32_bswap_4x a, b, c, d, t0, t1, t2
+ _le32_bswap \a, \t0, \t1, \t2
+ _le32_bswap \b, \t0, \t1, \t2
+ _le32_bswap \c, \t0, \t1, \t2
+ _le32_bswap \d, \t0, \t1, \t2
+.endm
+
+.macro __ldrd a, b, src, offset
+#if __LINUX_ARM_ARCH__ >= 6
+ ldrd \a, \b, [\src, #\offset]
+#else
+ ldr \a, [\src, #\offset]
+ ldr \b, [\src, #\offset + 4]
+#endif
+.endm
+
+.macro __strd a, b, dst, offset
+#if __LINUX_ARM_ARCH__ >= 6
+ strd \a, \b, [\dst, #\offset]
+#else
+ str \a, [\dst, #\offset]
+ str \b, [\dst, #\offset + 4]
+#endif
+.endm
+
+.macro _halfround a1, b1, c1, d1, a2, b2, c2, d2
+
+ // a += b; d ^= a; d = rol(d, 16);
+ add \a1, \a1, \b1, ror #brot
+ add \a2, \a2, \b2, ror #brot
+ eor \d1, \a1, \d1, ror #drot
+ eor \d2, \a2, \d2, ror #drot
+ // drot == 32 - 16 == 16
+
+ // c += d; b ^= c; b = rol(b, 12);
+ add \c1, \c1, \d1, ror #16
+ add \c2, \c2, \d2, ror #16
+ eor \b1, \c1, \b1, ror #brot
+ eor \b2, \c2, \b2, ror #brot
+ // brot == 32 - 12 == 20
+
+ // a += b; d ^= a; d = rol(d, 8);
+ add \a1, \a1, \b1, ror #20
+ add \a2, \a2, \b2, ror #20
+ eor \d1, \a1, \d1, ror #16
+ eor \d2, \a2, \d2, ror #16
+ // drot == 32 - 8 == 24
+
+ // c += d; b ^= c; b = rol(b, 7);
+ add \c1, \c1, \d1, ror #24
+ add \c2, \c2, \d2, ror #24
+ eor \b1, \c1, \b1, ror #20
+ eor \b2, \c2, \b2, ror #20
+ // brot == 32 - 7 == 25
+.endm
+
+.macro _doubleround
+
+ // column round
+
+ // quarterrounds: (x0, x4, x8, x12) and (x1, x5, x9, x13)
+ _halfround X0, X4, X8_X10, X12, X1, X5, X9_X11, X13
+
+ // save (x8, x9); restore (x10, x11)
+ __strd X8_X10, X9_X11, sp, 0
+ __ldrd X8_X10, X9_X11, sp, 8
+
+ // quarterrounds: (x2, x6, x10, x14) and (x3, x7, x11, x15)
+ _halfround X2, X6, X8_X10, X14, X3, X7, X9_X11, X15
+
+ .set brot, 25
+ .set drot, 24
+
+ // diagonal round
+
+ // quarterrounds: (x0, x5, x10, x15) and (x1, x6, x11, x12)
+ _halfround X0, X5, X8_X10, X15, X1, X6, X9_X11, X12
+
+ // save (x10, x11); restore (x8, x9)
+ __strd X8_X10, X9_X11, sp, 8
+ __ldrd X8_X10, X9_X11, sp, 0
+
+ // quarterrounds: (x2, x7, x8, x13) and (x3, x4, x9, x14)
+ _halfround X2, X7, X8_X10, X13, X3, X4, X9_X11, X14
+.endm
+
+.macro _chacha_permute nrounds
+ .set brot, 0
+ .set drot, 0
+ .rept \nrounds / 2
+ _doubleround
+ .endr
+.endm
+
+.macro _chacha nrounds
+
+.Lnext_block\@:
+ // Stack: unused0-unused1 x10-x11 x0-x15 OUT IN LEN
+ // Registers contain x0-x9,x12-x15.
+
+ // Do the core ChaCha permutation to update x0-x15.
+ _chacha_permute \nrounds
+
+ add sp, #8
+ // Stack: x10-x11 orig_x0-orig_x15 OUT IN LEN
+ // Registers contain x0-x9,x12-x15.
+ // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'.
+
+ // Free up some registers (r8-r12,r14) by pushing (x8-x9,x12-x15).
+ push {X8_X10, X9_X11, X12, X13, X14, X15}
+
+ // Load (OUT, IN, LEN).
+ ldr r14, [sp, #96]
+ ldr r12, [sp, #100]
+ ldr r11, [sp, #104]
+
+ orr r10, r14, r12
+
+ // Use slow path if fewer than 64 bytes remain.
+ cmp r11, #64
+ blt .Lxor_slowpath\@
+
+ // Use slow path if IN and/or OUT isn't 4-byte aligned. Needed even on
+ // ARMv6+, since ldmia and stmia (used below) still require alignment.
+ tst r10, #3
+ bne .Lxor_slowpath\@
+
+ // Fast path: XOR 64 bytes of aligned data.
+
+ // Stack: x8-x9 x12-x15 x10-x11 orig_x0-orig_x15 OUT IN LEN
+ // Registers: r0-r7 are x0-x7; r8-r11 are free; r12 is IN; r14 is OUT.
+ // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'.
+
+ // x0-x3
+ __ldrd r8, r9, sp, 32
+ __ldrd r10, r11, sp, 40
+ add X0, X0, r8
+ add X1, X1, r9
+ add X2, X2, r10
+ add X3, X3, r11
+ _le32_bswap_4x X0, X1, X2, X3, r8, r9, r10
+ ldmia r12!, {r8-r11}
+ eor X0, X0, r8
+ eor X1, X1, r9
+ eor X2, X2, r10
+ eor X3, X3, r11
+ stmia r14!, {X0-X3}
+
+ // x4-x7
+ __ldrd r8, r9, sp, 48
+ __ldrd r10, r11, sp, 56
+ add X4, r8, X4, ror #brot
+ add X5, r9, X5, ror #brot
+ ldmia r12!, {X0-X3}
+ add X6, r10, X6, ror #brot
+ add X7, r11, X7, ror #brot
+ _le32_bswap_4x X4, X5, X6, X7, r8, r9, r10
+ eor X4, X4, X0
+ eor X5, X5, X1
+ eor X6, X6, X2
+ eor X7, X7, X3
+ stmia r14!, {X4-X7}
+
+ // x8-x15
+ pop {r0-r7} // (x8-x9,x12-x15,x10-x11)
+ __ldrd r8, r9, sp, 32
+ __ldrd r10, r11, sp, 40
+ add r0, r0, r8 // x8
+ add r1, r1, r9 // x9
+ add r6, r6, r10 // x10
+ add r7, r7, r11 // x11
+ _le32_bswap_4x r0, r1, r6, r7, r8, r9, r10
+ ldmia r12!, {r8-r11}
+ eor r0, r0, r8 // x8
+ eor r1, r1, r9 // x9
+ eor r6, r6, r10 // x10
+ eor r7, r7, r11 // x11
+ stmia r14!, {r0,r1,r6,r7}
+ ldmia r12!, {r0,r1,r6,r7}
+ __ldrd r8, r9, sp, 48
+ __ldrd r10, r11, sp, 56
+ add r2, r8, r2, ror #drot // x12
+ add r3, r9, r3, ror #drot // x13
+ add r4, r10, r4, ror #drot // x14
+ add r5, r11, r5, ror #drot // x15
+ _le32_bswap_4x r2, r3, r4, r5, r9, r10, r11
+ ldr r9, [sp, #72] // load LEN
+ eor r2, r2, r0 // x12
+ eor r3, r3, r1 // x13
+ eor r4, r4, r6 // x14
+ eor r5, r5, r7 // x15
+ subs r9, #64 // decrement and check LEN
+ stmia r14!, {r2-r5}
+
+ beq .Ldone\@
+
+.Lprepare_for_next_block\@:
+
+ // Stack: x0-x15 OUT IN LEN
+
+ // Increment block counter (x12)
+ add r8, #1
+
+ // Store updated (OUT, IN, LEN)
+ str r14, [sp, #64]
+ str r12, [sp, #68]
+ str r9, [sp, #72]
+
+ mov r14, sp
+
+ // Store updated block counter (x12)
+ str r8, [sp, #48]
+
+ sub sp, #16
+
+ // Reload state and do next block
+ ldmia r14!, {r0-r11} // load x0-x11
+ __strd r10, r11, sp, 8 // store x10-x11 before state
+ ldmia r14, {r10-r12,r14} // load x12-x15
+ b .Lnext_block\@
+
+.Lxor_slowpath\@:
+ // Slow path: < 64 bytes remaining, or unaligned input or output buffer.
+ // We handle it by storing the 64 bytes of keystream to the stack, then
+ // XOR-ing the needed portion with the data.
+
+ // Allocate keystream buffer
+ sub sp, #64
+ mov r14, sp
+
+ // Stack: ks0-ks15 x8-x9 x12-x15 x10-x11 orig_x0-orig_x15 OUT IN LEN
+ // Registers: r0-r7 are x0-x7; r8-r11 are free; r12 is IN; r14 is &ks0.
+ // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'.
+
+ // Save keystream for x0-x3
+ __ldrd r8, r9, sp, 96
+ __ldrd r10, r11, sp, 104
+ add X0, X0, r8
+ add X1, X1, r9
+ add X2, X2, r10
+ add X3, X3, r11
+ _le32_bswap_4x X0, X1, X2, X3, r8, r9, r10
+ stmia r14!, {X0-X3}
+
+ // Save keystream for x4-x7
+ __ldrd r8, r9, sp, 112
+ __ldrd r10, r11, sp, 120
+ add X4, r8, X4, ror #brot
+ add X5, r9, X5, ror #brot
+ add X6, r10, X6, ror #brot
+ add X7, r11, X7, ror #brot
+ _le32_bswap_4x X4, X5, X6, X7, r8, r9, r10
+ add r8, sp, #64
+ stmia r14!, {X4-X7}
+
+ // Save keystream for x8-x15
+ ldm r8, {r0-r7} // (x8-x9,x12-x15,x10-x11)
+ __ldrd r8, r9, sp, 128
+ __ldrd r10, r11, sp, 136
+ add r0, r0, r8 // x8
+ add r1, r1, r9 // x9
+ add r6, r6, r10 // x10
+ add r7, r7, r11 // x11
+ _le32_bswap_4x r0, r1, r6, r7, r8, r9, r10
+ stmia r14!, {r0,r1,r6,r7}
+ __ldrd r8, r9, sp, 144
+ __ldrd r10, r11, sp, 152
+ add r2, r8, r2, ror #drot // x12
+ add r3, r9, r3, ror #drot // x13
+ add r4, r10, r4, ror #drot // x14
+ add r5, r11, r5, ror #drot // x15
+ _le32_bswap_4x r2, r3, r4, r5, r9, r10, r11
+ stmia r14, {r2-r5}
+
+ // Stack: ks0-ks15 unused0-unused7 x0-x15 OUT IN LEN
+ // Registers: r8 is block counter, r12 is IN.
+
+ ldr r9, [sp, #168] // LEN
+ ldr r14, [sp, #160] // OUT
+ cmp r9, #64
+ mov r0, sp
+ movle r1, r9
+ movgt r1, #64
+ // r1 is number of bytes to XOR, in range [1, 64]
+
+.if __LINUX_ARM_ARCH__ < 6
+ orr r2, r12, r14
+ tst r2, #3 // IN or OUT misaligned?
+ bne .Lxor_next_byte\@
+.endif
+
+ // XOR a word at a time
+.rept 16
+ subs r1, #4
+ blt .Lxor_words_done\@
+ ldr r2, [r12], #4
+ ldr r3, [r0], #4
+ eor r2, r2, r3
+ str r2, [r14], #4
+.endr
+ b .Lxor_slowpath_done\@
+.Lxor_words_done\@:
+ ands r1, r1, #3
+ beq .Lxor_slowpath_done\@
+
+ // XOR a byte@a time
+.Lxor_next_byte\@:
+ ldrb r2, [r12], #1
+ ldrb r3, [r0], #1
+ eor r2, r2, r3
+ strb r2, [r14], #1
+ subs r1, #1
+ bne .Lxor_next_byte\@
+
+.Lxor_slowpath_done\@:
+ subs r9, #64
+ add sp, #96
+ bgt .Lprepare_for_next_block\@
+
+.Ldone\@:
+.endm // _chacha
+
+/*
+ * void chacha20_arm(u8 *out, const u8 *in, size_t len, const u32 key[8],
+ * const u32 iv[4]);
+ */
+ENTRY(chacha20_arm)
+ cmp r2, #0 // len == 0?
+ bxeq lr
+
+ push {r0-r2,r4-r11,lr}
+
+ // Push state x0-x15 onto stack.
+ // Also store an extra copy of x10-x11 just before the state.
+
+ ldr r4, [sp, #48] // iv
+ mov r0, sp
+ sub sp, #80
+
+ // iv: x12-x15
+ ldm r4, {X12,X13,X14,X15}
+ stmdb r0!, {X12,X13,X14,X15}
+
+ // key: x4-x11
+ __ldrd X8_X10, X9_X11, r3, 24
+ __strd X8_X10, X9_X11, sp, 8
+ stmdb r0!, {X8_X10, X9_X11}
+ ldm r3, {X4-X9_X11}
+ stmdb r0!, {X4-X9_X11}
+
+ // constants: x0-x3
+ adrl X3, .Lexpand_32byte_k
+ ldm X3, {X0-X3}
+ __strd X0, X1, sp, 16
+ __strd X2, X3, sp, 24
+
+ _chacha 20
+
+ add sp, #76
+ pop {r4-r11, pc}
+ENDPROC(chacha20_arm)
+
+/*
+ * void hchacha20_arm(const u32 state[16], u32 out[8]);
+ */
+ENTRY(hchacha20_arm)
+ push {r1,r4-r11,lr}
+
+ mov r14, r0
+ ldmia r14!, {r0-r11} // load x0-x11
+ push {r10-r11} // store x10-x11 to stack
+ ldm r14, {r10-r12,r14} // load x12-x15
+ sub sp, #8
+
+ _chacha_permute 20
+
+ // Skip over (unused0-unused1, x10-x11)
+ add sp, #16
+
+ // Fix up rotations of x12-x15
+ ror X12, X12, #drot
+ ror X13, X13, #drot
+ pop {r4} // load 'out'
+ ror X14, X14, #drot
+ ror X15, X15, #drot
+
+ // Store (x0-x3,x12-x15) to 'out'
+ stm r4, {X0,X1,X2,X3,X12,X13,X14,X15}
+
+ pop {r4-r11,pc}
+ENDPROC(hchacha20_arm)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+/*
+ * This following NEON routine was ported from Andy Polyakov's implementation
+ * from CRYPTOGAMS. It begins with parts of the CRYPTOGAMS scalar routine,
+ * since certain NEON code paths actually branch to it.
+ */
+
.text
#if defined(__thumb2__) || defined(__clang__)
.syntax unified
@@ -22,39 +484,6 @@
#define ldrhsb ldrbhs
#endif
-.align 5
-.Lsigma:
-.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
-.Lone:
-.long 1,0,0,0
-.word -1
-
-.align 5
-ENTRY(chacha20_arm)
- ldr r12,[sp,#0] @ pull pointer to counter and nonce
- stmdb sp!,{r0-r2,r4-r11,lr}
- cmp r2,#0 @ len==0?
-#ifdef __thumb2__
- itt eq
-#endif
- addeq sp,sp,#4*3
- beq .Lno_data_arm
- ldmia r12,{r4-r7} @ load counter and nonce
- sub sp,sp,#4*(16) @ off-load area
-#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__)
- sub r14,pc,#100 @ .Lsigma
-#else
- adr r14,.Lsigma @ .Lsigma
-#endif
- stmdb sp!,{r4-r7} @ copy counter and nonce
- ldmia r3,{r4-r11} @ load key
- ldmia r14,{r0-r3} @ load sigma
- stmdb sp!,{r4-r11} @ copy key
- stmdb sp!,{r0-r3} @ copy sigma
- str r10,[sp,#4*(16+10)] @ off-load "rx"
- str r11,[sp,#4*(16+11)] @ off-load "rx"
- b .Loop_outer_enter
-
.align 4
.Loop_outer:
ldmia sp,{r0-r9} @ load key material
@@ -748,11 +1177,8 @@ ENTRY(chacha20_arm)
.Ldone:
add sp,sp,#4*(32+3)
-.Lno_data_arm:
ldmia sp!,{r4-r11,pc}
-ENDPROC(chacha20_arm)
-#ifdef CONFIG_KERNEL_MODE_NEON
.align 5
.Lsigma2:
.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral
diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64.S
similarity index 100%
rename from lib/zinc/chacha20/chacha20-arm64-cryptogams.S
rename to lib/zinc/chacha20/chacha20-arm64.S
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
index 4354b874a6a5..fc4f74fca653 100644
--- a/lib/zinc/chacha20/chacha20.c
+++ b/lib/zinc/chacha20/chacha20.c
@@ -16,6 +16,8 @@
#if defined(CONFIG_ZINC_ARCH_X86_64)
#include "chacha20-x86_64-glue.h"
+#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64)
+#include "chacha20-arm-glue.h"
#else
void __init chacha20_fpu_init(void)
{
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 08/23] zinc: ChaCha20 MIPS32r2 implementation
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
` (6 preceding siblings ...)
2018-09-25 14:56 ` Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 09/23] zinc: Poly1305 generic C implementations and selftest Jason A. Donenfeld
` (15 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, René van Dorst, Samuel Neves,
Andy Lutomirski, Jean-Philippe Aumasson, Ralf Baechle,
Paul Burton, James Hogan, linux-mips
This MIPS32r2 implementation comes from René van Dorst and me and
results in a nice speedup on the usual OpenWRT targets.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Co-developed-by: René van Dorst <opensource@vdorst.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
---
lib/zinc/Makefile | 2 +
lib/zinc/chacha20/chacha20-mips-glue.h | 27 ++
lib/zinc/chacha20/chacha20-mips.S | 424 +++++++++++++++++++++++++
lib/zinc/chacha20/chacha20.c | 2 +
4 files changed, 455 insertions(+)
create mode 100644 lib/zinc/chacha20/chacha20-mips-glue.h
create mode 100644 lib/zinc/chacha20/chacha20-mips.S
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index e47f64e12bbd..60d568cf5206 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -6,4 +6,6 @@ zinc_chacha20-y := chacha20/chacha20.o
zinc_chacha20-$(CONFIG_ZINC_ARCH_X86_64) += chacha20/chacha20-x86_64.o
zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM) += chacha20/chacha20-arm.o
zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o
+zinc_chacha20-$(CONFIG_ZINC_ARCH_MIPS) += chacha20/chacha20-mips.o
+AFLAGS_chacha20-mips.o += -O2 # This is required to fill the branch delay slots
obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
diff --git a/lib/zinc/chacha20/chacha20-mips-glue.h b/lib/zinc/chacha20/chacha20-mips-glue.h
new file mode 100644
index 000000000000..6e70dd6817d2
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-mips-glue.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+asmlinkage void chacha20_mips(u32 state[16], u8 *out, const u8 *in,
+ const size_t len);
+static void __init chacha20_fpu_init(void)
+{
+}
+
+static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst,
+ const u8 *src, const size_t len,
+ simd_context_t *simd_context)
+{
+ chacha20_mips((u32 *)state, dst, src, len);
+ return true;
+}
+
+
+static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS],
+ const u8 nonce[HCHACHA20_NONCE_SIZE],
+ const u8 key[HCHACHA20_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+ return false;
+}
diff --git a/lib/zinc/chacha20/chacha20-mips.S b/lib/zinc/chacha20/chacha20-mips.S
new file mode 100644
index 000000000000..031ee5e794df
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-mips.S
@@ -0,0 +1,424 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2016-2018 René van Dorst <opensource@vdorst.com>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#define MASK_U32 0x3c
+#define CHACHA20_BLOCK_SIZE 64
+#define STACK_SIZE 32
+
+#define X0 $t0
+#define X1 $t1
+#define X2 $t2
+#define X3 $t3
+#define X4 $t4
+#define X5 $t5
+#define X6 $t6
+#define X7 $t7
+#define X8 $t8
+#define X9 $t9
+#define X10 $v1
+#define X11 $s6
+#define X12 $s5
+#define X13 $s4
+#define X14 $s3
+#define X15 $s2
+/* Use regs which are overwritten on exit for Tx so we don't leak clear data. */
+#define T0 $s1
+#define T1 $s0
+#define T(n) T ## n
+#define X(n) X ## n
+
+/* Input arguments */
+#define STATE $a0
+#define OUT $a1
+#define IN $a2
+#define BYTES $a3
+
+/* Output argument */
+/* NONCE[0] is kept in a register and not in memory.
+ * We don't want to touch original value in memory.
+ * Must be incremented every loop iteration.
+ */
+#define NONCE_0 $v0
+
+/* SAVED_X and SAVED_CA are set in the jump table.
+ * Use regs which are overwritten on exit else we don't leak clear data.
+ * They are used to handling the last bytes which are not multiple of 4.
+ */
+#define SAVED_X X15
+#define SAVED_CA $s7
+
+#define IS_UNALIGNED $s7
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define MSB 0
+#define LSB 3
+#define ROTx rotl
+#define ROTR(n) rotr n, 24
+#define CPU_TO_LE32(n) \
+ wsbh n; \
+ rotr n, 16;
+#else
+#define MSB 3
+#define LSB 0
+#define ROTx rotr
+#define CPU_TO_LE32(n)
+#define ROTR(n)
+#endif
+
+#define FOR_EACH_WORD(x) \
+ x( 0); \
+ x( 1); \
+ x( 2); \
+ x( 3); \
+ x( 4); \
+ x( 5); \
+ x( 6); \
+ x( 7); \
+ x( 8); \
+ x( 9); \
+ x(10); \
+ x(11); \
+ x(12); \
+ x(13); \
+ x(14); \
+ x(15);
+
+#define FOR_EACH_WORD_REV(x) \
+ x(15); \
+ x(14); \
+ x(13); \
+ x(12); \
+ x(11); \
+ x(10); \
+ x( 9); \
+ x( 8); \
+ x( 7); \
+ x( 6); \
+ x( 5); \
+ x( 4); \
+ x( 3); \
+ x( 2); \
+ x( 1); \
+ x( 0);
+
+#define PLUS_ONE_0 1
+#define PLUS_ONE_1 2
+#define PLUS_ONE_2 3
+#define PLUS_ONE_3 4
+#define PLUS_ONE_4 5
+#define PLUS_ONE_5 6
+#define PLUS_ONE_6 7
+#define PLUS_ONE_7 8
+#define PLUS_ONE_8 9
+#define PLUS_ONE_9 10
+#define PLUS_ONE_10 11
+#define PLUS_ONE_11 12
+#define PLUS_ONE_12 13
+#define PLUS_ONE_13 14
+#define PLUS_ONE_14 15
+#define PLUS_ONE_15 16
+#define PLUS_ONE(x) PLUS_ONE_ ## x
+#define _CONCAT3(a,b,c) a ## b ## c
+#define CONCAT3(a,b,c) _CONCAT3(a,b,c)
+
+#define STORE_UNALIGNED(x) \
+CONCAT3(.Lchacha20_mips_xor_unaligned_, PLUS_ONE(x), _b: ;) \
+ .if (x != 12); \
+ lw T0, (x*4)(STATE); \
+ .endif; \
+ lwl T1, (x*4)+MSB ## (IN); \
+ lwr T1, (x*4)+LSB ## (IN); \
+ .if (x == 12); \
+ addu X ## x, NONCE_0; \
+ .else; \
+ addu X ## x, T0; \
+ .endif; \
+ CPU_TO_LE32(X ## x); \
+ xor X ## x, T1; \
+ swl X ## x, (x*4)+MSB ## (OUT); \
+ swr X ## x, (x*4)+LSB ## (OUT);
+
+#define STORE_ALIGNED(x) \
+CONCAT3(.Lchacha20_mips_xor_aligned_, PLUS_ONE(x), _b: ;) \
+ .if (x != 12); \
+ lw T0, (x*4)(STATE); \
+ .endif; \
+ lw T1, (x*4) ## (IN); \
+ .if (x == 12); \
+ addu X ## x, NONCE_0; \
+ .else; \
+ addu X ## x, T0; \
+ .endif; \
+ CPU_TO_LE32(X ## x); \
+ xor X ## x, T1; \
+ sw X ## x, (x*4) ## (OUT);
+
+/* Jump table macro.
+ * Used for setup and handling the last bytes, which are not multiple of 4.
+ * X15 is free to store Xn
+ * Every jumptable entry must be equal in size.
+ */
+#define JMPTBL_ALIGNED(x) \
+.Lchacha20_mips_jmptbl_aligned_ ## x: ; \
+ .set noreorder; \
+ b .Lchacha20_mips_xor_aligned_ ## x ## _b; \
+ .if (x == 12); \
+ addu SAVED_X, X ## x, NONCE_0; \
+ .else; \
+ addu SAVED_X, X ## x, SAVED_CA; \
+ .endif; \
+ .set reorder
+
+#define JMPTBL_UNALIGNED(x) \
+.Lchacha20_mips_jmptbl_unaligned_ ## x: ; \
+ .set noreorder; \
+ b .Lchacha20_mips_xor_unaligned_ ## x ## _b; \
+ .if (x == 12); \
+ addu SAVED_X, X ## x, NONCE_0; \
+ .else; \
+ addu SAVED_X, X ## x, SAVED_CA; \
+ .endif; \
+ .set reorder
+
+#define AXR(A, B, C, D, K, L, M, N, V, W, Y, Z, S) \
+ addu X(A), X(K); \
+ addu X(B), X(L); \
+ addu X(C), X(M); \
+ addu X(D), X(N); \
+ xor X(V), X(A); \
+ xor X(W), X(B); \
+ xor X(Y), X(C); \
+ xor X(Z), X(D); \
+ rotl X(V), S; \
+ rotl X(W), S; \
+ rotl X(Y), S; \
+ rotl X(Z), S;
+
+.text
+.set reorder
+.set noat
+.globl chacha20_mips
+.ent chacha20_mips
+chacha20_mips:
+ .frame $sp, STACK_SIZE, $ra
+
+ addiu $sp, -STACK_SIZE
+
+ /* Return bytes = 0. */
+ beqz BYTES, .Lchacha20_mips_end
+
+ lw NONCE_0, 48(STATE)
+
+ /* Save s0-s7 */
+ sw $s0, 0($sp)
+ sw $s1, 4($sp)
+ sw $s2, 8($sp)
+ sw $s3, 12($sp)
+ sw $s4, 16($sp)
+ sw $s5, 20($sp)
+ sw $s6, 24($sp)
+ sw $s7, 28($sp)
+
+ /* Test IN or OUT is unaligned.
+ * IS_UNALIGNED = ( IN | OUT ) & 0x00000003
+ */
+ or IS_UNALIGNED, IN, OUT
+ andi IS_UNALIGNED, 0x3
+
+ /* Set number of rounds */
+ li $at, 20
+
+ b .Lchacha20_rounds_start
+
+.align 4
+.Loop_chacha20_rounds:
+ addiu IN, CHACHA20_BLOCK_SIZE
+ addiu OUT, CHACHA20_BLOCK_SIZE
+ addiu NONCE_0, 1
+
+.Lchacha20_rounds_start:
+ lw X0, 0(STATE)
+ lw X1, 4(STATE)
+ lw X2, 8(STATE)
+ lw X3, 12(STATE)
+
+ lw X4, 16(STATE)
+ lw X5, 20(STATE)
+ lw X6, 24(STATE)
+ lw X7, 28(STATE)
+ lw X8, 32(STATE)
+ lw X9, 36(STATE)
+ lw X10, 40(STATE)
+ lw X11, 44(STATE)
+
+ move X12, NONCE_0
+ lw X13, 52(STATE)
+ lw X14, 56(STATE)
+ lw X15, 60(STATE)
+
+.Loop_chacha20_xor_rounds:
+ addiu $at, -2
+ AXR( 0, 1, 2, 3, 4, 5, 6, 7, 12,13,14,15, 16);
+ AXR( 8, 9,10,11, 12,13,14,15, 4, 5, 6, 7, 12);
+ AXR( 0, 1, 2, 3, 4, 5, 6, 7, 12,13,14,15, 8);
+ AXR( 8, 9,10,11, 12,13,14,15, 4, 5, 6, 7, 7);
+ AXR( 0, 1, 2, 3, 5, 6, 7, 4, 15,12,13,14, 16);
+ AXR(10,11, 8, 9, 15,12,13,14, 5, 6, 7, 4, 12);
+ AXR( 0, 1, 2, 3, 5, 6, 7, 4, 15,12,13,14, 8);
+ AXR(10,11, 8, 9, 15,12,13,14, 5, 6, 7, 4, 7);
+ bnez $at, .Loop_chacha20_xor_rounds
+
+ addiu BYTES, -(CHACHA20_BLOCK_SIZE)
+
+ /* Is data src/dst unaligned? Jump */
+ bnez IS_UNALIGNED, .Loop_chacha20_unaligned
+
+ /* Set number rounds here to fill delayslot. */
+ li $at, 20
+
+ /* BYTES < 0, it has no full block. */
+ bltz BYTES, .Lchacha20_mips_no_full_block_aligned
+
+ FOR_EACH_WORD_REV(STORE_ALIGNED)
+
+ /* BYTES > 0? Loop again. */
+ bgtz BYTES, .Loop_chacha20_rounds
+
+ /* Place this here to fill delay slot */
+ addiu NONCE_0, 1
+
+ /* BYTES < 0? Handle last bytes */
+ bltz BYTES, .Lchacha20_mips_xor_bytes
+
+.Lchacha20_mips_xor_done:
+ /* Restore used registers */
+ lw $s0, 0($sp)
+ lw $s1, 4($sp)
+ lw $s2, 8($sp)
+ lw $s3, 12($sp)
+ lw $s4, 16($sp)
+ lw $s5, 20($sp)
+ lw $s6, 24($sp)
+ lw $s7, 28($sp)
+
+ /* Write NONCE_0 back to right location in state */
+ sw NONCE_0, 48(STATE)
+
+.Lchacha20_mips_end:
+ addiu $sp, STACK_SIZE
+ jr $ra
+
+.Lchacha20_mips_no_full_block_aligned:
+ /* Restore the offset on BYTES */
+ addiu BYTES, CHACHA20_BLOCK_SIZE
+
+ /* Get number of full WORDS */
+ andi $at, BYTES, MASK_U32
+
+ /* Load upper half of jump table addr */
+ lui T0, %hi(.Lchacha20_mips_jmptbl_aligned_0)
+
+ /* Calculate lower half jump table offset */
+ ins T0, $at, 1, 6
+
+ /* Add offset to STATE */
+ addu T1, STATE, $at
+
+ /* Add lower half jump table addr */
+ addiu T0, %lo(.Lchacha20_mips_jmptbl_aligned_0)
+
+ /* Read value from STATE */
+ lw SAVED_CA, 0(T1)
+
+ /* Store remaining bytecounter as negative value */
+ subu BYTES, $at, BYTES
+
+ jr T0
+
+ /* Jump table */
+ FOR_EACH_WORD(JMPTBL_ALIGNED)
+
+
+.Loop_chacha20_unaligned:
+ /* Set number rounds here to fill delayslot. */
+ li $at, 20
+
+ /* BYTES > 0, it has no full block. */
+ bltz BYTES, .Lchacha20_mips_no_full_block_unaligned
+
+ FOR_EACH_WORD_REV(STORE_UNALIGNED)
+
+ /* BYTES > 0? Loop again. */
+ bgtz BYTES, .Loop_chacha20_rounds
+
+ /* Write NONCE_0 back to right location in state */
+ sw NONCE_0, 48(STATE)
+
+ .set noreorder
+ /* Fall through to byte handling */
+ bgez BYTES, .Lchacha20_mips_xor_done
+.Lchacha20_mips_xor_unaligned_0_b:
+.Lchacha20_mips_xor_aligned_0_b:
+ /* Place this here to fill delay slot */
+ addiu NONCE_0, 1
+ .set reorder
+
+.Lchacha20_mips_xor_bytes:
+ addu IN, $at
+ addu OUT, $at
+ /* First byte */
+ lbu T1, 0(IN)
+ addiu $at, BYTES, 1
+ CPU_TO_LE32(SAVED_X)
+ ROTR(SAVED_X)
+ xor T1, SAVED_X
+ sb T1, 0(OUT)
+ beqz $at, .Lchacha20_mips_xor_done
+ /* Second byte */
+ lbu T1, 1(IN)
+ addiu $at, BYTES, 2
+ ROTx SAVED_X, 8
+ xor T1, SAVED_X
+ sb T1, 1(OUT)
+ beqz $at, .Lchacha20_mips_xor_done
+ /* Third byte */
+ lbu T1, 2(IN)
+ ROTx SAVED_X, 8
+ xor T1, SAVED_X
+ sb T1, 2(OUT)
+ b .Lchacha20_mips_xor_done
+
+.Lchacha20_mips_no_full_block_unaligned:
+ /* Restore the offset on BYTES */
+ addiu BYTES, CHACHA20_BLOCK_SIZE
+
+ /* Get number of full WORDS */
+ andi $at, BYTES, MASK_U32
+
+ /* Load upper half of jump table addr */
+ lui T0, %hi(.Lchacha20_mips_jmptbl_unaligned_0)
+
+ /* Calculate lower half jump table offset */
+ ins T0, $at, 1, 6
+
+ /* Add offset to STATE */
+ addu T1, STATE, $at
+
+ /* Add lower half jump table addr */
+ addiu T0, %lo(.Lchacha20_mips_jmptbl_unaligned_0)
+
+ /* Read value from STATE */
+ lw SAVED_CA, 0(T1)
+
+ /* Store remaining bytecounter as negative value */
+ subu BYTES, $at, BYTES
+
+ jr T0
+
+ /* Jump table */
+ FOR_EACH_WORD(JMPTBL_UNALIGNED)
+.end chacha20_mips
+.set at
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
index fc4f74fca653..34cb35528c3d 100644
--- a/lib/zinc/chacha20/chacha20.c
+++ b/lib/zinc/chacha20/chacha20.c
@@ -18,6 +18,8 @@
#include "chacha20-x86_64-glue.h"
#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64)
#include "chacha20-arm-glue.h"
+#elif defined(CONFIG_ZINC_ARCH_MIPS)
+#include "chacha20-mips-glue.h"
#else
void __init chacha20_fpu_init(void)
{
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 09/23] zinc: Poly1305 generic C implementations and selftest
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
` (7 preceding siblings ...)
2018-09-25 14:56 ` [PATCH net-next v6 08/23] zinc: ChaCha20 MIPS32r2 implementation Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 10/23] zinc: Poly1305 x86_64 implementation Jason A. Donenfeld
` (14 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson
These two C implementations -- a 32x32 one and a 64x64 one, depending on
the platform -- come from Andrew Moon's public domain poly1305-donna
portable code, modified for usage in the kernel and for usage with
accelerated primitives.
Information: https://cr.yp.to/mac.html
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
include/zinc/poly1305.h | 31 +
lib/zinc/Kconfig | 3 +
lib/zinc/Makefile | 3 +
lib/zinc/poly1305/poly1305-donna32.h | 205 +++++
lib/zinc/poly1305/poly1305-donna64.h | 182 +++++
lib/zinc/poly1305/poly1305.c | 155 ++++
lib/zinc/selftest/poly1305.h | 1112 ++++++++++++++++++++++++++
7 files changed, 1691 insertions(+)
create mode 100644 include/zinc/poly1305.h
create mode 100644 lib/zinc/poly1305/poly1305-donna32.h
create mode 100644 lib/zinc/poly1305/poly1305-donna64.h
create mode 100644 lib/zinc/poly1305/poly1305.c
create mode 100644 lib/zinc/selftest/poly1305.h
diff --git a/include/zinc/poly1305.h b/include/zinc/poly1305.h
new file mode 100644
index 000000000000..13fe0e50fc3c
--- /dev/null
+++ b/include/zinc/poly1305.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_POLY1305_H
+#define _ZINC_POLY1305_H
+
+#include <linux/simd.h>
+#include <linux/types.h>
+
+enum poly1305_lengths {
+ POLY1305_BLOCK_SIZE = 16,
+ POLY1305_KEY_SIZE = 32,
+ POLY1305_MAC_SIZE = 16
+};
+
+struct poly1305_ctx {
+ u8 opaque[24 * sizeof(u64)];
+ u32 nonce[4];
+ u8 data[POLY1305_BLOCK_SIZE];
+ size_t num;
+} __aligned(8);
+
+void poly1305_init(struct poly1305_ctx *ctx, const u8 key[POLY1305_KEY_SIZE]);
+void poly1305_update(struct poly1305_ctx *ctx, const u8 *input, size_t len,
+ simd_context_t *simd_context);
+void poly1305_final(struct poly1305_ctx *ctx, u8 mac[POLY1305_MAC_SIZE],
+ simd_context_t *simd_context);
+
+#endif /* _ZINC_POLY1305_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index 1ca1ae1e9ea9..f9ef8f7e3c25 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -2,6 +2,9 @@ config ZINC_CHACHA20
tristate
select CRYPTO_ALGAPI
+config ZINC_POLY1305
+ tristate
+
config ZINC_DEBUG
bool "Zinc cryptography library debugging and self-tests"
help
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 60d568cf5206..6fc9626c55fa 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -9,3 +9,6 @@ zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o
zinc_chacha20-$(CONFIG_ZINC_ARCH_MIPS) += chacha20/chacha20-mips.o
AFLAGS_chacha20-mips.o += -O2 # This is required to fill the branch delay slots
obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
+
+zinc_poly1305-y := poly1305/poly1305.o
+obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o
diff --git a/lib/zinc/poly1305/poly1305-donna32.h b/lib/zinc/poly1305/poly1305-donna32.h
new file mode 100644
index 000000000000..2e7565dd9679
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-donna32.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is based in part on Andrew Moon's poly1305-donna, which is in the
+ * public domain.
+ */
+
+struct poly1305_internal {
+ u32 h[5];
+ u32 r[5];
+ u32 s[4];
+};
+
+static void poly1305_init_generic(void *ctx, const u8 key[16])
+{
+ struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+
+ /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
+ st->r[0] = (get_unaligned_le32(&key[0])) & 0x3ffffff;
+ st->r[1] = (get_unaligned_le32(&key[3]) >> 2) & 0x3ffff03;
+ st->r[2] = (get_unaligned_le32(&key[6]) >> 4) & 0x3ffc0ff;
+ st->r[3] = (get_unaligned_le32(&key[9]) >> 6) & 0x3f03fff;
+ st->r[4] = (get_unaligned_le32(&key[12]) >> 8) & 0x00fffff;
+
+ /* s = 5*r */
+ st->s[0] = st->r[1] * 5;
+ st->s[1] = st->r[2] * 5;
+ st->s[2] = st->r[3] * 5;
+ st->s[3] = st->r[4] * 5;
+
+ /* h = 0 */
+ st->h[0] = 0;
+ st->h[1] = 0;
+ st->h[2] = 0;
+ st->h[3] = 0;
+ st->h[4] = 0;
+}
+
+static void poly1305_blocks_generic(void *ctx, const u8 *input, size_t len,
+ const u32 padbit)
+{
+ struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+ const u32 hibit = padbit << 24;
+ u32 r0, r1, r2, r3, r4;
+ u32 s1, s2, s3, s4;
+ u32 h0, h1, h2, h3, h4;
+ u64 d0, d1, d2, d3, d4;
+ u32 c;
+
+ r0 = st->r[0];
+ r1 = st->r[1];
+ r2 = st->r[2];
+ r3 = st->r[3];
+ r4 = st->r[4];
+
+ s1 = st->s[0];
+ s2 = st->s[1];
+ s3 = st->s[2];
+ s4 = st->s[3];
+
+ h0 = st->h[0];
+ h1 = st->h[1];
+ h2 = st->h[2];
+ h3 = st->h[3];
+ h4 = st->h[4];
+
+ while (len >= POLY1305_BLOCK_SIZE) {
+ /* h += m[i] */
+ h0 += (get_unaligned_le32(&input[0])) & 0x3ffffff;
+ h1 += (get_unaligned_le32(&input[3]) >> 2) & 0x3ffffff;
+ h2 += (get_unaligned_le32(&input[6]) >> 4) & 0x3ffffff;
+ h3 += (get_unaligned_le32(&input[9]) >> 6) & 0x3ffffff;
+ h4 += (get_unaligned_le32(&input[12]) >> 8) | hibit;
+
+ /* h *= r */
+ d0 = ((u64)h0 * r0) + ((u64)h1 * s4) +
+ ((u64)h2 * s3) + ((u64)h3 * s2) +
+ ((u64)h4 * s1);
+ d1 = ((u64)h0 * r1) + ((u64)h1 * r0) +
+ ((u64)h2 * s4) + ((u64)h3 * s3) +
+ ((u64)h4 * s2);
+ d2 = ((u64)h0 * r2) + ((u64)h1 * r1) +
+ ((u64)h2 * r0) + ((u64)h3 * s4) +
+ ((u64)h4 * s3);
+ d3 = ((u64)h0 * r3) + ((u64)h1 * r2) +
+ ((u64)h2 * r1) + ((u64)h3 * r0) +
+ ((u64)h4 * s4);
+ d4 = ((u64)h0 * r4) + ((u64)h1 * r3) +
+ ((u64)h2 * r2) + ((u64)h3 * r1) +
+ ((u64)h4 * r0);
+
+ /* (partial) h %= p */
+ c = (u32)(d0 >> 26);
+ h0 = (u32)d0 & 0x3ffffff;
+ d1 += c;
+ c = (u32)(d1 >> 26);
+ h1 = (u32)d1 & 0x3ffffff;
+ d2 += c;
+ c = (u32)(d2 >> 26);
+ h2 = (u32)d2 & 0x3ffffff;
+ d3 += c;
+ c = (u32)(d3 >> 26);
+ h3 = (u32)d3 & 0x3ffffff;
+ d4 += c;
+ c = (u32)(d4 >> 26);
+ h4 = (u32)d4 & 0x3ffffff;
+ h0 += c * 5;
+ c = (h0 >> 26);
+ h0 = h0 & 0x3ffffff;
+ h1 += c;
+
+ input += POLY1305_BLOCK_SIZE;
+ len -= POLY1305_BLOCK_SIZE;
+ }
+
+ st->h[0] = h0;
+ st->h[1] = h1;
+ st->h[2] = h2;
+ st->h[3] = h3;
+ st->h[4] = h4;
+}
+
+static void poly1305_emit_generic(void *ctx, u8 mac[16], const u32 nonce[4])
+{
+ struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+ u32 h0, h1, h2, h3, h4, c;
+ u32 g0, g1, g2, g3, g4;
+ u64 f;
+ u32 mask;
+
+ /* fully carry h */
+ h0 = st->h[0];
+ h1 = st->h[1];
+ h2 = st->h[2];
+ h3 = st->h[3];
+ h4 = st->h[4];
+
+ c = h1 >> 26;
+ h1 = h1 & 0x3ffffff;
+ h2 += c;
+ c = h2 >> 26;
+ h2 = h2 & 0x3ffffff;
+ h3 += c;
+ c = h3 >> 26;
+ h3 = h3 & 0x3ffffff;
+ h4 += c;
+ c = h4 >> 26;
+ h4 = h4 & 0x3ffffff;
+ h0 += c * 5;
+ c = h0 >> 26;
+ h0 = h0 & 0x3ffffff;
+ h1 += c;
+
+ /* compute h + -p */
+ g0 = h0 + 5;
+ c = g0 >> 26;
+ g0 &= 0x3ffffff;
+ g1 = h1 + c;
+ c = g1 >> 26;
+ g1 &= 0x3ffffff;
+ g2 = h2 + c;
+ c = g2 >> 26;
+ g2 &= 0x3ffffff;
+ g3 = h3 + c;
+ c = g3 >> 26;
+ g3 &= 0x3ffffff;
+ g4 = h4 + c - (1UL << 26);
+
+ /* select h if h < p, or h + -p if h >= p */
+ mask = (g4 >> ((sizeof(u32) * 8) - 1)) - 1;
+ g0 &= mask;
+ g1 &= mask;
+ g2 &= mask;
+ g3 &= mask;
+ g4 &= mask;
+ mask = ~mask;
+
+ h0 = (h0 & mask) | g0;
+ h1 = (h1 & mask) | g1;
+ h2 = (h2 & mask) | g2;
+ h3 = (h3 & mask) | g3;
+ h4 = (h4 & mask) | g4;
+
+ /* h = h % (2^128) */
+ h0 = ((h0) | (h1 << 26)) & 0xffffffff;
+ h1 = ((h1 >> 6) | (h2 << 20)) & 0xffffffff;
+ h2 = ((h2 >> 12) | (h3 << 14)) & 0xffffffff;
+ h3 = ((h3 >> 18) | (h4 << 8)) & 0xffffffff;
+
+ /* mac = (h + nonce) % (2^128) */
+ f = (u64)h0 + nonce[0];
+ h0 = (u32)f;
+ f = (u64)h1 + nonce[1] + (f >> 32);
+ h1 = (u32)f;
+ f = (u64)h2 + nonce[2] + (f >> 32);
+ h2 = (u32)f;
+ f = (u64)h3 + nonce[3] + (f >> 32);
+ h3 = (u32)f;
+
+ put_unaligned_le32(h0, &mac[0]);
+ put_unaligned_le32(h1, &mac[4]);
+ put_unaligned_le32(h2, &mac[8]);
+ put_unaligned_le32(h3, &mac[12]);
+}
diff --git a/lib/zinc/poly1305/poly1305-donna64.h b/lib/zinc/poly1305/poly1305-donna64.h
new file mode 100644
index 000000000000..7850e0a11600
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-donna64.h
@@ -0,0 +1,182 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is based in part on Andrew Moon's poly1305-donna, which is in the
+ * public domain.
+ */
+
+typedef __uint128_t u128;
+
+struct poly1305_internal {
+ u64 r[3];
+ u64 h[3];
+ u64 s[2];
+};
+
+static void poly1305_init_generic(void *ctx, const u8 key[16])
+{
+ struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+ u64 t0, t1;
+
+ /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
+ t0 = get_unaligned_le64(&key[0]);
+ t1 = get_unaligned_le64(&key[8]);
+
+ st->r[0] = t0 & 0xffc0fffffff;
+ st->r[1] = ((t0 >> 44) | (t1 << 20)) & 0xfffffc0ffff;
+ st->r[2] = ((t1 >> 24)) & 0x00ffffffc0f;
+
+ /* s = 20*r */
+ st->s[0] = st->r[1] * 20;
+ st->s[1] = st->r[2] * 20;
+
+ /* h = 0 */
+ st->h[0] = 0;
+ st->h[1] = 0;
+ st->h[2] = 0;
+}
+
+static void poly1305_blocks_generic(void *ctx, const u8 *input, size_t len,
+ const u32 padbit)
+{
+ struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+ const u64 hibit = ((u64)padbit) << 40;
+ u64 r0, r1, r2;
+ u64 s1, s2;
+ u64 h0, h1, h2;
+ u64 c;
+ u128 d0, d1, d2, d;
+
+ r0 = st->r[0];
+ r1 = st->r[1];
+ r2 = st->r[2];
+
+ h0 = st->h[0];
+ h1 = st->h[1];
+ h2 = st->h[2];
+
+ s1 = st->s[0];
+ s2 = st->s[1];
+
+ while (len >= POLY1305_BLOCK_SIZE) {
+ u64 t0, t1;
+
+ /* h += m[i] */
+ t0 = get_unaligned_le64(&input[0]);
+ t1 = get_unaligned_le64(&input[8]);
+
+ h0 += t0 & 0xfffffffffff;
+ h1 += ((t0 >> 44) | (t1 << 20)) & 0xfffffffffff;
+ h2 += (((t1 >> 24)) & 0x3ffffffffff) | hibit;
+
+ /* h *= r */
+ d0 = (u128)h0 * r0;
+ d = (u128)h1 * s2;
+ d0 += d;
+ d = (u128)h2 * s1;
+ d0 += d;
+ d1 = (u128)h0 * r1;
+ d = (u128)h1 * r0;
+ d1 += d;
+ d = (u128)h2 * s2;
+ d1 += d;
+ d2 = (u128)h0 * r2;
+ d = (u128)h1 * r1;
+ d2 += d;
+ d = (u128)h2 * r0;
+ d2 += d;
+
+ /* (partial) h %= p */
+ c = (u64)(d0 >> 44);
+ h0 = (u64)d0 & 0xfffffffffff;
+ d1 += c;
+ c = (u64)(d1 >> 44);
+ h1 = (u64)d1 & 0xfffffffffff;
+ d2 += c;
+ c = (u64)(d2 >> 42);
+ h2 = (u64)d2 & 0x3ffffffffff;
+ h0 += c * 5;
+ c = h0 >> 44;
+ h0 = h0 & 0xfffffffffff;
+ h1 += c;
+
+ input += POLY1305_BLOCK_SIZE;
+ len -= POLY1305_BLOCK_SIZE;
+ }
+
+ st->h[0] = h0;
+ st->h[1] = h1;
+ st->h[2] = h2;
+}
+
+static void poly1305_emit_generic(void *ctx, u8 mac[16], const u32 nonce[4])
+{
+ struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+ u64 h0, h1, h2, c;
+ u64 g0, g1, g2;
+ u64 t0, t1;
+
+ /* fully carry h */
+ h0 = st->h[0];
+ h1 = st->h[1];
+ h2 = st->h[2];
+
+ c = h1 >> 44;
+ h1 &= 0xfffffffffff;
+ h2 += c;
+ c = h2 >> 42;
+ h2 &= 0x3ffffffffff;
+ h0 += c * 5;
+ c = h0 >> 44;
+ h0 &= 0xfffffffffff;
+ h1 += c;
+ c = h1 >> 44;
+ h1 &= 0xfffffffffff;
+ h2 += c;
+ c = h2 >> 42;
+ h2 &= 0x3ffffffffff;
+ h0 += c * 5;
+ c = h0 >> 44;
+ h0 &= 0xfffffffffff;
+ h1 += c;
+
+ /* compute h + -p */
+ g0 = h0 + 5;
+ c = g0 >> 44;
+ g0 &= 0xfffffffffff;
+ g1 = h1 + c;
+ c = g1 >> 44;
+ g1 &= 0xfffffffffff;
+ g2 = h2 + c - (1ULL << 42);
+
+ /* select h if h < p, or h + -p if h >= p */
+ c = (g2 >> ((sizeof(u64) * 8) - 1)) - 1;
+ g0 &= c;
+ g1 &= c;
+ g2 &= c;
+ c = ~c;
+ h0 = (h0 & c) | g0;
+ h1 = (h1 & c) | g1;
+ h2 = (h2 & c) | g2;
+
+ /* h = (h + nonce) */
+ t0 = ((u64)nonce[1] << 32) | nonce[0];
+ t1 = ((u64)nonce[3] << 32) | nonce[2];
+
+ h0 += t0 & 0xfffffffffff;
+ c = h0 >> 44;
+ h0 &= 0xfffffffffff;
+ h1 += (((t0 >> 44) | (t1 << 20)) & 0xfffffffffff) + c;
+ c = h1 >> 44;
+ h1 &= 0xfffffffffff;
+ h2 += (((t1 >> 24)) & 0x3ffffffffff) + c;
+ h2 &= 0x3ffffffffff;
+
+ /* mac = h % (2^128) */
+ h0 = h0 | (h1 << 44);
+ h1 = (h1 >> 20) | (h2 << 24);
+
+ put_unaligned_le64(h0, &mac[0]);
+ put_unaligned_le64(h1, &mac[8]);
+}
diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c
new file mode 100644
index 000000000000..538abc359d1d
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Implementation of the Poly1305 message authenticator.
+ *
+ * Information: https://cr.yp.to/mac.html
+ */
+
+#include <zinc/poly1305.h>
+
+#include <asm/unaligned.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/module.h>
+#include <linux/init.h>
+
+#ifndef HAVE_POLY1305_ARCH_IMPLEMENTATION
+static inline bool poly1305_init_arch(void *ctx,
+ const u8 key[POLY1305_KEY_SIZE])
+{
+ return false;
+}
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *input,
+ const size_t len, const u32 padbit,
+ simd_context_t *simd_context)
+{
+ return false;
+}
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4],
+ simd_context_t *simd_context)
+{
+ return false;
+}
+void __init poly1305_fpu_init(void)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
+#include "poly1305-donna64.h"
+#else
+#include "poly1305-donna32.h"
+#endif
+
+void poly1305_init(struct poly1305_ctx *ctx, const u8 key[POLY1305_KEY_SIZE])
+{
+ ctx->nonce[0] = get_unaligned_le32(&key[16]);
+ ctx->nonce[1] = get_unaligned_le32(&key[20]);
+ ctx->nonce[2] = get_unaligned_le32(&key[24]);
+ ctx->nonce[3] = get_unaligned_le32(&key[28]);
+
+ if (!poly1305_init_arch(ctx->opaque, key))
+ poly1305_init_generic(ctx->opaque, key);
+
+ ctx->num = 0;
+}
+EXPORT_SYMBOL(poly1305_init);
+
+static inline void poly1305_blocks(void *ctx, const u8 *input, const size_t len,
+ const u32 padbit,
+ simd_context_t *simd_context)
+{
+ if (!poly1305_blocks_arch(ctx, input, len, padbit, simd_context))
+ poly1305_blocks_generic(ctx, input, len, padbit);
+}
+
+static inline void poly1305_emit(void *ctx, u8 mac[POLY1305_KEY_SIZE],
+ const u32 nonce[4],
+ simd_context_t *simd_context)
+{
+ if (!poly1305_emit_arch(ctx, mac, nonce, simd_context))
+ poly1305_emit_generic(ctx, mac, nonce);
+}
+
+void poly1305_update(struct poly1305_ctx *ctx, const u8 *input, size_t len,
+ simd_context_t *simd_context)
+{
+ const size_t num = ctx->num;
+ size_t rem;
+
+ if (num) {
+ rem = POLY1305_BLOCK_SIZE - num;
+ if (len < rem) {
+ memcpy(ctx->data + num, input, len);
+ ctx->num = num + len;
+ return;
+ }
+ memcpy(ctx->data + num, input, rem);
+ poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 1,
+ simd_context);
+ input += rem;
+ len -= rem;
+ }
+
+ rem = len % POLY1305_BLOCK_SIZE;
+ len -= rem;
+
+ if (len >= POLY1305_BLOCK_SIZE) {
+ poly1305_blocks(ctx->opaque, input, len, 1, simd_context);
+ input += len;
+ }
+
+ if (rem)
+ memcpy(ctx->data, input, rem);
+
+ ctx->num = rem;
+}
+EXPORT_SYMBOL(poly1305_update);
+
+void poly1305_final(struct poly1305_ctx *ctx, u8 mac[POLY1305_MAC_SIZE],
+ simd_context_t *simd_context)
+{
+ size_t num = ctx->num;
+
+ if (num) {
+ ctx->data[num++] = 1;
+ while (num < POLY1305_BLOCK_SIZE)
+ ctx->data[num++] = 0;
+ poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 0,
+ simd_context);
+ }
+
+ poly1305_emit(ctx->opaque, mac, ctx->nonce, simd_context);
+
+ memzero_explicit(ctx, sizeof(*ctx));
+}
+EXPORT_SYMBOL(poly1305_final);
+
+#include "../selftest/poly1305.h"
+
+static bool nosimd __initdata = false;
+
+static int __init mod_init(void)
+{
+ if (!nosimd)
+ poly1305_fpu_init();
+#ifdef DEBUG
+ if (!poly1305_selftest())
+ return -ENOTRECOVERABLE;
+#endif
+ return 0;
+}
+
+static void __exit mod_exit(void)
+{
+}
+
+module_param(nosimd, bool, 0);
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Poly1305 one-time authenticator");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
diff --git a/lib/zinc/selftest/poly1305.h b/lib/zinc/selftest/poly1305.h
new file mode 100644
index 000000000000..e806ab2856fe
--- /dev/null
+++ b/lib/zinc/selftest/poly1305.h
@@ -0,0 +1,1112 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+struct poly1305_testvec {
+ const u8 *input, *output, *key;
+ size_t ilen;
+};
+
+/* RFC7539 */
+static const u8 input01[] __initconst = {
+ 0x43, 0x72, 0x79, 0x70, 0x74, 0x6f, 0x67, 0x72,
+ 0x61, 0x70, 0x68, 0x69, 0x63, 0x20, 0x46, 0x6f,
+ 0x72, 0x75, 0x6d, 0x20, 0x52, 0x65, 0x73, 0x65,
+ 0x61, 0x72, 0x63, 0x68, 0x20, 0x47, 0x72, 0x6f,
+ 0x75, 0x70
+};
+static const u8 output01[] __initconst = {
+ 0xa8, 0x06, 0x1d, 0xc1, 0x30, 0x51, 0x36, 0xc6,
+ 0xc2, 0x2b, 0x8b, 0xaf, 0x0c, 0x01, 0x27, 0xa9
+};
+static const u8 key01[] __initconst = {
+ 0x85, 0xd6, 0xbe, 0x78, 0x57, 0x55, 0x6d, 0x33,
+ 0x7f, 0x44, 0x52, 0xfe, 0x42, 0xd5, 0x06, 0xa8,
+ 0x01, 0x03, 0x80, 0x8a, 0xfb, 0x0d, 0xb2, 0xfd,
+ 0x4a, 0xbf, 0xf6, 0xaf, 0x41, 0x49, 0xf5, 0x1b
+};
+
+/* "The Poly1305-AES message-authentication code" */
+static const u8 input02[] __initconst = {
+ 0xf3, 0xf6
+};
+static const u8 output02[] __initconst = {
+ 0xf4, 0xc6, 0x33, 0xc3, 0x04, 0x4f, 0xc1, 0x45,
+ 0xf8, 0x4f, 0x33, 0x5c, 0xb8, 0x19, 0x53, 0xde
+};
+static const u8 key02[] __initconst = {
+ 0x85, 0x1f, 0xc4, 0x0c, 0x34, 0x67, 0xac, 0x0b,
+ 0xe0, 0x5c, 0xc2, 0x04, 0x04, 0xf3, 0xf7, 0x00,
+ 0x58, 0x0b, 0x3b, 0x0f, 0x94, 0x47, 0xbb, 0x1e,
+ 0x69, 0xd0, 0x95, 0xb5, 0x92, 0x8b, 0x6d, 0xbc
+};
+
+static const u8 input03[] __initconst = { };
+static const u8 output03[] __initconst = {
+ 0xdd, 0x3f, 0xab, 0x22, 0x51, 0xf1, 0x1a, 0xc7,
+ 0x59, 0xf0, 0x88, 0x71, 0x29, 0xcc, 0x2e, 0xe7
+};
+static const u8 key03[] __initconst = {
+ 0xa0, 0xf3, 0x08, 0x00, 0x00, 0xf4, 0x64, 0x00,
+ 0xd0, 0xc7, 0xe9, 0x07, 0x6c, 0x83, 0x44, 0x03,
+ 0xdd, 0x3f, 0xab, 0x22, 0x51, 0xf1, 0x1a, 0xc7,
+ 0x59, 0xf0, 0x88, 0x71, 0x29, 0xcc, 0x2e, 0xe7
+};
+
+static const u8 input04[] __initconst = {
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36
+};
+static const u8 output04[] __initconst = {
+ 0x0e, 0xe1, 0xc1, 0x6b, 0xb7, 0x3f, 0x0f, 0x4f,
+ 0xd1, 0x98, 0x81, 0x75, 0x3c, 0x01, 0xcd, 0xbe
+};
+static const u8 key04[] __initconst = {
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef
+};
+
+static const u8 input05[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9
+};
+static const u8 output05[] __initconst = {
+ 0x51, 0x54, 0xad, 0x0d, 0x2c, 0xb2, 0x6e, 0x01,
+ 0x27, 0x4f, 0xc5, 0x11, 0x48, 0x49, 0x1f, 0x1b
+};
+static const u8 key05[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+/* self-generated vectors exercise "significant" lengths, such that they
+ * are handled by different code paths */
+static const u8 input06[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf
+};
+static const u8 output06[] __initconst = {
+ 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+ 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66
+};
+static const u8 key06[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input07[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67
+};
+static const u8 output07[] __initconst = {
+ 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+ 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61
+};
+static const u8 key07[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input08[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36
+};
+static const u8 output08[] __initconst = {
+ 0xbb, 0xb6, 0x13, 0xb2, 0xb6, 0xd7, 0x53, 0xba,
+ 0x07, 0x39, 0x5b, 0x91, 0x6a, 0xae, 0xce, 0x15
+};
+static const u8 key08[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input09[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24
+};
+static const u8 output09[] __initconst = {
+ 0xc7, 0x94, 0xd7, 0x05, 0x7d, 0x17, 0x78, 0xc4,
+ 0xbb, 0xee, 0x0a, 0x39, 0xb3, 0xd9, 0x73, 0x42
+};
+static const u8 key09[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input10[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36
+};
+static const u8 output10[] __initconst = {
+ 0xff, 0xbc, 0xb9, 0xb3, 0x71, 0x42, 0x31, 0x52,
+ 0xd7, 0xfc, 0xa5, 0xad, 0x04, 0x2f, 0xba, 0xa9
+};
+static const u8 key10[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input11[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+ 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+ 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66
+};
+static const u8 output11[] __initconst = {
+ 0x06, 0x9e, 0xd6, 0xb8, 0xef, 0x0f, 0x20, 0x7b,
+ 0x3e, 0x24, 0x3b, 0xb1, 0x01, 0x9f, 0xe6, 0x32
+};
+static const u8 key11[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input12[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+ 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+ 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+ 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+ 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61
+};
+static const u8 output12[] __initconst = {
+ 0xcc, 0xa3, 0x39, 0xd9, 0xa4, 0x5f, 0xa2, 0x36,
+ 0x8c, 0x2c, 0x68, 0xb3, 0xa4, 0x17, 0x91, 0x33
+};
+static const u8 key12[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input13[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+ 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+ 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+ 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+ 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61,
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36
+};
+static const u8 output13[] __initconst = {
+ 0x53, 0xf6, 0xe8, 0x28, 0xa2, 0xf0, 0xfe, 0x0e,
+ 0xe8, 0x15, 0xbf, 0x0b, 0xd5, 0x84, 0x1a, 0x34
+};
+static const u8 key13[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+static const u8 input14[] __initconst = {
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+ 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+ 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+ 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+ 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61,
+ 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+ 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+ 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+ 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+ 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+ 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+ 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+ 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+ 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+ 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+ 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+ 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+ 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+ 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+ 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+ 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+ 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+ 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+ 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+ 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61
+};
+static const u8 output14[] __initconst = {
+ 0xb8, 0x46, 0xd4, 0x4e, 0x9b, 0xbd, 0x53, 0xce,
+ 0xdf, 0xfb, 0xfb, 0xb6, 0xb7, 0xfa, 0x49, 0x33
+};
+static const u8 key14[] __initconst = {
+ 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+ 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+ 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+ 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57
+};
+
+/* 4th power of the key spills to 131th bit in SIMD key setup */
+static const u8 input15[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 output15[] __initconst = {
+ 0x07, 0x14, 0x5a, 0x4c, 0x02, 0xfe, 0x5f, 0xa3,
+ 0x20, 0x36, 0xde, 0x68, 0xfa, 0xbe, 0x90, 0x66
+};
+static const u8 key15[] __initconst = {
+ 0xad, 0x62, 0x81, 0x07, 0xe8, 0x35, 0x1d, 0x0f,
+ 0x2c, 0x23, 0x1a, 0x05, 0xdc, 0x4a, 0x41, 0x06,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* OpenSSL's poly1305_ieee754.c failed this in final stage */
+static const u8 input16[] __initconst = {
+ 0x84, 0x23, 0x64, 0xe1, 0x56, 0x33, 0x6c, 0x09,
+ 0x98, 0xb9, 0x33, 0xa6, 0x23, 0x77, 0x26, 0x18,
+ 0x0d, 0x9e, 0x3f, 0xdc, 0xbd, 0xe4, 0xcd, 0x5d,
+ 0x17, 0x08, 0x0f, 0xc3, 0xbe, 0xb4, 0x96, 0x14,
+ 0xd7, 0x12, 0x2c, 0x03, 0x74, 0x63, 0xff, 0x10,
+ 0x4d, 0x73, 0xf1, 0x9c, 0x12, 0x70, 0x46, 0x28,
+ 0xd4, 0x17, 0xc4, 0xc5, 0x4a, 0x3f, 0xe3, 0x0d,
+ 0x3c, 0x3d, 0x77, 0x14, 0x38, 0x2d, 0x43, 0xb0,
+ 0x38, 0x2a, 0x50, 0xa5, 0xde, 0xe5, 0x4b, 0xe8,
+ 0x44, 0xb0, 0x76, 0xe8, 0xdf, 0x88, 0x20, 0x1a,
+ 0x1c, 0xd4, 0x3b, 0x90, 0xeb, 0x21, 0x64, 0x3f,
+ 0xa9, 0x6f, 0x39, 0xb5, 0x18, 0xaa, 0x83, 0x40,
+ 0xc9, 0x42, 0xff, 0x3c, 0x31, 0xba, 0xf7, 0xc9,
+ 0xbd, 0xbf, 0x0f, 0x31, 0xae, 0x3f, 0xa0, 0x96,
+ 0xbf, 0x8c, 0x63, 0x03, 0x06, 0x09, 0x82, 0x9f,
+ 0xe7, 0x2e, 0x17, 0x98, 0x24, 0x89, 0x0b, 0xc8,
+ 0xe0, 0x8c, 0x31, 0x5c, 0x1c, 0xce, 0x2a, 0x83,
+ 0x14, 0x4d, 0xbb, 0xff, 0x09, 0xf7, 0x4e, 0x3e,
+ 0xfc, 0x77, 0x0b, 0x54, 0xd0, 0x98, 0x4a, 0x8f,
+ 0x19, 0xb1, 0x47, 0x19, 0xe6, 0x36, 0x35, 0x64,
+ 0x1d, 0x6b, 0x1e, 0xed, 0xf6, 0x3e, 0xfb, 0xf0,
+ 0x80, 0xe1, 0x78, 0x3d, 0x32, 0x44, 0x54, 0x12,
+ 0x11, 0x4c, 0x20, 0xde, 0x0b, 0x83, 0x7a, 0x0d,
+ 0xfa, 0x33, 0xd6, 0xb8, 0x28, 0x25, 0xff, 0xf4,
+ 0x4c, 0x9a, 0x70, 0xea, 0x54, 0xce, 0x47, 0xf0,
+ 0x7d, 0xf6, 0x98, 0xe6, 0xb0, 0x33, 0x23, 0xb5,
+ 0x30, 0x79, 0x36, 0x4a, 0x5f, 0xc3, 0xe9, 0xdd,
+ 0x03, 0x43, 0x92, 0xbd, 0xde, 0x86, 0xdc, 0xcd,
+ 0xda, 0x94, 0x32, 0x1c, 0x5e, 0x44, 0x06, 0x04,
+ 0x89, 0x33, 0x6c, 0xb6, 0x5b, 0xf3, 0x98, 0x9c,
+ 0x36, 0xf7, 0x28, 0x2c, 0x2f, 0x5d, 0x2b, 0x88,
+ 0x2c, 0x17, 0x1e, 0x74
+};
+static const u8 output16[] __initconst = {
+ 0xf2, 0x48, 0x31, 0x2e, 0x57, 0x8d, 0x9d, 0x58,
+ 0xf8, 0xb7, 0xbb, 0x4d, 0x19, 0x10, 0x54, 0x31
+};
+static const u8 key16[] __initconst = {
+ 0x95, 0xd5, 0xc0, 0x05, 0x50, 0x3e, 0x51, 0x0d,
+ 0x8c, 0xd0, 0xaa, 0x07, 0x2c, 0x4a, 0x4d, 0x06,
+ 0x6e, 0xab, 0xc5, 0x2d, 0x11, 0x65, 0x3d, 0xf4,
+ 0x7f, 0xbf, 0x63, 0xab, 0x19, 0x8b, 0xcc, 0x26
+};
+
+/* AVX2 in OpenSSL's poly1305-x86.pl failed this with 176+32 split */
+static const u8 input17[] __initconst = {
+ 0x24, 0x8a, 0xc3, 0x10, 0x85, 0xb6, 0xc2, 0xad,
+ 0xaa, 0xa3, 0x82, 0x59, 0xa0, 0xd7, 0x19, 0x2c,
+ 0x5c, 0x35, 0xd1, 0xbb, 0x4e, 0xf3, 0x9a, 0xd9,
+ 0x4c, 0x38, 0xd1, 0xc8, 0x24, 0x79, 0xe2, 0xdd,
+ 0x21, 0x59, 0xa0, 0x77, 0x02, 0x4b, 0x05, 0x89,
+ 0xbc, 0x8a, 0x20, 0x10, 0x1b, 0x50, 0x6f, 0x0a,
+ 0x1a, 0xd0, 0xbb, 0xab, 0x76, 0xe8, 0x3a, 0x83,
+ 0xf1, 0xb9, 0x4b, 0xe6, 0xbe, 0xae, 0x74, 0xe8,
+ 0x74, 0xca, 0xb6, 0x92, 0xc5, 0x96, 0x3a, 0x75,
+ 0x43, 0x6b, 0x77, 0x61, 0x21, 0xec, 0x9f, 0x62,
+ 0x39, 0x9a, 0x3e, 0x66, 0xb2, 0xd2, 0x27, 0x07,
+ 0xda, 0xe8, 0x19, 0x33, 0xb6, 0x27, 0x7f, 0x3c,
+ 0x85, 0x16, 0xbc, 0xbe, 0x26, 0xdb, 0xbd, 0x86,
+ 0xf3, 0x73, 0x10, 0x3d, 0x7c, 0xf4, 0xca, 0xd1,
+ 0x88, 0x8c, 0x95, 0x21, 0x18, 0xfb, 0xfb, 0xd0,
+ 0xd7, 0xb4, 0xbe, 0xdc, 0x4a, 0xe4, 0x93, 0x6a,
+ 0xff, 0x91, 0x15, 0x7e, 0x7a, 0xa4, 0x7c, 0x54,
+ 0x44, 0x2e, 0xa7, 0x8d, 0x6a, 0xc2, 0x51, 0xd3,
+ 0x24, 0xa0, 0xfb, 0xe4, 0x9d, 0x89, 0xcc, 0x35,
+ 0x21, 0xb6, 0x6d, 0x16, 0xe9, 0xc6, 0x6a, 0x37,
+ 0x09, 0x89, 0x4e, 0x4e, 0xb0, 0xa4, 0xee, 0xdc,
+ 0x4a, 0xe1, 0x94, 0x68, 0xe6, 0x6b, 0x81, 0xf2,
+ 0x71, 0x35, 0x1b, 0x1d, 0x92, 0x1e, 0xa5, 0x51,
+ 0x04, 0x7a, 0xbc, 0xc6, 0xb8, 0x7a, 0x90, 0x1f,
+ 0xde, 0x7d, 0xb7, 0x9f, 0xa1, 0x81, 0x8c, 0x11,
+ 0x33, 0x6d, 0xbc, 0x07, 0x24, 0x4a, 0x40, 0xeb
+};
+static const u8 output17[] __initconst = {
+ 0xbc, 0x93, 0x9b, 0xc5, 0x28, 0x14, 0x80, 0xfa,
+ 0x99, 0xc6, 0xd6, 0x8c, 0x25, 0x8e, 0xc4, 0x2f
+};
+static const u8 key17[] __initconst = {
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* test vectors from Google */
+static const u8 input18[] __initconst = { };
+static const u8 output18[] __initconst = {
+ 0x47, 0x10, 0x13, 0x0e, 0x9f, 0x6f, 0xea, 0x8d,
+ 0x72, 0x29, 0x38, 0x50, 0xa6, 0x67, 0xd8, 0x6c
+};
+static const u8 key18[] __initconst = {
+ 0xc8, 0xaf, 0xaa, 0xc3, 0x31, 0xee, 0x37, 0x2c,
+ 0xd6, 0x08, 0x2d, 0xe1, 0x34, 0x94, 0x3b, 0x17,
+ 0x47, 0x10, 0x13, 0x0e, 0x9f, 0x6f, 0xea, 0x8d,
+ 0x72, 0x29, 0x38, 0x50, 0xa6, 0x67, 0xd8, 0x6c
+};
+
+static const u8 input19[] __initconst = {
+ 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f,
+ 0x72, 0x6c, 0x64, 0x21
+};
+static const u8 output19[] __initconst = {
+ 0xa6, 0xf7, 0x45, 0x00, 0x8f, 0x81, 0xc9, 0x16,
+ 0xa2, 0x0d, 0xcc, 0x74, 0xee, 0xf2, 0xb2, 0xf0
+};
+static const u8 key19[] __initconst = {
+ 0x74, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20,
+ 0x33, 0x32, 0x2d, 0x62, 0x79, 0x74, 0x65, 0x20,
+ 0x6b, 0x65, 0x79, 0x20, 0x66, 0x6f, 0x72, 0x20,
+ 0x50, 0x6f, 0x6c, 0x79, 0x31, 0x33, 0x30, 0x35
+};
+
+static const u8 input20[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 output20[] __initconst = {
+ 0x49, 0xec, 0x78, 0x09, 0x0e, 0x48, 0x1e, 0xc6,
+ 0xc2, 0x6b, 0x33, 0xb9, 0x1c, 0xcc, 0x03, 0x07
+};
+static const u8 key20[] __initconst = {
+ 0x74, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20,
+ 0x33, 0x32, 0x2d, 0x62, 0x79, 0x74, 0x65, 0x20,
+ 0x6b, 0x65, 0x79, 0x20, 0x66, 0x6f, 0x72, 0x20,
+ 0x50, 0x6f, 0x6c, 0x79, 0x31, 0x33, 0x30, 0x35
+};
+
+static const u8 input21[] __initconst = {
+ 0x89, 0xda, 0xb8, 0x0b, 0x77, 0x17, 0xc1, 0xdb,
+ 0x5d, 0xb4, 0x37, 0x86, 0x0a, 0x3f, 0x70, 0x21,
+ 0x8e, 0x93, 0xe1, 0xb8, 0xf4, 0x61, 0xfb, 0x67,
+ 0x7f, 0x16, 0xf3, 0x5f, 0x6f, 0x87, 0xe2, 0xa9,
+ 0x1c, 0x99, 0xbc, 0x3a, 0x47, 0xac, 0xe4, 0x76,
+ 0x40, 0xcc, 0x95, 0xc3, 0x45, 0xbe, 0x5e, 0xcc,
+ 0xa5, 0xa3, 0x52, 0x3c, 0x35, 0xcc, 0x01, 0x89,
+ 0x3a, 0xf0, 0xb6, 0x4a, 0x62, 0x03, 0x34, 0x27,
+ 0x03, 0x72, 0xec, 0x12, 0x48, 0x2d, 0x1b, 0x1e,
+ 0x36, 0x35, 0x61, 0x69, 0x8a, 0x57, 0x8b, 0x35,
+ 0x98, 0x03, 0x49, 0x5b, 0xb4, 0xe2, 0xef, 0x19,
+ 0x30, 0xb1, 0x7a, 0x51, 0x90, 0xb5, 0x80, 0xf1,
+ 0x41, 0x30, 0x0d, 0xf3, 0x0a, 0xdb, 0xec, 0xa2,
+ 0x8f, 0x64, 0x27, 0xa8, 0xbc, 0x1a, 0x99, 0x9f,
+ 0xd5, 0x1c, 0x55, 0x4a, 0x01, 0x7d, 0x09, 0x5d,
+ 0x8c, 0x3e, 0x31, 0x27, 0xda, 0xf9, 0xf5, 0x95
+};
+static const u8 output21[] __initconst = {
+ 0xc8, 0x5d, 0x15, 0xed, 0x44, 0xc3, 0x78, 0xd6,
+ 0xb0, 0x0e, 0x23, 0x06, 0x4c, 0x7b, 0xcd, 0x51
+};
+static const u8 key21[] __initconst = {
+ 0x2d, 0x77, 0x3b, 0xe3, 0x7a, 0xdb, 0x1e, 0x4d,
+ 0x68, 0x3b, 0xf0, 0x07, 0x5e, 0x79, 0xc4, 0xee,
+ 0x03, 0x79, 0x18, 0x53, 0x5a, 0x7f, 0x99, 0xcc,
+ 0xb7, 0x04, 0x0f, 0xb5, 0xf5, 0xf4, 0x3a, 0xea
+};
+
+static const u8 input22[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0b,
+ 0x17, 0x03, 0x03, 0x02, 0x00, 0x00, 0x00, 0x00,
+ 0x06, 0xdb, 0x1f, 0x1f, 0x36, 0x8d, 0x69, 0x6a,
+ 0x81, 0x0a, 0x34, 0x9c, 0x0c, 0x71, 0x4c, 0x9a,
+ 0x5e, 0x78, 0x50, 0xc2, 0x40, 0x7d, 0x72, 0x1a,
+ 0xcd, 0xed, 0x95, 0xe0, 0x18, 0xd7, 0xa8, 0x52,
+ 0x66, 0xa6, 0xe1, 0x28, 0x9c, 0xdb, 0x4a, 0xeb,
+ 0x18, 0xda, 0x5a, 0xc8, 0xa2, 0xb0, 0x02, 0x6d,
+ 0x24, 0xa5, 0x9a, 0xd4, 0x85, 0x22, 0x7f, 0x3e,
+ 0xae, 0xdb, 0xb2, 0xe7, 0xe3, 0x5e, 0x1c, 0x66,
+ 0xcd, 0x60, 0xf9, 0xab, 0xf7, 0x16, 0xdc, 0xc9,
+ 0xac, 0x42, 0x68, 0x2d, 0xd7, 0xda, 0xb2, 0x87,
+ 0xa7, 0x02, 0x4c, 0x4e, 0xef, 0xc3, 0x21, 0xcc,
+ 0x05, 0x74, 0xe1, 0x67, 0x93, 0xe3, 0x7c, 0xec,
+ 0x03, 0xc5, 0xbd, 0xa4, 0x2b, 0x54, 0xc1, 0x14,
+ 0xa8, 0x0b, 0x57, 0xaf, 0x26, 0x41, 0x6c, 0x7b,
+ 0xe7, 0x42, 0x00, 0x5e, 0x20, 0x85, 0x5c, 0x73,
+ 0xe2, 0x1d, 0xc8, 0xe2, 0xed, 0xc9, 0xd4, 0x35,
+ 0xcb, 0x6f, 0x60, 0x59, 0x28, 0x00, 0x11, 0xc2,
+ 0x70, 0xb7, 0x15, 0x70, 0x05, 0x1c, 0x1c, 0x9b,
+ 0x30, 0x52, 0x12, 0x66, 0x20, 0xbc, 0x1e, 0x27,
+ 0x30, 0xfa, 0x06, 0x6c, 0x7a, 0x50, 0x9d, 0x53,
+ 0xc6, 0x0e, 0x5a, 0xe1, 0xb4, 0x0a, 0xa6, 0xe3,
+ 0x9e, 0x49, 0x66, 0x92, 0x28, 0xc9, 0x0e, 0xec,
+ 0xb4, 0xa5, 0x0d, 0xb3, 0x2a, 0x50, 0xbc, 0x49,
+ 0xe9, 0x0b, 0x4f, 0x4b, 0x35, 0x9a, 0x1d, 0xfd,
+ 0x11, 0x74, 0x9c, 0xd3, 0x86, 0x7f, 0xcf, 0x2f,
+ 0xb7, 0xbb, 0x6c, 0xd4, 0x73, 0x8f, 0x6a, 0x4a,
+ 0xd6, 0xf7, 0xca, 0x50, 0x58, 0xf7, 0x61, 0x88,
+ 0x45, 0xaf, 0x9f, 0x02, 0x0f, 0x6c, 0x3b, 0x96,
+ 0x7b, 0x8f, 0x4c, 0xd4, 0xa9, 0x1e, 0x28, 0x13,
+ 0xb5, 0x07, 0xae, 0x66, 0xf2, 0xd3, 0x5c, 0x18,
+ 0x28, 0x4f, 0x72, 0x92, 0x18, 0x60, 0x62, 0xe1,
+ 0x0f, 0xd5, 0x51, 0x0d, 0x18, 0x77, 0x53, 0x51,
+ 0xef, 0x33, 0x4e, 0x76, 0x34, 0xab, 0x47, 0x43,
+ 0xf5, 0xb6, 0x8f, 0x49, 0xad, 0xca, 0xb3, 0x84,
+ 0xd3, 0xfd, 0x75, 0xf7, 0x39, 0x0f, 0x40, 0x06,
+ 0xef, 0x2a, 0x29, 0x5c, 0x8c, 0x7a, 0x07, 0x6a,
+ 0xd5, 0x45, 0x46, 0xcd, 0x25, 0xd2, 0x10, 0x7f,
+ 0xbe, 0x14, 0x36, 0xc8, 0x40, 0x92, 0x4a, 0xae,
+ 0xbe, 0x5b, 0x37, 0x08, 0x93, 0xcd, 0x63, 0xd1,
+ 0x32, 0x5b, 0x86, 0x16, 0xfc, 0x48, 0x10, 0x88,
+ 0x6b, 0xc1, 0x52, 0xc5, 0x32, 0x21, 0xb6, 0xdf,
+ 0x37, 0x31, 0x19, 0x39, 0x32, 0x55, 0xee, 0x72,
+ 0xbc, 0xaa, 0x88, 0x01, 0x74, 0xf1, 0x71, 0x7f,
+ 0x91, 0x84, 0xfa, 0x91, 0x64, 0x6f, 0x17, 0xa2,
+ 0x4a, 0xc5, 0x5d, 0x16, 0xbf, 0xdd, 0xca, 0x95,
+ 0x81, 0xa9, 0x2e, 0xda, 0x47, 0x92, 0x01, 0xf0,
+ 0xed, 0xbf, 0x63, 0x36, 0x00, 0xd6, 0x06, 0x6d,
+ 0x1a, 0xb3, 0x6d, 0x5d, 0x24, 0x15, 0xd7, 0x13,
+ 0x51, 0xbb, 0xcd, 0x60, 0x8a, 0x25, 0x10, 0x8d,
+ 0x25, 0x64, 0x19, 0x92, 0xc1, 0xf2, 0x6c, 0x53,
+ 0x1c, 0xf9, 0xf9, 0x02, 0x03, 0xbc, 0x4c, 0xc1,
+ 0x9f, 0x59, 0x27, 0xd8, 0x34, 0xb0, 0xa4, 0x71,
+ 0x16, 0xd3, 0x88, 0x4b, 0xbb, 0x16, 0x4b, 0x8e,
+ 0xc8, 0x83, 0xd1, 0xac, 0x83, 0x2e, 0x56, 0xb3,
+ 0x91, 0x8a, 0x98, 0x60, 0x1a, 0x08, 0xd1, 0x71,
+ 0x88, 0x15, 0x41, 0xd5, 0x94, 0xdb, 0x39, 0x9c,
+ 0x6a, 0xe6, 0x15, 0x12, 0x21, 0x74, 0x5a, 0xec,
+ 0x81, 0x4c, 0x45, 0xb0, 0xb0, 0x5b, 0x56, 0x54,
+ 0x36, 0xfd, 0x6f, 0x13, 0x7a, 0xa1, 0x0a, 0x0c,
+ 0x0b, 0x64, 0x37, 0x61, 0xdb, 0xd6, 0xf9, 0xa9,
+ 0xdc, 0xb9, 0x9b, 0x1a, 0x6e, 0x69, 0x08, 0x54,
+ 0xce, 0x07, 0x69, 0xcd, 0xe3, 0x97, 0x61, 0xd8,
+ 0x2f, 0xcd, 0xec, 0x15, 0xf0, 0xd9, 0x2d, 0x7d,
+ 0x8e, 0x94, 0xad, 0xe8, 0xeb, 0x83, 0xfb, 0xe0
+};
+static const u8 output22[] __initconst = {
+ 0x26, 0x37, 0x40, 0x8f, 0xe1, 0x30, 0x86, 0xea,
+ 0x73, 0xf9, 0x71, 0xe3, 0x42, 0x5e, 0x28, 0x20
+};
+static const u8 key22[] __initconst = {
+ 0x99, 0xe5, 0x82, 0x2d, 0xd4, 0x17, 0x3c, 0x99,
+ 0x5e, 0x3d, 0xae, 0x0d, 0xde, 0xfb, 0x97, 0x74,
+ 0x3f, 0xde, 0x3b, 0x08, 0x01, 0x34, 0xb3, 0x9f,
+ 0x76, 0xe9, 0xbf, 0x8d, 0x0e, 0x88, 0xd5, 0x46
+};
+
+/* test vectors from Hanno Böck */
+static const u8 input23[] __initconst = {
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0x80, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xce, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xc5,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xe3, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xac, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xe6,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x00, 0x00, 0x00,
+ 0xaf, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+ 0xcc, 0xcc, 0xff, 0xff, 0xff, 0xf5, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0xff, 0xff, 0xff, 0xe7, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x71, 0x92, 0x05, 0xa8, 0x52, 0x1d,
+ 0xfc
+};
+static const u8 output23[] __initconst = {
+ 0x85, 0x59, 0xb8, 0x76, 0xec, 0xee, 0xd6, 0x6e,
+ 0xb3, 0x77, 0x98, 0xc0, 0x45, 0x7b, 0xaf, 0xf9
+};
+static const u8 key23[] __initconst = {
+ 0x7f, 0x1b, 0x02, 0x64, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc
+};
+
+static const u8 input24[] __initconst = {
+ 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x80, 0x02, 0x64
+};
+static const u8 output24[] __initconst = {
+ 0x00, 0xbd, 0x12, 0x58, 0x97, 0x8e, 0x20, 0x54,
+ 0x44, 0xc9, 0xaa, 0xaa, 0x82, 0x00, 0x6f, 0xed
+};
+static const u8 key24[] __initconst = {
+ 0xe0, 0x00, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa
+};
+
+static const u8 input25[] __initconst = {
+ 0x02, 0xfc
+};
+static const u8 output25[] __initconst = {
+ 0x06, 0x12, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+ 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c
+};
+static const u8 key25[] __initconst = {
+ 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+ 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+ 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+ 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c
+};
+
+static const u8 input26[] __initconst = {
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7a, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x5c, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x6e, 0x7b, 0x00, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7a, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x5c,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+ 0x7b, 0x6e, 0x7b, 0x00, 0x13, 0x00, 0x00, 0x00,
+ 0x00, 0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0xf2, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x20, 0x00, 0xef, 0xff, 0x00,
+ 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00,
+ 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x64, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x13, 0x00, 0x00, 0x00, 0x00,
+ 0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf2,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x20, 0x00, 0xef, 0xff, 0x00, 0x09,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x7a, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00,
+ 0x00, 0x09, 0x00, 0x00, 0x00, 0x64, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfc
+};
+static const u8 output26[] __initconst = {
+ 0x33, 0x20, 0x5b, 0xbf, 0x9e, 0x9f, 0x8f, 0x72,
+ 0x12, 0xab, 0x9e, 0x2a, 0xb9, 0xb7, 0xe4, 0xa5
+};
+static const u8 key26[] __initconst = {
+ 0x00, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x1e, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7b, 0x7b
+};
+
+static const u8 input27[] __initconst = {
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0xff, 0xff, 0xff, 0xe9,
+ 0xe9, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac,
+ 0xac, 0xac, 0xac, 0xac, 0x00, 0x00, 0xac, 0xac,
+ 0xec, 0x01, 0x00, 0xac, 0xac, 0xac, 0x2c, 0xac,
+ 0xa2, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac,
+ 0xac, 0xac, 0xac, 0xac, 0x64, 0xf2
+};
+static const u8 output27[] __initconst = {
+ 0x02, 0xee, 0x7c, 0x8c, 0x54, 0x6d, 0xde, 0xb1,
+ 0xa4, 0x67, 0xe4, 0xc3, 0x98, 0x11, 0x58, 0xb9
+};
+static const u8 key27[] __initconst = {
+ 0x00, 0x00, 0x00, 0x7f, 0x00, 0x00, 0x00, 0x7f,
+ 0x01, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0xcf, 0x77, 0x77, 0x77, 0x77, 0x77,
+ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77
+};
+
+/* nacl */
+static const u8 input28[] __initconst = {
+ 0x8e, 0x99, 0x3b, 0x9f, 0x48, 0x68, 0x12, 0x73,
+ 0xc2, 0x96, 0x50, 0xba, 0x32, 0xfc, 0x76, 0xce,
+ 0x48, 0x33, 0x2e, 0xa7, 0x16, 0x4d, 0x96, 0xa4,
+ 0x47, 0x6f, 0xb8, 0xc5, 0x31, 0xa1, 0x18, 0x6a,
+ 0xc0, 0xdf, 0xc1, 0x7c, 0x98, 0xdc, 0xe8, 0x7b,
+ 0x4d, 0xa7, 0xf0, 0x11, 0xec, 0x48, 0xc9, 0x72,
+ 0x71, 0xd2, 0xc2, 0x0f, 0x9b, 0x92, 0x8f, 0xe2,
+ 0x27, 0x0d, 0x6f, 0xb8, 0x63, 0xd5, 0x17, 0x38,
+ 0xb4, 0x8e, 0xee, 0xe3, 0x14, 0xa7, 0xcc, 0x8a,
+ 0xb9, 0x32, 0x16, 0x45, 0x48, 0xe5, 0x26, 0xae,
+ 0x90, 0x22, 0x43, 0x68, 0x51, 0x7a, 0xcf, 0xea,
+ 0xbd, 0x6b, 0xb3, 0x73, 0x2b, 0xc0, 0xe9, 0xda,
+ 0x99, 0x83, 0x2b, 0x61, 0xca, 0x01, 0xb6, 0xde,
+ 0x56, 0x24, 0x4a, 0x9e, 0x88, 0xd5, 0xf9, 0xb3,
+ 0x79, 0x73, 0xf6, 0x22, 0xa4, 0x3d, 0x14, 0xa6,
+ 0x59, 0x9b, 0x1f, 0x65, 0x4c, 0xb4, 0x5a, 0x74,
+ 0xe3, 0x55, 0xa5
+};
+static const u8 output28[] __initconst = {
+ 0xf3, 0xff, 0xc7, 0x70, 0x3f, 0x94, 0x00, 0xe5,
+ 0x2a, 0x7d, 0xfb, 0x4b, 0x3d, 0x33, 0x05, 0xd9
+};
+static const u8 key28[] __initconst = {
+ 0xee, 0xa6, 0xa7, 0x25, 0x1c, 0x1e, 0x72, 0x91,
+ 0x6d, 0x11, 0xc2, 0xcb, 0x21, 0x4d, 0x3c, 0x25,
+ 0x25, 0x39, 0x12, 0x1d, 0x8e, 0x23, 0x4e, 0x65,
+ 0x2d, 0x65, 0x1f, 0xa4, 0xc8, 0xcf, 0xf8, 0x80
+};
+
+/* wrap 2^130-5 */
+static const u8 input29[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 output29[] __initconst = {
+ 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 key29[] __initconst = {
+ 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* wrap 2^128 */
+static const u8 input30[] __initconst = {
+ 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 output30[] __initconst = {
+ 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 key30[] __initconst = {
+ 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+/* limb carry */
+static const u8 input31[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 output31[] __initconst = {
+ 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 key31[] __initconst = {
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* 2^130-5 */
+static const u8 input32[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xfb, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe,
+ 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe,
+ 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+ 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01
+};
+static const u8 output32[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 key32[] __initconst = {
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* 2^130-6 */
+static const u8 input33[] __initconst = {
+ 0xfd, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 output33[] __initconst = {
+ 0xfa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 key33[] __initconst = {
+ 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* 5*H+L reduction intermediate */
+static const u8 input34[] __initconst = {
+ 0xe3, 0x35, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0xb9,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x33, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0x79, 0xcd,
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 output34[] __initconst = {
+ 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 key34[] __initconst = {
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+/* 5*H+L reduction final */
+static const u8 input35[] __initconst = {
+ 0xe3, 0x35, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0xb9,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x33, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0x79, 0xcd,
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 output35[] __initconst = {
+ 0x13, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 key35[] __initconst = {
+ 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+static const struct poly1305_testvec poly1305_testvecs[] __initconst = {
+ { input01, output01, key01, sizeof(input01) },
+ { input02, output02, key02, sizeof(input02) },
+ { input03, output03, key03, sizeof(input03) },
+ { input04, output04, key04, sizeof(input04) },
+ { input05, output05, key05, sizeof(input05) },
+ { input06, output06, key06, sizeof(input06) },
+ { input07, output07, key07, sizeof(input07) },
+ { input08, output08, key08, sizeof(input08) },
+ { input09, output09, key09, sizeof(input09) },
+ { input10, output10, key10, sizeof(input10) },
+ { input11, output11, key11, sizeof(input11) },
+ { input12, output12, key12, sizeof(input12) },
+ { input13, output13, key13, sizeof(input13) },
+ { input14, output14, key14, sizeof(input14) },
+ { input15, output15, key15, sizeof(input15) },
+ { input16, output16, key16, sizeof(input16) },
+ { input17, output17, key17, sizeof(input17) },
+ { input18, output18, key18, sizeof(input18) },
+ { input19, output19, key19, sizeof(input19) },
+ { input20, output20, key20, sizeof(input20) },
+ { input21, output21, key21, sizeof(input21) },
+ { input22, output22, key22, sizeof(input22) },
+ { input23, output23, key23, sizeof(input23) },
+ { input24, output24, key24, sizeof(input24) },
+ { input25, output25, key25, sizeof(input25) },
+ { input26, output26, key26, sizeof(input26) },
+ { input27, output27, key27, sizeof(input27) },
+ { input28, output28, key28, sizeof(input28) },
+ { input29, output29, key29, sizeof(input29) },
+ { input30, output30, key30, sizeof(input30) },
+ { input31, output31, key31, sizeof(input31) },
+ { input32, output32, key32, sizeof(input32) },
+ { input33, output33, key33, sizeof(input33) },
+ { input34, output34, key34, sizeof(input34) },
+ { input35, output35, key35, sizeof(input35) }
+};
+
+static bool __init poly1305_selftest(void)
+{
+ simd_context_t simd_context;
+ bool success = true;
+ size_t i, j;
+
+ simd_get(&simd_context);
+ for (i = 0; i < ARRAY_SIZE(poly1305_testvecs); ++i) {
+ struct poly1305_ctx poly1305;
+ u8 out[POLY1305_MAC_SIZE];
+
+ memset(out, 0, sizeof(out));
+ memset(&poly1305, 0, sizeof(poly1305));
+ poly1305_init(&poly1305, poly1305_testvecs[i].key);
+ poly1305_update(&poly1305, poly1305_testvecs[i].input,
+ poly1305_testvecs[i].ilen, &simd_context);
+ poly1305_final(&poly1305, out, &simd_context);
+ if (memcmp(out, poly1305_testvecs[i].output,
+ POLY1305_MAC_SIZE)) {
+ pr_info("poly1305 self-test %zu: FAIL\n", i + 1);
+ success = false;
+ }
+ simd_relax(&simd_context);
+
+ if (poly1305_testvecs[i].ilen <= 1)
+ continue;
+
+ for (j = 1; j < poly1305_testvecs[i].ilen - 1; ++j) {
+ memset(out, 0, sizeof(out));
+ memset(&poly1305, 0, sizeof(poly1305));
+ poly1305_init(&poly1305, poly1305_testvecs[i].key);
+ poly1305_update(&poly1305, poly1305_testvecs[i].input,
+ j, &simd_context);
+ poly1305_update(&poly1305,
+ poly1305_testvecs[i].input + j,
+ poly1305_testvecs[i].ilen - j,
+ &simd_context);
+ poly1305_final(&poly1305, out, &simd_context);
+ if (memcmp(out, poly1305_testvecs[i].output,
+ POLY1305_MAC_SIZE)) {
+ pr_info("poly1305 self-test %zu (split %zu): FAIL\n",
+ i + 1, j);
+ success = false;
+ }
+
+ memset(out, 0, sizeof(out));
+ memset(&poly1305, 0, sizeof(poly1305));
+ poly1305_init(&poly1305, poly1305_testvecs[i].key);
+ poly1305_update(&poly1305, poly1305_testvecs[i].input,
+ j, &simd_context);
+ poly1305_update(&poly1305,
+ poly1305_testvecs[i].input + j,
+ poly1305_testvecs[i].ilen - j,
+ (simd_context_t []){ HAVE_NO_SIMD });
+ poly1305_final(&poly1305, out, &simd_context);
+ if (memcmp(out, poly1305_testvecs[i].output,
+ POLY1305_MAC_SIZE)) {
+ pr_info("poly1305 self-test %zu (split %zu, mixed simd): FAIL\n",
+ i + 1, j);
+ success = false;
+ }
+ simd_relax(&simd_context);
+ }
+ }
+ simd_put(&simd_context);
+
+ if (success)
+ pr_info("poly1305 self-tests: pass\n");
+
+ return success;
+}
+#endif
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 10/23] zinc: Poly1305 x86_64 implementation
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
` (8 preceding siblings ...)
2018-09-25 14:56 ` [PATCH net-next v6 09/23] zinc: Poly1305 generic C implementations and selftest Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` Jason A. Donenfeld
` (13 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Andy Polyakov, Thomas Gleixner,
Ingo Molnar, x86
This provides AVX, AVX-2, and AVX-512F implementations for Poly1305.
The AVX-512F implementation is disabled on Skylake, due to throttling.
These come from Andy Polyakov's implementation, with the following
modifications from Samuel Neves:
- Some cosmetic changes, like renaming labels to .Lname, constants,
and other Linux conventions.
- CPU feature checking is done in C by the glue code, so that has been
removed from the assembly.
- poly1305_blocks_avx512 jumped to the middle of the poly1305_blocks_avx2
for the final blocks. To appease objtool, the relevant tail avx2 code
was duplicated for the avx512 function.
- The original uses %rbp as a scratch register. However, the kernel
expects %rbp to be a valid frame pointer at any given time in order
to do proper unwinding. Thus we need to alter the code in order to
preserve it. The most straightforward manner in which this was
accomplished was by replacing $d3, formerly %r10, by %rdi, and
replacing %rbp by %r10. Because %rdi, a pointer to the context
structure, does not change and is not used by poly1305_iteration,
it is safe to use it here, and the overhead of saving and restoring
it should be minimal.
- The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret".
We replace this by "ret". "rep ret" was meant to help with AMD K8
chips, cf. http://repzret.org/p/repzret. It makes no sense to
continue to use this kludge for code that won't even run on ancient
AMD chips.
While this is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit 4dfe4310c31c4483705991d9a798ce9be1ed1c68
Cycle counts on a Core i7 6700HQ using the AVX-2 codepath:
size old new
---- ---- ----
0 70 68
16 92 90
32 134 104
48 172 120
64 218 136
80 254 158
96 298 174
112 342 192
128 388 212
144 428 228
160 466 246
176 510 264
192 550 282
208 594 302
224 628 316
240 676 334
256 716 354
272 764 374
288 802 352
304 420 366
320 428 360
336 484 378
352 426 384
368 478 400
384 488 394
400 542 408
416 486 416
432 534 430
448 544 422
464 600 438
480 540 448
496 594 464
512 602 456
528 656 476
544 600 480
560 650 494
576 664 490
592 714 508
608 656 514
624 708 532
640 716 524
656 770 536
672 716 548
688 770 562
704 774 552
720 826 568
736 768 574
752 822 592
768 830 584
784 884 602
800 828 610
816 884 628
832 888 618
848 942 632
864 884 644
880 936 660
896 948 652
912 1000 664
928 942 676
944 994 690
960 1002 680
976 1054 694
992 1002 706
1008 1052 720
Cycle counts on a Xeon Gold 5120 using the AVX-512 codepath:
size old new
---- ---- ----
0 74 70
16 96 92
32 136 106
48 184 124
64 218 138
80 260 160
96 300 176
112 342 194
128 384 212
144 420 226
160 464 248
176 504 264
192 544 282
208 582 300
224 624 318
240 662 338
256 708 358
272 748 372
288 788 358
304 422 370
320 432 364
336 486 380
352 434 390
368 480 408
384 490 398
400 542 412
416 492 426
432 538 436
448 546 432
464 600 448
480 548 456
496 594 476
512 606 470
528 656 480
544 606 498
560 652 512
576 662 508
592 716 522
608 664 538
624 710 552
640 720 516
656 772 526
672 722 544
688 768 556
704 778 556
720 832 568
736 780 584
752 826 600
768 836 560
784 888 572
800 838 588
816 884 604
832 894 598
848 946 612
864 896 628
880 942 644
896 952 608
912 1004 616
928 954 634
944 1000 646
960 1008 646
976 1062 658
992 1012 674
1008 1058 690
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
lib/zinc/Makefile | 1 +
lib/zinc/poly1305/poly1305-x86_64-glue.h | 157 ++
lib/zinc/poly1305/poly1305-x86_64.S | 2792 ++++++++++++++++++++++
lib/zinc/poly1305/poly1305.c | 4 +-
4 files changed, 2953 insertions(+), 1 deletion(-)
create mode 100644 lib/zinc/poly1305/poly1305-x86_64-glue.h
create mode 100644 lib/zinc/poly1305/poly1305-x86_64.S
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 6fc9626c55fa..a8943d960b6a 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -11,4 +11,5 @@ AFLAGS_chacha20-mips.o += -O2 # This is required to fill the branch delay slots
obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
zinc_poly1305-y := poly1305/poly1305.o
+zinc_poly1305-$(CONFIG_ZINC_ARCH_X86_64) += poly1305/poly1305-x86_64.o
obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o
diff --git a/lib/zinc/poly1305/poly1305-x86_64-glue.h b/lib/zinc/poly1305/poly1305-x86_64-glue.h
new file mode 100644
index 000000000000..3ca1bdd02c6b
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-x86_64-glue.h
@@ -0,0 +1,157 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <asm/intel-family.h>
+
+asmlinkage void poly1305_init_x86_64(void *ctx,
+ const u8 key[POLY1305_KEY_SIZE]);
+asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp,
+ const size_t len, const u32 padbit);
+asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4]);
+#ifdef CONFIG_AS_AVX
+asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4]);
+asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+#endif
+#ifdef CONFIG_AS_AVX2
+asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+#endif
+#ifdef CONFIG_AS_AVX512
+asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp,
+ const size_t len, const u32 padbit);
+#endif
+
+static bool poly1305_use_avx __ro_after_init;
+static bool poly1305_use_avx2 __ro_after_init;
+static bool poly1305_use_avx512 __ro_after_init;
+
+static void __init poly1305_fpu_init(void)
+{
+ poly1305_use_avx =
+ boot_cpu_has(X86_FEATURE_AVX) &&
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+ poly1305_use_avx2 =
+ boot_cpu_has(X86_FEATURE_AVX) &&
+ boot_cpu_has(X86_FEATURE_AVX2) &&
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+ poly1305_use_avx512 =
+ boot_cpu_has(X86_FEATURE_AVX) &&
+ boot_cpu_has(X86_FEATURE_AVX2) &&
+ boot_cpu_has(X86_FEATURE_AVX512F) &&
+ cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+ XFEATURE_MASK_AVX512, NULL) &&
+ /* Skylake downclocks unacceptably much when using zmm. */
+ boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X;
+}
+
+static inline bool poly1305_init_arch(void *ctx,
+ const u8 key[POLY1305_KEY_SIZE])
+{
+ poly1305_init_x86_64(ctx, key);
+ return true;
+}
+
+struct poly1305_arch_internal {
+ union {
+ struct {
+ u32 h[5];
+ u32 is_base2_26;
+ };
+ u64 hs[3];
+ };
+ u64 r[2];
+ u64 pad;
+ struct { u32 r2, r1, r4, r3; } rn[9];
+};
+
+static void convert_to_base2_64(void *ctx)
+{
+ struct poly1305_arch_internal *state = ctx;
+ u32 cy;
+
+ if (!state->is_base2_26)
+ return;
+
+ cy = state->h[0] >> 26; state->h[0] &= 0x3ffffff; state->h[1] += cy;
+ cy = state->h[1] >> 26; state->h[1] &= 0x3ffffff; state->h[2] += cy;
+ cy = state->h[2] >> 26; state->h[2] &= 0x3ffffff; state->h[3] += cy;
+ cy = state->h[3] >> 26; state->h[3] &= 0x3ffffff; state->h[4] += cy;
+ state->hs[0] = ((u64)state->h[2] << 52) | ((u64)state->h[1] << 26) | state->h[0];
+ state->hs[1] = ((u64)state->h[4] << 40) | ((u64)state->h[3] << 14) | (state->h[2] >> 12);
+ state->hs[2] = state->h[4] >> 24;
+#define ULT(a, b) ((a ^ ((a ^ b) | ((a - b) ^ b))) >> (sizeof(a) * 8 - 1))
+ cy = (state->hs[2] >> 2) + (state->hs[2] & ~3ULL);
+ state->hs[2] &= 3;
+ state->hs[0] += cy;
+ state->hs[1] += (cy = ULT(state->hs[0], cy));
+ state->hs[2] += ULT(state->hs[1], cy);
+#undef ULT
+ state->is_base2_26 = 0;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+ const size_t len, const u32 padbit,
+ simd_context_t *simd_context)
+{
+ struct poly1305_arch_internal *state = ctx;
+
+ if (!poly1305_use_avx ||
+ (len < (POLY1305_BLOCK_SIZE * 18) && !state->is_base2_26) ||
+ !simd_use(simd_context))
+ goto scalar;
+
+#ifdef CONFIG_AS_AVX512
+ if (poly1305_use_avx512) {
+ poly1305_blocks_avx512(ctx, inp, len, padbit);
+ return true;
+ }
+#endif
+
+#ifdef CONFIG_AS_AVX2
+ if (poly1305_use_avx2) {
+ poly1305_blocks_avx2(ctx, inp, len, padbit);
+ return true;
+ }
+#endif
+
+#ifdef CONFIG_AS_AVX
+ if (poly1305_use_avx) {
+ poly1305_blocks_avx(ctx, inp, len, padbit);
+ return true;
+ }
+#endif
+
+scalar:
+ convert_to_base2_64(ctx);
+ poly1305_blocks_x86_64(ctx, inp, len, padbit);
+ return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4],
+ simd_context_t *simd_context)
+{
+ struct poly1305_arch_internal *state = ctx;
+
+ if (!poly1305_use_avx || !state->is_base2_26 ||!simd_use(simd_context))
+ goto scalar;
+
+#ifdef CONFIG_AS_AVX
+ if (poly1305_use_avx || poly1305_use_avx2 || poly1305_use_avx512) {
+ poly1305_emit_avx(ctx, mac, nonce);
+ return true;
+ }
+#endif
+
+scalar:
+ convert_to_base2_64(ctx);
+ poly1305_emit_x86_64(ctx, mac, nonce);
+ return true;
+}
diff --git a/lib/zinc/poly1305/poly1305-x86_64.S b/lib/zinc/poly1305/poly1305-x86_64.S
new file mode 100644
index 000000000000..3c3f2b4d880b
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-x86_64.S
@@ -0,0 +1,2792 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2017 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.section .rodata.cst192.Lconst, "aM", @progbits, 192
+.align 64
+.Lconst:
+.long 0x0ffffff,0,0x0ffffff,0,0x0ffffff,0,0x0ffffff,0
+.long 16777216,0,16777216,0,16777216,0,16777216,0
+.long 0x3ffffff,0,0x3ffffff,0,0x3ffffff,0,0x3ffffff,0
+.long 2,2,2,3,2,0,2,1
+.long 0,0,0,1, 0,2,0,3, 0,4,0,5, 0,6,0,7
+
+.text
+
+.align 32
+ENTRY(poly1305_init_x86_64)
+ xorq %rax,%rax
+ movq %rax,0(%rdi)
+ movq %rax,8(%rdi)
+ movq %rax,16(%rdi)
+
+ cmpq $0,%rsi
+ je .Lno_key
+
+ movq $0x0ffffffc0fffffff,%rax
+ movq $0x0ffffffc0ffffffc,%rcx
+ andq 0(%rsi),%rax
+ andq 8(%rsi),%rcx
+ movq %rax,24(%rdi)
+ movq %rcx,32(%rdi)
+ movl $1,%eax
+.Lno_key:
+ ret
+ENDPROC(poly1305_init_x86_64)
+
+.align 32
+ENTRY(poly1305_blocks_x86_64)
+.Lblocks:
+ shrq $4,%rdx
+ jz .Lno_data
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lblocks_body:
+
+ movq %rdx,%r15
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+ movq 0(%rdi),%r14
+ movq 8(%rdi),%rbx
+ movq 16(%rdi),%r10
+
+ movq %r13,%r12
+ shrq $2,%r13
+ movq %r12,%rax
+ addq %r12,%r13
+ jmp .Loop
+
+.align 32
+.Loop:
+
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+ mulq %r14
+ movq %rax,%r9
+ movq %r11,%rax
+ movq %rdx,%rdi
+
+ mulq %r14
+ movq %rax,%r14
+ movq %r11,%rax
+ movq %rdx,%r8
+
+ mulq %rbx
+ addq %rax,%r9
+ movq %r13,%rax
+ adcq %rdx,%rdi
+
+ mulq %rbx
+ movq %r10,%rbx
+ addq %rax,%r14
+ adcq %rdx,%r8
+
+ imulq %r13,%rbx
+ addq %rbx,%r9
+ movq %r8,%rbx
+ adcq $0,%rdi
+
+ imulq %r11,%r10
+ addq %r9,%rbx
+ movq $-4,%rax
+ adcq %r10,%rdi
+
+ andq %rdi,%rax
+ movq %rdi,%r10
+ shrq $2,%rdi
+ andq $3,%r10
+ addq %rdi,%rax
+ addq %rax,%r14
+ adcq $0,%rbx
+ adcq $0,%r10
+
+ movq %r12,%rax
+ decq %r15
+ jnz .Loop
+
+ movq 0(%rsp),%rdi
+
+ movq %r14,0(%rdi)
+ movq %rbx,8(%rdi)
+ movq %r10,16(%rdi)
+
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rsp
+.Lno_data:
+.Lblocks_epilogue:
+ ret
+ENDPROC(poly1305_blocks_x86_64)
+
+.align 32
+ENTRY(poly1305_emit_x86_64)
+.Lemit:
+ movq 0(%rdi),%r8
+ movq 8(%rdi),%r9
+ movq 16(%rdi),%r10
+
+ movq %r8,%rax
+ addq $5,%r8
+ movq %r9,%rcx
+ adcq $0,%r9
+ adcq $0,%r10
+ shrq $2,%r10
+ cmovnzq %r8,%rax
+ cmovnzq %r9,%rcx
+
+ addq 0(%rdx),%rax
+ adcq 8(%rdx),%rcx
+ movq %rax,0(%rsi)
+ movq %rcx,8(%rsi)
+
+ ret
+ENDPROC(poly1305_emit_x86_64)
+
+.macro __poly1305_block
+ mulq %r14
+ movq %rax,%r9
+ movq %r11,%rax
+ movq %rdx,%rdi
+
+ mulq %r14
+ movq %rax,%r14
+ movq %r11,%rax
+ movq %rdx,%r8
+
+ mulq %rbx
+ addq %rax,%r9
+ movq %r13,%rax
+ adcq %rdx,%rdi
+
+ mulq %rbx
+ movq %r10,%rbx
+ addq %rax,%r14
+ adcq %rdx,%r8
+
+ imulq %r13,%rbx
+ addq %rbx,%r9
+ movq %r8,%rbx
+ adcq $0,%rdi
+
+ imulq %r11,%r10
+ addq %r9,%rbx
+ movq $-4,%rax
+ adcq %r10,%rdi
+
+ andq %rdi,%rax
+ movq %rdi,%r10
+ shrq $2,%rdi
+ andq $3,%r10
+ addq %rdi,%rax
+ addq %rax,%r14
+ adcq $0,%rbx
+ adcq $0,%r10
+.endm
+
+.macro __poly1305_init_avx
+ movq %r11,%r14
+ movq %r12,%rbx
+ xorq %r10,%r10
+
+ leaq 48+64(%rdi),%rdi
+
+ movq %r12,%rax
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+
+ movl $0x3ffffff,%eax
+ movl $0x3ffffff,%edx
+ movq %r14,%r8
+ andl %r14d,%eax
+ movq %r11,%r9
+ andl %r11d,%edx
+ movl %eax,-64(%rdi)
+ shrq $26,%r8
+ movl %edx,-60(%rdi)
+ shrq $26,%r9
+
+ movl $0x3ffffff,%eax
+ movl $0x3ffffff,%edx
+ andl %r8d,%eax
+ andl %r9d,%edx
+ movl %eax,-48(%rdi)
+ leal (%rax,%rax,4),%eax
+ movl %edx,-44(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ movl %eax,-32(%rdi)
+ shrq $26,%r8
+ movl %edx,-28(%rdi)
+ shrq $26,%r9
+
+ movq %rbx,%rax
+ movq %r12,%rdx
+ shlq $12,%rax
+ shlq $12,%rdx
+ orq %r8,%rax
+ orq %r9,%rdx
+ andl $0x3ffffff,%eax
+ andl $0x3ffffff,%edx
+ movl %eax,-16(%rdi)
+ leal (%rax,%rax,4),%eax
+ movl %edx,-12(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ movl %eax,0(%rdi)
+ movq %rbx,%r8
+ movl %edx,4(%rdi)
+ movq %r12,%r9
+
+ movl $0x3ffffff,%eax
+ movl $0x3ffffff,%edx
+ shrq $14,%r8
+ shrq $14,%r9
+ andl %r8d,%eax
+ andl %r9d,%edx
+ movl %eax,16(%rdi)
+ leal (%rax,%rax,4),%eax
+ movl %edx,20(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ movl %eax,32(%rdi)
+ shrq $26,%r8
+ movl %edx,36(%rdi)
+ shrq $26,%r9
+
+ movq %r10,%rax
+ shlq $24,%rax
+ orq %rax,%r8
+ movl %r8d,48(%rdi)
+ leaq (%r8,%r8,4),%r8
+ movl %r9d,52(%rdi)
+ leaq (%r9,%r9,4),%r9
+ movl %r8d,64(%rdi)
+ movl %r9d,68(%rdi)
+
+ movq %r12,%rax
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+
+ movl $0x3ffffff,%eax
+ movq %r14,%r8
+ andl %r14d,%eax
+ shrq $26,%r8
+ movl %eax,-52(%rdi)
+
+ movl $0x3ffffff,%edx
+ andl %r8d,%edx
+ movl %edx,-36(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ shrq $26,%r8
+ movl %edx,-20(%rdi)
+
+ movq %rbx,%rax
+ shlq $12,%rax
+ orq %r8,%rax
+ andl $0x3ffffff,%eax
+ movl %eax,-4(%rdi)
+ leal (%rax,%rax,4),%eax
+ movq %rbx,%r8
+ movl %eax,12(%rdi)
+
+ movl $0x3ffffff,%edx
+ shrq $14,%r8
+ andl %r8d,%edx
+ movl %edx,28(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ shrq $26,%r8
+ movl %edx,44(%rdi)
+
+ movq %r10,%rax
+ shlq $24,%rax
+ orq %rax,%r8
+ movl %r8d,60(%rdi)
+ leaq (%r8,%r8,4),%r8
+ movl %r8d,76(%rdi)
+
+ movq %r12,%rax
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+
+ movl $0x3ffffff,%eax
+ movq %r14,%r8
+ andl %r14d,%eax
+ shrq $26,%r8
+ movl %eax,-56(%rdi)
+
+ movl $0x3ffffff,%edx
+ andl %r8d,%edx
+ movl %edx,-40(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ shrq $26,%r8
+ movl %edx,-24(%rdi)
+
+ movq %rbx,%rax
+ shlq $12,%rax
+ orq %r8,%rax
+ andl $0x3ffffff,%eax
+ movl %eax,-8(%rdi)
+ leal (%rax,%rax,4),%eax
+ movq %rbx,%r8
+ movl %eax,8(%rdi)
+
+ movl $0x3ffffff,%edx
+ shrq $14,%r8
+ andl %r8d,%edx
+ movl %edx,24(%rdi)
+ leal (%rdx,%rdx,4),%edx
+ shrq $26,%r8
+ movl %edx,40(%rdi)
+
+ movq %r10,%rax
+ shlq $24,%rax
+ orq %rax,%r8
+ movl %r8d,56(%rdi)
+ leaq (%r8,%r8,4),%r8
+ movl %r8d,72(%rdi)
+
+ leaq -48-64(%rdi),%rdi
+.endm
+
+#ifdef CONFIG_AS_AVX
+.align 32
+ENTRY(poly1305_blocks_avx)
+
+ movl 20(%rdi),%r8d
+ cmpq $128,%rdx
+ jae .Lblocks_avx
+ testl %r8d,%r8d
+ jz .Lblocks
+
+.Lblocks_avx:
+ andq $-16,%rdx
+ jz .Lno_data_avx
+
+ vzeroupper
+
+ testl %r8d,%r8d
+ jz .Lbase2_64_avx
+
+ testq $31,%rdx
+ jz .Leven_avx
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lblocks_avx_body:
+
+ movq %rdx,%r15
+
+ movq 0(%rdi),%r8
+ movq 8(%rdi),%r9
+ movl 16(%rdi),%r10d
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+
+ movl %r8d,%r14d
+ andq $-2147483648,%r8
+ movq %r9,%r12
+ movl %r9d,%ebx
+ andq $-2147483648,%r9
+
+ shrq $6,%r8
+ shlq $52,%r12
+ addq %r8,%r14
+ shrq $12,%rbx
+ shrq $18,%r9
+ addq %r12,%r14
+ adcq %r9,%rbx
+
+ movq %r10,%r8
+ shlq $40,%r8
+ shrq $24,%r10
+ addq %r8,%rbx
+ adcq $0,%r10
+
+ movq $-4,%r9
+ movq %r10,%r8
+ andq %r10,%r9
+ shrq $2,%r8
+ andq $3,%r10
+ addq %r9,%r8
+ addq %r8,%r14
+ adcq $0,%rbx
+ adcq $0,%r10
+
+ movq %r13,%r12
+ movq %r13,%rax
+ shrq $2,%r13
+ addq %r12,%r13
+
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+
+ testq %rcx,%rcx
+ jz .Lstore_base2_64_avx
+
+
+ movq %r14,%rax
+ movq %r14,%rdx
+ shrq $52,%r14
+ movq %rbx,%r11
+ movq %rbx,%r12
+ shrq $26,%rdx
+ andq $0x3ffffff,%rax
+ shlq $12,%r11
+ andq $0x3ffffff,%rdx
+ shrq $14,%rbx
+ orq %r11,%r14
+ shlq $24,%r10
+ andq $0x3ffffff,%r14
+ shrq $40,%r12
+ andq $0x3ffffff,%rbx
+ orq %r12,%r10
+
+ subq $16,%r15
+ jz .Lstore_base2_26_avx
+
+ vmovd %eax,%xmm0
+ vmovd %edx,%xmm1
+ vmovd %r14d,%xmm2
+ vmovd %ebx,%xmm3
+ vmovd %r10d,%xmm4
+ jmp .Lproceed_avx
+
+.align 32
+.Lstore_base2_64_avx:
+ movq %r14,0(%rdi)
+ movq %rbx,8(%rdi)
+ movq %r10,16(%rdi)
+ jmp .Ldone_avx
+
+.align 16
+.Lstore_base2_26_avx:
+ movl %eax,0(%rdi)
+ movl %edx,4(%rdi)
+ movl %r14d,8(%rdi)
+ movl %ebx,12(%rdi)
+ movl %r10d,16(%rdi)
+.align 16
+.Ldone_avx:
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rsp
+
+.Lno_data_avx:
+.Lblocks_avx_epilogue:
+ ret
+
+.align 32
+.Lbase2_64_avx:
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lbase2_64_avx_body:
+
+ movq %rdx,%r15
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+ movq 0(%rdi),%r14
+ movq 8(%rdi),%rbx
+ movl 16(%rdi),%r10d
+
+ movq %r13,%r12
+ movq %r13,%rax
+ shrq $2,%r13
+ addq %r12,%r13
+
+ testq $31,%rdx
+ jz .Linit_avx
+
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+ subq $16,%r15
+
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+
+.Linit_avx:
+
+ movq %r14,%rax
+ movq %r14,%rdx
+ shrq $52,%r14
+ movq %rbx,%r8
+ movq %rbx,%r9
+ shrq $26,%rdx
+ andq $0x3ffffff,%rax
+ shlq $12,%r8
+ andq $0x3ffffff,%rdx
+ shrq $14,%rbx
+ orq %r8,%r14
+ shlq $24,%r10
+ andq $0x3ffffff,%r14
+ shrq $40,%r9
+ andq $0x3ffffff,%rbx
+ orq %r9,%r10
+
+ vmovd %eax,%xmm0
+ vmovd %edx,%xmm1
+ vmovd %r14d,%xmm2
+ vmovd %ebx,%xmm3
+ vmovd %r10d,%xmm4
+ movl $1,20(%rdi)
+
+ __poly1305_init_avx
+
+.Lproceed_avx:
+ movq %r15,%rdx
+
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rax
+ leaq 48(%rsp),%rsp
+
+.Lbase2_64_avx_epilogue:
+ jmp .Ldo_avx
+
+
+.align 32
+.Leven_avx:
+ vmovd 0(%rdi),%xmm0
+ vmovd 4(%rdi),%xmm1
+ vmovd 8(%rdi),%xmm2
+ vmovd 12(%rdi),%xmm3
+ vmovd 16(%rdi),%xmm4
+
+.Ldo_avx:
+ leaq 8(%rsp),%r10
+ andq $-32,%rsp
+ subq $8,%rsp
+ leaq -88(%rsp),%r11
+ subq $0x178,%rsp
+ subq $64,%rdx
+ leaq -32(%rsi),%rax
+ cmovcq %rax,%rsi
+
+ vmovdqu 48(%rdi),%xmm14
+ leaq 112(%rdi),%rdi
+ leaq .Lconst(%rip),%rcx
+
+ vmovdqu 32(%rsi),%xmm5
+ vmovdqu 48(%rsi),%xmm6
+ vmovdqa 64(%rcx),%xmm15
+
+ vpsrldq $6,%xmm5,%xmm7
+ vpsrldq $6,%xmm6,%xmm8
+ vpunpckhqdq %xmm6,%xmm5,%xmm9
+ vpunpcklqdq %xmm6,%xmm5,%xmm5
+ vpunpcklqdq %xmm8,%xmm7,%xmm8
+
+ vpsrlq $40,%xmm9,%xmm9
+ vpsrlq $26,%xmm5,%xmm6
+ vpand %xmm15,%xmm5,%xmm5
+ vpsrlq $4,%xmm8,%xmm7
+ vpand %xmm15,%xmm6,%xmm6
+ vpsrlq $30,%xmm8,%xmm8
+ vpand %xmm15,%xmm7,%xmm7
+ vpand %xmm15,%xmm8,%xmm8
+ vpor 32(%rcx),%xmm9,%xmm9
+
+ jbe .Lskip_loop_avx
+
+
+ vmovdqu -48(%rdi),%xmm11
+ vmovdqu -32(%rdi),%xmm12
+ vpshufd $0xEE,%xmm14,%xmm13
+ vpshufd $0x44,%xmm14,%xmm10
+ vmovdqa %xmm13,-144(%r11)
+ vmovdqa %xmm10,0(%rsp)
+ vpshufd $0xEE,%xmm11,%xmm14
+ vmovdqu -16(%rdi),%xmm10
+ vpshufd $0x44,%xmm11,%xmm11
+ vmovdqa %xmm14,-128(%r11)
+ vmovdqa %xmm11,16(%rsp)
+ vpshufd $0xEE,%xmm12,%xmm13
+ vmovdqu 0(%rdi),%xmm11
+ vpshufd $0x44,%xmm12,%xmm12
+ vmovdqa %xmm13,-112(%r11)
+ vmovdqa %xmm12,32(%rsp)
+ vpshufd $0xEE,%xmm10,%xmm14
+ vmovdqu 16(%rdi),%xmm12
+ vpshufd $0x44,%xmm10,%xmm10
+ vmovdqa %xmm14,-96(%r11)
+ vmovdqa %xmm10,48(%rsp)
+ vpshufd $0xEE,%xmm11,%xmm13
+ vmovdqu 32(%rdi),%xmm10
+ vpshufd $0x44,%xmm11,%xmm11
+ vmovdqa %xmm13,-80(%r11)
+ vmovdqa %xmm11,64(%rsp)
+ vpshufd $0xEE,%xmm12,%xmm14
+ vmovdqu 48(%rdi),%xmm11
+ vpshufd $0x44,%xmm12,%xmm12
+ vmovdqa %xmm14,-64(%r11)
+ vmovdqa %xmm12,80(%rsp)
+ vpshufd $0xEE,%xmm10,%xmm13
+ vmovdqu 64(%rdi),%xmm12
+ vpshufd $0x44,%xmm10,%xmm10
+ vmovdqa %xmm13,-48(%r11)
+ vmovdqa %xmm10,96(%rsp)
+ vpshufd $0xEE,%xmm11,%xmm14
+ vpshufd $0x44,%xmm11,%xmm11
+ vmovdqa %xmm14,-32(%r11)
+ vmovdqa %xmm11,112(%rsp)
+ vpshufd $0xEE,%xmm12,%xmm13
+ vmovdqa 0(%rsp),%xmm14
+ vpshufd $0x44,%xmm12,%xmm12
+ vmovdqa %xmm13,-16(%r11)
+ vmovdqa %xmm12,128(%rsp)
+
+ jmp .Loop_avx
+
+.align 32
+.Loop_avx:
+
+ vpmuludq %xmm5,%xmm14,%xmm10
+ vpmuludq %xmm6,%xmm14,%xmm11
+ vmovdqa %xmm2,32(%r11)
+ vpmuludq %xmm7,%xmm14,%xmm12
+ vmovdqa 16(%rsp),%xmm2
+ vpmuludq %xmm8,%xmm14,%xmm13
+ vpmuludq %xmm9,%xmm14,%xmm14
+
+ vmovdqa %xmm0,0(%r11)
+ vpmuludq 32(%rsp),%xmm9,%xmm0
+ vmovdqa %xmm1,16(%r11)
+ vpmuludq %xmm8,%xmm2,%xmm1
+ vpaddq %xmm0,%xmm10,%xmm10
+ vpaddq %xmm1,%xmm14,%xmm14
+ vmovdqa %xmm3,48(%r11)
+ vpmuludq %xmm7,%xmm2,%xmm0
+ vpmuludq %xmm6,%xmm2,%xmm1
+ vpaddq %xmm0,%xmm13,%xmm13
+ vmovdqa 48(%rsp),%xmm3
+ vpaddq %xmm1,%xmm12,%xmm12
+ vmovdqa %xmm4,64(%r11)
+ vpmuludq %xmm5,%xmm2,%xmm2
+ vpmuludq %xmm7,%xmm3,%xmm0
+ vpaddq %xmm2,%xmm11,%xmm11
+
+ vmovdqa 64(%rsp),%xmm4
+ vpaddq %xmm0,%xmm14,%xmm14
+ vpmuludq %xmm6,%xmm3,%xmm1
+ vpmuludq %xmm5,%xmm3,%xmm3
+ vpaddq %xmm1,%xmm13,%xmm13
+ vmovdqa 80(%rsp),%xmm2
+ vpaddq %xmm3,%xmm12,%xmm12
+ vpmuludq %xmm9,%xmm4,%xmm0
+ vpmuludq %xmm8,%xmm4,%xmm4
+ vpaddq %xmm0,%xmm11,%xmm11
+ vmovdqa 96(%rsp),%xmm3
+ vpaddq %xmm4,%xmm10,%xmm10
+
+ vmovdqa 128(%rsp),%xmm4
+ vpmuludq %xmm6,%xmm2,%xmm1
+ vpmuludq %xmm5,%xmm2,%xmm2
+ vpaddq %xmm1,%xmm14,%xmm14
+ vpaddq %xmm2,%xmm13,%xmm13
+ vpmuludq %xmm9,%xmm3,%xmm0
+ vpmuludq %xmm8,%xmm3,%xmm1
+ vpaddq %xmm0,%xmm12,%xmm12
+ vmovdqu 0(%rsi),%xmm0
+ vpaddq %xmm1,%xmm11,%xmm11
+ vpmuludq %xmm7,%xmm3,%xmm3
+ vpmuludq %xmm7,%xmm4,%xmm7
+ vpaddq %xmm3,%xmm10,%xmm10
+
+ vmovdqu 16(%rsi),%xmm1
+ vpaddq %xmm7,%xmm11,%xmm11
+ vpmuludq %xmm8,%xmm4,%xmm8
+ vpmuludq %xmm9,%xmm4,%xmm9
+ vpsrldq $6,%xmm0,%xmm2
+ vpaddq %xmm8,%xmm12,%xmm12
+ vpaddq %xmm9,%xmm13,%xmm13
+ vpsrldq $6,%xmm1,%xmm3
+ vpmuludq 112(%rsp),%xmm5,%xmm9
+ vpmuludq %xmm6,%xmm4,%xmm5
+ vpunpckhqdq %xmm1,%xmm0,%xmm4
+ vpaddq %xmm9,%xmm14,%xmm14
+ vmovdqa -144(%r11),%xmm9
+ vpaddq %xmm5,%xmm10,%xmm10
+
+ vpunpcklqdq %xmm1,%xmm0,%xmm0
+ vpunpcklqdq %xmm3,%xmm2,%xmm3
+
+
+ vpsrldq $5,%xmm4,%xmm4
+ vpsrlq $26,%xmm0,%xmm1
+ vpand %xmm15,%xmm0,%xmm0
+ vpsrlq $4,%xmm3,%xmm2
+ vpand %xmm15,%xmm1,%xmm1
+ vpand 0(%rcx),%xmm4,%xmm4
+ vpsrlq $30,%xmm3,%xmm3
+ vpand %xmm15,%xmm2,%xmm2
+ vpand %xmm15,%xmm3,%xmm3
+ vpor 32(%rcx),%xmm4,%xmm4
+
+ vpaddq 0(%r11),%xmm0,%xmm0
+ vpaddq 16(%r11),%xmm1,%xmm1
+ vpaddq 32(%r11),%xmm2,%xmm2
+ vpaddq 48(%r11),%xmm3,%xmm3
+ vpaddq 64(%r11),%xmm4,%xmm4
+
+ leaq 32(%rsi),%rax
+ leaq 64(%rsi),%rsi
+ subq $64,%rdx
+ cmovcq %rax,%rsi
+
+ vpmuludq %xmm0,%xmm9,%xmm5
+ vpmuludq %xmm1,%xmm9,%xmm6
+ vpaddq %xmm5,%xmm10,%xmm10
+ vpaddq %xmm6,%xmm11,%xmm11
+ vmovdqa -128(%r11),%xmm7
+ vpmuludq %xmm2,%xmm9,%xmm5
+ vpmuludq %xmm3,%xmm9,%xmm6
+ vpaddq %xmm5,%xmm12,%xmm12
+ vpaddq %xmm6,%xmm13,%xmm13
+ vpmuludq %xmm4,%xmm9,%xmm9
+ vpmuludq -112(%r11),%xmm4,%xmm5
+ vpaddq %xmm9,%xmm14,%xmm14
+
+ vpaddq %xmm5,%xmm10,%xmm10
+ vpmuludq %xmm2,%xmm7,%xmm6
+ vpmuludq %xmm3,%xmm7,%xmm5
+ vpaddq %xmm6,%xmm13,%xmm13
+ vmovdqa -96(%r11),%xmm8
+ vpaddq %xmm5,%xmm14,%xmm14
+ vpmuludq %xmm1,%xmm7,%xmm6
+ vpmuludq %xmm0,%xmm7,%xmm7
+ vpaddq %xmm6,%xmm12,%xmm12
+ vpaddq %xmm7,%xmm11,%xmm11
+
+ vmovdqa -80(%r11),%xmm9
+ vpmuludq %xmm2,%xmm8,%xmm5
+ vpmuludq %xmm1,%xmm8,%xmm6
+ vpaddq %xmm5,%xmm14,%xmm14
+ vpaddq %xmm6,%xmm13,%xmm13
+ vmovdqa -64(%r11),%xmm7
+ vpmuludq %xmm0,%xmm8,%xmm8
+ vpmuludq %xmm4,%xmm9,%xmm5
+ vpaddq %xmm8,%xmm12,%xmm12
+ vpaddq %xmm5,%xmm11,%xmm11
+ vmovdqa -48(%r11),%xmm8
+ vpmuludq %xmm3,%xmm9,%xmm9
+ vpmuludq %xmm1,%xmm7,%xmm6
+ vpaddq %xmm9,%xmm10,%xmm10
+
+ vmovdqa -16(%r11),%xmm9
+ vpaddq %xmm6,%xmm14,%xmm14
+ vpmuludq %xmm0,%xmm7,%xmm7
+ vpmuludq %xmm4,%xmm8,%xmm5
+ vpaddq %xmm7,%xmm13,%xmm13
+ vpaddq %xmm5,%xmm12,%xmm12
+ vmovdqu 32(%rsi),%xmm5
+ vpmuludq %xmm3,%xmm8,%xmm7
+ vpmuludq %xmm2,%xmm8,%xmm8
+ vpaddq %xmm7,%xmm11,%xmm11
+ vmovdqu 48(%rsi),%xmm6
+ vpaddq %xmm8,%xmm10,%xmm10
+
+ vpmuludq %xmm2,%xmm9,%xmm2
+ vpmuludq %xmm3,%xmm9,%xmm3
+ vpsrldq $6,%xmm5,%xmm7
+ vpaddq %xmm2,%xmm11,%xmm11
+ vpmuludq %xmm4,%xmm9,%xmm4
+ vpsrldq $6,%xmm6,%xmm8
+ vpaddq %xmm3,%xmm12,%xmm2
+ vpaddq %xmm4,%xmm13,%xmm3
+ vpmuludq -32(%r11),%xmm0,%xmm4
+ vpmuludq %xmm1,%xmm9,%xmm0
+ vpunpckhqdq %xmm6,%xmm5,%xmm9
+ vpaddq %xmm4,%xmm14,%xmm4
+ vpaddq %xmm0,%xmm10,%xmm0
+
+ vpunpcklqdq %xmm6,%xmm5,%xmm5
+ vpunpcklqdq %xmm8,%xmm7,%xmm8
+
+
+ vpsrldq $5,%xmm9,%xmm9
+ vpsrlq $26,%xmm5,%xmm6
+ vmovdqa 0(%rsp),%xmm14
+ vpand %xmm15,%xmm5,%xmm5
+ vpsrlq $4,%xmm8,%xmm7
+ vpand %xmm15,%xmm6,%xmm6
+ vpand 0(%rcx),%xmm9,%xmm9
+ vpsrlq $30,%xmm8,%xmm8
+ vpand %xmm15,%xmm7,%xmm7
+ vpand %xmm15,%xmm8,%xmm8
+ vpor 32(%rcx),%xmm9,%xmm9
+
+ vpsrlq $26,%xmm3,%xmm13
+ vpand %xmm15,%xmm3,%xmm3
+ vpaddq %xmm13,%xmm4,%xmm4
+
+ vpsrlq $26,%xmm0,%xmm10
+ vpand %xmm15,%xmm0,%xmm0
+ vpaddq %xmm10,%xmm11,%xmm1
+
+ vpsrlq $26,%xmm4,%xmm10
+ vpand %xmm15,%xmm4,%xmm4
+
+ vpsrlq $26,%xmm1,%xmm11
+ vpand %xmm15,%xmm1,%xmm1
+ vpaddq %xmm11,%xmm2,%xmm2
+
+ vpaddq %xmm10,%xmm0,%xmm0
+ vpsllq $2,%xmm10,%xmm10
+ vpaddq %xmm10,%xmm0,%xmm0
+
+ vpsrlq $26,%xmm2,%xmm12
+ vpand %xmm15,%xmm2,%xmm2
+ vpaddq %xmm12,%xmm3,%xmm3
+
+ vpsrlq $26,%xmm0,%xmm10
+ vpand %xmm15,%xmm0,%xmm0
+ vpaddq %xmm10,%xmm1,%xmm1
+
+ vpsrlq $26,%xmm3,%xmm13
+ vpand %xmm15,%xmm3,%xmm3
+ vpaddq %xmm13,%xmm4,%xmm4
+
+ ja .Loop_avx
+
+.Lskip_loop_avx:
+ vpshufd $0x10,%xmm14,%xmm14
+ addq $32,%rdx
+ jnz .Long_tail_avx
+
+ vpaddq %xmm2,%xmm7,%xmm7
+ vpaddq %xmm0,%xmm5,%xmm5
+ vpaddq %xmm1,%xmm6,%xmm6
+ vpaddq %xmm3,%xmm8,%xmm8
+ vpaddq %xmm4,%xmm9,%xmm9
+
+.Long_tail_avx:
+ vmovdqa %xmm2,32(%r11)
+ vmovdqa %xmm0,0(%r11)
+ vmovdqa %xmm1,16(%r11)
+ vmovdqa %xmm3,48(%r11)
+ vmovdqa %xmm4,64(%r11)
+
+ vpmuludq %xmm7,%xmm14,%xmm12
+ vpmuludq %xmm5,%xmm14,%xmm10
+ vpshufd $0x10,-48(%rdi),%xmm2
+ vpmuludq %xmm6,%xmm14,%xmm11
+ vpmuludq %xmm8,%xmm14,%xmm13
+ vpmuludq %xmm9,%xmm14,%xmm14
+
+ vpmuludq %xmm8,%xmm2,%xmm0
+ vpaddq %xmm0,%xmm14,%xmm14
+ vpshufd $0x10,-32(%rdi),%xmm3
+ vpmuludq %xmm7,%xmm2,%xmm1
+ vpaddq %xmm1,%xmm13,%xmm13
+ vpshufd $0x10,-16(%rdi),%xmm4
+ vpmuludq %xmm6,%xmm2,%xmm0
+ vpaddq %xmm0,%xmm12,%xmm12
+ vpmuludq %xmm5,%xmm2,%xmm2
+ vpaddq %xmm2,%xmm11,%xmm11
+ vpmuludq %xmm9,%xmm3,%xmm3
+ vpaddq %xmm3,%xmm10,%xmm10
+
+ vpshufd $0x10,0(%rdi),%xmm2
+ vpmuludq %xmm7,%xmm4,%xmm1
+ vpaddq %xmm1,%xmm14,%xmm14
+ vpmuludq %xmm6,%xmm4,%xmm0
+ vpaddq %xmm0,%xmm13,%xmm13
+ vpshufd $0x10,16(%rdi),%xmm3
+ vpmuludq %xmm5,%xmm4,%xmm4
+ vpaddq %xmm4,%xmm12,%xmm12
+ vpmuludq %xmm9,%xmm2,%xmm1
+ vpaddq %xmm1,%xmm11,%xmm11
+ vpshufd $0x10,32(%rdi),%xmm4
+ vpmuludq %xmm8,%xmm2,%xmm2
+ vpaddq %xmm2,%xmm10,%xmm10
+
+ vpmuludq %xmm6,%xmm3,%xmm0
+ vpaddq %xmm0,%xmm14,%xmm14
+ vpmuludq %xmm5,%xmm3,%xmm3
+ vpaddq %xmm3,%xmm13,%xmm13
+ vpshufd $0x10,48(%rdi),%xmm2
+ vpmuludq %xmm9,%xmm4,%xmm1
+ vpaddq %xmm1,%xmm12,%xmm12
+ vpshufd $0x10,64(%rdi),%xmm3
+ vpmuludq %xmm8,%xmm4,%xmm0
+ vpaddq %xmm0,%xmm11,%xmm11
+ vpmuludq %xmm7,%xmm4,%xmm4
+ vpaddq %xmm4,%xmm10,%xmm10
+
+ vpmuludq %xmm5,%xmm2,%xmm2
+ vpaddq %xmm2,%xmm14,%xmm14
+ vpmuludq %xmm9,%xmm3,%xmm1
+ vpaddq %xmm1,%xmm13,%xmm13
+ vpmuludq %xmm8,%xmm3,%xmm0
+ vpaddq %xmm0,%xmm12,%xmm12
+ vpmuludq %xmm7,%xmm3,%xmm1
+ vpaddq %xmm1,%xmm11,%xmm11
+ vpmuludq %xmm6,%xmm3,%xmm3
+ vpaddq %xmm3,%xmm10,%xmm10
+
+ jz .Lshort_tail_avx
+
+ vmovdqu 0(%rsi),%xmm0
+ vmovdqu 16(%rsi),%xmm1
+
+ vpsrldq $6,%xmm0,%xmm2
+ vpsrldq $6,%xmm1,%xmm3
+ vpunpckhqdq %xmm1,%xmm0,%xmm4
+ vpunpcklqdq %xmm1,%xmm0,%xmm0
+ vpunpcklqdq %xmm3,%xmm2,%xmm3
+
+ vpsrlq $40,%xmm4,%xmm4
+ vpsrlq $26,%xmm0,%xmm1
+ vpand %xmm15,%xmm0,%xmm0
+ vpsrlq $4,%xmm3,%xmm2
+ vpand %xmm15,%xmm1,%xmm1
+ vpsrlq $30,%xmm3,%xmm3
+ vpand %xmm15,%xmm2,%xmm2
+ vpand %xmm15,%xmm3,%xmm3
+ vpor 32(%rcx),%xmm4,%xmm4
+
+ vpshufd $0x32,-64(%rdi),%xmm9
+ vpaddq 0(%r11),%xmm0,%xmm0
+ vpaddq 16(%r11),%xmm1,%xmm1
+ vpaddq 32(%r11),%xmm2,%xmm2
+ vpaddq 48(%r11),%xmm3,%xmm3
+ vpaddq 64(%r11),%xmm4,%xmm4
+
+ vpmuludq %xmm0,%xmm9,%xmm5
+ vpaddq %xmm5,%xmm10,%xmm10
+ vpmuludq %xmm1,%xmm9,%xmm6
+ vpaddq %xmm6,%xmm11,%xmm11
+ vpmuludq %xmm2,%xmm9,%xmm5
+ vpaddq %xmm5,%xmm12,%xmm12
+ vpshufd $0x32,-48(%rdi),%xmm7
+ vpmuludq %xmm3,%xmm9,%xmm6
+ vpaddq %xmm6,%xmm13,%xmm13
+ vpmuludq %xmm4,%xmm9,%xmm9
+ vpaddq %xmm9,%xmm14,%xmm14
+
+ vpmuludq %xmm3,%xmm7,%xmm5
+ vpaddq %xmm5,%xmm14,%xmm14
+ vpshufd $0x32,-32(%rdi),%xmm8
+ vpmuludq %xmm2,%xmm7,%xmm6
+ vpaddq %xmm6,%xmm13,%xmm13
+ vpshufd $0x32,-16(%rdi),%xmm9
+ vpmuludq %xmm1,%xmm7,%xmm5
+ vpaddq %xmm5,%xmm12,%xmm12
+ vpmuludq %xmm0,%xmm7,%xmm7
+ vpaddq %xmm7,%xmm11,%xmm11
+ vpmuludq %xmm4,%xmm8,%xmm8
+ vpaddq %xmm8,%xmm10,%xmm10
+
+ vpshufd $0x32,0(%rdi),%xmm7
+ vpmuludq %xmm2,%xmm9,%xmm6
+ vpaddq %xmm6,%xmm14,%xmm14
+ vpmuludq %xmm1,%xmm9,%xmm5
+ vpaddq %xmm5,%xmm13,%xmm13
+ vpshufd $0x32,16(%rdi),%xmm8
+ vpmuludq %xmm0,%xmm9,%xmm9
+ vpaddq %xmm9,%xmm12,%xmm12
+ vpmuludq %xmm4,%xmm7,%xmm6
+ vpaddq %xmm6,%xmm11,%xmm11
+ vpshufd $0x32,32(%rdi),%xmm9
+ vpmuludq %xmm3,%xmm7,%xmm7
+ vpaddq %xmm7,%xmm10,%xmm10
+
+ vpmuludq %xmm1,%xmm8,%xmm5
+ vpaddq %xmm5,%xmm14,%xmm14
+ vpmuludq %xmm0,%xmm8,%xmm8
+ vpaddq %xmm8,%xmm13,%xmm13
+ vpshufd $0x32,48(%rdi),%xmm7
+ vpmuludq %xmm4,%xmm9,%xmm6
+ vpaddq %xmm6,%xmm12,%xmm12
+ vpshufd $0x32,64(%rdi),%xmm8
+ vpmuludq %xmm3,%xmm9,%xmm5
+ vpaddq %xmm5,%xmm11,%xmm11
+ vpmuludq %xmm2,%xmm9,%xmm9
+ vpaddq %xmm9,%xmm10,%xmm10
+
+ vpmuludq %xmm0,%xmm7,%xmm7
+ vpaddq %xmm7,%xmm14,%xmm14
+ vpmuludq %xmm4,%xmm8,%xmm6
+ vpaddq %xmm6,%xmm13,%xmm13
+ vpmuludq %xmm3,%xmm8,%xmm5
+ vpaddq %xmm5,%xmm12,%xmm12
+ vpmuludq %xmm2,%xmm8,%xmm6
+ vpaddq %xmm6,%xmm11,%xmm11
+ vpmuludq %xmm1,%xmm8,%xmm8
+ vpaddq %xmm8,%xmm10,%xmm10
+
+.Lshort_tail_avx:
+
+ vpsrldq $8,%xmm14,%xmm9
+ vpsrldq $8,%xmm13,%xmm8
+ vpsrldq $8,%xmm11,%xmm6
+ vpsrldq $8,%xmm10,%xmm5
+ vpsrldq $8,%xmm12,%xmm7
+ vpaddq %xmm8,%xmm13,%xmm13
+ vpaddq %xmm9,%xmm14,%xmm14
+ vpaddq %xmm5,%xmm10,%xmm10
+ vpaddq %xmm6,%xmm11,%xmm11
+ vpaddq %xmm7,%xmm12,%xmm12
+
+ vpsrlq $26,%xmm13,%xmm3
+ vpand %xmm15,%xmm13,%xmm13
+ vpaddq %xmm3,%xmm14,%xmm14
+
+ vpsrlq $26,%xmm10,%xmm0
+ vpand %xmm15,%xmm10,%xmm10
+ vpaddq %xmm0,%xmm11,%xmm11
+
+ vpsrlq $26,%xmm14,%xmm4
+ vpand %xmm15,%xmm14,%xmm14
+
+ vpsrlq $26,%xmm11,%xmm1
+ vpand %xmm15,%xmm11,%xmm11
+ vpaddq %xmm1,%xmm12,%xmm12
+
+ vpaddq %xmm4,%xmm10,%xmm10
+ vpsllq $2,%xmm4,%xmm4
+ vpaddq %xmm4,%xmm10,%xmm10
+
+ vpsrlq $26,%xmm12,%xmm2
+ vpand %xmm15,%xmm12,%xmm12
+ vpaddq %xmm2,%xmm13,%xmm13
+
+ vpsrlq $26,%xmm10,%xmm0
+ vpand %xmm15,%xmm10,%xmm10
+ vpaddq %xmm0,%xmm11,%xmm11
+
+ vpsrlq $26,%xmm13,%xmm3
+ vpand %xmm15,%xmm13,%xmm13
+ vpaddq %xmm3,%xmm14,%xmm14
+
+ vmovd %xmm10,-112(%rdi)
+ vmovd %xmm11,-108(%rdi)
+ vmovd %xmm12,-104(%rdi)
+ vmovd %xmm13,-100(%rdi)
+ vmovd %xmm14,-96(%rdi)
+ leaq -8(%r10),%rsp
+
+ vzeroupper
+ ret
+ENDPROC(poly1305_blocks_avx)
+
+.align 32
+ENTRY(poly1305_emit_avx)
+ cmpl $0,20(%rdi)
+ je .Lemit
+
+ movl 0(%rdi),%eax
+ movl 4(%rdi),%ecx
+ movl 8(%rdi),%r8d
+ movl 12(%rdi),%r11d
+ movl 16(%rdi),%r10d
+
+ shlq $26,%rcx
+ movq %r8,%r9
+ shlq $52,%r8
+ addq %rcx,%rax
+ shrq $12,%r9
+ addq %rax,%r8
+ adcq $0,%r9
+
+ shlq $14,%r11
+ movq %r10,%rax
+ shrq $24,%r10
+ addq %r11,%r9
+ shlq $40,%rax
+ addq %rax,%r9
+ adcq $0,%r10
+
+ movq %r10,%rax
+ movq %r10,%rcx
+ andq $3,%r10
+ shrq $2,%rax
+ andq $-4,%rcx
+ addq %rcx,%rax
+ addq %rax,%r8
+ adcq $0,%r9
+ adcq $0,%r10
+
+ movq %r8,%rax
+ addq $5,%r8
+ movq %r9,%rcx
+ adcq $0,%r9
+ adcq $0,%r10
+ shrq $2,%r10
+ cmovnzq %r8,%rax
+ cmovnzq %r9,%rcx
+
+ addq 0(%rdx),%rax
+ adcq 8(%rdx),%rcx
+ movq %rax,0(%rsi)
+ movq %rcx,8(%rsi)
+
+ ret
+ENDPROC(poly1305_emit_avx)
+#endif /* CONFIG_AS_AVX */
+
+#ifdef CONFIG_AS_AVX2
+.align 32
+ENTRY(poly1305_blocks_avx2)
+
+ movl 20(%rdi),%r8d
+ cmpq $128,%rdx
+ jae .Lblocks_avx2
+ testl %r8d,%r8d
+ jz .Lblocks
+
+.Lblocks_avx2:
+ andq $-16,%rdx
+ jz .Lno_data_avx2
+
+ vzeroupper
+
+ testl %r8d,%r8d
+ jz .Lbase2_64_avx2
+
+ testq $63,%rdx
+ jz .Leven_avx2
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lblocks_avx2_body:
+
+ movq %rdx,%r15
+
+ movq 0(%rdi),%r8
+ movq 8(%rdi),%r9
+ movl 16(%rdi),%r10d
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+
+ movl %r8d,%r14d
+ andq $-2147483648,%r8
+ movq %r9,%r12
+ movl %r9d,%ebx
+ andq $-2147483648,%r9
+
+ shrq $6,%r8
+ shlq $52,%r12
+ addq %r8,%r14
+ shrq $12,%rbx
+ shrq $18,%r9
+ addq %r12,%r14
+ adcq %r9,%rbx
+
+ movq %r10,%r8
+ shlq $40,%r8
+ shrq $24,%r10
+ addq %r8,%rbx
+ adcq $0,%r10
+
+ movq $-4,%r9
+ movq %r10,%r8
+ andq %r10,%r9
+ shrq $2,%r8
+ andq $3,%r10
+ addq %r9,%r8
+ addq %r8,%r14
+ adcq $0,%rbx
+ adcq $0,%r10
+
+ movq %r13,%r12
+ movq %r13,%rax
+ shrq $2,%r13
+ addq %r12,%r13
+
+.Lbase2_26_pre_avx2:
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+ subq $16,%r15
+
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+ movq %r12,%rax
+
+ testq $63,%r15
+ jnz .Lbase2_26_pre_avx2
+
+ testq %rcx,%rcx
+ jz .Lstore_base2_64_avx2
+
+
+ movq %r14,%rax
+ movq %r14,%rdx
+ shrq $52,%r14
+ movq %rbx,%r11
+ movq %rbx,%r12
+ shrq $26,%rdx
+ andq $0x3ffffff,%rax
+ shlq $12,%r11
+ andq $0x3ffffff,%rdx
+ shrq $14,%rbx
+ orq %r11,%r14
+ shlq $24,%r10
+ andq $0x3ffffff,%r14
+ shrq $40,%r12
+ andq $0x3ffffff,%rbx
+ orq %r12,%r10
+
+ testq %r15,%r15
+ jz .Lstore_base2_26_avx2
+
+ vmovd %eax,%xmm0
+ vmovd %edx,%xmm1
+ vmovd %r14d,%xmm2
+ vmovd %ebx,%xmm3
+ vmovd %r10d,%xmm4
+ jmp .Lproceed_avx2
+
+.align 32
+.Lstore_base2_64_avx2:
+ movq %r14,0(%rdi)
+ movq %rbx,8(%rdi)
+ movq %r10,16(%rdi)
+ jmp .Ldone_avx2
+
+.align 16
+.Lstore_base2_26_avx2:
+ movl %eax,0(%rdi)
+ movl %edx,4(%rdi)
+ movl %r14d,8(%rdi)
+ movl %ebx,12(%rdi)
+ movl %r10d,16(%rdi)
+.align 16
+.Ldone_avx2:
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rsp
+
+.Lno_data_avx2:
+.Lblocks_avx2_epilogue:
+ ret
+
+
+.align 32
+.Lbase2_64_avx2:
+
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lbase2_64_avx2_body:
+
+ movq %rdx,%r15
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+ movq 0(%rdi),%r14
+ movq 8(%rdi),%rbx
+ movl 16(%rdi),%r10d
+
+ movq %r13,%r12
+ movq %r13,%rax
+ shrq $2,%r13
+ addq %r12,%r13
+
+ testq $63,%rdx
+ jz .Linit_avx2
+
+.Lbase2_64_pre_avx2:
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+ subq $16,%r15
+
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+ movq %r12,%rax
+
+ testq $63,%r15
+ jnz .Lbase2_64_pre_avx2
+
+.Linit_avx2:
+
+ movq %r14,%rax
+ movq %r14,%rdx
+ shrq $52,%r14
+ movq %rbx,%r8
+ movq %rbx,%r9
+ shrq $26,%rdx
+ andq $0x3ffffff,%rax
+ shlq $12,%r8
+ andq $0x3ffffff,%rdx
+ shrq $14,%rbx
+ orq %r8,%r14
+ shlq $24,%r10
+ andq $0x3ffffff,%r14
+ shrq $40,%r9
+ andq $0x3ffffff,%rbx
+ orq %r9,%r10
+
+ vmovd %eax,%xmm0
+ vmovd %edx,%xmm1
+ vmovd %r14d,%xmm2
+ vmovd %ebx,%xmm3
+ vmovd %r10d,%xmm4
+ movl $1,20(%rdi)
+
+ __poly1305_init_avx
+
+.Lproceed_avx2:
+ movq %r15,%rdx
+
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rax
+ leaq 48(%rsp),%rsp
+
+.Lbase2_64_avx2_epilogue:
+ jmp .Ldo_avx2
+
+
+.align 32
+.Leven_avx2:
+
+ vmovd 0(%rdi),%xmm0
+ vmovd 4(%rdi),%xmm1
+ vmovd 8(%rdi),%xmm2
+ vmovd 12(%rdi),%xmm3
+ vmovd 16(%rdi),%xmm4
+
+.Ldo_avx2:
+ leaq 8(%rsp),%r10
+ subq $0x128,%rsp
+ leaq .Lconst(%rip),%rcx
+ leaq 48+64(%rdi),%rdi
+ vmovdqa 96(%rcx),%ymm7
+
+
+ vmovdqu -64(%rdi),%xmm9
+ andq $-512,%rsp
+ vmovdqu -48(%rdi),%xmm10
+ vmovdqu -32(%rdi),%xmm6
+ vmovdqu -16(%rdi),%xmm11
+ vmovdqu 0(%rdi),%xmm12
+ vmovdqu 16(%rdi),%xmm13
+ leaq 144(%rsp),%rax
+ vmovdqu 32(%rdi),%xmm14
+ vpermd %ymm9,%ymm7,%ymm9
+ vmovdqu 48(%rdi),%xmm15
+ vpermd %ymm10,%ymm7,%ymm10
+ vmovdqu 64(%rdi),%xmm5
+ vpermd %ymm6,%ymm7,%ymm6
+ vmovdqa %ymm9,0(%rsp)
+ vpermd %ymm11,%ymm7,%ymm11
+ vmovdqa %ymm10,32-144(%rax)
+ vpermd %ymm12,%ymm7,%ymm12
+ vmovdqa %ymm6,64-144(%rax)
+ vpermd %ymm13,%ymm7,%ymm13
+ vmovdqa %ymm11,96-144(%rax)
+ vpermd %ymm14,%ymm7,%ymm14
+ vmovdqa %ymm12,128-144(%rax)
+ vpermd %ymm15,%ymm7,%ymm15
+ vmovdqa %ymm13,160-144(%rax)
+ vpermd %ymm5,%ymm7,%ymm5
+ vmovdqa %ymm14,192-144(%rax)
+ vmovdqa %ymm15,224-144(%rax)
+ vmovdqa %ymm5,256-144(%rax)
+ vmovdqa 64(%rcx),%ymm5
+
+
+
+ vmovdqu 0(%rsi),%xmm7
+ vmovdqu 16(%rsi),%xmm8
+ vinserti128 $1,32(%rsi),%ymm7,%ymm7
+ vinserti128 $1,48(%rsi),%ymm8,%ymm8
+ leaq 64(%rsi),%rsi
+
+ vpsrldq $6,%ymm7,%ymm9
+ vpsrldq $6,%ymm8,%ymm10
+ vpunpckhqdq %ymm8,%ymm7,%ymm6
+ vpunpcklqdq %ymm10,%ymm9,%ymm9
+ vpunpcklqdq %ymm8,%ymm7,%ymm7
+
+ vpsrlq $30,%ymm9,%ymm10
+ vpsrlq $4,%ymm9,%ymm9
+ vpsrlq $26,%ymm7,%ymm8
+ vpsrlq $40,%ymm6,%ymm6
+ vpand %ymm5,%ymm9,%ymm9
+ vpand %ymm5,%ymm7,%ymm7
+ vpand %ymm5,%ymm8,%ymm8
+ vpand %ymm5,%ymm10,%ymm10
+ vpor 32(%rcx),%ymm6,%ymm6
+
+ vpaddq %ymm2,%ymm9,%ymm2
+ subq $64,%rdx
+ jz .Ltail_avx2
+ jmp .Loop_avx2
+
+.align 32
+.Loop_avx2:
+
+ vpaddq %ymm0,%ymm7,%ymm0
+ vmovdqa 0(%rsp),%ymm7
+ vpaddq %ymm1,%ymm8,%ymm1
+ vmovdqa 32(%rsp),%ymm8
+ vpaddq %ymm3,%ymm10,%ymm3
+ vmovdqa 96(%rsp),%ymm9
+ vpaddq %ymm4,%ymm6,%ymm4
+ vmovdqa 48(%rax),%ymm10
+ vmovdqa 112(%rax),%ymm5
+
+ vpmuludq %ymm2,%ymm7,%ymm13
+ vpmuludq %ymm2,%ymm8,%ymm14
+ vpmuludq %ymm2,%ymm9,%ymm15
+ vpmuludq %ymm2,%ymm10,%ymm11
+ vpmuludq %ymm2,%ymm5,%ymm12
+
+ vpmuludq %ymm0,%ymm8,%ymm6
+ vpmuludq %ymm1,%ymm8,%ymm2
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq 64(%rsp),%ymm4,%ymm2
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm11,%ymm11
+ vmovdqa -16(%rax),%ymm8
+
+ vpmuludq %ymm0,%ymm7,%ymm6
+ vpmuludq %ymm1,%ymm7,%ymm2
+ vpaddq %ymm6,%ymm11,%ymm11
+ vpaddq %ymm2,%ymm12,%ymm12
+ vpmuludq %ymm3,%ymm7,%ymm6
+ vpmuludq %ymm4,%ymm7,%ymm2
+ vmovdqu 0(%rsi),%xmm7
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm2,%ymm15,%ymm15
+ vinserti128 $1,32(%rsi),%ymm7,%ymm7
+
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq %ymm4,%ymm8,%ymm2
+ vmovdqu 16(%rsi),%xmm8
+ vpaddq %ymm6,%ymm11,%ymm11
+ vpaddq %ymm2,%ymm12,%ymm12
+ vmovdqa 16(%rax),%ymm2
+ vpmuludq %ymm1,%ymm9,%ymm6
+ vpmuludq %ymm0,%ymm9,%ymm9
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm9,%ymm13,%ymm13
+ vinserti128 $1,48(%rsi),%ymm8,%ymm8
+ leaq 64(%rsi),%rsi
+
+ vpmuludq %ymm1,%ymm2,%ymm6
+ vpmuludq %ymm0,%ymm2,%ymm2
+ vpsrldq $6,%ymm7,%ymm9
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm14,%ymm14
+ vpmuludq %ymm3,%ymm10,%ymm6
+ vpmuludq %ymm4,%ymm10,%ymm2
+ vpsrldq $6,%ymm8,%ymm10
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+ vpunpckhqdq %ymm8,%ymm7,%ymm6
+
+ vpmuludq %ymm3,%ymm5,%ymm3
+ vpmuludq %ymm4,%ymm5,%ymm4
+ vpunpcklqdq %ymm8,%ymm7,%ymm7
+ vpaddq %ymm3,%ymm13,%ymm2
+ vpaddq %ymm4,%ymm14,%ymm3
+ vpunpcklqdq %ymm10,%ymm9,%ymm10
+ vpmuludq 80(%rax),%ymm0,%ymm4
+ vpmuludq %ymm1,%ymm5,%ymm0
+ vmovdqa 64(%rcx),%ymm5
+ vpaddq %ymm4,%ymm15,%ymm4
+ vpaddq %ymm0,%ymm11,%ymm0
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm12,%ymm1
+
+ vpsrlq $26,%ymm4,%ymm15
+ vpand %ymm5,%ymm4,%ymm4
+
+ vpsrlq $4,%ymm10,%ymm9
+
+ vpsrlq $26,%ymm1,%ymm12
+ vpand %ymm5,%ymm1,%ymm1
+ vpaddq %ymm12,%ymm2,%ymm2
+
+ vpaddq %ymm15,%ymm0,%ymm0
+ vpsllq $2,%ymm15,%ymm15
+ vpaddq %ymm15,%ymm0,%ymm0
+
+ vpand %ymm5,%ymm9,%ymm9
+ vpsrlq $26,%ymm7,%ymm8
+
+ vpsrlq $26,%ymm2,%ymm13
+ vpand %ymm5,%ymm2,%ymm2
+ vpaddq %ymm13,%ymm3,%ymm3
+
+ vpaddq %ymm9,%ymm2,%ymm2
+ vpsrlq $30,%ymm10,%ymm10
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm1,%ymm1
+
+ vpsrlq $40,%ymm6,%ymm6
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpand %ymm5,%ymm7,%ymm7
+ vpand %ymm5,%ymm8,%ymm8
+ vpand %ymm5,%ymm10,%ymm10
+ vpor 32(%rcx),%ymm6,%ymm6
+
+ subq $64,%rdx
+ jnz .Loop_avx2
+
+.byte 0x66,0x90
+.Ltail_avx2:
+
+ vpaddq %ymm0,%ymm7,%ymm0
+ vmovdqu 4(%rsp),%ymm7
+ vpaddq %ymm1,%ymm8,%ymm1
+ vmovdqu 36(%rsp),%ymm8
+ vpaddq %ymm3,%ymm10,%ymm3
+ vmovdqu 100(%rsp),%ymm9
+ vpaddq %ymm4,%ymm6,%ymm4
+ vmovdqu 52(%rax),%ymm10
+ vmovdqu 116(%rax),%ymm5
+
+ vpmuludq %ymm2,%ymm7,%ymm13
+ vpmuludq %ymm2,%ymm8,%ymm14
+ vpmuludq %ymm2,%ymm9,%ymm15
+ vpmuludq %ymm2,%ymm10,%ymm11
+ vpmuludq %ymm2,%ymm5,%ymm12
+
+ vpmuludq %ymm0,%ymm8,%ymm6
+ vpmuludq %ymm1,%ymm8,%ymm2
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq 68(%rsp),%ymm4,%ymm2
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm11,%ymm11
+
+ vpmuludq %ymm0,%ymm7,%ymm6
+ vpmuludq %ymm1,%ymm7,%ymm2
+ vpaddq %ymm6,%ymm11,%ymm11
+ vmovdqu -12(%rax),%ymm8
+ vpaddq %ymm2,%ymm12,%ymm12
+ vpmuludq %ymm3,%ymm7,%ymm6
+ vpmuludq %ymm4,%ymm7,%ymm2
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm2,%ymm15,%ymm15
+
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq %ymm4,%ymm8,%ymm2
+ vpaddq %ymm6,%ymm11,%ymm11
+ vpaddq %ymm2,%ymm12,%ymm12
+ vmovdqu 20(%rax),%ymm2
+ vpmuludq %ymm1,%ymm9,%ymm6
+ vpmuludq %ymm0,%ymm9,%ymm9
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm9,%ymm13,%ymm13
+
+ vpmuludq %ymm1,%ymm2,%ymm6
+ vpmuludq %ymm0,%ymm2,%ymm2
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm14,%ymm14
+ vpmuludq %ymm3,%ymm10,%ymm6
+ vpmuludq %ymm4,%ymm10,%ymm2
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+
+ vpmuludq %ymm3,%ymm5,%ymm3
+ vpmuludq %ymm4,%ymm5,%ymm4
+ vpaddq %ymm3,%ymm13,%ymm2
+ vpaddq %ymm4,%ymm14,%ymm3
+ vpmuludq 84(%rax),%ymm0,%ymm4
+ vpmuludq %ymm1,%ymm5,%ymm0
+ vmovdqa 64(%rcx),%ymm5
+ vpaddq %ymm4,%ymm15,%ymm4
+ vpaddq %ymm0,%ymm11,%ymm0
+
+ vpsrldq $8,%ymm12,%ymm8
+ vpsrldq $8,%ymm2,%ymm9
+ vpsrldq $8,%ymm3,%ymm10
+ vpsrldq $8,%ymm4,%ymm6
+ vpsrldq $8,%ymm0,%ymm7
+ vpaddq %ymm8,%ymm12,%ymm12
+ vpaddq %ymm9,%ymm2,%ymm2
+ vpaddq %ymm10,%ymm3,%ymm3
+ vpaddq %ymm6,%ymm4,%ymm4
+ vpaddq %ymm7,%ymm0,%ymm0
+
+ vpermq $0x2,%ymm3,%ymm10
+ vpermq $0x2,%ymm4,%ymm6
+ vpermq $0x2,%ymm0,%ymm7
+ vpermq $0x2,%ymm12,%ymm8
+ vpermq $0x2,%ymm2,%ymm9
+ vpaddq %ymm10,%ymm3,%ymm3
+ vpaddq %ymm6,%ymm4,%ymm4
+ vpaddq %ymm7,%ymm0,%ymm0
+ vpaddq %ymm8,%ymm12,%ymm12
+ vpaddq %ymm9,%ymm2,%ymm2
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm12,%ymm1
+
+ vpsrlq $26,%ymm4,%ymm15
+ vpand %ymm5,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm1,%ymm12
+ vpand %ymm5,%ymm1,%ymm1
+ vpaddq %ymm12,%ymm2,%ymm2
+
+ vpaddq %ymm15,%ymm0,%ymm0
+ vpsllq $2,%ymm15,%ymm15
+ vpaddq %ymm15,%ymm0,%ymm0
+
+ vpsrlq $26,%ymm2,%ymm13
+ vpand %ymm5,%ymm2,%ymm2
+ vpaddq %ymm13,%ymm3,%ymm3
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm1,%ymm1
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vmovd %xmm0,-112(%rdi)
+ vmovd %xmm1,-108(%rdi)
+ vmovd %xmm2,-104(%rdi)
+ vmovd %xmm3,-100(%rdi)
+ vmovd %xmm4,-96(%rdi)
+ leaq -8(%r10),%rsp
+
+ vzeroupper
+ ret
+
+ENDPROC(poly1305_blocks_avx2)
+#endif /* CONFIG_AS_AVX2 */
+
+#ifdef CONFIG_AS_AVX512
+.align 32
+ENTRY(poly1305_blocks_avx512)
+
+ movl 20(%rdi),%r8d
+ cmpq $128,%rdx
+ jae .Lblocks_avx2_512
+ testl %r8d,%r8d
+ jz .Lblocks
+
+.Lblocks_avx2_512:
+ andq $-16,%rdx
+ jz .Lno_data_avx2_512
+
+ vzeroupper
+
+ testl %r8d,%r8d
+ jz .Lbase2_64_avx2_512
+
+ testq $63,%rdx
+ jz .Leven_avx2_512
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lblocks_avx2_body_512:
+
+ movq %rdx,%r15
+
+ movq 0(%rdi),%r8
+ movq 8(%rdi),%r9
+ movl 16(%rdi),%r10d
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+
+ movl %r8d,%r14d
+ andq $-2147483648,%r8
+ movq %r9,%r12
+ movl %r9d,%ebx
+ andq $-2147483648,%r9
+
+ shrq $6,%r8
+ shlq $52,%r12
+ addq %r8,%r14
+ shrq $12,%rbx
+ shrq $18,%r9
+ addq %r12,%r14
+ adcq %r9,%rbx
+
+ movq %r10,%r8
+ shlq $40,%r8
+ shrq $24,%r10
+ addq %r8,%rbx
+ adcq $0,%r10
+
+ movq $-4,%r9
+ movq %r10,%r8
+ andq %r10,%r9
+ shrq $2,%r8
+ andq $3,%r10
+ addq %r9,%r8
+ addq %r8,%r14
+ adcq $0,%rbx
+ adcq $0,%r10
+
+ movq %r13,%r12
+ movq %r13,%rax
+ shrq $2,%r13
+ addq %r12,%r13
+
+.Lbase2_26_pre_avx2_512:
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+ subq $16,%r15
+
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+ movq %r12,%rax
+
+ testq $63,%r15
+ jnz .Lbase2_26_pre_avx2_512
+
+ testq %rcx,%rcx
+ jz .Lstore_base2_64_avx2_512
+
+
+ movq %r14,%rax
+ movq %r14,%rdx
+ shrq $52,%r14
+ movq %rbx,%r11
+ movq %rbx,%r12
+ shrq $26,%rdx
+ andq $0x3ffffff,%rax
+ shlq $12,%r11
+ andq $0x3ffffff,%rdx
+ shrq $14,%rbx
+ orq %r11,%r14
+ shlq $24,%r10
+ andq $0x3ffffff,%r14
+ shrq $40,%r12
+ andq $0x3ffffff,%rbx
+ orq %r12,%r10
+
+ testq %r15,%r15
+ jz .Lstore_base2_26_avx2_512
+
+ vmovd %eax,%xmm0
+ vmovd %edx,%xmm1
+ vmovd %r14d,%xmm2
+ vmovd %ebx,%xmm3
+ vmovd %r10d,%xmm4
+ jmp .Lproceed_avx2_512
+
+.align 32
+.Lstore_base2_64_avx2_512:
+ movq %r14,0(%rdi)
+ movq %rbx,8(%rdi)
+ movq %r10,16(%rdi)
+ jmp .Ldone_avx2_512
+
+.align 16
+.Lstore_base2_26_avx2_512:
+ movl %eax,0(%rdi)
+ movl %edx,4(%rdi)
+ movl %r14d,8(%rdi)
+ movl %ebx,12(%rdi)
+ movl %r10d,16(%rdi)
+.align 16
+.Ldone_avx2_512:
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rsp
+
+.Lno_data_avx2_512:
+.Lblocks_avx2_epilogue_512:
+ ret
+
+
+.align 32
+.Lbase2_64_avx2_512:
+
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+ pushq %rdi
+
+.Lbase2_64_avx2_body_512:
+
+ movq %rdx,%r15
+
+ movq 24(%rdi),%r11
+ movq 32(%rdi),%r13
+
+ movq 0(%rdi),%r14
+ movq 8(%rdi),%rbx
+ movl 16(%rdi),%r10d
+
+ movq %r13,%r12
+ movq %r13,%rax
+ shrq $2,%r13
+ addq %r12,%r13
+
+ testq $63,%rdx
+ jz .Linit_avx2_512
+
+.Lbase2_64_pre_avx2_512:
+ addq 0(%rsi),%r14
+ adcq 8(%rsi),%rbx
+ leaq 16(%rsi),%rsi
+ adcq %rcx,%r10
+ subq $16,%r15
+
+ movq %rdi,0(%rsp)
+ __poly1305_block
+ movq 0(%rsp),%rdi
+ movq %r12,%rax
+
+ testq $63,%r15
+ jnz .Lbase2_64_pre_avx2_512
+
+.Linit_avx2_512:
+
+ movq %r14,%rax
+ movq %r14,%rdx
+ shrq $52,%r14
+ movq %rbx,%r8
+ movq %rbx,%r9
+ shrq $26,%rdx
+ andq $0x3ffffff,%rax
+ shlq $12,%r8
+ andq $0x3ffffff,%rdx
+ shrq $14,%rbx
+ orq %r8,%r14
+ shlq $24,%r10
+ andq $0x3ffffff,%r14
+ shrq $40,%r9
+ andq $0x3ffffff,%rbx
+ orq %r9,%r10
+
+ vmovd %eax,%xmm0
+ vmovd %edx,%xmm1
+ vmovd %r14d,%xmm2
+ vmovd %ebx,%xmm3
+ vmovd %r10d,%xmm4
+ movl $1,20(%rdi)
+
+ __poly1305_init_avx
+
+.Lproceed_avx2_512:
+ movq %r15,%rdx
+
+ movq 8(%rsp),%r15
+ movq 16(%rsp),%r14
+ movq 24(%rsp),%r13
+ movq 32(%rsp),%r12
+ movq 40(%rsp),%rbx
+ leaq 48(%rsp),%rax
+ leaq 48(%rsp),%rsp
+
+.Lbase2_64_avx2_epilogue_512:
+ jmp .Ldo_avx2_512
+
+
+.align 32
+.Leven_avx2_512:
+
+ vmovd 0(%rdi),%xmm0
+ vmovd 4(%rdi),%xmm1
+ vmovd 8(%rdi),%xmm2
+ vmovd 12(%rdi),%xmm3
+ vmovd 16(%rdi),%xmm4
+
+.Ldo_avx2_512:
+ cmpq $512,%rdx
+ jae .Lblocks_avx512
+.Lskip_avx512:
+ leaq 8(%rsp),%r10
+
+ subq $0x128,%rsp
+ leaq .Lconst(%rip),%rcx
+ leaq 48+64(%rdi),%rdi
+ vmovdqa 96(%rcx),%ymm7
+
+
+ vmovdqu -64(%rdi),%xmm9
+ andq $-512,%rsp
+ vmovdqu -48(%rdi),%xmm10
+ vmovdqu -32(%rdi),%xmm6
+ vmovdqu -16(%rdi),%xmm11
+ vmovdqu 0(%rdi),%xmm12
+ vmovdqu 16(%rdi),%xmm13
+ leaq 144(%rsp),%rax
+ vmovdqu 32(%rdi),%xmm14
+ vpermd %ymm9,%ymm7,%ymm9
+ vmovdqu 48(%rdi),%xmm15
+ vpermd %ymm10,%ymm7,%ymm10
+ vmovdqu 64(%rdi),%xmm5
+ vpermd %ymm6,%ymm7,%ymm6
+ vmovdqa %ymm9,0(%rsp)
+ vpermd %ymm11,%ymm7,%ymm11
+ vmovdqa %ymm10,32-144(%rax)
+ vpermd %ymm12,%ymm7,%ymm12
+ vmovdqa %ymm6,64-144(%rax)
+ vpermd %ymm13,%ymm7,%ymm13
+ vmovdqa %ymm11,96-144(%rax)
+ vpermd %ymm14,%ymm7,%ymm14
+ vmovdqa %ymm12,128-144(%rax)
+ vpermd %ymm15,%ymm7,%ymm15
+ vmovdqa %ymm13,160-144(%rax)
+ vpermd %ymm5,%ymm7,%ymm5
+ vmovdqa %ymm14,192-144(%rax)
+ vmovdqa %ymm15,224-144(%rax)
+ vmovdqa %ymm5,256-144(%rax)
+ vmovdqa 64(%rcx),%ymm5
+
+
+
+ vmovdqu 0(%rsi),%xmm7
+ vmovdqu 16(%rsi),%xmm8
+ vinserti128 $1,32(%rsi),%ymm7,%ymm7
+ vinserti128 $1,48(%rsi),%ymm8,%ymm8
+ leaq 64(%rsi),%rsi
+
+ vpsrldq $6,%ymm7,%ymm9
+ vpsrldq $6,%ymm8,%ymm10
+ vpunpckhqdq %ymm8,%ymm7,%ymm6
+ vpunpcklqdq %ymm10,%ymm9,%ymm9
+ vpunpcklqdq %ymm8,%ymm7,%ymm7
+
+ vpsrlq $30,%ymm9,%ymm10
+ vpsrlq $4,%ymm9,%ymm9
+ vpsrlq $26,%ymm7,%ymm8
+ vpsrlq $40,%ymm6,%ymm6
+ vpand %ymm5,%ymm9,%ymm9
+ vpand %ymm5,%ymm7,%ymm7
+ vpand %ymm5,%ymm8,%ymm8
+ vpand %ymm5,%ymm10,%ymm10
+ vpor 32(%rcx),%ymm6,%ymm6
+
+ vpaddq %ymm2,%ymm9,%ymm2
+ subq $64,%rdx
+ jz .Ltail_avx2_512
+ jmp .Loop_avx2_512
+
+.align 32
+.Loop_avx2_512:
+
+ vpaddq %ymm0,%ymm7,%ymm0
+ vmovdqa 0(%rsp),%ymm7
+ vpaddq %ymm1,%ymm8,%ymm1
+ vmovdqa 32(%rsp),%ymm8
+ vpaddq %ymm3,%ymm10,%ymm3
+ vmovdqa 96(%rsp),%ymm9
+ vpaddq %ymm4,%ymm6,%ymm4
+ vmovdqa 48(%rax),%ymm10
+ vmovdqa 112(%rax),%ymm5
+
+ vpmuludq %ymm2,%ymm7,%ymm13
+ vpmuludq %ymm2,%ymm8,%ymm14
+ vpmuludq %ymm2,%ymm9,%ymm15
+ vpmuludq %ymm2,%ymm10,%ymm11
+ vpmuludq %ymm2,%ymm5,%ymm12
+
+ vpmuludq %ymm0,%ymm8,%ymm6
+ vpmuludq %ymm1,%ymm8,%ymm2
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq 64(%rsp),%ymm4,%ymm2
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm11,%ymm11
+ vmovdqa -16(%rax),%ymm8
+
+ vpmuludq %ymm0,%ymm7,%ymm6
+ vpmuludq %ymm1,%ymm7,%ymm2
+ vpaddq %ymm6,%ymm11,%ymm11
+ vpaddq %ymm2,%ymm12,%ymm12
+ vpmuludq %ymm3,%ymm7,%ymm6
+ vpmuludq %ymm4,%ymm7,%ymm2
+ vmovdqu 0(%rsi),%xmm7
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm2,%ymm15,%ymm15
+ vinserti128 $1,32(%rsi),%ymm7,%ymm7
+
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq %ymm4,%ymm8,%ymm2
+ vmovdqu 16(%rsi),%xmm8
+ vpaddq %ymm6,%ymm11,%ymm11
+ vpaddq %ymm2,%ymm12,%ymm12
+ vmovdqa 16(%rax),%ymm2
+ vpmuludq %ymm1,%ymm9,%ymm6
+ vpmuludq %ymm0,%ymm9,%ymm9
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm9,%ymm13,%ymm13
+ vinserti128 $1,48(%rsi),%ymm8,%ymm8
+ leaq 64(%rsi),%rsi
+
+ vpmuludq %ymm1,%ymm2,%ymm6
+ vpmuludq %ymm0,%ymm2,%ymm2
+ vpsrldq $6,%ymm7,%ymm9
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm14,%ymm14
+ vpmuludq %ymm3,%ymm10,%ymm6
+ vpmuludq %ymm4,%ymm10,%ymm2
+ vpsrldq $6,%ymm8,%ymm10
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+ vpunpckhqdq %ymm8,%ymm7,%ymm6
+
+ vpmuludq %ymm3,%ymm5,%ymm3
+ vpmuludq %ymm4,%ymm5,%ymm4
+ vpunpcklqdq %ymm8,%ymm7,%ymm7
+ vpaddq %ymm3,%ymm13,%ymm2
+ vpaddq %ymm4,%ymm14,%ymm3
+ vpunpcklqdq %ymm10,%ymm9,%ymm10
+ vpmuludq 80(%rax),%ymm0,%ymm4
+ vpmuludq %ymm1,%ymm5,%ymm0
+ vmovdqa 64(%rcx),%ymm5
+ vpaddq %ymm4,%ymm15,%ymm4
+ vpaddq %ymm0,%ymm11,%ymm0
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm12,%ymm1
+
+ vpsrlq $26,%ymm4,%ymm15
+ vpand %ymm5,%ymm4,%ymm4
+
+ vpsrlq $4,%ymm10,%ymm9
+
+ vpsrlq $26,%ymm1,%ymm12
+ vpand %ymm5,%ymm1,%ymm1
+ vpaddq %ymm12,%ymm2,%ymm2
+
+ vpaddq %ymm15,%ymm0,%ymm0
+ vpsllq $2,%ymm15,%ymm15
+ vpaddq %ymm15,%ymm0,%ymm0
+
+ vpand %ymm5,%ymm9,%ymm9
+ vpsrlq $26,%ymm7,%ymm8
+
+ vpsrlq $26,%ymm2,%ymm13
+ vpand %ymm5,%ymm2,%ymm2
+ vpaddq %ymm13,%ymm3,%ymm3
+
+ vpaddq %ymm9,%ymm2,%ymm2
+ vpsrlq $30,%ymm10,%ymm10
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm1,%ymm1
+
+ vpsrlq $40,%ymm6,%ymm6
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpand %ymm5,%ymm7,%ymm7
+ vpand %ymm5,%ymm8,%ymm8
+ vpand %ymm5,%ymm10,%ymm10
+ vpor 32(%rcx),%ymm6,%ymm6
+
+ subq $64,%rdx
+ jnz .Loop_avx2_512
+
+.byte 0x66,0x90
+.Ltail_avx2_512:
+
+ vpaddq %ymm0,%ymm7,%ymm0
+ vmovdqu 4(%rsp),%ymm7
+ vpaddq %ymm1,%ymm8,%ymm1
+ vmovdqu 36(%rsp),%ymm8
+ vpaddq %ymm3,%ymm10,%ymm3
+ vmovdqu 100(%rsp),%ymm9
+ vpaddq %ymm4,%ymm6,%ymm4
+ vmovdqu 52(%rax),%ymm10
+ vmovdqu 116(%rax),%ymm5
+
+ vpmuludq %ymm2,%ymm7,%ymm13
+ vpmuludq %ymm2,%ymm8,%ymm14
+ vpmuludq %ymm2,%ymm9,%ymm15
+ vpmuludq %ymm2,%ymm10,%ymm11
+ vpmuludq %ymm2,%ymm5,%ymm12
+
+ vpmuludq %ymm0,%ymm8,%ymm6
+ vpmuludq %ymm1,%ymm8,%ymm2
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq 68(%rsp),%ymm4,%ymm2
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm11,%ymm11
+
+ vpmuludq %ymm0,%ymm7,%ymm6
+ vpmuludq %ymm1,%ymm7,%ymm2
+ vpaddq %ymm6,%ymm11,%ymm11
+ vmovdqu -12(%rax),%ymm8
+ vpaddq %ymm2,%ymm12,%ymm12
+ vpmuludq %ymm3,%ymm7,%ymm6
+ vpmuludq %ymm4,%ymm7,%ymm2
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm2,%ymm15,%ymm15
+
+ vpmuludq %ymm3,%ymm8,%ymm6
+ vpmuludq %ymm4,%ymm8,%ymm2
+ vpaddq %ymm6,%ymm11,%ymm11
+ vpaddq %ymm2,%ymm12,%ymm12
+ vmovdqu 20(%rax),%ymm2
+ vpmuludq %ymm1,%ymm9,%ymm6
+ vpmuludq %ymm0,%ymm9,%ymm9
+ vpaddq %ymm6,%ymm14,%ymm14
+ vpaddq %ymm9,%ymm13,%ymm13
+
+ vpmuludq %ymm1,%ymm2,%ymm6
+ vpmuludq %ymm0,%ymm2,%ymm2
+ vpaddq %ymm6,%ymm15,%ymm15
+ vpaddq %ymm2,%ymm14,%ymm14
+ vpmuludq %ymm3,%ymm10,%ymm6
+ vpmuludq %ymm4,%ymm10,%ymm2
+ vpaddq %ymm6,%ymm12,%ymm12
+ vpaddq %ymm2,%ymm13,%ymm13
+
+ vpmuludq %ymm3,%ymm5,%ymm3
+ vpmuludq %ymm4,%ymm5,%ymm4
+ vpaddq %ymm3,%ymm13,%ymm2
+ vpaddq %ymm4,%ymm14,%ymm3
+ vpmuludq 84(%rax),%ymm0,%ymm4
+ vpmuludq %ymm1,%ymm5,%ymm0
+ vmovdqa 64(%rcx),%ymm5
+ vpaddq %ymm4,%ymm15,%ymm4
+ vpaddq %ymm0,%ymm11,%ymm0
+
+ vpsrldq $8,%ymm12,%ymm8
+ vpsrldq $8,%ymm2,%ymm9
+ vpsrldq $8,%ymm3,%ymm10
+ vpsrldq $8,%ymm4,%ymm6
+ vpsrldq $8,%ymm0,%ymm7
+ vpaddq %ymm8,%ymm12,%ymm12
+ vpaddq %ymm9,%ymm2,%ymm2
+ vpaddq %ymm10,%ymm3,%ymm3
+ vpaddq %ymm6,%ymm4,%ymm4
+ vpaddq %ymm7,%ymm0,%ymm0
+
+ vpermq $0x2,%ymm3,%ymm10
+ vpermq $0x2,%ymm4,%ymm6
+ vpermq $0x2,%ymm0,%ymm7
+ vpermq $0x2,%ymm12,%ymm8
+ vpermq $0x2,%ymm2,%ymm9
+ vpaddq %ymm10,%ymm3,%ymm3
+ vpaddq %ymm6,%ymm4,%ymm4
+ vpaddq %ymm7,%ymm0,%ymm0
+ vpaddq %ymm8,%ymm12,%ymm12
+ vpaddq %ymm9,%ymm2,%ymm2
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm12,%ymm1
+
+ vpsrlq $26,%ymm4,%ymm15
+ vpand %ymm5,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm1,%ymm12
+ vpand %ymm5,%ymm1,%ymm1
+ vpaddq %ymm12,%ymm2,%ymm2
+
+ vpaddq %ymm15,%ymm0,%ymm0
+ vpsllq $2,%ymm15,%ymm15
+ vpaddq %ymm15,%ymm0,%ymm0
+
+ vpsrlq $26,%ymm2,%ymm13
+ vpand %ymm5,%ymm2,%ymm2
+ vpaddq %ymm13,%ymm3,%ymm3
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm11,%ymm1,%ymm1
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vmovd %xmm0,-112(%rdi)
+ vmovd %xmm1,-108(%rdi)
+ vmovd %xmm2,-104(%rdi)
+ vmovd %xmm3,-100(%rdi)
+ vmovd %xmm4,-96(%rdi)
+ leaq -8(%r10),%rsp
+
+ vzeroupper
+ ret
+
+.Lblocks_avx512:
+
+ movl $15,%eax
+ kmovw %eax,%k2
+ leaq 8(%rsp),%r10
+
+ subq $0x128,%rsp
+ leaq .Lconst(%rip),%rcx
+ leaq 48+64(%rdi),%rdi
+ vmovdqa 96(%rcx),%ymm9
+
+ vmovdqu32 -64(%rdi),%zmm16{%k2}{z}
+ andq $-512,%rsp
+ vmovdqu32 -48(%rdi),%zmm17{%k2}{z}
+ movq $0x20,%rax
+ vmovdqu32 -32(%rdi),%zmm21{%k2}{z}
+ vmovdqu32 -16(%rdi),%zmm18{%k2}{z}
+ vmovdqu32 0(%rdi),%zmm22{%k2}{z}
+ vmovdqu32 16(%rdi),%zmm19{%k2}{z}
+ vmovdqu32 32(%rdi),%zmm23{%k2}{z}
+ vmovdqu32 48(%rdi),%zmm20{%k2}{z}
+ vmovdqu32 64(%rdi),%zmm24{%k2}{z}
+ vpermd %zmm16,%zmm9,%zmm16
+ vpbroadcastq 64(%rcx),%zmm5
+ vpermd %zmm17,%zmm9,%zmm17
+ vpermd %zmm21,%zmm9,%zmm21
+ vpermd %zmm18,%zmm9,%zmm18
+ vmovdqa64 %zmm16,0(%rsp){%k2}
+ vpsrlq $32,%zmm16,%zmm7
+ vpermd %zmm22,%zmm9,%zmm22
+ vmovdqu64 %zmm17,0(%rsp,%rax,1){%k2}
+ vpsrlq $32,%zmm17,%zmm8
+ vpermd %zmm19,%zmm9,%zmm19
+ vmovdqa64 %zmm21,64(%rsp){%k2}
+ vpermd %zmm23,%zmm9,%zmm23
+ vpermd %zmm20,%zmm9,%zmm20
+ vmovdqu64 %zmm18,64(%rsp,%rax,1){%k2}
+ vpermd %zmm24,%zmm9,%zmm24
+ vmovdqa64 %zmm22,128(%rsp){%k2}
+ vmovdqu64 %zmm19,128(%rsp,%rax,1){%k2}
+ vmovdqa64 %zmm23,192(%rsp){%k2}
+ vmovdqu64 %zmm20,192(%rsp,%rax,1){%k2}
+ vmovdqa64 %zmm24,256(%rsp){%k2}
+
+ vpmuludq %zmm7,%zmm16,%zmm11
+ vpmuludq %zmm7,%zmm17,%zmm12
+ vpmuludq %zmm7,%zmm18,%zmm13
+ vpmuludq %zmm7,%zmm19,%zmm14
+ vpmuludq %zmm7,%zmm20,%zmm15
+ vpsrlq $32,%zmm18,%zmm9
+
+ vpmuludq %zmm8,%zmm24,%zmm25
+ vpmuludq %zmm8,%zmm16,%zmm26
+ vpmuludq %zmm8,%zmm17,%zmm27
+ vpmuludq %zmm8,%zmm18,%zmm28
+ vpmuludq %zmm8,%zmm19,%zmm29
+ vpsrlq $32,%zmm19,%zmm10
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+
+ vpmuludq %zmm9,%zmm23,%zmm25
+ vpmuludq %zmm9,%zmm24,%zmm26
+ vpmuludq %zmm9,%zmm17,%zmm28
+ vpmuludq %zmm9,%zmm18,%zmm29
+ vpmuludq %zmm9,%zmm16,%zmm27
+ vpsrlq $32,%zmm20,%zmm6
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vpmuludq %zmm10,%zmm22,%zmm25
+ vpmuludq %zmm10,%zmm16,%zmm28
+ vpmuludq %zmm10,%zmm17,%zmm29
+ vpmuludq %zmm10,%zmm23,%zmm26
+ vpmuludq %zmm10,%zmm24,%zmm27
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vpmuludq %zmm6,%zmm24,%zmm28
+ vpmuludq %zmm6,%zmm16,%zmm29
+ vpmuludq %zmm6,%zmm21,%zmm25
+ vpmuludq %zmm6,%zmm22,%zmm26
+ vpmuludq %zmm6,%zmm23,%zmm27
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vmovdqu64 0(%rsi),%zmm10
+ vmovdqu64 64(%rsi),%zmm6
+ leaq 128(%rsi),%rsi
+
+ vpsrlq $26,%zmm14,%zmm28
+ vpandq %zmm5,%zmm14,%zmm14
+ vpaddq %zmm28,%zmm15,%zmm15
+
+ vpsrlq $26,%zmm11,%zmm25
+ vpandq %zmm5,%zmm11,%zmm11
+ vpaddq %zmm25,%zmm12,%zmm12
+
+ vpsrlq $26,%zmm15,%zmm29
+ vpandq %zmm5,%zmm15,%zmm15
+
+ vpsrlq $26,%zmm12,%zmm26
+ vpandq %zmm5,%zmm12,%zmm12
+ vpaddq %zmm26,%zmm13,%zmm13
+
+ vpaddq %zmm29,%zmm11,%zmm11
+ vpsllq $2,%zmm29,%zmm29
+ vpaddq %zmm29,%zmm11,%zmm11
+
+ vpsrlq $26,%zmm13,%zmm27
+ vpandq %zmm5,%zmm13,%zmm13
+ vpaddq %zmm27,%zmm14,%zmm14
+
+ vpsrlq $26,%zmm11,%zmm25
+ vpandq %zmm5,%zmm11,%zmm11
+ vpaddq %zmm25,%zmm12,%zmm12
+
+ vpsrlq $26,%zmm14,%zmm28
+ vpandq %zmm5,%zmm14,%zmm14
+ vpaddq %zmm28,%zmm15,%zmm15
+
+ vpunpcklqdq %zmm6,%zmm10,%zmm7
+ vpunpckhqdq %zmm6,%zmm10,%zmm6
+
+ vmovdqa32 128(%rcx),%zmm25
+ movl $0x7777,%eax
+ kmovw %eax,%k1
+
+ vpermd %zmm16,%zmm25,%zmm16
+ vpermd %zmm17,%zmm25,%zmm17
+ vpermd %zmm18,%zmm25,%zmm18
+ vpermd %zmm19,%zmm25,%zmm19
+ vpermd %zmm20,%zmm25,%zmm20
+
+ vpermd %zmm11,%zmm25,%zmm16{%k1}
+ vpermd %zmm12,%zmm25,%zmm17{%k1}
+ vpermd %zmm13,%zmm25,%zmm18{%k1}
+ vpermd %zmm14,%zmm25,%zmm19{%k1}
+ vpermd %zmm15,%zmm25,%zmm20{%k1}
+
+ vpslld $2,%zmm17,%zmm21
+ vpslld $2,%zmm18,%zmm22
+ vpslld $2,%zmm19,%zmm23
+ vpslld $2,%zmm20,%zmm24
+ vpaddd %zmm17,%zmm21,%zmm21
+ vpaddd %zmm18,%zmm22,%zmm22
+ vpaddd %zmm19,%zmm23,%zmm23
+ vpaddd %zmm20,%zmm24,%zmm24
+
+ vpbroadcastq 32(%rcx),%zmm30
+
+ vpsrlq $52,%zmm7,%zmm9
+ vpsllq $12,%zmm6,%zmm10
+ vporq %zmm10,%zmm9,%zmm9
+ vpsrlq $26,%zmm7,%zmm8
+ vpsrlq $14,%zmm6,%zmm10
+ vpsrlq $40,%zmm6,%zmm6
+ vpandq %zmm5,%zmm9,%zmm9
+ vpandq %zmm5,%zmm7,%zmm7
+
+ vpaddq %zmm2,%zmm9,%zmm2
+ subq $192,%rdx
+ jbe .Ltail_avx512
+ jmp .Loop_avx512
+
+.align 32
+.Loop_avx512:
+
+ vpmuludq %zmm2,%zmm17,%zmm14
+ vpaddq %zmm0,%zmm7,%zmm0
+ vpmuludq %zmm2,%zmm18,%zmm15
+ vpandq %zmm5,%zmm8,%zmm8
+ vpmuludq %zmm2,%zmm23,%zmm11
+ vpandq %zmm5,%zmm10,%zmm10
+ vpmuludq %zmm2,%zmm24,%zmm12
+ vporq %zmm30,%zmm6,%zmm6
+ vpmuludq %zmm2,%zmm16,%zmm13
+ vpaddq %zmm1,%zmm8,%zmm1
+ vpaddq %zmm3,%zmm10,%zmm3
+ vpaddq %zmm4,%zmm6,%zmm4
+
+ vmovdqu64 0(%rsi),%zmm10
+ vmovdqu64 64(%rsi),%zmm6
+ leaq 128(%rsi),%rsi
+ vpmuludq %zmm0,%zmm19,%zmm28
+ vpmuludq %zmm0,%zmm20,%zmm29
+ vpmuludq %zmm0,%zmm16,%zmm25
+ vpmuludq %zmm0,%zmm17,%zmm26
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+
+ vpmuludq %zmm1,%zmm18,%zmm28
+ vpmuludq %zmm1,%zmm19,%zmm29
+ vpmuludq %zmm1,%zmm24,%zmm25
+ vpmuludq %zmm0,%zmm18,%zmm27
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vpunpcklqdq %zmm6,%zmm10,%zmm7
+ vpunpckhqdq %zmm6,%zmm10,%zmm6
+
+ vpmuludq %zmm3,%zmm16,%zmm28
+ vpmuludq %zmm3,%zmm17,%zmm29
+ vpmuludq %zmm1,%zmm16,%zmm26
+ vpmuludq %zmm1,%zmm17,%zmm27
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vpmuludq %zmm4,%zmm24,%zmm28
+ vpmuludq %zmm4,%zmm16,%zmm29
+ vpmuludq %zmm3,%zmm22,%zmm25
+ vpmuludq %zmm3,%zmm23,%zmm26
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpmuludq %zmm3,%zmm24,%zmm27
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vpmuludq %zmm4,%zmm21,%zmm25
+ vpmuludq %zmm4,%zmm22,%zmm26
+ vpmuludq %zmm4,%zmm23,%zmm27
+ vpaddq %zmm25,%zmm11,%zmm0
+ vpaddq %zmm26,%zmm12,%zmm1
+ vpaddq %zmm27,%zmm13,%zmm2
+
+ vpsrlq $52,%zmm7,%zmm9
+ vpsllq $12,%zmm6,%zmm10
+
+ vpsrlq $26,%zmm14,%zmm3
+ vpandq %zmm5,%zmm14,%zmm14
+ vpaddq %zmm3,%zmm15,%zmm4
+
+ vporq %zmm10,%zmm9,%zmm9
+
+ vpsrlq $26,%zmm0,%zmm11
+ vpandq %zmm5,%zmm0,%zmm0
+ vpaddq %zmm11,%zmm1,%zmm1
+
+ vpandq %zmm5,%zmm9,%zmm9
+
+ vpsrlq $26,%zmm4,%zmm15
+ vpandq %zmm5,%zmm4,%zmm4
+
+ vpsrlq $26,%zmm1,%zmm12
+ vpandq %zmm5,%zmm1,%zmm1
+ vpaddq %zmm12,%zmm2,%zmm2
+
+ vpaddq %zmm15,%zmm0,%zmm0
+ vpsllq $2,%zmm15,%zmm15
+ vpaddq %zmm15,%zmm0,%zmm0
+
+ vpaddq %zmm9,%zmm2,%zmm2
+ vpsrlq $26,%zmm7,%zmm8
+
+ vpsrlq $26,%zmm2,%zmm13
+ vpandq %zmm5,%zmm2,%zmm2
+ vpaddq %zmm13,%zmm14,%zmm3
+
+ vpsrlq $14,%zmm6,%zmm10
+
+ vpsrlq $26,%zmm0,%zmm11
+ vpandq %zmm5,%zmm0,%zmm0
+ vpaddq %zmm11,%zmm1,%zmm1
+
+ vpsrlq $40,%zmm6,%zmm6
+
+ vpsrlq $26,%zmm3,%zmm14
+ vpandq %zmm5,%zmm3,%zmm3
+ vpaddq %zmm14,%zmm4,%zmm4
+
+ vpandq %zmm5,%zmm7,%zmm7
+
+ subq $128,%rdx
+ ja .Loop_avx512
+
+.Ltail_avx512:
+
+ vpsrlq $32,%zmm16,%zmm16
+ vpsrlq $32,%zmm17,%zmm17
+ vpsrlq $32,%zmm18,%zmm18
+ vpsrlq $32,%zmm23,%zmm23
+ vpsrlq $32,%zmm24,%zmm24
+ vpsrlq $32,%zmm19,%zmm19
+ vpsrlq $32,%zmm20,%zmm20
+ vpsrlq $32,%zmm21,%zmm21
+ vpsrlq $32,%zmm22,%zmm22
+
+ leaq (%rsi,%rdx,1),%rsi
+
+ vpaddq %zmm0,%zmm7,%zmm0
+
+ vpmuludq %zmm2,%zmm17,%zmm14
+ vpmuludq %zmm2,%zmm18,%zmm15
+ vpmuludq %zmm2,%zmm23,%zmm11
+ vpandq %zmm5,%zmm8,%zmm8
+ vpmuludq %zmm2,%zmm24,%zmm12
+ vpandq %zmm5,%zmm10,%zmm10
+ vpmuludq %zmm2,%zmm16,%zmm13
+ vporq %zmm30,%zmm6,%zmm6
+ vpaddq %zmm1,%zmm8,%zmm1
+ vpaddq %zmm3,%zmm10,%zmm3
+ vpaddq %zmm4,%zmm6,%zmm4
+
+ vmovdqu 0(%rsi),%xmm7
+ vpmuludq %zmm0,%zmm19,%zmm28
+ vpmuludq %zmm0,%zmm20,%zmm29
+ vpmuludq %zmm0,%zmm16,%zmm25
+ vpmuludq %zmm0,%zmm17,%zmm26
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+
+ vmovdqu 16(%rsi),%xmm8
+ vpmuludq %zmm1,%zmm18,%zmm28
+ vpmuludq %zmm1,%zmm19,%zmm29
+ vpmuludq %zmm1,%zmm24,%zmm25
+ vpmuludq %zmm0,%zmm18,%zmm27
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vinserti128 $1,32(%rsi),%ymm7,%ymm7
+ vpmuludq %zmm3,%zmm16,%zmm28
+ vpmuludq %zmm3,%zmm17,%zmm29
+ vpmuludq %zmm1,%zmm16,%zmm26
+ vpmuludq %zmm1,%zmm17,%zmm27
+ vpaddq %zmm28,%zmm14,%zmm14
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vinserti128 $1,48(%rsi),%ymm8,%ymm8
+ vpmuludq %zmm4,%zmm24,%zmm28
+ vpmuludq %zmm4,%zmm16,%zmm29
+ vpmuludq %zmm3,%zmm22,%zmm25
+ vpmuludq %zmm3,%zmm23,%zmm26
+ vpmuludq %zmm3,%zmm24,%zmm27
+ vpaddq %zmm28,%zmm14,%zmm3
+ vpaddq %zmm29,%zmm15,%zmm15
+ vpaddq %zmm25,%zmm11,%zmm11
+ vpaddq %zmm26,%zmm12,%zmm12
+ vpaddq %zmm27,%zmm13,%zmm13
+
+ vpmuludq %zmm4,%zmm21,%zmm25
+ vpmuludq %zmm4,%zmm22,%zmm26
+ vpmuludq %zmm4,%zmm23,%zmm27
+ vpaddq %zmm25,%zmm11,%zmm0
+ vpaddq %zmm26,%zmm12,%zmm1
+ vpaddq %zmm27,%zmm13,%zmm2
+
+ movl $1,%eax
+ vpermq $0xb1,%zmm3,%zmm14
+ vpermq $0xb1,%zmm15,%zmm4
+ vpermq $0xb1,%zmm0,%zmm11
+ vpermq $0xb1,%zmm1,%zmm12
+ vpermq $0xb1,%zmm2,%zmm13
+ vpaddq %zmm14,%zmm3,%zmm3
+ vpaddq %zmm15,%zmm4,%zmm4
+ vpaddq %zmm11,%zmm0,%zmm0
+ vpaddq %zmm12,%zmm1,%zmm1
+ vpaddq %zmm13,%zmm2,%zmm2
+
+ kmovw %eax,%k3
+ vpermq $0x2,%zmm3,%zmm14
+ vpermq $0x2,%zmm4,%zmm15
+ vpermq $0x2,%zmm0,%zmm11
+ vpermq $0x2,%zmm1,%zmm12
+ vpermq $0x2,%zmm2,%zmm13
+ vpaddq %zmm14,%zmm3,%zmm3
+ vpaddq %zmm15,%zmm4,%zmm4
+ vpaddq %zmm11,%zmm0,%zmm0
+ vpaddq %zmm12,%zmm1,%zmm1
+ vpaddq %zmm13,%zmm2,%zmm2
+
+ vextracti64x4 $0x1,%zmm3,%ymm14
+ vextracti64x4 $0x1,%zmm4,%ymm15
+ vextracti64x4 $0x1,%zmm0,%ymm11
+ vextracti64x4 $0x1,%zmm1,%ymm12
+ vextracti64x4 $0x1,%zmm2,%ymm13
+ vpaddq %zmm14,%zmm3,%zmm3{%k3}{z}
+ vpaddq %zmm15,%zmm4,%zmm4{%k3}{z}
+ vpaddq %zmm11,%zmm0,%zmm0{%k3}{z}
+ vpaddq %zmm12,%zmm1,%zmm1{%k3}{z}
+ vpaddq %zmm13,%zmm2,%zmm2{%k3}{z}
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpsrldq $6,%ymm7,%ymm9
+ vpsrldq $6,%ymm8,%ymm10
+ vpunpckhqdq %ymm8,%ymm7,%ymm6
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpunpcklqdq %ymm10,%ymm9,%ymm9
+ vpunpcklqdq %ymm8,%ymm7,%ymm7
+ vpaddq %ymm11,%ymm1,%ymm1
+
+ vpsrlq $26,%ymm4,%ymm15
+ vpand %ymm5,%ymm4,%ymm4
+
+ vpsrlq $26,%ymm1,%ymm12
+ vpand %ymm5,%ymm1,%ymm1
+ vpsrlq $30,%ymm9,%ymm10
+ vpsrlq $4,%ymm9,%ymm9
+ vpaddq %ymm12,%ymm2,%ymm2
+
+ vpaddq %ymm15,%ymm0,%ymm0
+ vpsllq $2,%ymm15,%ymm15
+ vpsrlq $26,%ymm7,%ymm8
+ vpsrlq $40,%ymm6,%ymm6
+ vpaddq %ymm15,%ymm0,%ymm0
+
+ vpsrlq $26,%ymm2,%ymm13
+ vpand %ymm5,%ymm2,%ymm2
+ vpand %ymm5,%ymm9,%ymm9
+ vpand %ymm5,%ymm7,%ymm7
+ vpaddq %ymm13,%ymm3,%ymm3
+
+ vpsrlq $26,%ymm0,%ymm11
+ vpand %ymm5,%ymm0,%ymm0
+ vpaddq %ymm2,%ymm9,%ymm2
+ vpand %ymm5,%ymm8,%ymm8
+ vpaddq %ymm11,%ymm1,%ymm1
+
+ vpsrlq $26,%ymm3,%ymm14
+ vpand %ymm5,%ymm3,%ymm3
+ vpand %ymm5,%ymm10,%ymm10
+ vpor 32(%rcx),%ymm6,%ymm6
+ vpaddq %ymm14,%ymm4,%ymm4
+
+ leaq 144(%rsp),%rax
+ addq $64,%rdx
+ jnz .Ltail_avx2_512
+
+ vpsubq %ymm9,%ymm2,%ymm2
+ vmovd %xmm0,-112(%rdi)
+ vmovd %xmm1,-108(%rdi)
+ vmovd %xmm2,-104(%rdi)
+ vmovd %xmm3,-100(%rdi)
+ vmovd %xmm4,-96(%rdi)
+ vzeroall
+ leaq -8(%r10),%rsp
+
+ ret
+
+ENDPROC(poly1305_blocks_avx512)
+#endif /* CONFIG_AS_AVX512 */
diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c
index 538abc359d1d..2ae1b3cb66cd 100644
--- a/lib/zinc/poly1305/poly1305.c
+++ b/lib/zinc/poly1305/poly1305.c
@@ -15,7 +15,9 @@
#include <linux/module.h>
#include <linux/init.h>
-#ifndef HAVE_POLY1305_ARCH_IMPLEMENTATION
+#if defined(CONFIG_ZINC_ARCH_X86_64)
+#include "poly1305-x86_64-glue.h"
+#else
static inline bool poly1305_init_arch(void *ctx,
const u8 key[POLY1305_KEY_SIZE])
{
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 11/23] zinc: import Andy Polyakov's Poly1305 ARM and ARM64 implementations
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
` (22 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Andy Polyakov, Russell King,
linux-arm-kernel
These NEON and non-NEON implementations come from Andy Polyakov's
implementation, and are included here in raw form without modification,
so that subsequent commits that fix these up for the kernel can see how
it has changed. This awkward commit splitting has been requested for the
ARM[64] implementations in particular.
While this is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit 5bb1cd2292b388263a0cc05392bb99141212aa53
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
lib/zinc/poly1305/poly1305-arm-cryptogams.S | 1172 +++++++++++++++++
lib/zinc/poly1305/poly1305-arm64-cryptogams.S | 869 ++++++++++++
2 files changed, 2041 insertions(+)
create mode 100644 lib/zinc/poly1305/poly1305-arm-cryptogams.S
create mode 100644 lib/zinc/poly1305/poly1305-arm64-cryptogams.S
diff --git a/lib/zinc/poly1305/poly1305-arm-cryptogams.S b/lib/zinc/poly1305/poly1305-arm-cryptogams.S
new file mode 100644
index 000000000000..884b465030e4
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm-cryptogams.S
@@ -0,0 +1,1172 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+#if defined(__thumb2__)
+.syntax unified
+.thumb
+#else
+.code 32
+#endif
+
+.globl poly1305_emit
+.globl poly1305_blocks
+.globl poly1305_init
+.type poly1305_init,%function
+.align 5
+poly1305_init:
+.Lpoly1305_init:
+ stmdb sp!,{r4-r11}
+
+ eor r3,r3,r3
+ cmp r1,#0
+ str r3,[r0,#0] @ zero hash value
+ str r3,[r0,#4]
+ str r3,[r0,#8]
+ str r3,[r0,#12]
+ str r3,[r0,#16]
+ str r3,[r0,#36] @ is_base2_26
+ add r0,r0,#20
+
+#ifdef __thumb2__
+ it eq
+#endif
+ moveq r0,#0
+ beq .Lno_key
+
+#if __ARM_MAX_ARCH__>=7
+ adr r11,.Lpoly1305_init
+ ldr r12,.LOPENSSL_armcap
+#endif
+ ldrb r4,[r1,#0]
+ mov r10,#0x0fffffff
+ ldrb r5,[r1,#1]
+ and r3,r10,#-4 @ 0x0ffffffc
+ ldrb r6,[r1,#2]
+ ldrb r7,[r1,#3]
+ orr r4,r4,r5,lsl#8
+ ldrb r5,[r1,#4]
+ orr r4,r4,r6,lsl#16
+ ldrb r6,[r1,#5]
+ orr r4,r4,r7,lsl#24
+ ldrb r7,[r1,#6]
+ and r4,r4,r10
+
+#if __ARM_MAX_ARCH__>=7
+ ldr r12,[r11,r12] @ OPENSSL_armcap_P
+# ifdef __APPLE__
+ ldr r12,[r12]
+# endif
+#endif
+ ldrb r8,[r1,#7]
+ orr r5,r5,r6,lsl#8
+ ldrb r6,[r1,#8]
+ orr r5,r5,r7,lsl#16
+ ldrb r7,[r1,#9]
+ orr r5,r5,r8,lsl#24
+ ldrb r8,[r1,#10]
+ and r5,r5,r3
+
+#if __ARM_MAX_ARCH__>=7
+ tst r12,#ARMV7_NEON @ check for NEON
+# ifdef __APPLE__
+ adr r9,poly1305_blocks_neon
+ adr r11,poly1305_blocks
+# ifdef __thumb2__
+ it ne
+# endif
+ movne r11,r9
+ adr r12,poly1305_emit
+ adr r10,poly1305_emit_neon
+# ifdef __thumb2__
+ it ne
+# endif
+ movne r12,r10
+# else
+# ifdef __thumb2__
+ itete eq
+# endif
+ addeq r12,r11,#(poly1305_emit-.Lpoly1305_init)
+ addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
+ addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init)
+ addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
+# endif
+# ifdef __thumb2__
+ orr r12,r12,#1 @ thumb-ify address
+ orr r11,r11,#1
+# endif
+#endif
+ ldrb r9,[r1,#11]
+ orr r6,r6,r7,lsl#8
+ ldrb r7,[r1,#12]
+ orr r6,r6,r8,lsl#16
+ ldrb r8,[r1,#13]
+ orr r6,r6,r9,lsl#24
+ ldrb r9,[r1,#14]
+ and r6,r6,r3
+
+ ldrb r10,[r1,#15]
+ orr r7,r7,r8,lsl#8
+ str r4,[r0,#0]
+ orr r7,r7,r9,lsl#16
+ str r5,[r0,#4]
+ orr r7,r7,r10,lsl#24
+ str r6,[r0,#8]
+ and r7,r7,r3
+ str r7,[r0,#12]
+#if __ARM_MAX_ARCH__>=7
+ stmia r2,{r11,r12} @ fill functions table
+ mov r0,#1
+#else
+ mov r0,#0
+#endif
+.Lno_key:
+ ldmia sp!,{r4-r11}
+#if __ARM_ARCH__>=5
+ bx lr @ bx lr
+#else
+ tst lr,#1
+ moveq pc,lr @ be binary compatible with V4, yet
+ .word 0xe12fff1e @ interoperable with Thumb ISA:-)
+#endif
+.size poly1305_init,.-poly1305_init
+.type poly1305_blocks,%function
+.align 5
+poly1305_blocks:
+.Lpoly1305_blocks:
+ stmdb sp!,{r3-r11,lr}
+
+ ands r2,r2,#-16
+ beq .Lno_data
+
+ cmp r3,#0
+ add r2,r2,r1 @ end pointer
+ sub sp,sp,#32
+
+ ldmia r0,{r4-r12} @ load context
+
+ str r0,[sp,#12] @ offload stuff
+ mov lr,r1
+ str r2,[sp,#16]
+ str r10,[sp,#20]
+ str r11,[sp,#24]
+ str r12,[sp,#28]
+ b .Loop
+
+.Loop:
+#if __ARM_ARCH__<7
+ ldrb r0,[lr],#16 @ load input
+# ifdef __thumb2__
+ it hi
+# endif
+ addhi r8,r8,#1 @ 1<<128
+ ldrb r1,[lr,#-15]
+ ldrb r2,[lr,#-14]
+ ldrb r3,[lr,#-13]
+ orr r1,r0,r1,lsl#8
+ ldrb r0,[lr,#-12]
+ orr r2,r1,r2,lsl#16
+ ldrb r1,[lr,#-11]
+ orr r3,r2,r3,lsl#24
+ ldrb r2,[lr,#-10]
+ adds r4,r4,r3 @ accumulate input
+
+ ldrb r3,[lr,#-9]
+ orr r1,r0,r1,lsl#8
+ ldrb r0,[lr,#-8]
+ orr r2,r1,r2,lsl#16
+ ldrb r1,[lr,#-7]
+ orr r3,r2,r3,lsl#24
+ ldrb r2,[lr,#-6]
+ adcs r5,r5,r3
+
+ ldrb r3,[lr,#-5]
+ orr r1,r0,r1,lsl#8
+ ldrb r0,[lr,#-4]
+ orr r2,r1,r2,lsl#16
+ ldrb r1,[lr,#-3]
+ orr r3,r2,r3,lsl#24
+ ldrb r2,[lr,#-2]
+ adcs r6,r6,r3
+
+ ldrb r3,[lr,#-1]
+ orr r1,r0,r1,lsl#8
+ str lr,[sp,#8] @ offload input pointer
+ orr r2,r1,r2,lsl#16
+ add r10,r10,r10,lsr#2
+ orr r3,r2,r3,lsl#24
+#else
+ ldr r0,[lr],#16 @ load input
+# ifdef __thumb2__
+ it hi
+# endif
+ addhi r8,r8,#1 @ padbit
+ ldr r1,[lr,#-12]
+ ldr r2,[lr,#-8]
+ ldr r3,[lr,#-4]
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+ adds r4,r4,r0 @ accumulate input
+ str lr,[sp,#8] @ offload input pointer
+ adcs r5,r5,r1
+ add r10,r10,r10,lsr#2
+ adcs r6,r6,r2
+#endif
+ add r11,r11,r11,lsr#2
+ adcs r7,r7,r3
+ add r12,r12,r12,lsr#2
+
+ umull r2,r3,r5,r9
+ adc r8,r8,#0
+ umull r0,r1,r4,r9
+ umlal r2,r3,r8,r10
+ umlal r0,r1,r7,r10
+ ldr r10,[sp,#20] @ reload r10
+ umlal r2,r3,r6,r12
+ umlal r0,r1,r5,r12
+ umlal r2,r3,r7,r11
+ umlal r0,r1,r6,r11
+ umlal r2,r3,r4,r10
+ str r0,[sp,#0] @ future r4
+ mul r0,r11,r8
+ ldr r11,[sp,#24] @ reload r11
+ adds r2,r2,r1 @ d1+=d0>>32
+ eor r1,r1,r1
+ adc lr,r3,#0 @ future r6
+ str r2,[sp,#4] @ future r5
+
+ mul r2,r12,r8
+ eor r3,r3,r3
+ umlal r0,r1,r7,r12
+ ldr r12,[sp,#28] @ reload r12
+ umlal r2,r3,r7,r9
+ umlal r0,r1,r6,r9
+ umlal r2,r3,r6,r10
+ umlal r0,r1,r5,r10
+ umlal r2,r3,r5,r11
+ umlal r0,r1,r4,r11
+ umlal r2,r3,r4,r12
+ ldr r4,[sp,#0]
+ mul r8,r9,r8
+ ldr r5,[sp,#4]
+
+ adds r6,lr,r0 @ d2+=d1>>32
+ ldr lr,[sp,#8] @ reload input pointer
+ adc r1,r1,#0
+ adds r7,r2,r1 @ d3+=d2>>32
+ ldr r0,[sp,#16] @ reload end pointer
+ adc r3,r3,#0
+ add r8,r8,r3 @ h4+=d3>>32
+
+ and r1,r8,#-4
+ and r8,r8,#3
+ add r1,r1,r1,lsr#2 @ *=5
+ adds r4,r4,r1
+ adcs r5,r5,#0
+ adcs r6,r6,#0
+ adcs r7,r7,#0
+ adc r8,r8,#0
+
+ cmp r0,lr @ done yet?
+ bhi .Loop
+
+ ldr r0,[sp,#12]
+ add sp,sp,#32
+ stmia r0,{r4-r8} @ store the result
+
+.Lno_data:
+#if __ARM_ARCH__>=5
+ ldmia sp!,{r3-r11,pc}
+#else
+ ldmia sp!,{r3-r11,lr}
+ tst lr,#1
+ moveq pc,lr @ be binary compatible with V4, yet
+ .word 0xe12fff1e @ interoperable with Thumb ISA:-)
+#endif
+.size poly1305_blocks,.-poly1305_blocks
+.type poly1305_emit,%function
+.align 5
+poly1305_emit:
+ stmdb sp!,{r4-r11}
+.Lpoly1305_emit_enter:
+
+ ldmia r0,{r3-r7}
+ adds r8,r3,#5 @ compare to modulus
+ adcs r9,r4,#0
+ adcs r10,r5,#0
+ adcs r11,r6,#0
+ adc r7,r7,#0
+ tst r7,#4 @ did it carry/borrow?
+
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r3,r8
+ ldr r8,[r2,#0]
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r4,r9
+ ldr r9,[r2,#4]
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r5,r10
+ ldr r10,[r2,#8]
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r6,r11
+ ldr r11,[r2,#12]
+
+ adds r3,r3,r8
+ adcs r4,r4,r9
+ adcs r5,r5,r10
+ adc r6,r6,r11
+
+#if __ARM_ARCH__>=7
+# ifdef __ARMEB__
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+# endif
+ str r3,[r1,#0]
+ str r4,[r1,#4]
+ str r5,[r1,#8]
+ str r6,[r1,#12]
+#else
+ strb r3,[r1,#0]
+ mov r3,r3,lsr#8
+ strb r4,[r1,#4]
+ mov r4,r4,lsr#8
+ strb r5,[r1,#8]
+ mov r5,r5,lsr#8
+ strb r6,[r1,#12]
+ mov r6,r6,lsr#8
+
+ strb r3,[r1,#1]
+ mov r3,r3,lsr#8
+ strb r4,[r1,#5]
+ mov r4,r4,lsr#8
+ strb r5,[r1,#9]
+ mov r5,r5,lsr#8
+ strb r6,[r1,#13]
+ mov r6,r6,lsr#8
+
+ strb r3,[r1,#2]
+ mov r3,r3,lsr#8
+ strb r4,[r1,#6]
+ mov r4,r4,lsr#8
+ strb r5,[r1,#10]
+ mov r5,r5,lsr#8
+ strb r6,[r1,#14]
+ mov r6,r6,lsr#8
+
+ strb r3,[r1,#3]
+ strb r4,[r1,#7]
+ strb r5,[r1,#11]
+ strb r6,[r1,#15]
+#endif
+ ldmia sp!,{r4-r11}
+#if __ARM_ARCH__>=5
+ bx lr @ bx lr
+#else
+ tst lr,#1
+ moveq pc,lr @ be binary compatible with V4, yet
+ .word 0xe12fff1e @ interoperable with Thumb ISA:-)
+#endif
+.size poly1305_emit,.-poly1305_emit
+#if __ARM_MAX_ARCH__>=7
+.fpu neon
+
+.type poly1305_init_neon,%function
+.align 5
+poly1305_init_neon:
+ ldr r4,[r0,#20] @ load key base 2^32
+ ldr r5,[r0,#24]
+ ldr r6,[r0,#28]
+ ldr r7,[r0,#32]
+
+ and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26
+ mov r3,r4,lsr#26
+ mov r4,r5,lsr#20
+ orr r3,r3,r5,lsl#6
+ mov r5,r6,lsr#14
+ orr r4,r4,r6,lsl#12
+ mov r6,r7,lsr#8
+ orr r5,r5,r7,lsl#18
+ and r3,r3,#0x03ffffff
+ and r4,r4,#0x03ffffff
+ and r5,r5,#0x03ffffff
+
+ vdup.32 d0,r2 @ r^1 in both lanes
+ add r2,r3,r3,lsl#2 @ *5
+ vdup.32 d1,r3
+ add r3,r4,r4,lsl#2
+ vdup.32 d2,r2
+ vdup.32 d3,r4
+ add r4,r5,r5,lsl#2
+ vdup.32 d4,r3
+ vdup.32 d5,r5
+ add r5,r6,r6,lsl#2
+ vdup.32 d6,r4
+ vdup.32 d7,r6
+ vdup.32 d8,r5
+
+ mov r5,#2 @ counter
+
+.Lsquare_neon:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
+ @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4
+ @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4
+ @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4
+ @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4
+
+ vmull.u32 q5,d0,d0[1]
+ vmull.u32 q6,d1,d0[1]
+ vmull.u32 q7,d3,d0[1]
+ vmull.u32 q8,d5,d0[1]
+ vmull.u32 q9,d7,d0[1]
+
+ vmlal.u32 q5,d7,d2[1]
+ vmlal.u32 q6,d0,d1[1]
+ vmlal.u32 q7,d1,d1[1]
+ vmlal.u32 q8,d3,d1[1]
+ vmlal.u32 q9,d5,d1[1]
+
+ vmlal.u32 q5,d5,d4[1]
+ vmlal.u32 q6,d7,d4[1]
+ vmlal.u32 q8,d1,d3[1]
+ vmlal.u32 q7,d0,d3[1]
+ vmlal.u32 q9,d3,d3[1]
+
+ vmlal.u32 q5,d3,d6[1]
+ vmlal.u32 q8,d0,d5[1]
+ vmlal.u32 q6,d5,d6[1]
+ vmlal.u32 q7,d7,d6[1]
+ vmlal.u32 q9,d1,d5[1]
+
+ vmlal.u32 q8,d7,d8[1]
+ vmlal.u32 q5,d1,d8[1]
+ vmlal.u32 q6,d3,d8[1]
+ vmlal.u32 q7,d5,d8[1]
+ vmlal.u32 q9,d0,d7[1]
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
+ @ and P. Schwabe
+ @
+ @ H0>>+H1>>+H2>>+H3>>+H4
+ @ H3>>+H4>>*5+H0>>+H1
+ @
+ @ Trivia.
+ @
+ @ Result of multiplication of n-bit number by m-bit number is
+ @ n+m bits wide. However! Even though 2^n is a n+1-bit number,
+ @ m-bit number multiplied by 2^n is still n+m bits wide.
+ @
+ @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2,
+ @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit
+ @ one is n+1 bits wide.
+ @
+ @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that
+ @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4
+ @ can be 27. However! In cases when their width exceeds 26 bits
+ @ they are limited by 2^26+2^6. This in turn means that *sum*
+ @ of the products with these values can still be viewed as sum
+ @ of 52-bit numbers as long as the amount of addends is not a
+ @ power of 2. For example,
+ @
+ @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4,
+ @
+ @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or
+ @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than
+ @ 8 * (2^52) or 2^55. However, the value is then multiplied by
+ @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12),
+ @ which is less than 32 * (2^52) or 2^57. And when processing
+ @ data we are looking at triple as many addends...
+ @
+ @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and
+ @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the
+ @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while
+ @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32
+ @ instruction accepts 2x32-bit input and writes 2x64-bit result.
+ @ This means that result of reduction have to be compressed upon
+ @ loop wrap-around. This can be done in the process of reduction
+ @ to minimize amount of instructions [as well as amount of
+ @ 128-bit instructions, which benefits low-end processors], but
+ @ one has to watch for H2 (which is narrower than H0) and 5*H4
+ @ not being wider than 58 bits, so that result of right shift
+ @ by 26 bits fits in 32 bits. This is also useful on x86,
+ @ because it allows to use paddd in place for paddq, which
+ @ benefits Atom, where paddq is ridiculously slow.
+
+ vshr.u64 q15,q8,#26
+ vmovn.i64 d16,q8
+ vshr.u64 q4,q5,#26
+ vmovn.i64 d10,q5
+ vadd.i64 q9,q9,q15 @ h3 -> h4
+ vbic.i32 d16,#0xfc000000 @ &=0x03ffffff
+ vadd.i64 q6,q6,q4 @ h0 -> h1
+ vbic.i32 d10,#0xfc000000
+
+ vshrn.u64 d30,q9,#26
+ vmovn.i64 d18,q9
+ vshr.u64 q4,q6,#26
+ vmovn.i64 d12,q6
+ vadd.i64 q7,q7,q4 @ h1 -> h2
+ vbic.i32 d18,#0xfc000000
+ vbic.i32 d12,#0xfc000000
+
+ vadd.i32 d10,d10,d30
+ vshl.u32 d30,d30,#2
+ vshrn.u64 d8,q7,#26
+ vmovn.i64 d14,q7
+ vadd.i32 d10,d10,d30 @ h4 -> h0
+ vadd.i32 d16,d16,d8 @ h2 -> h3
+ vbic.i32 d14,#0xfc000000
+
+ vshr.u32 d30,d10,#26
+ vbic.i32 d10,#0xfc000000
+ vshr.u32 d8,d16,#26
+ vbic.i32 d16,#0xfc000000
+ vadd.i32 d12,d12,d30 @ h0 -> h1
+ vadd.i32 d18,d18,d8 @ h3 -> h4
+
+ subs r5,r5,#1
+ beq .Lsquare_break_neon
+
+ add r6,r0,#(48+0*9*4)
+ add r7,r0,#(48+1*9*4)
+
+ vtrn.32 d0,d10 @ r^2:r^1
+ vtrn.32 d3,d14
+ vtrn.32 d5,d16
+ vtrn.32 d1,d12
+ vtrn.32 d7,d18
+
+ vshl.u32 d4,d3,#2 @ *5
+ vshl.u32 d6,d5,#2
+ vshl.u32 d2,d1,#2
+ vshl.u32 d8,d7,#2
+ vadd.i32 d4,d4,d3
+ vadd.i32 d2,d2,d1
+ vadd.i32 d6,d6,d5
+ vadd.i32 d8,d8,d7
+
+ vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]!
+ vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]!
+ vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vst1.32 {d8[0]},[r6,:32]
+ vst1.32 {d8[1]},[r7,:32]
+
+ b .Lsquare_neon
+
+.align 4
+.Lsquare_break_neon:
+ add r6,r0,#(48+2*4*9)
+ add r7,r0,#(48+3*4*9)
+
+ vmov d0,d10 @ r^4:r^3
+ vshl.u32 d2,d12,#2 @ *5
+ vmov d1,d12
+ vshl.u32 d4,d14,#2
+ vmov d3,d14
+ vshl.u32 d6,d16,#2
+ vmov d5,d16
+ vshl.u32 d8,d18,#2
+ vmov d7,d18
+ vadd.i32 d2,d2,d12
+ vadd.i32 d4,d4,d14
+ vadd.i32 d6,d6,d16
+ vadd.i32 d8,d8,d18
+
+ vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]!
+ vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]!
+ vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vst1.32 {d8[0]},[r6]
+ vst1.32 {d8[1]},[r7]
+
+ bx lr @ bx lr
+.size poly1305_init_neon,.-poly1305_init_neon
+
+.type poly1305_blocks_neon,%function
+.align 5
+poly1305_blocks_neon:
+ ldr ip,[r0,#36] @ is_base2_26
+ ands r2,r2,#-16
+ beq .Lno_data_neon
+
+ cmp r2,#64
+ bhs .Lenter_neon
+ tst ip,ip @ is_base2_26?
+ beq .Lpoly1305_blocks
+
+.Lenter_neon:
+ stmdb sp!,{r4-r7}
+ vstmdb sp!,{d8-d15} @ ABI specification says so
+
+ tst ip,ip @ is_base2_26?
+ bne .Lbase2_26_neon
+
+ stmdb sp!,{r1-r3,lr}
+ bl poly1305_init_neon
+
+ ldr r4,[r0,#0] @ load hash value base 2^32
+ ldr r5,[r0,#4]
+ ldr r6,[r0,#8]
+ ldr r7,[r0,#12]
+ ldr ip,[r0,#16]
+
+ and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26
+ mov r3,r4,lsr#26
+ veor d10,d10,d10
+ mov r4,r5,lsr#20
+ orr r3,r3,r5,lsl#6
+ veor d12,d12,d12
+ mov r5,r6,lsr#14
+ orr r4,r4,r6,lsl#12
+ veor d14,d14,d14
+ mov r6,r7,lsr#8
+ orr r5,r5,r7,lsl#18
+ veor d16,d16,d16
+ and r3,r3,#0x03ffffff
+ orr r6,r6,ip,lsl#24
+ veor d18,d18,d18
+ and r4,r4,#0x03ffffff
+ mov r1,#1
+ and r5,r5,#0x03ffffff
+ str r1,[r0,#36] @ is_base2_26
+
+ vmov.32 d10[0],r2
+ vmov.32 d12[0],r3
+ vmov.32 d14[0],r4
+ vmov.32 d16[0],r5
+ vmov.32 d18[0],r6
+ adr r5,.Lzeros
+
+ ldmia sp!,{r1-r3,lr}
+ b .Lbase2_32_neon
+
+.align 4
+.Lbase2_26_neon:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ load hash value
+
+ veor d10,d10,d10
+ veor d12,d12,d12
+ veor d14,d14,d14
+ veor d16,d16,d16
+ veor d18,d18,d18
+ vld4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]!
+ adr r5,.Lzeros
+ vld1.32 {d18[0]},[r0]
+ sub r0,r0,#16 @ rewind
+
+.Lbase2_32_neon:
+ add r4,r1,#32
+ mov r3,r3,lsl#24
+ tst r2,#31
+ beq .Leven
+
+ vld4.32 {d20[0],d22[0],d24[0],d26[0]},[r1]!
+ vmov.32 d28[0],r3
+ sub r2,r2,#16
+ add r4,r1,#32
+
+# ifdef __ARMEB__
+ vrev32.8 q10,q10
+ vrev32.8 q13,q13
+ vrev32.8 q11,q11
+ vrev32.8 q12,q12
+# endif
+ vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26
+ vshl.u32 d26,d26,#18
+
+ vsri.u32 d26,d24,#14
+ vshl.u32 d24,d24,#12
+ vadd.i32 d29,d28,d18 @ add hash value and move to #hi
+
+ vbic.i32 d26,#0xfc000000
+ vsri.u32 d24,d22,#20
+ vshl.u32 d22,d22,#6
+
+ vbic.i32 d24,#0xfc000000
+ vsri.u32 d22,d20,#26
+ vadd.i32 d27,d26,d16
+
+ vbic.i32 d20,#0xfc000000
+ vbic.i32 d22,#0xfc000000
+ vadd.i32 d25,d24,d14
+
+ vadd.i32 d21,d20,d10
+ vadd.i32 d23,d22,d12
+
+ mov r7,r5
+ add r6,r0,#48
+
+ cmp r2,r2
+ b .Long_tail
+
+.align 4
+.Leven:
+ subs r2,r2,#64
+ it lo
+ movlo r4,r5
+
+ vmov.i32 q14,#1<<24 @ padbit, yes, always
+ vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
+ add r1,r1,#64
+ vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0)
+ add r4,r4,#64
+ itt hi
+ addhi r7,r0,#(48+1*9*4)
+ addhi r6,r0,#(48+3*9*4)
+
+# ifdef __ARMEB__
+ vrev32.8 q10,q10
+ vrev32.8 q13,q13
+ vrev32.8 q11,q11
+ vrev32.8 q12,q12
+# endif
+ vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
+ vshl.u32 q13,q13,#18
+
+ vsri.u32 q13,q12,#14
+ vshl.u32 q12,q12,#12
+
+ vbic.i32 q13,#0xfc000000
+ vsri.u32 q12,q11,#20
+ vshl.u32 q11,q11,#6
+
+ vbic.i32 q12,#0xfc000000
+ vsri.u32 q11,q10,#26
+
+ vbic.i32 q10,#0xfc000000
+ vbic.i32 q11,#0xfc000000
+
+ bls .Lskip_loop
+
+ vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2
+ vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
+ vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ b .Loop_neon
+
+.align 5
+.Loop_neon:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
+ @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
+ @ ___________________/
+ @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
+ @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
+ @ ___________________/ ____________________/
+ @
+ @ Note that we start with inp[2:3]*r^2. This is because it
+ @ doesn't depend on reduction in previous iteration.
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4
+ @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4
+ @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4
+ @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4
+ @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ inp[2:3]*r^2
+
+ vadd.i32 d24,d24,d14 @ accumulate inp[0:1]
+ vmull.u32 q7,d25,d0[1]
+ vadd.i32 d20,d20,d10
+ vmull.u32 q5,d21,d0[1]
+ vadd.i32 d26,d26,d16
+ vmull.u32 q8,d27,d0[1]
+ vmlal.u32 q7,d23,d1[1]
+ vadd.i32 d22,d22,d12
+ vmull.u32 q6,d23,d0[1]
+
+ vadd.i32 d28,d28,d18
+ vmull.u32 q9,d29,d0[1]
+ subs r2,r2,#64
+ vmlal.u32 q5,d29,d2[1]
+ it lo
+ movlo r4,r5
+ vmlal.u32 q8,d25,d1[1]
+ vld1.32 d8[1],[r7,:32]
+ vmlal.u32 q6,d21,d1[1]
+ vmlal.u32 q9,d27,d1[1]
+
+ vmlal.u32 q5,d27,d4[1]
+ vmlal.u32 q8,d23,d3[1]
+ vmlal.u32 q9,d25,d3[1]
+ vmlal.u32 q6,d29,d4[1]
+ vmlal.u32 q7,d21,d3[1]
+
+ vmlal.u32 q8,d21,d5[1]
+ vmlal.u32 q5,d25,d6[1]
+ vmlal.u32 q9,d23,d5[1]
+ vmlal.u32 q6,d27,d6[1]
+ vmlal.u32 q7,d29,d6[1]
+
+ vmlal.u32 q8,d29,d8[1]
+ vmlal.u32 q5,d23,d8[1]
+ vmlal.u32 q9,d21,d7[1]
+ vmlal.u32 q6,d25,d8[1]
+ vmlal.u32 q7,d27,d8[1]
+
+ vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0)
+ add r4,r4,#64
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ (hash+inp[0:1])*r^4 and accumulate
+
+ vmlal.u32 q8,d26,d0[0]
+ vmlal.u32 q5,d20,d0[0]
+ vmlal.u32 q9,d28,d0[0]
+ vmlal.u32 q6,d22,d0[0]
+ vmlal.u32 q7,d24,d0[0]
+ vld1.32 d8[0],[r6,:32]
+
+ vmlal.u32 q8,d24,d1[0]
+ vmlal.u32 q5,d28,d2[0]
+ vmlal.u32 q9,d26,d1[0]
+ vmlal.u32 q6,d20,d1[0]
+ vmlal.u32 q7,d22,d1[0]
+
+ vmlal.u32 q8,d22,d3[0]
+ vmlal.u32 q5,d26,d4[0]
+ vmlal.u32 q9,d24,d3[0]
+ vmlal.u32 q6,d28,d4[0]
+ vmlal.u32 q7,d20,d3[0]
+
+ vmlal.u32 q8,d20,d5[0]
+ vmlal.u32 q5,d24,d6[0]
+ vmlal.u32 q9,d22,d5[0]
+ vmlal.u32 q6,d26,d6[0]
+ vmlal.u32 q8,d28,d8[0]
+
+ vmlal.u32 q7,d28,d6[0]
+ vmlal.u32 q5,d22,d8[0]
+ vmlal.u32 q9,d20,d7[0]
+ vmov.i32 q14,#1<<24 @ padbit, yes, always
+ vmlal.u32 q6,d24,d8[0]
+ vmlal.u32 q7,d26,d8[0]
+
+ vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
+ add r1,r1,#64
+# ifdef __ARMEB__
+ vrev32.8 q10,q10
+ vrev32.8 q11,q11
+ vrev32.8 q12,q12
+ vrev32.8 q13,q13
+# endif
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ lazy reduction interleaved with base 2^32 -> base 2^26 of
+ @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14.
+
+ vshr.u64 q15,q8,#26
+ vmovn.i64 d16,q8
+ vshr.u64 q4,q5,#26
+ vmovn.i64 d10,q5
+ vadd.i64 q9,q9,q15 @ h3 -> h4
+ vbic.i32 d16,#0xfc000000
+ vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
+ vadd.i64 q6,q6,q4 @ h0 -> h1
+ vshl.u32 q13,q13,#18
+ vbic.i32 d10,#0xfc000000
+
+ vshrn.u64 d30,q9,#26
+ vmovn.i64 d18,q9
+ vshr.u64 q4,q6,#26
+ vmovn.i64 d12,q6
+ vadd.i64 q7,q7,q4 @ h1 -> h2
+ vsri.u32 q13,q12,#14
+ vbic.i32 d18,#0xfc000000
+ vshl.u32 q12,q12,#12
+ vbic.i32 d12,#0xfc000000
+
+ vadd.i32 d10,d10,d30
+ vshl.u32 d30,d30,#2
+ vbic.i32 q13,#0xfc000000
+ vshrn.u64 d8,q7,#26
+ vmovn.i64 d14,q7
+ vaddl.u32 q5,d10,d30 @ h4 -> h0 [widen for a sec]
+ vsri.u32 q12,q11,#20
+ vadd.i32 d16,d16,d8 @ h2 -> h3
+ vshl.u32 q11,q11,#6
+ vbic.i32 d14,#0xfc000000
+ vbic.i32 q12,#0xfc000000
+
+ vshrn.u64 d30,q5,#26 @ re-narrow
+ vmovn.i64 d10,q5
+ vsri.u32 q11,q10,#26
+ vbic.i32 q10,#0xfc000000
+ vshr.u32 d8,d16,#26
+ vbic.i32 d16,#0xfc000000
+ vbic.i32 d10,#0xfc000000
+ vadd.i32 d12,d12,d30 @ h0 -> h1
+ vadd.i32 d18,d18,d8 @ h3 -> h4
+ vbic.i32 q11,#0xfc000000
+
+ bhi .Loop_neon
+
+.Lskip_loop:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
+
+ add r7,r0,#(48+0*9*4)
+ add r6,r0,#(48+1*9*4)
+ adds r2,r2,#32
+ it ne
+ movne r2,#0
+ bne .Long_tail
+
+ vadd.i32 d25,d24,d14 @ add hash value and move to #hi
+ vadd.i32 d21,d20,d10
+ vadd.i32 d27,d26,d16
+ vadd.i32 d23,d22,d12
+ vadd.i32 d29,d28,d18
+
+.Long_tail:
+ vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1
+ vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2
+
+ vadd.i32 d24,d24,d14 @ can be redundant
+ vmull.u32 q7,d25,d0
+ vadd.i32 d20,d20,d10
+ vmull.u32 q5,d21,d0
+ vadd.i32 d26,d26,d16
+ vmull.u32 q8,d27,d0
+ vadd.i32 d22,d22,d12
+ vmull.u32 q6,d23,d0
+ vadd.i32 d28,d28,d18
+ vmull.u32 q9,d29,d0
+
+ vmlal.u32 q5,d29,d2
+ vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vmlal.u32 q8,d25,d1
+ vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vmlal.u32 q6,d21,d1
+ vmlal.u32 q9,d27,d1
+ vmlal.u32 q7,d23,d1
+
+ vmlal.u32 q8,d23,d3
+ vld1.32 d8[1],[r7,:32]
+ vmlal.u32 q5,d27,d4
+ vld1.32 d8[0],[r6,:32]
+ vmlal.u32 q9,d25,d3
+ vmlal.u32 q6,d29,d4
+ vmlal.u32 q7,d21,d3
+
+ vmlal.u32 q8,d21,d5
+ it ne
+ addne r7,r0,#(48+2*9*4)
+ vmlal.u32 q5,d25,d6
+ it ne
+ addne r6,r0,#(48+3*9*4)
+ vmlal.u32 q9,d23,d5
+ vmlal.u32 q6,d27,d6
+ vmlal.u32 q7,d29,d6
+
+ vmlal.u32 q8,d29,d8
+ vorn q0,q0,q0 @ all-ones, can be redundant
+ vmlal.u32 q5,d23,d8
+ vshr.u64 q0,q0,#38
+ vmlal.u32 q9,d21,d7
+ vmlal.u32 q6,d25,d8
+ vmlal.u32 q7,d27,d8
+
+ beq .Lshort_tail
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ (hash+inp[0:1])*r^4:r^3 and accumulate
+
+ vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3
+ vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
+
+ vmlal.u32 q7,d24,d0
+ vmlal.u32 q5,d20,d0
+ vmlal.u32 q8,d26,d0
+ vmlal.u32 q6,d22,d0
+ vmlal.u32 q9,d28,d0
+
+ vmlal.u32 q5,d28,d2
+ vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vmlal.u32 q8,d24,d1
+ vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vmlal.u32 q6,d20,d1
+ vmlal.u32 q9,d26,d1
+ vmlal.u32 q7,d22,d1
+
+ vmlal.u32 q8,d22,d3
+ vld1.32 d8[1],[r7,:32]
+ vmlal.u32 q5,d26,d4
+ vld1.32 d8[0],[r6,:32]
+ vmlal.u32 q9,d24,d3
+ vmlal.u32 q6,d28,d4
+ vmlal.u32 q7,d20,d3
+
+ vmlal.u32 q8,d20,d5
+ vmlal.u32 q5,d24,d6
+ vmlal.u32 q9,d22,d5
+ vmlal.u32 q6,d26,d6
+ vmlal.u32 q7,d28,d6
+
+ vmlal.u32 q8,d28,d8
+ vorn q0,q0,q0 @ all-ones
+ vmlal.u32 q5,d22,d8
+ vshr.u64 q0,q0,#38
+ vmlal.u32 q9,d20,d7
+ vmlal.u32 q6,d24,d8
+ vmlal.u32 q7,d26,d8
+
+.Lshort_tail:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ horizontal addition
+
+ vadd.i64 d16,d16,d17
+ vadd.i64 d10,d10,d11
+ vadd.i64 d18,d18,d19
+ vadd.i64 d12,d12,d13
+ vadd.i64 d14,d14,d15
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ lazy reduction, but without narrowing
+
+ vshr.u64 q15,q8,#26
+ vand.i64 q8,q8,q0
+ vshr.u64 q4,q5,#26
+ vand.i64 q5,q5,q0
+ vadd.i64 q9,q9,q15 @ h3 -> h4
+ vadd.i64 q6,q6,q4 @ h0 -> h1
+
+ vshr.u64 q15,q9,#26
+ vand.i64 q9,q9,q0
+ vshr.u64 q4,q6,#26
+ vand.i64 q6,q6,q0
+ vadd.i64 q7,q7,q4 @ h1 -> h2
+
+ vadd.i64 q5,q5,q15
+ vshl.u64 q15,q15,#2
+ vshr.u64 q4,q7,#26
+ vand.i64 q7,q7,q0
+ vadd.i64 q5,q5,q15 @ h4 -> h0
+ vadd.i64 q8,q8,q4 @ h2 -> h3
+
+ vshr.u64 q15,q5,#26
+ vand.i64 q5,q5,q0
+ vshr.u64 q4,q8,#26
+ vand.i64 q8,q8,q0
+ vadd.i64 q6,q6,q15 @ h0 -> h1
+ vadd.i64 q9,q9,q4 @ h3 -> h4
+
+ cmp r2,#0
+ bne .Leven
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ store hash value
+
+ vst4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]!
+ vst1.32 {d18[0]},[r0]
+
+ vldmia sp!,{d8-d15} @ epilogue
+ ldmia sp!,{r4-r7}
+.Lno_data_neon:
+ bx lr @ bx lr
+.size poly1305_blocks_neon,.-poly1305_blocks_neon
+
+.type poly1305_emit_neon,%function
+.align 5
+poly1305_emit_neon:
+ ldr ip,[r0,#36] @ is_base2_26
+
+ stmdb sp!,{r4-r11}
+
+ tst ip,ip
+ beq .Lpoly1305_emit_enter
+
+ ldmia r0,{r3-r7}
+ eor r8,r8,r8
+
+ adds r3,r3,r4,lsl#26 @ base 2^26 -> base 2^32
+ mov r4,r4,lsr#6
+ adcs r4,r4,r5,lsl#20
+ mov r5,r5,lsr#12
+ adcs r5,r5,r6,lsl#14
+ mov r6,r6,lsr#18
+ adcs r6,r6,r7,lsl#8
+ adc r7,r8,r7,lsr#24 @ can be partially reduced ...
+
+ and r8,r7,#-4 @ ... so reduce
+ and r7,r6,#3
+ add r8,r8,r8,lsr#2 @ *= 5
+ adds r3,r3,r8
+ adcs r4,r4,#0
+ adcs r5,r5,#0
+ adcs r6,r6,#0
+ adc r7,r7,#0
+
+ adds r8,r3,#5 @ compare to modulus
+ adcs r9,r4,#0
+ adcs r10,r5,#0
+ adcs r11,r6,#0
+ adc r7,r7,#0
+ tst r7,#4 @ did it carry/borrow?
+
+ it ne
+ movne r3,r8
+ ldr r8,[r2,#0]
+ it ne
+ movne r4,r9
+ ldr r9,[r2,#4]
+ it ne
+ movne r5,r10
+ ldr r10,[r2,#8]
+ it ne
+ movne r6,r11
+ ldr r11,[r2,#12]
+
+ adds r3,r3,r8 @ accumulate nonce
+ adcs r4,r4,r9
+ adcs r5,r5,r10
+ adc r6,r6,r11
+
+# ifdef __ARMEB__
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+# endif
+ str r3,[r1,#0] @ store the result
+ str r4,[r1,#4]
+ str r5,[r1,#8]
+ str r6,[r1,#12]
+
+ ldmia sp!,{r4-r11}
+ bx lr @ bx lr
+.size poly1305_emit_neon,.-poly1305_emit_neon
+
+.align 5
+.Lzeros:
+.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+.LOPENSSL_armcap:
+.word OPENSSL_armcap_P-.Lpoly1305_init
+#endif
+.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by <appro@openssl.org>"
+.align 2
+#if __ARM_MAX_ARCH__>=7
+.comm OPENSSL_armcap_P,4,4
+#endif
diff --git a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S b/lib/zinc/poly1305/poly1305-arm64-cryptogams.S
new file mode 100644
index 000000000000..0ecb50a83ec0
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm64-cryptogams.S
@@ -0,0 +1,869 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+
+// forward "declarations" are required for Apple
+
+.globl poly1305_blocks
+.globl poly1305_emit
+
+.globl poly1305_init
+.type poly1305_init,%function
+.align 5
+poly1305_init:
+ cmp x1,xzr
+ stp xzr,xzr,[x0] // zero hash value
+ stp xzr,xzr,[x0,#16] // [along with is_base2_26]
+
+ csel x0,xzr,x0,eq
+ b.eq .Lno_key
+
+#ifdef __ILP32__
+ ldrsw x11,.LOPENSSL_armcap_P
+#else
+ ldr x11,.LOPENSSL_armcap_P
+#endif
+ adr x10,.LOPENSSL_armcap_P
+
+ ldp x7,x8,[x1] // load key
+ mov x9,#0xfffffffc0fffffff
+ movk x9,#0x0fff,lsl#48
+ ldr w17,[x10,x11]
+#ifdef __ARMEB__
+ rev x7,x7 // flip bytes
+ rev x8,x8
+#endif
+ and x7,x7,x9 // &=0ffffffc0fffffff
+ and x9,x9,#-4
+ and x8,x8,x9 // &=0ffffffc0ffffffc
+ stp x7,x8,[x0,#32] // save key value
+
+ tst w17,#ARMV7_NEON
+
+ adr x12,poly1305_blocks
+ adr x7,poly1305_blocks_neon
+ adr x13,poly1305_emit
+ adr x8,poly1305_emit_neon
+
+ csel x12,x12,x7,eq
+ csel x13,x13,x8,eq
+
+#ifdef __ILP32__
+ stp w12,w13,[x2]
+#else
+ stp x12,x13,[x2]
+#endif
+
+ mov x0,#1
+.Lno_key:
+ ret
+.size poly1305_init,.-poly1305_init
+
+.type poly1305_blocks,%function
+.align 5
+poly1305_blocks:
+ ands x2,x2,#-16
+ b.eq .Lno_data
+
+ ldp x4,x5,[x0] // load hash value
+ ldp x7,x8,[x0,#32] // load key value
+ ldr x6,[x0,#16]
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+ b .Loop
+
+.align 5
+.Loop:
+ ldp x10,x11,[x1],#16 // load input
+ sub x2,x2,#16
+#ifdef __ARMEB__
+ rev x10,x10
+ rev x11,x11
+#endif
+ adds x4,x4,x10 // accumulate input
+ adcs x5,x5,x11
+
+ mul x12,x4,x7 // h0*r0
+ adc x6,x6,x3
+ umulh x13,x4,x7
+
+ mul x10,x5,x9 // h1*5*r1
+ umulh x11,x5,x9
+
+ adds x12,x12,x10
+ mul x10,x4,x8 // h0*r1
+ adc x13,x13,x11
+ umulh x14,x4,x8
+
+ adds x13,x13,x10
+ mul x10,x5,x7 // h1*r0
+ adc x14,x14,xzr
+ umulh x11,x5,x7
+
+ adds x13,x13,x10
+ mul x10,x6,x9 // h2*5*r1
+ adc x14,x14,x11
+ mul x11,x6,x7 // h2*r0
+
+ adds x13,x13,x10
+ adc x14,x14,x11
+
+ and x10,x14,#-4 // final reduction
+ and x6,x14,#3
+ add x10,x10,x14,lsr#2
+ adds x4,x12,x10
+ adcs x5,x13,xzr
+ adc x6,x6,xzr
+
+ cbnz x2,.Loop
+
+ stp x4,x5,[x0] // store hash value
+ str x6,[x0,#16]
+
+.Lno_data:
+ ret
+.size poly1305_blocks,.-poly1305_blocks
+
+.type poly1305_emit,%function
+.align 5
+poly1305_emit:
+ ldp x4,x5,[x0] // load hash base 2^64
+ ldr x6,[x0,#16]
+ ldp x10,x11,[x2] // load nonce
+
+ adds x12,x4,#5 // compare to modulus
+ adcs x13,x5,xzr
+ adc x14,x6,xzr
+
+ tst x14,#-4 // see if it's carried/borrowed
+
+ csel x4,x4,x12,eq
+ csel x5,x5,x13,eq
+
+#ifdef __ARMEB__
+ ror x10,x10,#32 // flip nonce words
+ ror x11,x11,#32
+#endif
+ adds x4,x4,x10 // accumulate nonce
+ adc x5,x5,x11
+#ifdef __ARMEB__
+ rev x4,x4 // flip output bytes
+ rev x5,x5
+#endif
+ stp x4,x5,[x1] // write result
+
+ ret
+.size poly1305_emit,.-poly1305_emit
+.type poly1305_mult,%function
+.align 5
+poly1305_mult:
+ mul x12,x4,x7 // h0*r0
+ umulh x13,x4,x7
+
+ mul x10,x5,x9 // h1*5*r1
+ umulh x11,x5,x9
+
+ adds x12,x12,x10
+ mul x10,x4,x8 // h0*r1
+ adc x13,x13,x11
+ umulh x14,x4,x8
+
+ adds x13,x13,x10
+ mul x10,x5,x7 // h1*r0
+ adc x14,x14,xzr
+ umulh x11,x5,x7
+
+ adds x13,x13,x10
+ mul x10,x6,x9 // h2*5*r1
+ adc x14,x14,x11
+ mul x11,x6,x7 // h2*r0
+
+ adds x13,x13,x10
+ adc x14,x14,x11
+
+ and x10,x14,#-4 // final reduction
+ and x6,x14,#3
+ add x10,x10,x14,lsr#2
+ adds x4,x12,x10
+ adcs x5,x13,xzr
+ adc x6,x6,xzr
+
+ ret
+.size poly1305_mult,.-poly1305_mult
+
+.type poly1305_splat,%function
+.align 5
+poly1305_splat:
+ and x12,x4,#0x03ffffff // base 2^64 -> base 2^26
+ ubfx x13,x4,#26,#26
+ extr x14,x5,x4,#52
+ and x14,x14,#0x03ffffff
+ ubfx x15,x5,#14,#26
+ extr x16,x6,x5,#40
+
+ str w12,[x0,#16*0] // r0
+ add w12,w13,w13,lsl#2 // r1*5
+ str w13,[x0,#16*1] // r1
+ add w13,w14,w14,lsl#2 // r2*5
+ str w12,[x0,#16*2] // s1
+ str w14,[x0,#16*3] // r2
+ add w14,w15,w15,lsl#2 // r3*5
+ str w13,[x0,#16*4] // s2
+ str w15,[x0,#16*5] // r3
+ add w15,w16,w16,lsl#2 // r4*5
+ str w14,[x0,#16*6] // s3
+ str w16,[x0,#16*7] // r4
+ str w15,[x0,#16*8] // s4
+
+ ret
+.size poly1305_splat,.-poly1305_splat
+
+.type poly1305_blocks_neon,%function
+.align 5
+poly1305_blocks_neon:
+ ldr x17,[x0,#24]
+ cmp x2,#128
+ b.hs .Lblocks_neon
+ cbz x17,poly1305_blocks
+
+.Lblocks_neon:
+ stp x29,x30,[sp,#-80]!
+ add x29,sp,#0
+
+ ands x2,x2,#-16
+ b.eq .Lno_data_neon
+
+ cbz x17,.Lbase2_64_neon
+
+ ldp w10,w11,[x0] // load hash value base 2^26
+ ldp w12,w13,[x0,#8]
+ ldr w14,[x0,#16]
+
+ tst x2,#31
+ b.eq .Leven_neon
+
+ ldp x7,x8,[x0,#32] // load key value
+
+ add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64
+ lsr x5,x12,#12
+ adds x4,x4,x12,lsl#52
+ add x5,x5,x13,lsl#14
+ adc x5,x5,xzr
+ lsr x6,x14,#24
+ adds x5,x5,x14,lsl#40
+ adc x14,x6,xzr // can be partially reduced...
+
+ ldp x12,x13,[x1],#16 // load input
+ sub x2,x2,#16
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+
+ and x10,x14,#-4 // ... so reduce
+ and x6,x14,#3
+ add x10,x10,x14,lsr#2
+ adds x4,x4,x10
+ adcs x5,x5,xzr
+ adc x6,x6,xzr
+
+#ifdef __ARMEB__
+ rev x12,x12
+ rev x13,x13
+#endif
+ adds x4,x4,x12 // accumulate input
+ adcs x5,x5,x13
+ adc x6,x6,x3
+
+ bl poly1305_mult
+ ldr x30,[sp,#8]
+
+ cbz x3,.Lstore_base2_64_neon
+
+ and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
+ ubfx x11,x4,#26,#26
+ extr x12,x5,x4,#52
+ and x12,x12,#0x03ffffff
+ ubfx x13,x5,#14,#26
+ extr x14,x6,x5,#40
+
+ cbnz x2,.Leven_neon
+
+ stp w10,w11,[x0] // store hash value base 2^26
+ stp w12,w13,[x0,#8]
+ str w14,[x0,#16]
+ b .Lno_data_neon
+
+.align 4
+.Lstore_base2_64_neon:
+ stp x4,x5,[x0] // store hash value base 2^64
+ stp x6,xzr,[x0,#16] // note that is_base2_26 is zeroed
+ b .Lno_data_neon
+
+.align 4
+.Lbase2_64_neon:
+ ldp x7,x8,[x0,#32] // load key value
+
+ ldp x4,x5,[x0] // load hash value base 2^64
+ ldr x6,[x0,#16]
+
+ tst x2,#31
+ b.eq .Linit_neon
+
+ ldp x12,x13,[x1],#16 // load input
+ sub x2,x2,#16
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+#ifdef __ARMEB__
+ rev x12,x12
+ rev x13,x13
+#endif
+ adds x4,x4,x12 // accumulate input
+ adcs x5,x5,x13
+ adc x6,x6,x3
+
+ bl poly1305_mult
+
+.Linit_neon:
+ and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
+ ubfx x11,x4,#26,#26
+ extr x12,x5,x4,#52
+ and x12,x12,#0x03ffffff
+ ubfx x13,x5,#14,#26
+ extr x14,x6,x5,#40
+
+ stp d8,d9,[sp,#16] // meet ABI requirements
+ stp d10,d11,[sp,#32]
+ stp d12,d13,[sp,#48]
+ stp d14,d15,[sp,#64]
+
+ fmov d24,x10
+ fmov d25,x11
+ fmov d26,x12
+ fmov d27,x13
+ fmov d28,x14
+
+ ////////////////////////////////// initialize r^n table
+ mov x4,x7 // r^1
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+ mov x5,x8
+ mov x6,xzr
+ add x0,x0,#48+12
+ bl poly1305_splat
+
+ bl poly1305_mult // r^2
+ sub x0,x0,#4
+ bl poly1305_splat
+
+ bl poly1305_mult // r^3
+ sub x0,x0,#4
+ bl poly1305_splat
+
+ bl poly1305_mult // r^4
+ sub x0,x0,#4
+ bl poly1305_splat
+ ldr x30,[sp,#8]
+
+ add x16,x1,#32
+ adr x17,.Lzeros
+ subs x2,x2,#64
+ csel x16,x17,x16,lo
+
+ mov x4,#1
+ str x4,[x0,#-24] // set is_base2_26
+ sub x0,x0,#48 // restore original x0
+ b .Ldo_neon
+
+.align 4
+.Leven_neon:
+ add x16,x1,#32
+ adr x17,.Lzeros
+ subs x2,x2,#64
+ csel x16,x17,x16,lo
+
+ stp d8,d9,[sp,#16] // meet ABI requirements
+ stp d10,d11,[sp,#32]
+ stp d12,d13,[sp,#48]
+ stp d14,d15,[sp,#64]
+
+ fmov d24,x10
+ fmov d25,x11
+ fmov d26,x12
+ fmov d27,x13
+ fmov d28,x14
+
+.Ldo_neon:
+ ldp x8,x12,[x16],#16 // inp[2:3] (or zero)
+ ldp x9,x13,[x16],#48
+
+ lsl x3,x3,#24
+ add x15,x0,#48
+
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ and x5,x9,#0x03ffffff
+ ubfx x6,x8,#26,#26
+ ubfx x7,x9,#26,#26
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+ extr x8,x12,x8,#52
+ extr x9,x13,x9,#52
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ fmov d14,x4
+ and x8,x8,#0x03ffffff
+ and x9,x9,#0x03ffffff
+ ubfx x10,x12,#14,#26
+ ubfx x11,x13,#14,#26
+ add x12,x3,x12,lsr#40
+ add x13,x3,x13,lsr#40
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ fmov d15,x6
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ fmov d16,x8
+ fmov d17,x10
+ fmov d18,x12
+
+ ldp x8,x12,[x1],#16 // inp[0:1]
+ ldp x9,x13,[x1],#48
+
+ ld1 {v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64
+ ld1 {v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64
+ ld1 {v8.4s},[x15]
+
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ and x5,x9,#0x03ffffff
+ ubfx x6,x8,#26,#26
+ ubfx x7,x9,#26,#26
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+ extr x8,x12,x8,#52
+ extr x9,x13,x9,#52
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ fmov d9,x4
+ and x8,x8,#0x03ffffff
+ and x9,x9,#0x03ffffff
+ ubfx x10,x12,#14,#26
+ ubfx x11,x13,#14,#26
+ add x12,x3,x12,lsr#40
+ add x13,x3,x13,lsr#40
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ fmov d10,x6
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ movi v31.2d,#-1
+ fmov d11,x8
+ fmov d12,x10
+ fmov d13,x12
+ ushr v31.2d,v31.2d,#38
+
+ b.ls .Lskip_loop
+
+.align 4
+.Loop_neon:
+ ////////////////////////////////////////////////////////////////
+ // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
+ // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
+ // ___________________/
+ // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
+ // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
+ // ___________________/ ____________________/
+ //
+ // Note that we start with inp[2:3]*r^2. This is because it
+ // doesn't depend on reduction in previous iteration.
+ ////////////////////////////////////////////////////////////////
+ // d4 = h0*r4 + h1*r3 + h2*r2 + h3*r1 + h4*r0
+ // d3 = h0*r3 + h1*r2 + h2*r1 + h3*r0 + h4*5*r4
+ // d2 = h0*r2 + h1*r1 + h2*r0 + h3*5*r4 + h4*5*r3
+ // d1 = h0*r1 + h1*r0 + h2*5*r4 + h3*5*r3 + h4*5*r2
+ // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1
+
+ subs x2,x2,#64
+ umull v23.2d,v14.2s,v7.s[2]
+ csel x16,x17,x16,lo
+ umull v22.2d,v14.2s,v5.s[2]
+ umull v21.2d,v14.2s,v3.s[2]
+ ldp x8,x12,[x16],#16 // inp[2:3] (or zero)
+ umull v20.2d,v14.2s,v1.s[2]
+ ldp x9,x13,[x16],#48
+ umull v19.2d,v14.2s,v0.s[2]
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+
+ umlal v23.2d,v15.2s,v5.s[2]
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ umlal v22.2d,v15.2s,v3.s[2]
+ and x5,x9,#0x03ffffff
+ umlal v21.2d,v15.2s,v1.s[2]
+ ubfx x6,x8,#26,#26
+ umlal v20.2d,v15.2s,v0.s[2]
+ ubfx x7,x9,#26,#26
+ umlal v19.2d,v15.2s,v8.s[2]
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+
+ umlal v23.2d,v16.2s,v3.s[2]
+ extr x8,x12,x8,#52
+ umlal v22.2d,v16.2s,v1.s[2]
+ extr x9,x13,x9,#52
+ umlal v21.2d,v16.2s,v0.s[2]
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ umlal v20.2d,v16.2s,v8.s[2]
+ fmov d14,x4
+ umlal v19.2d,v16.2s,v6.s[2]
+ and x8,x8,#0x03ffffff
+
+ umlal v23.2d,v17.2s,v1.s[2]
+ and x9,x9,#0x03ffffff
+ umlal v22.2d,v17.2s,v0.s[2]
+ ubfx x10,x12,#14,#26
+ umlal v21.2d,v17.2s,v8.s[2]
+ ubfx x11,x13,#14,#26
+ umlal v20.2d,v17.2s,v6.s[2]
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ umlal v19.2d,v17.2s,v4.s[2]
+ fmov d15,x6
+
+ add v11.2s,v11.2s,v26.2s
+ add x12,x3,x12,lsr#40
+ umlal v23.2d,v18.2s,v0.s[2]
+ add x13,x3,x13,lsr#40
+ umlal v22.2d,v18.2s,v8.s[2]
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ umlal v21.2d,v18.2s,v6.s[2]
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ umlal v20.2d,v18.2s,v4.s[2]
+ fmov d16,x8
+ umlal v19.2d,v18.2s,v2.s[2]
+ fmov d17,x10
+
+ ////////////////////////////////////////////////////////////////
+ // (hash+inp[0:1])*r^4 and accumulate
+
+ add v9.2s,v9.2s,v24.2s
+ fmov d18,x12
+ umlal v22.2d,v11.2s,v1.s[0]
+ ldp x8,x12,[x1],#16 // inp[0:1]
+ umlal v19.2d,v11.2s,v6.s[0]
+ ldp x9,x13,[x1],#48
+ umlal v23.2d,v11.2s,v3.s[0]
+ umlal v20.2d,v11.2s,v8.s[0]
+ umlal v21.2d,v11.2s,v0.s[0]
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+
+ add v10.2s,v10.2s,v25.2s
+ umlal v22.2d,v9.2s,v5.s[0]
+ umlal v23.2d,v9.2s,v7.s[0]
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ umlal v21.2d,v9.2s,v3.s[0]
+ and x5,x9,#0x03ffffff
+ umlal v19.2d,v9.2s,v0.s[0]
+ ubfx x6,x8,#26,#26
+ umlal v20.2d,v9.2s,v1.s[0]
+ ubfx x7,x9,#26,#26
+
+ add v12.2s,v12.2s,v27.2s
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+ umlal v22.2d,v10.2s,v3.s[0]
+ extr x8,x12,x8,#52
+ umlal v23.2d,v10.2s,v5.s[0]
+ extr x9,x13,x9,#52
+ umlal v19.2d,v10.2s,v8.s[0]
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ umlal v21.2d,v10.2s,v1.s[0]
+ fmov d9,x4
+ umlal v20.2d,v10.2s,v0.s[0]
+ and x8,x8,#0x03ffffff
+
+ add v13.2s,v13.2s,v28.2s
+ and x9,x9,#0x03ffffff
+ umlal v22.2d,v12.2s,v0.s[0]
+ ubfx x10,x12,#14,#26
+ umlal v19.2d,v12.2s,v4.s[0]
+ ubfx x11,x13,#14,#26
+ umlal v23.2d,v12.2s,v1.s[0]
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ umlal v20.2d,v12.2s,v6.s[0]
+ fmov d10,x6
+ umlal v21.2d,v12.2s,v8.s[0]
+ add x12,x3,x12,lsr#40
+
+ umlal v22.2d,v13.2s,v8.s[0]
+ add x13,x3,x13,lsr#40
+ umlal v19.2d,v13.2s,v2.s[0]
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ umlal v23.2d,v13.2s,v0.s[0]
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ umlal v20.2d,v13.2s,v4.s[0]
+ fmov d11,x8
+ umlal v21.2d,v13.2s,v6.s[0]
+ fmov d12,x10
+ fmov d13,x12
+
+ /////////////////////////////////////////////////////////////////
+ // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
+ // and P. Schwabe
+ //
+ // [see discussion in poly1305-armv4 module]
+
+ ushr v29.2d,v22.2d,#26
+ xtn v27.2s,v22.2d
+ ushr v30.2d,v19.2d,#26
+ and v19.16b,v19.16b,v31.16b
+ add v23.2d,v23.2d,v29.2d // h3 -> h4
+ bic v27.2s,#0xfc,lsl#24 // &=0x03ffffff
+ add v20.2d,v20.2d,v30.2d // h0 -> h1
+
+ ushr v29.2d,v23.2d,#26
+ xtn v28.2s,v23.2d
+ ushr v30.2d,v20.2d,#26
+ xtn v25.2s,v20.2d
+ bic v28.2s,#0xfc,lsl#24
+ add v21.2d,v21.2d,v30.2d // h1 -> h2
+
+ add v19.2d,v19.2d,v29.2d
+ shl v29.2d,v29.2d,#2
+ shrn v30.2s,v21.2d,#26
+ xtn v26.2s,v21.2d
+ add v19.2d,v19.2d,v29.2d // h4 -> h0
+ bic v25.2s,#0xfc,lsl#24
+ add v27.2s,v27.2s,v30.2s // h2 -> h3
+ bic v26.2s,#0xfc,lsl#24
+
+ shrn v29.2s,v19.2d,#26
+ xtn v24.2s,v19.2d
+ ushr v30.2s,v27.2s,#26
+ bic v27.2s,#0xfc,lsl#24
+ bic v24.2s,#0xfc,lsl#24
+ add v25.2s,v25.2s,v29.2s // h0 -> h1
+ add v28.2s,v28.2s,v30.2s // h3 -> h4
+
+ b.hi .Loop_neon
+
+.Lskip_loop:
+ dup v16.2d,v16.d[0]
+ add v11.2s,v11.2s,v26.2s
+
+ ////////////////////////////////////////////////////////////////
+ // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
+
+ adds x2,x2,#32
+ b.ne .Long_tail
+
+ dup v16.2d,v11.d[0]
+ add v14.2s,v9.2s,v24.2s
+ add v17.2s,v12.2s,v27.2s
+ add v15.2s,v10.2s,v25.2s
+ add v18.2s,v13.2s,v28.2s
+
+.Long_tail:
+ dup v14.2d,v14.d[0]
+ umull2 v19.2d,v16.4s,v6.4s
+ umull2 v22.2d,v16.4s,v1.4s
+ umull2 v23.2d,v16.4s,v3.4s
+ umull2 v21.2d,v16.4s,v0.4s
+ umull2 v20.2d,v16.4s,v8.4s
+
+ dup v15.2d,v15.d[0]
+ umlal2 v19.2d,v14.4s,v0.4s
+ umlal2 v21.2d,v14.4s,v3.4s
+ umlal2 v22.2d,v14.4s,v5.4s
+ umlal2 v23.2d,v14.4s,v7.4s
+ umlal2 v20.2d,v14.4s,v1.4s
+
+ dup v17.2d,v17.d[0]
+ umlal2 v19.2d,v15.4s,v8.4s
+ umlal2 v22.2d,v15.4s,v3.4s
+ umlal2 v21.2d,v15.4s,v1.4s
+ umlal2 v23.2d,v15.4s,v5.4s
+ umlal2 v20.2d,v15.4s,v0.4s
+
+ dup v18.2d,v18.d[0]
+ umlal2 v22.2d,v17.4s,v0.4s
+ umlal2 v23.2d,v17.4s,v1.4s
+ umlal2 v19.2d,v17.4s,v4.4s
+ umlal2 v20.2d,v17.4s,v6.4s
+ umlal2 v21.2d,v17.4s,v8.4s
+
+ umlal2 v22.2d,v18.4s,v8.4s
+ umlal2 v19.2d,v18.4s,v2.4s
+ umlal2 v23.2d,v18.4s,v0.4s
+ umlal2 v20.2d,v18.4s,v4.4s
+ umlal2 v21.2d,v18.4s,v6.4s
+
+ b.eq .Lshort_tail
+
+ ////////////////////////////////////////////////////////////////
+ // (hash+inp[0:1])*r^4:r^3 and accumulate
+
+ add v9.2s,v9.2s,v24.2s
+ umlal v22.2d,v11.2s,v1.2s
+ umlal v19.2d,v11.2s,v6.2s
+ umlal v23.2d,v11.2s,v3.2s
+ umlal v20.2d,v11.2s,v8.2s
+ umlal v21.2d,v11.2s,v0.2s
+
+ add v10.2s,v10.2s,v25.2s
+ umlal v22.2d,v9.2s,v5.2s
+ umlal v19.2d,v9.2s,v0.2s
+ umlal v23.2d,v9.2s,v7.2s
+ umlal v20.2d,v9.2s,v1.2s
+ umlal v21.2d,v9.2s,v3.2s
+
+ add v12.2s,v12.2s,v27.2s
+ umlal v22.2d,v10.2s,v3.2s
+ umlal v19.2d,v10.2s,v8.2s
+ umlal v23.2d,v10.2s,v5.2s
+ umlal v20.2d,v10.2s,v0.2s
+ umlal v21.2d,v10.2s,v1.2s
+
+ add v13.2s,v13.2s,v28.2s
+ umlal v22.2d,v12.2s,v0.2s
+ umlal v19.2d,v12.2s,v4.2s
+ umlal v23.2d,v12.2s,v1.2s
+ umlal v20.2d,v12.2s,v6.2s
+ umlal v21.2d,v12.2s,v8.2s
+
+ umlal v22.2d,v13.2s,v8.2s
+ umlal v19.2d,v13.2s,v2.2s
+ umlal v23.2d,v13.2s,v0.2s
+ umlal v20.2d,v13.2s,v4.2s
+ umlal v21.2d,v13.2s,v6.2s
+
+.Lshort_tail:
+ ////////////////////////////////////////////////////////////////
+ // horizontal add
+
+ addp v22.2d,v22.2d,v22.2d
+ ldp d8,d9,[sp,#16] // meet ABI requirements
+ addp v19.2d,v19.2d,v19.2d
+ ldp d10,d11,[sp,#32]
+ addp v23.2d,v23.2d,v23.2d
+ ldp d12,d13,[sp,#48]
+ addp v20.2d,v20.2d,v20.2d
+ ldp d14,d15,[sp,#64]
+ addp v21.2d,v21.2d,v21.2d
+
+ ////////////////////////////////////////////////////////////////
+ // lazy reduction, but without narrowing
+
+ ushr v29.2d,v22.2d,#26
+ and v22.16b,v22.16b,v31.16b
+ ushr v30.2d,v19.2d,#26
+ and v19.16b,v19.16b,v31.16b
+
+ add v23.2d,v23.2d,v29.2d // h3 -> h4
+ add v20.2d,v20.2d,v30.2d // h0 -> h1
+
+ ushr v29.2d,v23.2d,#26
+ and v23.16b,v23.16b,v31.16b
+ ushr v30.2d,v20.2d,#26
+ and v20.16b,v20.16b,v31.16b
+ add v21.2d,v21.2d,v30.2d // h1 -> h2
+
+ add v19.2d,v19.2d,v29.2d
+ shl v29.2d,v29.2d,#2
+ ushr v30.2d,v21.2d,#26
+ and v21.16b,v21.16b,v31.16b
+ add v19.2d,v19.2d,v29.2d // h4 -> h0
+ add v22.2d,v22.2d,v30.2d // h2 -> h3
+
+ ushr v29.2d,v19.2d,#26
+ and v19.16b,v19.16b,v31.16b
+ ushr v30.2d,v22.2d,#26
+ and v22.16b,v22.16b,v31.16b
+ add v20.2d,v20.2d,v29.2d // h0 -> h1
+ add v23.2d,v23.2d,v30.2d // h3 -> h4
+
+ ////////////////////////////////////////////////////////////////
+ // write the result, can be partially reduced
+
+ st4 {v19.s,v20.s,v21.s,v22.s}[0],[x0],#16
+ st1 {v23.s}[0],[x0]
+
+.Lno_data_neon:
+ ldr x29,[sp],#80
+ ret
+.size poly1305_blocks_neon,.-poly1305_blocks_neon
+
+.type poly1305_emit_neon,%function
+.align 5
+poly1305_emit_neon:
+ ldr x17,[x0,#24]
+ cbz x17,poly1305_emit
+
+ ldp w10,w11,[x0] // load hash value base 2^26
+ ldp w12,w13,[x0,#8]
+ ldr w14,[x0,#16]
+
+ add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64
+ lsr x5,x12,#12
+ adds x4,x4,x12,lsl#52
+ add x5,x5,x13,lsl#14
+ adc x5,x5,xzr
+ lsr x6,x14,#24
+ adds x5,x5,x14,lsl#40
+ adc x6,x6,xzr // can be partially reduced...
+
+ ldp x10,x11,[x2] // load nonce
+
+ and x12,x6,#-4 // ... so reduce
+ add x12,x12,x6,lsr#2
+ and x6,x6,#3
+ adds x4,x4,x12
+ adcs x5,x5,xzr
+ adc x6,x6,xzr
+
+ adds x12,x4,#5 // compare to modulus
+ adcs x13,x5,xzr
+ adc x14,x6,xzr
+
+ tst x14,#-4 // see if it's carried/borrowed
+
+ csel x4,x4,x12,eq
+ csel x5,x5,x13,eq
+
+#ifdef __ARMEB__
+ ror x10,x10,#32 // flip nonce words
+ ror x11,x11,#32
+#endif
+ adds x4,x4,x10 // accumulate nonce
+ adc x5,x5,x11
+#ifdef __ARMEB__
+ rev x4,x4 // flip output bytes
+ rev x5,x5
+#endif
+ stp x4,x5,[x1] // write result
+
+ ret
+.size poly1305_emit_neon,.-poly1305_emit_neon
+
+.align 5
+.Lzeros:
+.long 0,0,0,0,0,0,0,0
+.LOPENSSL_armcap_P:
+#ifdef __ILP32__
+.long OPENSSL_armcap_P-.
+#else
+.quad OPENSSL_armcap_P-.
+#endif
+.byte 80,111,108,121,49,51,48,53,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+.align 2
+.align 2
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 11/23] zinc: import Andy Polyakov's Poly1305 ARM and ARM64 implementations
@ 2018-09-25 14:56 ` Jason A. Donenfeld
0 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-arm-kernel
These NEON and non-NEON implementations come from Andy Polyakov's
implementation, and are included here in raw form without modification,
so that subsequent commits that fix these up for the kernel can see how
it has changed. This awkward commit splitting has been requested for the
ARM[64] implementations in particular.
While this is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit 5bb1cd2292b388263a0cc05392bb99141212aa53
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel at lists.infradead.org
---
lib/zinc/poly1305/poly1305-arm-cryptogams.S | 1172 +++++++++++++++++
lib/zinc/poly1305/poly1305-arm64-cryptogams.S | 869 ++++++++++++
2 files changed, 2041 insertions(+)
create mode 100644 lib/zinc/poly1305/poly1305-arm-cryptogams.S
create mode 100644 lib/zinc/poly1305/poly1305-arm64-cryptogams.S
diff --git a/lib/zinc/poly1305/poly1305-arm-cryptogams.S b/lib/zinc/poly1305/poly1305-arm-cryptogams.S
new file mode 100644
index 000000000000..884b465030e4
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm-cryptogams.S
@@ -0,0 +1,1172 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+#if defined(__thumb2__)
+.syntax unified
+.thumb
+#else
+.code 32
+#endif
+
+.globl poly1305_emit
+.globl poly1305_blocks
+.globl poly1305_init
+.type poly1305_init,%function
+.align 5
+poly1305_init:
+.Lpoly1305_init:
+ stmdb sp!,{r4-r11}
+
+ eor r3,r3,r3
+ cmp r1,#0
+ str r3,[r0,#0] @ zero hash value
+ str r3,[r0,#4]
+ str r3,[r0,#8]
+ str r3,[r0,#12]
+ str r3,[r0,#16]
+ str r3,[r0,#36] @ is_base2_26
+ add r0,r0,#20
+
+#ifdef __thumb2__
+ it eq
+#endif
+ moveq r0,#0
+ beq .Lno_key
+
+#if __ARM_MAX_ARCH__>=7
+ adr r11,.Lpoly1305_init
+ ldr r12,.LOPENSSL_armcap
+#endif
+ ldrb r4,[r1,#0]
+ mov r10,#0x0fffffff
+ ldrb r5,[r1,#1]
+ and r3,r10,#-4 @ 0x0ffffffc
+ ldrb r6,[r1,#2]
+ ldrb r7,[r1,#3]
+ orr r4,r4,r5,lsl#8
+ ldrb r5,[r1,#4]
+ orr r4,r4,r6,lsl#16
+ ldrb r6,[r1,#5]
+ orr r4,r4,r7,lsl#24
+ ldrb r7,[r1,#6]
+ and r4,r4,r10
+
+#if __ARM_MAX_ARCH__>=7
+ ldr r12,[r11,r12] @ OPENSSL_armcap_P
+# ifdef __APPLE__
+ ldr r12,[r12]
+# endif
+#endif
+ ldrb r8,[r1,#7]
+ orr r5,r5,r6,lsl#8
+ ldrb r6,[r1,#8]
+ orr r5,r5,r7,lsl#16
+ ldrb r7,[r1,#9]
+ orr r5,r5,r8,lsl#24
+ ldrb r8,[r1,#10]
+ and r5,r5,r3
+
+#if __ARM_MAX_ARCH__>=7
+ tst r12,#ARMV7_NEON @ check for NEON
+# ifdef __APPLE__
+ adr r9,poly1305_blocks_neon
+ adr r11,poly1305_blocks
+# ifdef __thumb2__
+ it ne
+# endif
+ movne r11,r9
+ adr r12,poly1305_emit
+ adr r10,poly1305_emit_neon
+# ifdef __thumb2__
+ it ne
+# endif
+ movne r12,r10
+# else
+# ifdef __thumb2__
+ itete eq
+# endif
+ addeq r12,r11,#(poly1305_emit-.Lpoly1305_init)
+ addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
+ addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init)
+ addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
+# endif
+# ifdef __thumb2__
+ orr r12,r12,#1 @ thumb-ify address
+ orr r11,r11,#1
+# endif
+#endif
+ ldrb r9,[r1,#11]
+ orr r6,r6,r7,lsl#8
+ ldrb r7,[r1,#12]
+ orr r6,r6,r8,lsl#16
+ ldrb r8,[r1,#13]
+ orr r6,r6,r9,lsl#24
+ ldrb r9,[r1,#14]
+ and r6,r6,r3
+
+ ldrb r10,[r1,#15]
+ orr r7,r7,r8,lsl#8
+ str r4,[r0,#0]
+ orr r7,r7,r9,lsl#16
+ str r5,[r0,#4]
+ orr r7,r7,r10,lsl#24
+ str r6,[r0,#8]
+ and r7,r7,r3
+ str r7,[r0,#12]
+#if __ARM_MAX_ARCH__>=7
+ stmia r2,{r11,r12} @ fill functions table
+ mov r0,#1
+#else
+ mov r0,#0
+#endif
+.Lno_key:
+ ldmia sp!,{r4-r11}
+#if __ARM_ARCH__>=5
+ bx lr @ bx lr
+#else
+ tst lr,#1
+ moveq pc,lr @ be binary compatible with V4, yet
+ .word 0xe12fff1e @ interoperable with Thumb ISA:-)
+#endif
+.size poly1305_init,.-poly1305_init
+.type poly1305_blocks,%function
+.align 5
+poly1305_blocks:
+.Lpoly1305_blocks:
+ stmdb sp!,{r3-r11,lr}
+
+ ands r2,r2,#-16
+ beq .Lno_data
+
+ cmp r3,#0
+ add r2,r2,r1 @ end pointer
+ sub sp,sp,#32
+
+ ldmia r0,{r4-r12} @ load context
+
+ str r0,[sp,#12] @ offload stuff
+ mov lr,r1
+ str r2,[sp,#16]
+ str r10,[sp,#20]
+ str r11,[sp,#24]
+ str r12,[sp,#28]
+ b .Loop
+
+.Loop:
+#if __ARM_ARCH__<7
+ ldrb r0,[lr],#16 @ load input
+# ifdef __thumb2__
+ it hi
+# endif
+ addhi r8,r8,#1 @ 1<<128
+ ldrb r1,[lr,#-15]
+ ldrb r2,[lr,#-14]
+ ldrb r3,[lr,#-13]
+ orr r1,r0,r1,lsl#8
+ ldrb r0,[lr,#-12]
+ orr r2,r1,r2,lsl#16
+ ldrb r1,[lr,#-11]
+ orr r3,r2,r3,lsl#24
+ ldrb r2,[lr,#-10]
+ adds r4,r4,r3 @ accumulate input
+
+ ldrb r3,[lr,#-9]
+ orr r1,r0,r1,lsl#8
+ ldrb r0,[lr,#-8]
+ orr r2,r1,r2,lsl#16
+ ldrb r1,[lr,#-7]
+ orr r3,r2,r3,lsl#24
+ ldrb r2,[lr,#-6]
+ adcs r5,r5,r3
+
+ ldrb r3,[lr,#-5]
+ orr r1,r0,r1,lsl#8
+ ldrb r0,[lr,#-4]
+ orr r2,r1,r2,lsl#16
+ ldrb r1,[lr,#-3]
+ orr r3,r2,r3,lsl#24
+ ldrb r2,[lr,#-2]
+ adcs r6,r6,r3
+
+ ldrb r3,[lr,#-1]
+ orr r1,r0,r1,lsl#8
+ str lr,[sp,#8] @ offload input pointer
+ orr r2,r1,r2,lsl#16
+ add r10,r10,r10,lsr#2
+ orr r3,r2,r3,lsl#24
+#else
+ ldr r0,[lr],#16 @ load input
+# ifdef __thumb2__
+ it hi
+# endif
+ addhi r8,r8,#1 @ padbit
+ ldr r1,[lr,#-12]
+ ldr r2,[lr,#-8]
+ ldr r3,[lr,#-4]
+# ifdef __ARMEB__
+ rev r0,r0
+ rev r1,r1
+ rev r2,r2
+ rev r3,r3
+# endif
+ adds r4,r4,r0 @ accumulate input
+ str lr,[sp,#8] @ offload input pointer
+ adcs r5,r5,r1
+ add r10,r10,r10,lsr#2
+ adcs r6,r6,r2
+#endif
+ add r11,r11,r11,lsr#2
+ adcs r7,r7,r3
+ add r12,r12,r12,lsr#2
+
+ umull r2,r3,r5,r9
+ adc r8,r8,#0
+ umull r0,r1,r4,r9
+ umlal r2,r3,r8,r10
+ umlal r0,r1,r7,r10
+ ldr r10,[sp,#20] @ reload r10
+ umlal r2,r3,r6,r12
+ umlal r0,r1,r5,r12
+ umlal r2,r3,r7,r11
+ umlal r0,r1,r6,r11
+ umlal r2,r3,r4,r10
+ str r0,[sp,#0] @ future r4
+ mul r0,r11,r8
+ ldr r11,[sp,#24] @ reload r11
+ adds r2,r2,r1 @ d1+=d0>>32
+ eor r1,r1,r1
+ adc lr,r3,#0 @ future r6
+ str r2,[sp,#4] @ future r5
+
+ mul r2,r12,r8
+ eor r3,r3,r3
+ umlal r0,r1,r7,r12
+ ldr r12,[sp,#28] @ reload r12
+ umlal r2,r3,r7,r9
+ umlal r0,r1,r6,r9
+ umlal r2,r3,r6,r10
+ umlal r0,r1,r5,r10
+ umlal r2,r3,r5,r11
+ umlal r0,r1,r4,r11
+ umlal r2,r3,r4,r12
+ ldr r4,[sp,#0]
+ mul r8,r9,r8
+ ldr r5,[sp,#4]
+
+ adds r6,lr,r0 @ d2+=d1>>32
+ ldr lr,[sp,#8] @ reload input pointer
+ adc r1,r1,#0
+ adds r7,r2,r1 @ d3+=d2>>32
+ ldr r0,[sp,#16] @ reload end pointer
+ adc r3,r3,#0
+ add r8,r8,r3 @ h4+=d3>>32
+
+ and r1,r8,#-4
+ and r8,r8,#3
+ add r1,r1,r1,lsr#2 @ *=5
+ adds r4,r4,r1
+ adcs r5,r5,#0
+ adcs r6,r6,#0
+ adcs r7,r7,#0
+ adc r8,r8,#0
+
+ cmp r0,lr @ done yet?
+ bhi .Loop
+
+ ldr r0,[sp,#12]
+ add sp,sp,#32
+ stmia r0,{r4-r8} @ store the result
+
+.Lno_data:
+#if __ARM_ARCH__>=5
+ ldmia sp!,{r3-r11,pc}
+#else
+ ldmia sp!,{r3-r11,lr}
+ tst lr,#1
+ moveq pc,lr @ be binary compatible with V4, yet
+ .word 0xe12fff1e @ interoperable with Thumb ISA:-)
+#endif
+.size poly1305_blocks,.-poly1305_blocks
+.type poly1305_emit,%function
+.align 5
+poly1305_emit:
+ stmdb sp!,{r4-r11}
+.Lpoly1305_emit_enter:
+
+ ldmia r0,{r3-r7}
+ adds r8,r3,#5 @ compare to modulus
+ adcs r9,r4,#0
+ adcs r10,r5,#0
+ adcs r11,r6,#0
+ adc r7,r7,#0
+ tst r7,#4 @ did it carry/borrow?
+
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r3,r8
+ ldr r8,[r2,#0]
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r4,r9
+ ldr r9,[r2,#4]
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r5,r10
+ ldr r10,[r2,#8]
+#ifdef __thumb2__
+ it ne
+#endif
+ movne r6,r11
+ ldr r11,[r2,#12]
+
+ adds r3,r3,r8
+ adcs r4,r4,r9
+ adcs r5,r5,r10
+ adc r6,r6,r11
+
+#if __ARM_ARCH__>=7
+# ifdef __ARMEB__
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+# endif
+ str r3,[r1,#0]
+ str r4,[r1,#4]
+ str r5,[r1,#8]
+ str r6,[r1,#12]
+#else
+ strb r3,[r1,#0]
+ mov r3,r3,lsr#8
+ strb r4,[r1,#4]
+ mov r4,r4,lsr#8
+ strb r5,[r1,#8]
+ mov r5,r5,lsr#8
+ strb r6,[r1,#12]
+ mov r6,r6,lsr#8
+
+ strb r3,[r1,#1]
+ mov r3,r3,lsr#8
+ strb r4,[r1,#5]
+ mov r4,r4,lsr#8
+ strb r5,[r1,#9]
+ mov r5,r5,lsr#8
+ strb r6,[r1,#13]
+ mov r6,r6,lsr#8
+
+ strb r3,[r1,#2]
+ mov r3,r3,lsr#8
+ strb r4,[r1,#6]
+ mov r4,r4,lsr#8
+ strb r5,[r1,#10]
+ mov r5,r5,lsr#8
+ strb r6,[r1,#14]
+ mov r6,r6,lsr#8
+
+ strb r3,[r1,#3]
+ strb r4,[r1,#7]
+ strb r5,[r1,#11]
+ strb r6,[r1,#15]
+#endif
+ ldmia sp!,{r4-r11}
+#if __ARM_ARCH__>=5
+ bx lr @ bx lr
+#else
+ tst lr,#1
+ moveq pc,lr @ be binary compatible with V4, yet
+ .word 0xe12fff1e @ interoperable with Thumb ISA:-)
+#endif
+.size poly1305_emit,.-poly1305_emit
+#if __ARM_MAX_ARCH__>=7
+.fpu neon
+
+.type poly1305_init_neon,%function
+.align 5
+poly1305_init_neon:
+ ldr r4,[r0,#20] @ load key base 2^32
+ ldr r5,[r0,#24]
+ ldr r6,[r0,#28]
+ ldr r7,[r0,#32]
+
+ and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26
+ mov r3,r4,lsr#26
+ mov r4,r5,lsr#20
+ orr r3,r3,r5,lsl#6
+ mov r5,r6,lsr#14
+ orr r4,r4,r6,lsl#12
+ mov r6,r7,lsr#8
+ orr r5,r5,r7,lsl#18
+ and r3,r3,#0x03ffffff
+ and r4,r4,#0x03ffffff
+ and r5,r5,#0x03ffffff
+
+ vdup.32 d0,r2 @ r^1 in both lanes
+ add r2,r3,r3,lsl#2 @ *5
+ vdup.32 d1,r3
+ add r3,r4,r4,lsl#2
+ vdup.32 d2,r2
+ vdup.32 d3,r4
+ add r4,r5,r5,lsl#2
+ vdup.32 d4,r3
+ vdup.32 d5,r5
+ add r5,r6,r6,lsl#2
+ vdup.32 d6,r4
+ vdup.32 d7,r6
+ vdup.32 d8,r5
+
+ mov r5,#2 @ counter
+
+.Lsquare_neon:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
+ @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4
+ @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4
+ @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4
+ @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4
+
+ vmull.u32 q5,d0,d0[1]
+ vmull.u32 q6,d1,d0[1]
+ vmull.u32 q7,d3,d0[1]
+ vmull.u32 q8,d5,d0[1]
+ vmull.u32 q9,d7,d0[1]
+
+ vmlal.u32 q5,d7,d2[1]
+ vmlal.u32 q6,d0,d1[1]
+ vmlal.u32 q7,d1,d1[1]
+ vmlal.u32 q8,d3,d1[1]
+ vmlal.u32 q9,d5,d1[1]
+
+ vmlal.u32 q5,d5,d4[1]
+ vmlal.u32 q6,d7,d4[1]
+ vmlal.u32 q8,d1,d3[1]
+ vmlal.u32 q7,d0,d3[1]
+ vmlal.u32 q9,d3,d3[1]
+
+ vmlal.u32 q5,d3,d6[1]
+ vmlal.u32 q8,d0,d5[1]
+ vmlal.u32 q6,d5,d6[1]
+ vmlal.u32 q7,d7,d6[1]
+ vmlal.u32 q9,d1,d5[1]
+
+ vmlal.u32 q8,d7,d8[1]
+ vmlal.u32 q5,d1,d8[1]
+ vmlal.u32 q6,d3,d8[1]
+ vmlal.u32 q7,d5,d8[1]
+ vmlal.u32 q9,d0,d7[1]
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
+ @ and P. Schwabe
+ @
+ @ H0>>+H1>>+H2>>+H3>>+H4
+ @ H3>>+H4>>*5+H0>>+H1
+ @
+ @ Trivia.
+ @
+ @ Result of multiplication of n-bit number by m-bit number is
+ @ n+m bits wide. However! Even though 2^n is a n+1-bit number,
+ @ m-bit number multiplied by 2^n is still n+m bits wide.
+ @
+ @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2,
+ @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit
+ @ one is n+1 bits wide.
+ @
+ @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that
+ @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4
+ @ can be 27. However! In cases when their width exceeds 26 bits
+ @ they are limited by 2^26+2^6. This in turn means that *sum*
+ @ of the products with these values can still be viewed as sum
+ @ of 52-bit numbers as long as the amount of addends is not a
+ @ power of 2. For example,
+ @
+ @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4,
+ @
+ @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or
+ @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than
+ @ 8 * (2^52) or 2^55. However, the value is then multiplied by
+ @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12),
+ @ which is less than 32 * (2^52) or 2^57. And when processing
+ @ data we are looking at triple as many addends...
+ @
+ @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and
+ @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the
+ @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while
+ @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32
+ @ instruction accepts 2x32-bit input and writes 2x64-bit result.
+ @ This means that result of reduction have to be compressed upon
+ @ loop wrap-around. This can be done in the process of reduction
+ @ to minimize amount of instructions [as well as amount of
+ @ 128-bit instructions, which benefits low-end processors], but
+ @ one has to watch for H2 (which is narrower than H0) and 5*H4
+ @ not being wider than 58 bits, so that result of right shift
+ @ by 26 bits fits in 32 bits. This is also useful on x86,
+ @ because it allows to use paddd in place for paddq, which
+ @ benefits Atom, where paddq is ridiculously slow.
+
+ vshr.u64 q15,q8,#26
+ vmovn.i64 d16,q8
+ vshr.u64 q4,q5,#26
+ vmovn.i64 d10,q5
+ vadd.i64 q9,q9,q15 @ h3 -> h4
+ vbic.i32 d16,#0xfc000000 @ &=0x03ffffff
+ vadd.i64 q6,q6,q4 @ h0 -> h1
+ vbic.i32 d10,#0xfc000000
+
+ vshrn.u64 d30,q9,#26
+ vmovn.i64 d18,q9
+ vshr.u64 q4,q6,#26
+ vmovn.i64 d12,q6
+ vadd.i64 q7,q7,q4 @ h1 -> h2
+ vbic.i32 d18,#0xfc000000
+ vbic.i32 d12,#0xfc000000
+
+ vadd.i32 d10,d10,d30
+ vshl.u32 d30,d30,#2
+ vshrn.u64 d8,q7,#26
+ vmovn.i64 d14,q7
+ vadd.i32 d10,d10,d30 @ h4 -> h0
+ vadd.i32 d16,d16,d8 @ h2 -> h3
+ vbic.i32 d14,#0xfc000000
+
+ vshr.u32 d30,d10,#26
+ vbic.i32 d10,#0xfc000000
+ vshr.u32 d8,d16,#26
+ vbic.i32 d16,#0xfc000000
+ vadd.i32 d12,d12,d30 @ h0 -> h1
+ vadd.i32 d18,d18,d8 @ h3 -> h4
+
+ subs r5,r5,#1
+ beq .Lsquare_break_neon
+
+ add r6,r0,#(48+0*9*4)
+ add r7,r0,#(48+1*9*4)
+
+ vtrn.32 d0,d10 @ r^2:r^1
+ vtrn.32 d3,d14
+ vtrn.32 d5,d16
+ vtrn.32 d1,d12
+ vtrn.32 d7,d18
+
+ vshl.u32 d4,d3,#2 @ *5
+ vshl.u32 d6,d5,#2
+ vshl.u32 d2,d1,#2
+ vshl.u32 d8,d7,#2
+ vadd.i32 d4,d4,d3
+ vadd.i32 d2,d2,d1
+ vadd.i32 d6,d6,d5
+ vadd.i32 d8,d8,d7
+
+ vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]!
+ vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]!
+ vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vst1.32 {d8[0]},[r6,:32]
+ vst1.32 {d8[1]},[r7,:32]
+
+ b .Lsquare_neon
+
+.align 4
+.Lsquare_break_neon:
+ add r6,r0,#(48+2*4*9)
+ add r7,r0,#(48+3*4*9)
+
+ vmov d0,d10 @ r^4:r^3
+ vshl.u32 d2,d12,#2 @ *5
+ vmov d1,d12
+ vshl.u32 d4,d14,#2
+ vmov d3,d14
+ vshl.u32 d6,d16,#2
+ vmov d5,d16
+ vshl.u32 d8,d18,#2
+ vmov d7,d18
+ vadd.i32 d2,d2,d12
+ vadd.i32 d4,d4,d14
+ vadd.i32 d6,d6,d16
+ vadd.i32 d8,d8,d18
+
+ vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]!
+ vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]!
+ vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vst1.32 {d8[0]},[r6]
+ vst1.32 {d8[1]},[r7]
+
+ bx lr @ bx lr
+.size poly1305_init_neon,.-poly1305_init_neon
+
+.type poly1305_blocks_neon,%function
+.align 5
+poly1305_blocks_neon:
+ ldr ip,[r0,#36] @ is_base2_26
+ ands r2,r2,#-16
+ beq .Lno_data_neon
+
+ cmp r2,#64
+ bhs .Lenter_neon
+ tst ip,ip @ is_base2_26?
+ beq .Lpoly1305_blocks
+
+.Lenter_neon:
+ stmdb sp!,{r4-r7}
+ vstmdb sp!,{d8-d15} @ ABI specification says so
+
+ tst ip,ip @ is_base2_26?
+ bne .Lbase2_26_neon
+
+ stmdb sp!,{r1-r3,lr}
+ bl poly1305_init_neon
+
+ ldr r4,[r0,#0] @ load hash value base 2^32
+ ldr r5,[r0,#4]
+ ldr r6,[r0,#8]
+ ldr r7,[r0,#12]
+ ldr ip,[r0,#16]
+
+ and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26
+ mov r3,r4,lsr#26
+ veor d10,d10,d10
+ mov r4,r5,lsr#20
+ orr r3,r3,r5,lsl#6
+ veor d12,d12,d12
+ mov r5,r6,lsr#14
+ orr r4,r4,r6,lsl#12
+ veor d14,d14,d14
+ mov r6,r7,lsr#8
+ orr r5,r5,r7,lsl#18
+ veor d16,d16,d16
+ and r3,r3,#0x03ffffff
+ orr r6,r6,ip,lsl#24
+ veor d18,d18,d18
+ and r4,r4,#0x03ffffff
+ mov r1,#1
+ and r5,r5,#0x03ffffff
+ str r1,[r0,#36] @ is_base2_26
+
+ vmov.32 d10[0],r2
+ vmov.32 d12[0],r3
+ vmov.32 d14[0],r4
+ vmov.32 d16[0],r5
+ vmov.32 d18[0],r6
+ adr r5,.Lzeros
+
+ ldmia sp!,{r1-r3,lr}
+ b .Lbase2_32_neon
+
+.align 4
+.Lbase2_26_neon:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ load hash value
+
+ veor d10,d10,d10
+ veor d12,d12,d12
+ veor d14,d14,d14
+ veor d16,d16,d16
+ veor d18,d18,d18
+ vld4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]!
+ adr r5,.Lzeros
+ vld1.32 {d18[0]},[r0]
+ sub r0,r0,#16 @ rewind
+
+.Lbase2_32_neon:
+ add r4,r1,#32
+ mov r3,r3,lsl#24
+ tst r2,#31
+ beq .Leven
+
+ vld4.32 {d20[0],d22[0],d24[0],d26[0]},[r1]!
+ vmov.32 d28[0],r3
+ sub r2,r2,#16
+ add r4,r1,#32
+
+# ifdef __ARMEB__
+ vrev32.8 q10,q10
+ vrev32.8 q13,q13
+ vrev32.8 q11,q11
+ vrev32.8 q12,q12
+# endif
+ vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26
+ vshl.u32 d26,d26,#18
+
+ vsri.u32 d26,d24,#14
+ vshl.u32 d24,d24,#12
+ vadd.i32 d29,d28,d18 @ add hash value and move to #hi
+
+ vbic.i32 d26,#0xfc000000
+ vsri.u32 d24,d22,#20
+ vshl.u32 d22,d22,#6
+
+ vbic.i32 d24,#0xfc000000
+ vsri.u32 d22,d20,#26
+ vadd.i32 d27,d26,d16
+
+ vbic.i32 d20,#0xfc000000
+ vbic.i32 d22,#0xfc000000
+ vadd.i32 d25,d24,d14
+
+ vadd.i32 d21,d20,d10
+ vadd.i32 d23,d22,d12
+
+ mov r7,r5
+ add r6,r0,#48
+
+ cmp r2,r2
+ b .Long_tail
+
+.align 4
+.Leven:
+ subs r2,r2,#64
+ it lo
+ movlo r4,r5
+
+ vmov.i32 q14,#1<<24 @ padbit, yes, always
+ vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
+ add r1,r1,#64
+ vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0)
+ add r4,r4,#64
+ itt hi
+ addhi r7,r0,#(48+1*9*4)
+ addhi r6,r0,#(48+3*9*4)
+
+# ifdef __ARMEB__
+ vrev32.8 q10,q10
+ vrev32.8 q13,q13
+ vrev32.8 q11,q11
+ vrev32.8 q12,q12
+# endif
+ vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
+ vshl.u32 q13,q13,#18
+
+ vsri.u32 q13,q12,#14
+ vshl.u32 q12,q12,#12
+
+ vbic.i32 q13,#0xfc000000
+ vsri.u32 q12,q11,#20
+ vshl.u32 q11,q11,#6
+
+ vbic.i32 q12,#0xfc000000
+ vsri.u32 q11,q10,#26
+
+ vbic.i32 q10,#0xfc000000
+ vbic.i32 q11,#0xfc000000
+
+ bls .Lskip_loop
+
+ vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2
+ vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
+ vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ b .Loop_neon
+
+.align 5
+.Loop_neon:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
+ @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
+ @ ___________________/
+ @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
+ @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
+ @ ___________________/ ____________________/
+ @
+ @ Note that we start with inp[2:3]*r^2. This is because it
+ @ doesn't depend on reduction in previous iteration.
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4
+ @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4
+ @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4
+ @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4
+ @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ inp[2:3]*r^2
+
+ vadd.i32 d24,d24,d14 @ accumulate inp[0:1]
+ vmull.u32 q7,d25,d0[1]
+ vadd.i32 d20,d20,d10
+ vmull.u32 q5,d21,d0[1]
+ vadd.i32 d26,d26,d16
+ vmull.u32 q8,d27,d0[1]
+ vmlal.u32 q7,d23,d1[1]
+ vadd.i32 d22,d22,d12
+ vmull.u32 q6,d23,d0[1]
+
+ vadd.i32 d28,d28,d18
+ vmull.u32 q9,d29,d0[1]
+ subs r2,r2,#64
+ vmlal.u32 q5,d29,d2[1]
+ it lo
+ movlo r4,r5
+ vmlal.u32 q8,d25,d1[1]
+ vld1.32 d8[1],[r7,:32]
+ vmlal.u32 q6,d21,d1[1]
+ vmlal.u32 q9,d27,d1[1]
+
+ vmlal.u32 q5,d27,d4[1]
+ vmlal.u32 q8,d23,d3[1]
+ vmlal.u32 q9,d25,d3[1]
+ vmlal.u32 q6,d29,d4[1]
+ vmlal.u32 q7,d21,d3[1]
+
+ vmlal.u32 q8,d21,d5[1]
+ vmlal.u32 q5,d25,d6[1]
+ vmlal.u32 q9,d23,d5[1]
+ vmlal.u32 q6,d27,d6[1]
+ vmlal.u32 q7,d29,d6[1]
+
+ vmlal.u32 q8,d29,d8[1]
+ vmlal.u32 q5,d23,d8[1]
+ vmlal.u32 q9,d21,d7[1]
+ vmlal.u32 q6,d25,d8[1]
+ vmlal.u32 q7,d27,d8[1]
+
+ vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0)
+ add r4,r4,#64
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ (hash+inp[0:1])*r^4 and accumulate
+
+ vmlal.u32 q8,d26,d0[0]
+ vmlal.u32 q5,d20,d0[0]
+ vmlal.u32 q9,d28,d0[0]
+ vmlal.u32 q6,d22,d0[0]
+ vmlal.u32 q7,d24,d0[0]
+ vld1.32 d8[0],[r6,:32]
+
+ vmlal.u32 q8,d24,d1[0]
+ vmlal.u32 q5,d28,d2[0]
+ vmlal.u32 q9,d26,d1[0]
+ vmlal.u32 q6,d20,d1[0]
+ vmlal.u32 q7,d22,d1[0]
+
+ vmlal.u32 q8,d22,d3[0]
+ vmlal.u32 q5,d26,d4[0]
+ vmlal.u32 q9,d24,d3[0]
+ vmlal.u32 q6,d28,d4[0]
+ vmlal.u32 q7,d20,d3[0]
+
+ vmlal.u32 q8,d20,d5[0]
+ vmlal.u32 q5,d24,d6[0]
+ vmlal.u32 q9,d22,d5[0]
+ vmlal.u32 q6,d26,d6[0]
+ vmlal.u32 q8,d28,d8[0]
+
+ vmlal.u32 q7,d28,d6[0]
+ vmlal.u32 q5,d22,d8[0]
+ vmlal.u32 q9,d20,d7[0]
+ vmov.i32 q14,#1<<24 @ padbit, yes, always
+ vmlal.u32 q6,d24,d8[0]
+ vmlal.u32 q7,d26,d8[0]
+
+ vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
+ add r1,r1,#64
+# ifdef __ARMEB__
+ vrev32.8 q10,q10
+ vrev32.8 q11,q11
+ vrev32.8 q12,q12
+ vrev32.8 q13,q13
+# endif
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ lazy reduction interleaved with base 2^32 -> base 2^26 of
+ @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14.
+
+ vshr.u64 q15,q8,#26
+ vmovn.i64 d16,q8
+ vshr.u64 q4,q5,#26
+ vmovn.i64 d10,q5
+ vadd.i64 q9,q9,q15 @ h3 -> h4
+ vbic.i32 d16,#0xfc000000
+ vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
+ vadd.i64 q6,q6,q4 @ h0 -> h1
+ vshl.u32 q13,q13,#18
+ vbic.i32 d10,#0xfc000000
+
+ vshrn.u64 d30,q9,#26
+ vmovn.i64 d18,q9
+ vshr.u64 q4,q6,#26
+ vmovn.i64 d12,q6
+ vadd.i64 q7,q7,q4 @ h1 -> h2
+ vsri.u32 q13,q12,#14
+ vbic.i32 d18,#0xfc000000
+ vshl.u32 q12,q12,#12
+ vbic.i32 d12,#0xfc000000
+
+ vadd.i32 d10,d10,d30
+ vshl.u32 d30,d30,#2
+ vbic.i32 q13,#0xfc000000
+ vshrn.u64 d8,q7,#26
+ vmovn.i64 d14,q7
+ vaddl.u32 q5,d10,d30 @ h4 -> h0 [widen for a sec]
+ vsri.u32 q12,q11,#20
+ vadd.i32 d16,d16,d8 @ h2 -> h3
+ vshl.u32 q11,q11,#6
+ vbic.i32 d14,#0xfc000000
+ vbic.i32 q12,#0xfc000000
+
+ vshrn.u64 d30,q5,#26 @ re-narrow
+ vmovn.i64 d10,q5
+ vsri.u32 q11,q10,#26
+ vbic.i32 q10,#0xfc000000
+ vshr.u32 d8,d16,#26
+ vbic.i32 d16,#0xfc000000
+ vbic.i32 d10,#0xfc000000
+ vadd.i32 d12,d12,d30 @ h0 -> h1
+ vadd.i32 d18,d18,d8 @ h3 -> h4
+ vbic.i32 q11,#0xfc000000
+
+ bhi .Loop_neon
+
+.Lskip_loop:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
+
+ add r7,r0,#(48+0*9*4)
+ add r6,r0,#(48+1*9*4)
+ adds r2,r2,#32
+ it ne
+ movne r2,#0
+ bne .Long_tail
+
+ vadd.i32 d25,d24,d14 @ add hash value and move to #hi
+ vadd.i32 d21,d20,d10
+ vadd.i32 d27,d26,d16
+ vadd.i32 d23,d22,d12
+ vadd.i32 d29,d28,d18
+
+.Long_tail:
+ vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1
+ vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2
+
+ vadd.i32 d24,d24,d14 @ can be redundant
+ vmull.u32 q7,d25,d0
+ vadd.i32 d20,d20,d10
+ vmull.u32 q5,d21,d0
+ vadd.i32 d26,d26,d16
+ vmull.u32 q8,d27,d0
+ vadd.i32 d22,d22,d12
+ vmull.u32 q6,d23,d0
+ vadd.i32 d28,d28,d18
+ vmull.u32 q9,d29,d0
+
+ vmlal.u32 q5,d29,d2
+ vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vmlal.u32 q8,d25,d1
+ vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vmlal.u32 q6,d21,d1
+ vmlal.u32 q9,d27,d1
+ vmlal.u32 q7,d23,d1
+
+ vmlal.u32 q8,d23,d3
+ vld1.32 d8[1],[r7,:32]
+ vmlal.u32 q5,d27,d4
+ vld1.32 d8[0],[r6,:32]
+ vmlal.u32 q9,d25,d3
+ vmlal.u32 q6,d29,d4
+ vmlal.u32 q7,d21,d3
+
+ vmlal.u32 q8,d21,d5
+ it ne
+ addne r7,r0,#(48+2*9*4)
+ vmlal.u32 q5,d25,d6
+ it ne
+ addne r6,r0,#(48+3*9*4)
+ vmlal.u32 q9,d23,d5
+ vmlal.u32 q6,d27,d6
+ vmlal.u32 q7,d29,d6
+
+ vmlal.u32 q8,d29,d8
+ vorn q0,q0,q0 @ all-ones, can be redundant
+ vmlal.u32 q5,d23,d8
+ vshr.u64 q0,q0,#38
+ vmlal.u32 q9,d21,d7
+ vmlal.u32 q6,d25,d8
+ vmlal.u32 q7,d27,d8
+
+ beq .Lshort_tail
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ (hash+inp[0:1])*r^4:r^3 and accumulate
+
+ vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3
+ vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
+
+ vmlal.u32 q7,d24,d0
+ vmlal.u32 q5,d20,d0
+ vmlal.u32 q8,d26,d0
+ vmlal.u32 q6,d22,d0
+ vmlal.u32 q9,d28,d0
+
+ vmlal.u32 q5,d28,d2
+ vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]!
+ vmlal.u32 q8,d24,d1
+ vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]!
+ vmlal.u32 q6,d20,d1
+ vmlal.u32 q9,d26,d1
+ vmlal.u32 q7,d22,d1
+
+ vmlal.u32 q8,d22,d3
+ vld1.32 d8[1],[r7,:32]
+ vmlal.u32 q5,d26,d4
+ vld1.32 d8[0],[r6,:32]
+ vmlal.u32 q9,d24,d3
+ vmlal.u32 q6,d28,d4
+ vmlal.u32 q7,d20,d3
+
+ vmlal.u32 q8,d20,d5
+ vmlal.u32 q5,d24,d6
+ vmlal.u32 q9,d22,d5
+ vmlal.u32 q6,d26,d6
+ vmlal.u32 q7,d28,d6
+
+ vmlal.u32 q8,d28,d8
+ vorn q0,q0,q0 @ all-ones
+ vmlal.u32 q5,d22,d8
+ vshr.u64 q0,q0,#38
+ vmlal.u32 q9,d20,d7
+ vmlal.u32 q6,d24,d8
+ vmlal.u32 q7,d26,d8
+
+.Lshort_tail:
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ horizontal addition
+
+ vadd.i64 d16,d16,d17
+ vadd.i64 d10,d10,d11
+ vadd.i64 d18,d18,d19
+ vadd.i64 d12,d12,d13
+ vadd.i64 d14,d14,d15
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ lazy reduction, but without narrowing
+
+ vshr.u64 q15,q8,#26
+ vand.i64 q8,q8,q0
+ vshr.u64 q4,q5,#26
+ vand.i64 q5,q5,q0
+ vadd.i64 q9,q9,q15 @ h3 -> h4
+ vadd.i64 q6,q6,q4 @ h0 -> h1
+
+ vshr.u64 q15,q9,#26
+ vand.i64 q9,q9,q0
+ vshr.u64 q4,q6,#26
+ vand.i64 q6,q6,q0
+ vadd.i64 q7,q7,q4 @ h1 -> h2
+
+ vadd.i64 q5,q5,q15
+ vshl.u64 q15,q15,#2
+ vshr.u64 q4,q7,#26
+ vand.i64 q7,q7,q0
+ vadd.i64 q5,q5,q15 @ h4 -> h0
+ vadd.i64 q8,q8,q4 @ h2 -> h3
+
+ vshr.u64 q15,q5,#26
+ vand.i64 q5,q5,q0
+ vshr.u64 q4,q8,#26
+ vand.i64 q8,q8,q0
+ vadd.i64 q6,q6,q15 @ h0 -> h1
+ vadd.i64 q9,q9,q4 @ h3 -> h4
+
+ cmp r2,#0
+ bne .Leven
+
+ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+ @ store hash value
+
+ vst4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]!
+ vst1.32 {d18[0]},[r0]
+
+ vldmia sp!,{d8-d15} @ epilogue
+ ldmia sp!,{r4-r7}
+.Lno_data_neon:
+ bx lr @ bx lr
+.size poly1305_blocks_neon,.-poly1305_blocks_neon
+
+.type poly1305_emit_neon,%function
+.align 5
+poly1305_emit_neon:
+ ldr ip,[r0,#36] @ is_base2_26
+
+ stmdb sp!,{r4-r11}
+
+ tst ip,ip
+ beq .Lpoly1305_emit_enter
+
+ ldmia r0,{r3-r7}
+ eor r8,r8,r8
+
+ adds r3,r3,r4,lsl#26 @ base 2^26 -> base 2^32
+ mov r4,r4,lsr#6
+ adcs r4,r4,r5,lsl#20
+ mov r5,r5,lsr#12
+ adcs r5,r5,r6,lsl#14
+ mov r6,r6,lsr#18
+ adcs r6,r6,r7,lsl#8
+ adc r7,r8,r7,lsr#24 @ can be partially reduced ...
+
+ and r8,r7,#-4 @ ... so reduce
+ and r7,r6,#3
+ add r8,r8,r8,lsr#2 @ *= 5
+ adds r3,r3,r8
+ adcs r4,r4,#0
+ adcs r5,r5,#0
+ adcs r6,r6,#0
+ adc r7,r7,#0
+
+ adds r8,r3,#5 @ compare to modulus
+ adcs r9,r4,#0
+ adcs r10,r5,#0
+ adcs r11,r6,#0
+ adc r7,r7,#0
+ tst r7,#4 @ did it carry/borrow?
+
+ it ne
+ movne r3,r8
+ ldr r8,[r2,#0]
+ it ne
+ movne r4,r9
+ ldr r9,[r2,#4]
+ it ne
+ movne r5,r10
+ ldr r10,[r2,#8]
+ it ne
+ movne r6,r11
+ ldr r11,[r2,#12]
+
+ adds r3,r3,r8 @ accumulate nonce
+ adcs r4,r4,r9
+ adcs r5,r5,r10
+ adc r6,r6,r11
+
+# ifdef __ARMEB__
+ rev r3,r3
+ rev r4,r4
+ rev r5,r5
+ rev r6,r6
+# endif
+ str r3,[r1,#0] @ store the result
+ str r4,[r1,#4]
+ str r5,[r1,#8]
+ str r6,[r1,#12]
+
+ ldmia sp!,{r4-r11}
+ bx lr @ bx lr
+.size poly1305_emit_neon,.-poly1305_emit_neon
+
+.align 5
+.Lzeros:
+.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+.LOPENSSL_armcap:
+.word OPENSSL_armcap_P-.Lpoly1305_init
+#endif
+.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by <appro@openssl.org>"
+.align 2
+#if __ARM_MAX_ARCH__>=7
+.comm OPENSSL_armcap_P,4,4
+#endif
diff --git a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S b/lib/zinc/poly1305/poly1305-arm64-cryptogams.S
new file mode 100644
index 000000000000..0ecb50a83ec0
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm64-cryptogams.S
@@ -0,0 +1,869 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ */
+
+#include "arm_arch.h"
+
+.text
+
+// forward "declarations" are required for Apple
+
+.globl poly1305_blocks
+.globl poly1305_emit
+
+.globl poly1305_init
+.type poly1305_init,%function
+.align 5
+poly1305_init:
+ cmp x1,xzr
+ stp xzr,xzr,[x0] // zero hash value
+ stp xzr,xzr,[x0,#16] // [along with is_base2_26]
+
+ csel x0,xzr,x0,eq
+ b.eq .Lno_key
+
+#ifdef __ILP32__
+ ldrsw x11,.LOPENSSL_armcap_P
+#else
+ ldr x11,.LOPENSSL_armcap_P
+#endif
+ adr x10,.LOPENSSL_armcap_P
+
+ ldp x7,x8,[x1] // load key
+ mov x9,#0xfffffffc0fffffff
+ movk x9,#0x0fff,lsl#48
+ ldr w17,[x10,x11]
+#ifdef __ARMEB__
+ rev x7,x7 // flip bytes
+ rev x8,x8
+#endif
+ and x7,x7,x9 // &=0ffffffc0fffffff
+ and x9,x9,#-4
+ and x8,x8,x9 // &=0ffffffc0ffffffc
+ stp x7,x8,[x0,#32] // save key value
+
+ tst w17,#ARMV7_NEON
+
+ adr x12,poly1305_blocks
+ adr x7,poly1305_blocks_neon
+ adr x13,poly1305_emit
+ adr x8,poly1305_emit_neon
+
+ csel x12,x12,x7,eq
+ csel x13,x13,x8,eq
+
+#ifdef __ILP32__
+ stp w12,w13,[x2]
+#else
+ stp x12,x13,[x2]
+#endif
+
+ mov x0,#1
+.Lno_key:
+ ret
+.size poly1305_init,.-poly1305_init
+
+.type poly1305_blocks,%function
+.align 5
+poly1305_blocks:
+ ands x2,x2,#-16
+ b.eq .Lno_data
+
+ ldp x4,x5,[x0] // load hash value
+ ldp x7,x8,[x0,#32] // load key value
+ ldr x6,[x0,#16]
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+ b .Loop
+
+.align 5
+.Loop:
+ ldp x10,x11,[x1],#16 // load input
+ sub x2,x2,#16
+#ifdef __ARMEB__
+ rev x10,x10
+ rev x11,x11
+#endif
+ adds x4,x4,x10 // accumulate input
+ adcs x5,x5,x11
+
+ mul x12,x4,x7 // h0*r0
+ adc x6,x6,x3
+ umulh x13,x4,x7
+
+ mul x10,x5,x9 // h1*5*r1
+ umulh x11,x5,x9
+
+ adds x12,x12,x10
+ mul x10,x4,x8 // h0*r1
+ adc x13,x13,x11
+ umulh x14,x4,x8
+
+ adds x13,x13,x10
+ mul x10,x5,x7 // h1*r0
+ adc x14,x14,xzr
+ umulh x11,x5,x7
+
+ adds x13,x13,x10
+ mul x10,x6,x9 // h2*5*r1
+ adc x14,x14,x11
+ mul x11,x6,x7 // h2*r0
+
+ adds x13,x13,x10
+ adc x14,x14,x11
+
+ and x10,x14,#-4 // final reduction
+ and x6,x14,#3
+ add x10,x10,x14,lsr#2
+ adds x4,x12,x10
+ adcs x5,x13,xzr
+ adc x6,x6,xzr
+
+ cbnz x2,.Loop
+
+ stp x4,x5,[x0] // store hash value
+ str x6,[x0,#16]
+
+.Lno_data:
+ ret
+.size poly1305_blocks,.-poly1305_blocks
+
+.type poly1305_emit,%function
+.align 5
+poly1305_emit:
+ ldp x4,x5,[x0] // load hash base 2^64
+ ldr x6,[x0,#16]
+ ldp x10,x11,[x2] // load nonce
+
+ adds x12,x4,#5 // compare to modulus
+ adcs x13,x5,xzr
+ adc x14,x6,xzr
+
+ tst x14,#-4 // see if it's carried/borrowed
+
+ csel x4,x4,x12,eq
+ csel x5,x5,x13,eq
+
+#ifdef __ARMEB__
+ ror x10,x10,#32 // flip nonce words
+ ror x11,x11,#32
+#endif
+ adds x4,x4,x10 // accumulate nonce
+ adc x5,x5,x11
+#ifdef __ARMEB__
+ rev x4,x4 // flip output bytes
+ rev x5,x5
+#endif
+ stp x4,x5,[x1] // write result
+
+ ret
+.size poly1305_emit,.-poly1305_emit
+.type poly1305_mult,%function
+.align 5
+poly1305_mult:
+ mul x12,x4,x7 // h0*r0
+ umulh x13,x4,x7
+
+ mul x10,x5,x9 // h1*5*r1
+ umulh x11,x5,x9
+
+ adds x12,x12,x10
+ mul x10,x4,x8 // h0*r1
+ adc x13,x13,x11
+ umulh x14,x4,x8
+
+ adds x13,x13,x10
+ mul x10,x5,x7 // h1*r0
+ adc x14,x14,xzr
+ umulh x11,x5,x7
+
+ adds x13,x13,x10
+ mul x10,x6,x9 // h2*5*r1
+ adc x14,x14,x11
+ mul x11,x6,x7 // h2*r0
+
+ adds x13,x13,x10
+ adc x14,x14,x11
+
+ and x10,x14,#-4 // final reduction
+ and x6,x14,#3
+ add x10,x10,x14,lsr#2
+ adds x4,x12,x10
+ adcs x5,x13,xzr
+ adc x6,x6,xzr
+
+ ret
+.size poly1305_mult,.-poly1305_mult
+
+.type poly1305_splat,%function
+.align 5
+poly1305_splat:
+ and x12,x4,#0x03ffffff // base 2^64 -> base 2^26
+ ubfx x13,x4,#26,#26
+ extr x14,x5,x4,#52
+ and x14,x14,#0x03ffffff
+ ubfx x15,x5,#14,#26
+ extr x16,x6,x5,#40
+
+ str w12,[x0,#16*0] // r0
+ add w12,w13,w13,lsl#2 // r1*5
+ str w13,[x0,#16*1] // r1
+ add w13,w14,w14,lsl#2 // r2*5
+ str w12,[x0,#16*2] // s1
+ str w14,[x0,#16*3] // r2
+ add w14,w15,w15,lsl#2 // r3*5
+ str w13,[x0,#16*4] // s2
+ str w15,[x0,#16*5] // r3
+ add w15,w16,w16,lsl#2 // r4*5
+ str w14,[x0,#16*6] // s3
+ str w16,[x0,#16*7] // r4
+ str w15,[x0,#16*8] // s4
+
+ ret
+.size poly1305_splat,.-poly1305_splat
+
+.type poly1305_blocks_neon,%function
+.align 5
+poly1305_blocks_neon:
+ ldr x17,[x0,#24]
+ cmp x2,#128
+ b.hs .Lblocks_neon
+ cbz x17,poly1305_blocks
+
+.Lblocks_neon:
+ stp x29,x30,[sp,#-80]!
+ add x29,sp,#0
+
+ ands x2,x2,#-16
+ b.eq .Lno_data_neon
+
+ cbz x17,.Lbase2_64_neon
+
+ ldp w10,w11,[x0] // load hash value base 2^26
+ ldp w12,w13,[x0,#8]
+ ldr w14,[x0,#16]
+
+ tst x2,#31
+ b.eq .Leven_neon
+
+ ldp x7,x8,[x0,#32] // load key value
+
+ add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64
+ lsr x5,x12,#12
+ adds x4,x4,x12,lsl#52
+ add x5,x5,x13,lsl#14
+ adc x5,x5,xzr
+ lsr x6,x14,#24
+ adds x5,x5,x14,lsl#40
+ adc x14,x6,xzr // can be partially reduced...
+
+ ldp x12,x13,[x1],#16 // load input
+ sub x2,x2,#16
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+
+ and x10,x14,#-4 // ... so reduce
+ and x6,x14,#3
+ add x10,x10,x14,lsr#2
+ adds x4,x4,x10
+ adcs x5,x5,xzr
+ adc x6,x6,xzr
+
+#ifdef __ARMEB__
+ rev x12,x12
+ rev x13,x13
+#endif
+ adds x4,x4,x12 // accumulate input
+ adcs x5,x5,x13
+ adc x6,x6,x3
+
+ bl poly1305_mult
+ ldr x30,[sp,#8]
+
+ cbz x3,.Lstore_base2_64_neon
+
+ and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
+ ubfx x11,x4,#26,#26
+ extr x12,x5,x4,#52
+ and x12,x12,#0x03ffffff
+ ubfx x13,x5,#14,#26
+ extr x14,x6,x5,#40
+
+ cbnz x2,.Leven_neon
+
+ stp w10,w11,[x0] // store hash value base 2^26
+ stp w12,w13,[x0,#8]
+ str w14,[x0,#16]
+ b .Lno_data_neon
+
+.align 4
+.Lstore_base2_64_neon:
+ stp x4,x5,[x0] // store hash value base 2^64
+ stp x6,xzr,[x0,#16] // note that is_base2_26 is zeroed
+ b .Lno_data_neon
+
+.align 4
+.Lbase2_64_neon:
+ ldp x7,x8,[x0,#32] // load key value
+
+ ldp x4,x5,[x0] // load hash value base 2^64
+ ldr x6,[x0,#16]
+
+ tst x2,#31
+ b.eq .Linit_neon
+
+ ldp x12,x13,[x1],#16 // load input
+ sub x2,x2,#16
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+#ifdef __ARMEB__
+ rev x12,x12
+ rev x13,x13
+#endif
+ adds x4,x4,x12 // accumulate input
+ adcs x5,x5,x13
+ adc x6,x6,x3
+
+ bl poly1305_mult
+
+.Linit_neon:
+ and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
+ ubfx x11,x4,#26,#26
+ extr x12,x5,x4,#52
+ and x12,x12,#0x03ffffff
+ ubfx x13,x5,#14,#26
+ extr x14,x6,x5,#40
+
+ stp d8,d9,[sp,#16] // meet ABI requirements
+ stp d10,d11,[sp,#32]
+ stp d12,d13,[sp,#48]
+ stp d14,d15,[sp,#64]
+
+ fmov d24,x10
+ fmov d25,x11
+ fmov d26,x12
+ fmov d27,x13
+ fmov d28,x14
+
+ ////////////////////////////////// initialize r^n table
+ mov x4,x7 // r^1
+ add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2)
+ mov x5,x8
+ mov x6,xzr
+ add x0,x0,#48+12
+ bl poly1305_splat
+
+ bl poly1305_mult // r^2
+ sub x0,x0,#4
+ bl poly1305_splat
+
+ bl poly1305_mult // r^3
+ sub x0,x0,#4
+ bl poly1305_splat
+
+ bl poly1305_mult // r^4
+ sub x0,x0,#4
+ bl poly1305_splat
+ ldr x30,[sp,#8]
+
+ add x16,x1,#32
+ adr x17,.Lzeros
+ subs x2,x2,#64
+ csel x16,x17,x16,lo
+
+ mov x4,#1
+ str x4,[x0,#-24] // set is_base2_26
+ sub x0,x0,#48 // restore original x0
+ b .Ldo_neon
+
+.align 4
+.Leven_neon:
+ add x16,x1,#32
+ adr x17,.Lzeros
+ subs x2,x2,#64
+ csel x16,x17,x16,lo
+
+ stp d8,d9,[sp,#16] // meet ABI requirements
+ stp d10,d11,[sp,#32]
+ stp d12,d13,[sp,#48]
+ stp d14,d15,[sp,#64]
+
+ fmov d24,x10
+ fmov d25,x11
+ fmov d26,x12
+ fmov d27,x13
+ fmov d28,x14
+
+.Ldo_neon:
+ ldp x8,x12,[x16],#16 // inp[2:3] (or zero)
+ ldp x9,x13,[x16],#48
+
+ lsl x3,x3,#24
+ add x15,x0,#48
+
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ and x5,x9,#0x03ffffff
+ ubfx x6,x8,#26,#26
+ ubfx x7,x9,#26,#26
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+ extr x8,x12,x8,#52
+ extr x9,x13,x9,#52
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ fmov d14,x4
+ and x8,x8,#0x03ffffff
+ and x9,x9,#0x03ffffff
+ ubfx x10,x12,#14,#26
+ ubfx x11,x13,#14,#26
+ add x12,x3,x12,lsr#40
+ add x13,x3,x13,lsr#40
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ fmov d15,x6
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ fmov d16,x8
+ fmov d17,x10
+ fmov d18,x12
+
+ ldp x8,x12,[x1],#16 // inp[0:1]
+ ldp x9,x13,[x1],#48
+
+ ld1 {v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64
+ ld1 {v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64
+ ld1 {v8.4s},[x15]
+
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ and x5,x9,#0x03ffffff
+ ubfx x6,x8,#26,#26
+ ubfx x7,x9,#26,#26
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+ extr x8,x12,x8,#52
+ extr x9,x13,x9,#52
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ fmov d9,x4
+ and x8,x8,#0x03ffffff
+ and x9,x9,#0x03ffffff
+ ubfx x10,x12,#14,#26
+ ubfx x11,x13,#14,#26
+ add x12,x3,x12,lsr#40
+ add x13,x3,x13,lsr#40
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ fmov d10,x6
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ movi v31.2d,#-1
+ fmov d11,x8
+ fmov d12,x10
+ fmov d13,x12
+ ushr v31.2d,v31.2d,#38
+
+ b.ls .Lskip_loop
+
+.align 4
+.Loop_neon:
+ ////////////////////////////////////////////////////////////////
+ // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
+ // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
+ // ___________________/
+ // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
+ // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
+ // ___________________/ ____________________/
+ //
+ // Note that we start with inp[2:3]*r^2. This is because it
+ // doesn't depend on reduction in previous iteration.
+ ////////////////////////////////////////////////////////////////
+ // d4 = h0*r4 + h1*r3 + h2*r2 + h3*r1 + h4*r0
+ // d3 = h0*r3 + h1*r2 + h2*r1 + h3*r0 + h4*5*r4
+ // d2 = h0*r2 + h1*r1 + h2*r0 + h3*5*r4 + h4*5*r3
+ // d1 = h0*r1 + h1*r0 + h2*5*r4 + h3*5*r3 + h4*5*r2
+ // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1
+
+ subs x2,x2,#64
+ umull v23.2d,v14.2s,v7.s[2]
+ csel x16,x17,x16,lo
+ umull v22.2d,v14.2s,v5.s[2]
+ umull v21.2d,v14.2s,v3.s[2]
+ ldp x8,x12,[x16],#16 // inp[2:3] (or zero)
+ umull v20.2d,v14.2s,v1.s[2]
+ ldp x9,x13,[x16],#48
+ umull v19.2d,v14.2s,v0.s[2]
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+
+ umlal v23.2d,v15.2s,v5.s[2]
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ umlal v22.2d,v15.2s,v3.s[2]
+ and x5,x9,#0x03ffffff
+ umlal v21.2d,v15.2s,v1.s[2]
+ ubfx x6,x8,#26,#26
+ umlal v20.2d,v15.2s,v0.s[2]
+ ubfx x7,x9,#26,#26
+ umlal v19.2d,v15.2s,v8.s[2]
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+
+ umlal v23.2d,v16.2s,v3.s[2]
+ extr x8,x12,x8,#52
+ umlal v22.2d,v16.2s,v1.s[2]
+ extr x9,x13,x9,#52
+ umlal v21.2d,v16.2s,v0.s[2]
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ umlal v20.2d,v16.2s,v8.s[2]
+ fmov d14,x4
+ umlal v19.2d,v16.2s,v6.s[2]
+ and x8,x8,#0x03ffffff
+
+ umlal v23.2d,v17.2s,v1.s[2]
+ and x9,x9,#0x03ffffff
+ umlal v22.2d,v17.2s,v0.s[2]
+ ubfx x10,x12,#14,#26
+ umlal v21.2d,v17.2s,v8.s[2]
+ ubfx x11,x13,#14,#26
+ umlal v20.2d,v17.2s,v6.s[2]
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ umlal v19.2d,v17.2s,v4.s[2]
+ fmov d15,x6
+
+ add v11.2s,v11.2s,v26.2s
+ add x12,x3,x12,lsr#40
+ umlal v23.2d,v18.2s,v0.s[2]
+ add x13,x3,x13,lsr#40
+ umlal v22.2d,v18.2s,v8.s[2]
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ umlal v21.2d,v18.2s,v6.s[2]
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ umlal v20.2d,v18.2s,v4.s[2]
+ fmov d16,x8
+ umlal v19.2d,v18.2s,v2.s[2]
+ fmov d17,x10
+
+ ////////////////////////////////////////////////////////////////
+ // (hash+inp[0:1])*r^4 and accumulate
+
+ add v9.2s,v9.2s,v24.2s
+ fmov d18,x12
+ umlal v22.2d,v11.2s,v1.s[0]
+ ldp x8,x12,[x1],#16 // inp[0:1]
+ umlal v19.2d,v11.2s,v6.s[0]
+ ldp x9,x13,[x1],#48
+ umlal v23.2d,v11.2s,v3.s[0]
+ umlal v20.2d,v11.2s,v8.s[0]
+ umlal v21.2d,v11.2s,v0.s[0]
+#ifdef __ARMEB__
+ rev x8,x8
+ rev x12,x12
+ rev x9,x9
+ rev x13,x13
+#endif
+
+ add v10.2s,v10.2s,v25.2s
+ umlal v22.2d,v9.2s,v5.s[0]
+ umlal v23.2d,v9.2s,v7.s[0]
+ and x4,x8,#0x03ffffff // base 2^64 -> base 2^26
+ umlal v21.2d,v9.2s,v3.s[0]
+ and x5,x9,#0x03ffffff
+ umlal v19.2d,v9.2s,v0.s[0]
+ ubfx x6,x8,#26,#26
+ umlal v20.2d,v9.2s,v1.s[0]
+ ubfx x7,x9,#26,#26
+
+ add v12.2s,v12.2s,v27.2s
+ add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32
+ umlal v22.2d,v10.2s,v3.s[0]
+ extr x8,x12,x8,#52
+ umlal v23.2d,v10.2s,v5.s[0]
+ extr x9,x13,x9,#52
+ umlal v19.2d,v10.2s,v8.s[0]
+ add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32
+ umlal v21.2d,v10.2s,v1.s[0]
+ fmov d9,x4
+ umlal v20.2d,v10.2s,v0.s[0]
+ and x8,x8,#0x03ffffff
+
+ add v13.2s,v13.2s,v28.2s
+ and x9,x9,#0x03ffffff
+ umlal v22.2d,v12.2s,v0.s[0]
+ ubfx x10,x12,#14,#26
+ umlal v19.2d,v12.2s,v4.s[0]
+ ubfx x11,x13,#14,#26
+ umlal v23.2d,v12.2s,v1.s[0]
+ add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32
+ umlal v20.2d,v12.2s,v6.s[0]
+ fmov d10,x6
+ umlal v21.2d,v12.2s,v8.s[0]
+ add x12,x3,x12,lsr#40
+
+ umlal v22.2d,v13.2s,v8.s[0]
+ add x13,x3,x13,lsr#40
+ umlal v19.2d,v13.2s,v2.s[0]
+ add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32
+ umlal v23.2d,v13.2s,v0.s[0]
+ add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32
+ umlal v20.2d,v13.2s,v4.s[0]
+ fmov d11,x8
+ umlal v21.2d,v13.2s,v6.s[0]
+ fmov d12,x10
+ fmov d13,x12
+
+ /////////////////////////////////////////////////////////////////
+ // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
+ // and P. Schwabe
+ //
+ // [see discussion in poly1305-armv4 module]
+
+ ushr v29.2d,v22.2d,#26
+ xtn v27.2s,v22.2d
+ ushr v30.2d,v19.2d,#26
+ and v19.16b,v19.16b,v31.16b
+ add v23.2d,v23.2d,v29.2d // h3 -> h4
+ bic v27.2s,#0xfc,lsl#24 // &=0x03ffffff
+ add v20.2d,v20.2d,v30.2d // h0 -> h1
+
+ ushr v29.2d,v23.2d,#26
+ xtn v28.2s,v23.2d
+ ushr v30.2d,v20.2d,#26
+ xtn v25.2s,v20.2d
+ bic v28.2s,#0xfc,lsl#24
+ add v21.2d,v21.2d,v30.2d // h1 -> h2
+
+ add v19.2d,v19.2d,v29.2d
+ shl v29.2d,v29.2d,#2
+ shrn v30.2s,v21.2d,#26
+ xtn v26.2s,v21.2d
+ add v19.2d,v19.2d,v29.2d // h4 -> h0
+ bic v25.2s,#0xfc,lsl#24
+ add v27.2s,v27.2s,v30.2s // h2 -> h3
+ bic v26.2s,#0xfc,lsl#24
+
+ shrn v29.2s,v19.2d,#26
+ xtn v24.2s,v19.2d
+ ushr v30.2s,v27.2s,#26
+ bic v27.2s,#0xfc,lsl#24
+ bic v24.2s,#0xfc,lsl#24
+ add v25.2s,v25.2s,v29.2s // h0 -> h1
+ add v28.2s,v28.2s,v30.2s // h3 -> h4
+
+ b.hi .Loop_neon
+
+.Lskip_loop:
+ dup v16.2d,v16.d[0]
+ add v11.2s,v11.2s,v26.2s
+
+ ////////////////////////////////////////////////////////////////
+ // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
+
+ adds x2,x2,#32
+ b.ne .Long_tail
+
+ dup v16.2d,v11.d[0]
+ add v14.2s,v9.2s,v24.2s
+ add v17.2s,v12.2s,v27.2s
+ add v15.2s,v10.2s,v25.2s
+ add v18.2s,v13.2s,v28.2s
+
+.Long_tail:
+ dup v14.2d,v14.d[0]
+ umull2 v19.2d,v16.4s,v6.4s
+ umull2 v22.2d,v16.4s,v1.4s
+ umull2 v23.2d,v16.4s,v3.4s
+ umull2 v21.2d,v16.4s,v0.4s
+ umull2 v20.2d,v16.4s,v8.4s
+
+ dup v15.2d,v15.d[0]
+ umlal2 v19.2d,v14.4s,v0.4s
+ umlal2 v21.2d,v14.4s,v3.4s
+ umlal2 v22.2d,v14.4s,v5.4s
+ umlal2 v23.2d,v14.4s,v7.4s
+ umlal2 v20.2d,v14.4s,v1.4s
+
+ dup v17.2d,v17.d[0]
+ umlal2 v19.2d,v15.4s,v8.4s
+ umlal2 v22.2d,v15.4s,v3.4s
+ umlal2 v21.2d,v15.4s,v1.4s
+ umlal2 v23.2d,v15.4s,v5.4s
+ umlal2 v20.2d,v15.4s,v0.4s
+
+ dup v18.2d,v18.d[0]
+ umlal2 v22.2d,v17.4s,v0.4s
+ umlal2 v23.2d,v17.4s,v1.4s
+ umlal2 v19.2d,v17.4s,v4.4s
+ umlal2 v20.2d,v17.4s,v6.4s
+ umlal2 v21.2d,v17.4s,v8.4s
+
+ umlal2 v22.2d,v18.4s,v8.4s
+ umlal2 v19.2d,v18.4s,v2.4s
+ umlal2 v23.2d,v18.4s,v0.4s
+ umlal2 v20.2d,v18.4s,v4.4s
+ umlal2 v21.2d,v18.4s,v6.4s
+
+ b.eq .Lshort_tail
+
+ ////////////////////////////////////////////////////////////////
+ // (hash+inp[0:1])*r^4:r^3 and accumulate
+
+ add v9.2s,v9.2s,v24.2s
+ umlal v22.2d,v11.2s,v1.2s
+ umlal v19.2d,v11.2s,v6.2s
+ umlal v23.2d,v11.2s,v3.2s
+ umlal v20.2d,v11.2s,v8.2s
+ umlal v21.2d,v11.2s,v0.2s
+
+ add v10.2s,v10.2s,v25.2s
+ umlal v22.2d,v9.2s,v5.2s
+ umlal v19.2d,v9.2s,v0.2s
+ umlal v23.2d,v9.2s,v7.2s
+ umlal v20.2d,v9.2s,v1.2s
+ umlal v21.2d,v9.2s,v3.2s
+
+ add v12.2s,v12.2s,v27.2s
+ umlal v22.2d,v10.2s,v3.2s
+ umlal v19.2d,v10.2s,v8.2s
+ umlal v23.2d,v10.2s,v5.2s
+ umlal v20.2d,v10.2s,v0.2s
+ umlal v21.2d,v10.2s,v1.2s
+
+ add v13.2s,v13.2s,v28.2s
+ umlal v22.2d,v12.2s,v0.2s
+ umlal v19.2d,v12.2s,v4.2s
+ umlal v23.2d,v12.2s,v1.2s
+ umlal v20.2d,v12.2s,v6.2s
+ umlal v21.2d,v12.2s,v8.2s
+
+ umlal v22.2d,v13.2s,v8.2s
+ umlal v19.2d,v13.2s,v2.2s
+ umlal v23.2d,v13.2s,v0.2s
+ umlal v20.2d,v13.2s,v4.2s
+ umlal v21.2d,v13.2s,v6.2s
+
+.Lshort_tail:
+ ////////////////////////////////////////////////////////////////
+ // horizontal add
+
+ addp v22.2d,v22.2d,v22.2d
+ ldp d8,d9,[sp,#16] // meet ABI requirements
+ addp v19.2d,v19.2d,v19.2d
+ ldp d10,d11,[sp,#32]
+ addp v23.2d,v23.2d,v23.2d
+ ldp d12,d13,[sp,#48]
+ addp v20.2d,v20.2d,v20.2d
+ ldp d14,d15,[sp,#64]
+ addp v21.2d,v21.2d,v21.2d
+
+ ////////////////////////////////////////////////////////////////
+ // lazy reduction, but without narrowing
+
+ ushr v29.2d,v22.2d,#26
+ and v22.16b,v22.16b,v31.16b
+ ushr v30.2d,v19.2d,#26
+ and v19.16b,v19.16b,v31.16b
+
+ add v23.2d,v23.2d,v29.2d // h3 -> h4
+ add v20.2d,v20.2d,v30.2d // h0 -> h1
+
+ ushr v29.2d,v23.2d,#26
+ and v23.16b,v23.16b,v31.16b
+ ushr v30.2d,v20.2d,#26
+ and v20.16b,v20.16b,v31.16b
+ add v21.2d,v21.2d,v30.2d // h1 -> h2
+
+ add v19.2d,v19.2d,v29.2d
+ shl v29.2d,v29.2d,#2
+ ushr v30.2d,v21.2d,#26
+ and v21.16b,v21.16b,v31.16b
+ add v19.2d,v19.2d,v29.2d // h4 -> h0
+ add v22.2d,v22.2d,v30.2d // h2 -> h3
+
+ ushr v29.2d,v19.2d,#26
+ and v19.16b,v19.16b,v31.16b
+ ushr v30.2d,v22.2d,#26
+ and v22.16b,v22.16b,v31.16b
+ add v20.2d,v20.2d,v29.2d // h0 -> h1
+ add v23.2d,v23.2d,v30.2d // h3 -> h4
+
+ ////////////////////////////////////////////////////////////////
+ // write the result, can be partially reduced
+
+ st4 {v19.s,v20.s,v21.s,v22.s}[0],[x0],#16
+ st1 {v23.s}[0],[x0]
+
+.Lno_data_neon:
+ ldr x29,[sp],#80
+ ret
+.size poly1305_blocks_neon,.-poly1305_blocks_neon
+
+.type poly1305_emit_neon,%function
+.align 5
+poly1305_emit_neon:
+ ldr x17,[x0,#24]
+ cbz x17,poly1305_emit
+
+ ldp w10,w11,[x0] // load hash value base 2^26
+ ldp w12,w13,[x0,#8]
+ ldr w14,[x0,#16]
+
+ add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64
+ lsr x5,x12,#12
+ adds x4,x4,x12,lsl#52
+ add x5,x5,x13,lsl#14
+ adc x5,x5,xzr
+ lsr x6,x14,#24
+ adds x5,x5,x14,lsl#40
+ adc x6,x6,xzr // can be partially reduced...
+
+ ldp x10,x11,[x2] // load nonce
+
+ and x12,x6,#-4 // ... so reduce
+ add x12,x12,x6,lsr#2
+ and x6,x6,#3
+ adds x4,x4,x12
+ adcs x5,x5,xzr
+ adc x6,x6,xzr
+
+ adds x12,x4,#5 // compare to modulus
+ adcs x13,x5,xzr
+ adc x14,x6,xzr
+
+ tst x14,#-4 // see if it's carried/borrowed
+
+ csel x4,x4,x12,eq
+ csel x5,x5,x13,eq
+
+#ifdef __ARMEB__
+ ror x10,x10,#32 // flip nonce words
+ ror x11,x11,#32
+#endif
+ adds x4,x4,x10 // accumulate nonce
+ adc x5,x5,x11
+#ifdef __ARMEB__
+ rev x4,x4 // flip output bytes
+ rev x5,x5
+#endif
+ stp x4,x5,[x1] // write result
+
+ ret
+.size poly1305_emit_neon,.-poly1305_emit_neon
+
+.align 5
+.Lzeros:
+.long 0,0,0,0,0,0,0,0
+.LOPENSSL_armcap_P:
+#ifdef __ILP32__
+.long OPENSSL_armcap_P-.
+#else
+.quad OPENSSL_armcap_P-.
+#endif
+.byte 80,111,108,121,49,51,48,53,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+.align 2
+.align 2
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 12/23] zinc: Poly1305 ARM and ARM64 implementations
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Jason A. Donenfeld
` (22 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson, Russell King, linux-arm-kernel
These wire Andy Polyakov's implementations up to the kernel. We make a
few small changes to the assembly:
- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed to fit kernel conventions.
- The neon code can jump to the scalar code when it makes sense to do
so.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
lib/zinc/Makefile | 2 +
lib/zinc/poly1305/poly1305-arm-glue.h | 119 ++++++++++++++
...ly1305-arm-cryptogams.S => poly1305-arm.S} | 147 ++++++------------
...05-arm64-cryptogams.S => poly1305-arm64.S} | 103 ++++--------
lib/zinc/poly1305/poly1305.c | 2 +
5 files changed, 198 insertions(+), 175 deletions(-)
create mode 100644 lib/zinc/poly1305/poly1305-arm-glue.h
rename lib/zinc/poly1305/{poly1305-arm-cryptogams.S => poly1305-arm.S} (91%)
rename lib/zinc/poly1305/{poly1305-arm64-cryptogams.S => poly1305-arm64.S} (90%)
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index a8943d960b6a..c09fd3de60f9 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -12,4 +12,6 @@ obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
zinc_poly1305-y := poly1305/poly1305.o
zinc_poly1305-$(CONFIG_ZINC_ARCH_X86_64) += poly1305/poly1305-x86_64.o
+zinc_poly1305-$(CONFIG_ZINC_ARCH_ARM) += poly1305/poly1305-arm.o
+zinc_poly1305-$(CONFIG_ZINC_ARCH_ARM64) += poly1305/poly1305-arm64.o
obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o
diff --git a/lib/zinc/poly1305/poly1305-arm-glue.h b/lib/zinc/poly1305/poly1305-arm-glue.h
new file mode 100644
index 000000000000..ddeb58a2b547
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm-glue.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+
+asmlinkage void poly1305_init_arm(void *ctx, const u8 key[16]);
+asmlinkage void poly1305_blocks_arm(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+asmlinkage void poly1305_emit_arm(void *ctx, u8 mac[16], const u32 nonce[4]);
+#if defined(CONFIG_KERNEL_MODE_NEON)
+asmlinkage void poly1305_blocks_neon(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+asmlinkage void poly1305_emit_neon(void *ctx, u8 mac[16], const u32 nonce[4]);
+#endif
+
+static bool poly1305_use_neon __ro_after_init;
+
+static void __init poly1305_fpu_init(void)
+{
+#if defined(CONFIG_ARM64)
+ poly1305_use_neon = elf_hwcap & HWCAP_ASIMD;
+#elif defined(CONFIG_ARM)
+ poly1305_use_neon = elf_hwcap & HWCAP_NEON;
+#endif
+}
+
+#if defined(CONFIG_ARM64)
+struct poly1305_arch_internal {
+ union {
+ u32 h[5];
+ struct {
+ u64 h0, h1, h2;
+ };
+ };
+ u32 is_base2_26;
+ u64 r[2];
+};
+#elif defined(CONFIG_ARM)
+struct poly1305_arch_internal {
+ union {
+ u32 h[5];
+ struct {
+ u64 h0, h1;
+ u32 h2;
+ } __packed;
+ };
+ u32 r[4];
+ u32 is_base2_26;
+};
+#endif
+
+#if defined(CONFIG_KERNEL_MODE_NEON)
+static void convert_to_base2_64(void *ctx)
+{
+ struct poly1305_arch_internal *state = ctx;
+ u32 cy;
+
+ if (!state->is_base2_26)
+ return;
+
+ cy = state->h[0] >> 26; state->h[0] &= 0x3ffffff; state->h[1] += cy;
+ cy = state->h[1] >> 26; state->h[1] &= 0x3ffffff; state->h[2] += cy;
+ cy = state->h[2] >> 26; state->h[2] &= 0x3ffffff; state->h[3] += cy;
+ cy = state->h[3] >> 26; state->h[3] &= 0x3ffffff; state->h[4] += cy;
+ state->h0 = ((u64)state->h[2] << 52) | ((u64)state->h[1] << 26) | state->h[0];
+ state->h1 = ((u64)state->h[4] << 40) | ((u64)state->h[3] << 14) | (state->h[2] >> 12);
+ state->h2 = state->h[4] >> 24;
+#define ULT(a, b) ((a ^ ((a ^ b) | ((a - b) ^ b))) >> (sizeof(a) * 8 - 1))
+ cy = (state->h2 >> 2) + (state->h2 & ~3ULL);
+ state->h2 &= 3;
+ state->h0 += cy;
+ state->h1 += (cy = ULT(state->h0, cy));
+ state->h2 += ULT(state->h1, cy);
+#undef ULT
+ state->is_base2_26 = 0;
+}
+#endif
+
+static inline bool poly1305_init_arch(void *ctx,
+ const u8 key[POLY1305_KEY_SIZE])
+{
+ poly1305_init_arm(ctx, key);
+ return true;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+ const size_t len, const u32 padbit,
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_KERNEL_MODE_NEON)
+ if (poly1305_use_neon && simd_use(simd_context)) {
+ poly1305_blocks_neon(ctx, inp, len, padbit);
+ return true;
+ }
+ convert_to_base2_64(ctx);
+#endif
+
+ poly1305_blocks_arm(ctx, inp, len, padbit);
+ return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4],
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_KERNEL_MODE_NEON)
+ if (poly1305_use_neon && simd_use(simd_context)) {
+ poly1305_emit_neon(ctx, mac, nonce);
+ return true;
+ }
+ convert_to_base2_64(ctx);
+#endif
+
+ poly1305_emit_arm(ctx, mac, nonce);
+ return true;
+}
diff --git a/lib/zinc/poly1305/poly1305-arm-cryptogams.S b/lib/zinc/poly1305/poly1305-arm.S
similarity index 91%
rename from lib/zinc/poly1305/poly1305-arm-cryptogams.S
rename to lib/zinc/poly1305/poly1305-arm.S
index 884b465030e4..4a0e9d451119 100644
--- a/lib/zinc/poly1305/poly1305-arm-cryptogams.S
+++ b/lib/zinc/poly1305/poly1305-arm.S
@@ -1,9 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
+#include <linux/linkage.h>
.text
#if defined(__thumb2__)
@@ -13,13 +16,8 @@
.code 32
#endif
-.globl poly1305_emit
-.globl poly1305_blocks
-.globl poly1305_init
-.type poly1305_init,%function
.align 5
-poly1305_init:
-.Lpoly1305_init:
+ENTRY(poly1305_init_arm)
stmdb sp!,{r4-r11}
eor r3,r3,r3
@@ -38,10 +36,6 @@ poly1305_init:
moveq r0,#0
beq .Lno_key
-#if __ARM_MAX_ARCH__>=7
- adr r11,.Lpoly1305_init
- ldr r12,.LOPENSSL_armcap
-#endif
ldrb r4,[r1,#0]
mov r10,#0x0fffffff
ldrb r5,[r1,#1]
@@ -56,12 +50,6 @@ poly1305_init:
ldrb r7,[r1,#6]
and r4,r4,r10
-#if __ARM_MAX_ARCH__>=7
- ldr r12,[r11,r12] @ OPENSSL_armcap_P
-# ifdef __APPLE__
- ldr r12,[r12]
-# endif
-#endif
ldrb r8,[r1,#7]
orr r5,r5,r6,lsl#8
ldrb r6,[r1,#8]
@@ -71,35 +59,6 @@ poly1305_init:
ldrb r8,[r1,#10]
and r5,r5,r3
-#if __ARM_MAX_ARCH__>=7
- tst r12,#ARMV7_NEON @ check for NEON
-# ifdef __APPLE__
- adr r9,poly1305_blocks_neon
- adr r11,poly1305_blocks
-# ifdef __thumb2__
- it ne
-# endif
- movne r11,r9
- adr r12,poly1305_emit
- adr r10,poly1305_emit_neon
-# ifdef __thumb2__
- it ne
-# endif
- movne r12,r10
-# else
-# ifdef __thumb2__
- itete eq
-# endif
- addeq r12,r11,#(poly1305_emit-.Lpoly1305_init)
- addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
- addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init)
- addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
-# endif
-# ifdef __thumb2__
- orr r12,r12,#1 @ thumb-ify address
- orr r11,r11,#1
-# endif
-#endif
ldrb r9,[r1,#11]
orr r6,r6,r7,lsl#8
ldrb r7,[r1,#12]
@@ -118,26 +77,20 @@ poly1305_init:
str r6,[r0,#8]
and r7,r7,r3
str r7,[r0,#12]
-#if __ARM_MAX_ARCH__>=7
- stmia r2,{r11,r12} @ fill functions table
- mov r0,#1
-#else
- mov r0,#0
-#endif
.Lno_key:
ldmia sp!,{r4-r11}
-#if __ARM_ARCH__>=5
+#if __LINUX_ARM_ARCH__ >= 5
bx lr @ bx lr
#else
tst lr,#1
moveq pc,lr @ be binary compatible with V4, yet
.word 0xe12fff1e @ interoperable with Thumb ISA:-)
#endif
-.size poly1305_init,.-poly1305_init
-.type poly1305_blocks,%function
+ENDPROC(poly1305_init_arm)
+
.align 5
-poly1305_blocks:
-.Lpoly1305_blocks:
+ENTRY(poly1305_blocks_arm)
+.Lpoly1305_blocks_arm:
stmdb sp!,{r3-r11,lr}
ands r2,r2,#-16
@@ -158,11 +111,11 @@ poly1305_blocks:
b .Loop
.Loop:
-#if __ARM_ARCH__<7
+#if __LINUX_ARM_ARCH__ < 7
ldrb r0,[lr],#16 @ load input
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
addhi r8,r8,#1 @ 1<<128
ldrb r1,[lr,#-15]
ldrb r2,[lr,#-14]
@@ -201,19 +154,19 @@ poly1305_blocks:
orr r3,r2,r3,lsl#24
#else
ldr r0,[lr],#16 @ load input
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
addhi r8,r8,#1 @ padbit
ldr r1,[lr,#-12]
ldr r2,[lr,#-8]
ldr r3,[lr,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
+#endif
adds r4,r4,r0 @ accumulate input
str lr,[sp,#8] @ offload input pointer
adcs r5,r5,r1
@@ -283,7 +236,7 @@ poly1305_blocks:
stmia r0,{r4-r8} @ store the result
.Lno_data:
-#if __ARM_ARCH__>=5
+#if __LINUX_ARM_ARCH__ >= 5
ldmia sp!,{r3-r11,pc}
#else
ldmia sp!,{r3-r11,lr}
@@ -291,13 +244,12 @@ poly1305_blocks:
moveq pc,lr @ be binary compatible with V4, yet
.word 0xe12fff1e @ interoperable with Thumb ISA:-)
#endif
-.size poly1305_blocks,.-poly1305_blocks
-.type poly1305_emit,%function
+ENDPROC(poly1305_blocks_arm)
+
.align 5
-poly1305_emit:
+ENTRY(poly1305_emit_arm)
stmdb sp!,{r4-r11}
.Lpoly1305_emit_enter:
-
ldmia r0,{r3-r7}
adds r8,r3,#5 @ compare to modulus
adcs r9,r4,#0
@@ -332,13 +284,13 @@ poly1305_emit:
adcs r5,r5,r10
adc r6,r6,r11
-#if __ARM_ARCH__>=7
-# ifdef __ARMEB__
+#if __LINUX_ARM_ARCH__ >= 7
+#ifdef __ARMEB__
rev r3,r3
rev r4,r4
rev r5,r5
rev r6,r6
-# endif
+#endif
str r3,[r1,#0]
str r4,[r1,#4]
str r5,[r1,#8]
@@ -377,20 +329,22 @@ poly1305_emit:
strb r6,[r1,#15]
#endif
ldmia sp!,{r4-r11}
-#if __ARM_ARCH__>=5
+#if __LINUX_ARM_ARCH__ >= 5
bx lr @ bx lr
#else
tst lr,#1
moveq pc,lr @ be binary compatible with V4, yet
.word 0xe12fff1e @ interoperable with Thumb ISA:-)
#endif
-.size poly1305_emit,.-poly1305_emit
-#if __ARM_MAX_ARCH__>=7
+ENDPROC(poly1305_emit_arm)
+
+
+#ifdef CONFIG_KERNEL_MODE_NEON
.fpu neon
-.type poly1305_init_neon,%function
.align 5
-poly1305_init_neon:
+ENTRY(poly1305_init_neon)
+.Lpoly1305_init_neon:
ldr r4,[r0,#20] @ load key base 2^32
ldr r5,[r0,#24]
ldr r6,[r0,#28]
@@ -600,11 +554,10 @@ poly1305_init_neon:
vst1.32 {d8[1]},[r7]
bx lr @ bx lr
-.size poly1305_init_neon,.-poly1305_init_neon
+ENDPROC(poly1305_init_neon)
-.type poly1305_blocks_neon,%function
.align 5
-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
ldr ip,[r0,#36] @ is_base2_26
ands r2,r2,#-16
beq .Lno_data_neon
@@ -612,7 +565,7 @@ poly1305_blocks_neon:
cmp r2,#64
bhs .Lenter_neon
tst ip,ip @ is_base2_26?
- beq .Lpoly1305_blocks
+ beq .Lpoly1305_blocks_arm
.Lenter_neon:
stmdb sp!,{r4-r7}
@@ -622,7 +575,7 @@ poly1305_blocks_neon:
bne .Lbase2_26_neon
stmdb sp!,{r1-r3,lr}
- bl poly1305_init_neon
+ bl .Lpoly1305_init_neon
ldr r4,[r0,#0] @ load hash value base 2^32
ldr r5,[r0,#4]
@@ -686,12 +639,12 @@ poly1305_blocks_neon:
sub r2,r2,#16
add r4,r1,#32
-# ifdef __ARMEB__
+#ifdef __ARMEB__
vrev32.8 q10,q10
vrev32.8 q13,q13
vrev32.8 q11,q11
vrev32.8 q12,q12
-# endif
+#endif
vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26
vshl.u32 d26,d26,#18
@@ -735,12 +688,12 @@ poly1305_blocks_neon:
addhi r7,r0,#(48+1*9*4)
addhi r6,r0,#(48+3*9*4)
-# ifdef __ARMEB__
+#ifdef __ARMEB__
vrev32.8 q10,q10
vrev32.8 q13,q13
vrev32.8 q11,q11
vrev32.8 q12,q12
-# endif
+#endif
vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
vshl.u32 q13,q13,#18
@@ -866,12 +819,12 @@ poly1305_blocks_neon:
vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
add r1,r1,#64
-# ifdef __ARMEB__
+#ifdef __ARMEB__
vrev32.8 q10,q10
vrev32.8 q11,q11
vrev32.8 q12,q12
vrev32.8 q13,q13
-# endif
+#endif
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ lazy reduction interleaved with base 2^32 -> base 2^26 of
@@ -1086,11 +1039,10 @@ poly1305_blocks_neon:
ldmia sp!,{r4-r7}
.Lno_data_neon:
bx lr @ bx lr
-.size poly1305_blocks_neon,.-poly1305_blocks_neon
+ENDPROC(poly1305_blocks_neon)
-.type poly1305_emit_neon,%function
.align 5
-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
ldr ip,[r0,#36] @ is_base2_26
stmdb sp!,{r4-r11}
@@ -1144,12 +1096,12 @@ poly1305_emit_neon:
adcs r5,r5,r10
adc r6,r6,r11
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r3,r3
rev r4,r4
rev r5,r5
rev r6,r6
-# endif
+#endif
str r3,[r1,#0] @ store the result
str r4,[r1,#4]
str r5,[r1,#8]
@@ -1157,16 +1109,9 @@ poly1305_emit_neon:
ldmia sp!,{r4-r11}
bx lr @ bx lr
-.size poly1305_emit_neon,.-poly1305_emit_neon
+ENDPROC(poly1305_emit_neon)
.align 5
.Lzeros:
.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
-.LOPENSSL_armcap:
-.word OPENSSL_armcap_P-.Lpoly1305_init
-#endif
-.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by <appro@openssl.org>"
-.align 2
-#if __ARM_MAX_ARCH__>=7
-.comm OPENSSL_armcap_P,4,4
#endif
diff --git a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S b/lib/zinc/poly1305/poly1305-arm64.S
similarity index 90%
rename from lib/zinc/poly1305/poly1305-arm64-cryptogams.S
rename to lib/zinc/poly1305/poly1305-arm64.S
index 0ecb50a83ec0..84a654479cac 100644
--- a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S
+++ b/lib/zinc/poly1305/poly1305-arm64.S
@@ -1,21 +1,16 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
-
+#include <linux/linkage.h>
.text
-// forward "declarations" are required for Apple
-
-.globl poly1305_blocks
-.globl poly1305_emit
-
-.globl poly1305_init
-.type poly1305_init,%function
.align 5
-poly1305_init:
+ENTRY(poly1305_init_arm)
cmp x1,xzr
stp xzr,xzr,[x0] // zero hash value
stp xzr,xzr,[x0,#16] // [along with is_base2_26]
@@ -23,17 +18,9 @@ poly1305_init:
csel x0,xzr,x0,eq
b.eq .Lno_key
-#ifdef __ILP32__
- ldrsw x11,.LOPENSSL_armcap_P
-#else
- ldr x11,.LOPENSSL_armcap_P
-#endif
- adr x10,.LOPENSSL_armcap_P
-
ldp x7,x8,[x1] // load key
mov x9,#0xfffffffc0fffffff
movk x9,#0x0fff,lsl#48
- ldr w17,[x10,x11]
#ifdef __ARMEB__
rev x7,x7 // flip bytes
rev x8,x8
@@ -43,30 +30,12 @@ poly1305_init:
and x8,x8,x9 // &=0ffffffc0ffffffc
stp x7,x8,[x0,#32] // save key value
- tst w17,#ARMV7_NEON
-
- adr x12,poly1305_blocks
- adr x7,poly1305_blocks_neon
- adr x13,poly1305_emit
- adr x8,poly1305_emit_neon
-
- csel x12,x12,x7,eq
- csel x13,x13,x8,eq
-
-#ifdef __ILP32__
- stp w12,w13,[x2]
-#else
- stp x12,x13,[x2]
-#endif
-
- mov x0,#1
.Lno_key:
ret
-.size poly1305_init,.-poly1305_init
+ENDPROC(poly1305_init_arm)
-.type poly1305_blocks,%function
.align 5
-poly1305_blocks:
+ENTRY(poly1305_blocks_arm)
ands x2,x2,#-16
b.eq .Lno_data
@@ -126,11 +95,10 @@ poly1305_blocks:
.Lno_data:
ret
-.size poly1305_blocks,.-poly1305_blocks
+ENDPROC(poly1305_blocks_arm)
-.type poly1305_emit,%function
.align 5
-poly1305_emit:
+ENTRY(poly1305_emit_arm)
ldp x4,x5,[x0] // load hash base 2^64
ldr x6,[x0,#16]
ldp x10,x11,[x2] // load nonce
@@ -157,10 +125,10 @@ poly1305_emit:
stp x4,x5,[x1] // write result
ret
-.size poly1305_emit,.-poly1305_emit
-.type poly1305_mult,%function
+ENDPROC(poly1305_emit_arm)
+
.align 5
-poly1305_mult:
+__poly1305_mult:
mul x12,x4,x7 // h0*r0
umulh x13,x4,x7
@@ -193,11 +161,8 @@ poly1305_mult:
adc x6,x6,xzr
ret
-.size poly1305_mult,.-poly1305_mult
-.type poly1305_splat,%function
-.align 5
-poly1305_splat:
+__poly1305_splat:
and x12,x4,#0x03ffffff // base 2^64 -> base 2^26
ubfx x13,x4,#26,#26
extr x14,x5,x4,#52
@@ -220,15 +185,14 @@ poly1305_splat:
str w15,[x0,#16*8] // s4
ret
-.size poly1305_splat,.-poly1305_splat
-.type poly1305_blocks_neon,%function
+#ifdef CONFIG_KERNEL_MODE_NEON
.align 5
-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
ldr x17,[x0,#24]
cmp x2,#128
b.hs .Lblocks_neon
- cbz x17,poly1305_blocks
+ cbz x17,poly1305_blocks_arm
.Lblocks_neon:
stp x29,x30,[sp,#-80]!
@@ -276,7 +240,7 @@ poly1305_blocks_neon:
adcs x5,x5,x13
adc x6,x6,x3
- bl poly1305_mult
+ bl __poly1305_mult
ldr x30,[sp,#8]
cbz x3,.Lstore_base2_64_neon
@@ -322,7 +286,7 @@ poly1305_blocks_neon:
adcs x5,x5,x13
adc x6,x6,x3
- bl poly1305_mult
+ bl __poly1305_mult
.Linit_neon:
and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
@@ -349,19 +313,19 @@ poly1305_blocks_neon:
mov x5,x8
mov x6,xzr
add x0,x0,#48+12
- bl poly1305_splat
+ bl __poly1305_splat
- bl poly1305_mult // r^2
+ bl __poly1305_mult // r^2
sub x0,x0,#4
- bl poly1305_splat
+ bl __poly1305_splat
- bl poly1305_mult // r^3
+ bl __poly1305_mult // r^3
sub x0,x0,#4
- bl poly1305_splat
+ bl __poly1305_splat
- bl poly1305_mult // r^4
+ bl __poly1305_mult // r^4
sub x0,x0,#4
- bl poly1305_splat
+ bl __poly1305_splat
ldr x30,[sp,#8]
add x16,x1,#32
@@ -801,13 +765,12 @@ poly1305_blocks_neon:
.Lno_data_neon:
ldr x29,[sp],#80
ret
-.size poly1305_blocks_neon,.-poly1305_blocks_neon
+ENDPROC(poly1305_blocks_neon)
-.type poly1305_emit_neon,%function
.align 5
-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
ldr x17,[x0,#24]
- cbz x17,poly1305_emit
+ cbz x17,poly1305_emit_arm
ldp w10,w11,[x0] // load hash value base 2^26
ldp w12,w13,[x0,#8]
@@ -853,17 +816,9 @@ poly1305_emit_neon:
stp x4,x5,[x1] // write result
ret
-.size poly1305_emit_neon,.-poly1305_emit_neon
+ENDPROC(poly1305_emit_neon)
.align 5
.Lzeros:
.long 0,0,0,0,0,0,0,0
-.LOPENSSL_armcap_P:
-#ifdef __ILP32__
-.long OPENSSL_armcap_P-.
-#else
-.quad OPENSSL_armcap_P-.
#endif
-.byte 80,111,108,121,49,51,48,53,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
-.align 2
-.align 2
diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c
index 2ae1b3cb66cd..647aa3354d38 100644
--- a/lib/zinc/poly1305/poly1305.c
+++ b/lib/zinc/poly1305/poly1305.c
@@ -17,6 +17,8 @@
#if defined(CONFIG_ZINC_ARCH_X86_64)
#include "poly1305-x86_64-glue.h"
+#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64)
+#include "poly1305-arm-glue.h"
#else
static inline bool poly1305_init_arch(void *ctx,
const u8 key[POLY1305_KEY_SIZE])
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 12/23] zinc: Poly1305 ARM and ARM64 implementations
@ 2018-09-25 14:56 ` Jason A. Donenfeld
0 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-arm-kernel
These wire Andy Polyakov's implementations up to the kernel. We make a
few small changes to the assembly:
- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed to fit kernel conventions.
- The neon code can jump to the scalar code when it makes sense to do
so.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel at lists.infradead.org
---
lib/zinc/Makefile | 2 +
lib/zinc/poly1305/poly1305-arm-glue.h | 119 ++++++++++++++
...ly1305-arm-cryptogams.S => poly1305-arm.S} | 147 ++++++------------
...05-arm64-cryptogams.S => poly1305-arm64.S} | 103 ++++--------
lib/zinc/poly1305/poly1305.c | 2 +
5 files changed, 198 insertions(+), 175 deletions(-)
create mode 100644 lib/zinc/poly1305/poly1305-arm-glue.h
rename lib/zinc/poly1305/{poly1305-arm-cryptogams.S => poly1305-arm.S} (91%)
rename lib/zinc/poly1305/{poly1305-arm64-cryptogams.S => poly1305-arm64.S} (90%)
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index a8943d960b6a..c09fd3de60f9 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -12,4 +12,6 @@ obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o
zinc_poly1305-y := poly1305/poly1305.o
zinc_poly1305-$(CONFIG_ZINC_ARCH_X86_64) += poly1305/poly1305-x86_64.o
+zinc_poly1305-$(CONFIG_ZINC_ARCH_ARM) += poly1305/poly1305-arm.o
+zinc_poly1305-$(CONFIG_ZINC_ARCH_ARM64) += poly1305/poly1305-arm64.o
obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o
diff --git a/lib/zinc/poly1305/poly1305-arm-glue.h b/lib/zinc/poly1305/poly1305-arm-glue.h
new file mode 100644
index 000000000000..ddeb58a2b547
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm-glue.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+
+asmlinkage void poly1305_init_arm(void *ctx, const u8 key[16]);
+asmlinkage void poly1305_blocks_arm(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+asmlinkage void poly1305_emit_arm(void *ctx, u8 mac[16], const u32 nonce[4]);
+#if defined(CONFIG_KERNEL_MODE_NEON)
+asmlinkage void poly1305_blocks_neon(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+asmlinkage void poly1305_emit_neon(void *ctx, u8 mac[16], const u32 nonce[4]);
+#endif
+
+static bool poly1305_use_neon __ro_after_init;
+
+static void __init poly1305_fpu_init(void)
+{
+#if defined(CONFIG_ARM64)
+ poly1305_use_neon = elf_hwcap & HWCAP_ASIMD;
+#elif defined(CONFIG_ARM)
+ poly1305_use_neon = elf_hwcap & HWCAP_NEON;
+#endif
+}
+
+#if defined(CONFIG_ARM64)
+struct poly1305_arch_internal {
+ union {
+ u32 h[5];
+ struct {
+ u64 h0, h1, h2;
+ };
+ };
+ u32 is_base2_26;
+ u64 r[2];
+};
+#elif defined(CONFIG_ARM)
+struct poly1305_arch_internal {
+ union {
+ u32 h[5];
+ struct {
+ u64 h0, h1;
+ u32 h2;
+ } __packed;
+ };
+ u32 r[4];
+ u32 is_base2_26;
+};
+#endif
+
+#if defined(CONFIG_KERNEL_MODE_NEON)
+static void convert_to_base2_64(void *ctx)
+{
+ struct poly1305_arch_internal *state = ctx;
+ u32 cy;
+
+ if (!state->is_base2_26)
+ return;
+
+ cy = state->h[0] >> 26; state->h[0] &= 0x3ffffff; state->h[1] += cy;
+ cy = state->h[1] >> 26; state->h[1] &= 0x3ffffff; state->h[2] += cy;
+ cy = state->h[2] >> 26; state->h[2] &= 0x3ffffff; state->h[3] += cy;
+ cy = state->h[3] >> 26; state->h[3] &= 0x3ffffff; state->h[4] += cy;
+ state->h0 = ((u64)state->h[2] << 52) | ((u64)state->h[1] << 26) | state->h[0];
+ state->h1 = ((u64)state->h[4] << 40) | ((u64)state->h[3] << 14) | (state->h[2] >> 12);
+ state->h2 = state->h[4] >> 24;
+#define ULT(a, b) ((a ^ ((a ^ b) | ((a - b) ^ b))) >> (sizeof(a) * 8 - 1))
+ cy = (state->h2 >> 2) + (state->h2 & ~3ULL);
+ state->h2 &= 3;
+ state->h0 += cy;
+ state->h1 += (cy = ULT(state->h0, cy));
+ state->h2 += ULT(state->h1, cy);
+#undef ULT
+ state->is_base2_26 = 0;
+}
+#endif
+
+static inline bool poly1305_init_arch(void *ctx,
+ const u8 key[POLY1305_KEY_SIZE])
+{
+ poly1305_init_arm(ctx, key);
+ return true;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+ const size_t len, const u32 padbit,
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_KERNEL_MODE_NEON)
+ if (poly1305_use_neon && simd_use(simd_context)) {
+ poly1305_blocks_neon(ctx, inp, len, padbit);
+ return true;
+ }
+ convert_to_base2_64(ctx);
+#endif
+
+ poly1305_blocks_arm(ctx, inp, len, padbit);
+ return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4],
+ simd_context_t *simd_context)
+{
+#if defined(CONFIG_KERNEL_MODE_NEON)
+ if (poly1305_use_neon && simd_use(simd_context)) {
+ poly1305_emit_neon(ctx, mac, nonce);
+ return true;
+ }
+ convert_to_base2_64(ctx);
+#endif
+
+ poly1305_emit_arm(ctx, mac, nonce);
+ return true;
+}
diff --git a/lib/zinc/poly1305/poly1305-arm-cryptogams.S b/lib/zinc/poly1305/poly1305-arm.S
similarity index 91%
rename from lib/zinc/poly1305/poly1305-arm-cryptogams.S
rename to lib/zinc/poly1305/poly1305-arm.S
index 884b465030e4..4a0e9d451119 100644
--- a/lib/zinc/poly1305/poly1305-arm-cryptogams.S
+++ b/lib/zinc/poly1305/poly1305-arm.S
@@ -1,9 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
+#include <linux/linkage.h>
.text
#if defined(__thumb2__)
@@ -13,13 +16,8 @@
.code 32
#endif
-.globl poly1305_emit
-.globl poly1305_blocks
-.globl poly1305_init
-.type poly1305_init,%function
.align 5
-poly1305_init:
-.Lpoly1305_init:
+ENTRY(poly1305_init_arm)
stmdb sp!,{r4-r11}
eor r3,r3,r3
@@ -38,10 +36,6 @@ poly1305_init:
moveq r0,#0
beq .Lno_key
-#if __ARM_MAX_ARCH__>=7
- adr r11,.Lpoly1305_init
- ldr r12,.LOPENSSL_armcap
-#endif
ldrb r4,[r1,#0]
mov r10,#0x0fffffff
ldrb r5,[r1,#1]
@@ -56,12 +50,6 @@ poly1305_init:
ldrb r7,[r1,#6]
and r4,r4,r10
-#if __ARM_MAX_ARCH__>=7
- ldr r12,[r11,r12] @ OPENSSL_armcap_P
-# ifdef __APPLE__
- ldr r12,[r12]
-# endif
-#endif
ldrb r8,[r1,#7]
orr r5,r5,r6,lsl#8
ldrb r6,[r1,#8]
@@ -71,35 +59,6 @@ poly1305_init:
ldrb r8,[r1,#10]
and r5,r5,r3
-#if __ARM_MAX_ARCH__>=7
- tst r12,#ARMV7_NEON @ check for NEON
-# ifdef __APPLE__
- adr r9,poly1305_blocks_neon
- adr r11,poly1305_blocks
-# ifdef __thumb2__
- it ne
-# endif
- movne r11,r9
- adr r12,poly1305_emit
- adr r10,poly1305_emit_neon
-# ifdef __thumb2__
- it ne
-# endif
- movne r12,r10
-# else
-# ifdef __thumb2__
- itete eq
-# endif
- addeq r12,r11,#(poly1305_emit-.Lpoly1305_init)
- addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
- addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init)
- addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
-# endif
-# ifdef __thumb2__
- orr r12,r12,#1 @ thumb-ify address
- orr r11,r11,#1
-# endif
-#endif
ldrb r9,[r1,#11]
orr r6,r6,r7,lsl#8
ldrb r7,[r1,#12]
@@ -118,26 +77,20 @@ poly1305_init:
str r6,[r0,#8]
and r7,r7,r3
str r7,[r0,#12]
-#if __ARM_MAX_ARCH__>=7
- stmia r2,{r11,r12} @ fill functions table
- mov r0,#1
-#else
- mov r0,#0
-#endif
.Lno_key:
ldmia sp!,{r4-r11}
-#if __ARM_ARCH__>=5
+#if __LINUX_ARM_ARCH__ >= 5
bx lr @ bx lr
#else
tst lr,#1
moveq pc,lr @ be binary compatible with V4, yet
.word 0xe12fff1e @ interoperable with Thumb ISA:-)
#endif
-.size poly1305_init,.-poly1305_init
-.type poly1305_blocks,%function
+ENDPROC(poly1305_init_arm)
+
.align 5
-poly1305_blocks:
-.Lpoly1305_blocks:
+ENTRY(poly1305_blocks_arm)
+.Lpoly1305_blocks_arm:
stmdb sp!,{r3-r11,lr}
ands r2,r2,#-16
@@ -158,11 +111,11 @@ poly1305_blocks:
b .Loop
.Loop:
-#if __ARM_ARCH__<7
+#if __LINUX_ARM_ARCH__ < 7
ldrb r0,[lr],#16 @ load input
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
addhi r8,r8,#1 @ 1<<128
ldrb r1,[lr,#-15]
ldrb r2,[lr,#-14]
@@ -201,19 +154,19 @@ poly1305_blocks:
orr r3,r2,r3,lsl#24
#else
ldr r0,[lr],#16 @ load input
-# ifdef __thumb2__
+#ifdef __thumb2__
it hi
-# endif
+#endif
addhi r8,r8,#1 @ padbit
ldr r1,[lr,#-12]
ldr r2,[lr,#-8]
ldr r3,[lr,#-4]
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r0,r0
rev r1,r1
rev r2,r2
rev r3,r3
-# endif
+#endif
adds r4,r4,r0 @ accumulate input
str lr,[sp,#8] @ offload input pointer
adcs r5,r5,r1
@@ -283,7 +236,7 @@ poly1305_blocks:
stmia r0,{r4-r8} @ store the result
.Lno_data:
-#if __ARM_ARCH__>=5
+#if __LINUX_ARM_ARCH__ >= 5
ldmia sp!,{r3-r11,pc}
#else
ldmia sp!,{r3-r11,lr}
@@ -291,13 +244,12 @@ poly1305_blocks:
moveq pc,lr @ be binary compatible with V4, yet
.word 0xe12fff1e @ interoperable with Thumb ISA:-)
#endif
-.size poly1305_blocks,.-poly1305_blocks
-.type poly1305_emit,%function
+ENDPROC(poly1305_blocks_arm)
+
.align 5
-poly1305_emit:
+ENTRY(poly1305_emit_arm)
stmdb sp!,{r4-r11}
.Lpoly1305_emit_enter:
-
ldmia r0,{r3-r7}
adds r8,r3,#5 @ compare to modulus
adcs r9,r4,#0
@@ -332,13 +284,13 @@ poly1305_emit:
adcs r5,r5,r10
adc r6,r6,r11
-#if __ARM_ARCH__>=7
-# ifdef __ARMEB__
+#if __LINUX_ARM_ARCH__ >= 7
+#ifdef __ARMEB__
rev r3,r3
rev r4,r4
rev r5,r5
rev r6,r6
-# endif
+#endif
str r3,[r1,#0]
str r4,[r1,#4]
str r5,[r1,#8]
@@ -377,20 +329,22 @@ poly1305_emit:
strb r6,[r1,#15]
#endif
ldmia sp!,{r4-r11}
-#if __ARM_ARCH__>=5
+#if __LINUX_ARM_ARCH__ >= 5
bx lr @ bx lr
#else
tst lr,#1
moveq pc,lr @ be binary compatible with V4, yet
.word 0xe12fff1e @ interoperable with Thumb ISA:-)
#endif
-.size poly1305_emit,.-poly1305_emit
-#if __ARM_MAX_ARCH__>=7
+ENDPROC(poly1305_emit_arm)
+
+
+#ifdef CONFIG_KERNEL_MODE_NEON
.fpu neon
-.type poly1305_init_neon,%function
.align 5
-poly1305_init_neon:
+ENTRY(poly1305_init_neon)
+.Lpoly1305_init_neon:
ldr r4,[r0,#20] @ load key base 2^32
ldr r5,[r0,#24]
ldr r6,[r0,#28]
@@ -600,11 +554,10 @@ poly1305_init_neon:
vst1.32 {d8[1]},[r7]
bx lr @ bx lr
-.size poly1305_init_neon,.-poly1305_init_neon
+ENDPROC(poly1305_init_neon)
-.type poly1305_blocks_neon,%function
.align 5
-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
ldr ip,[r0,#36] @ is_base2_26
ands r2,r2,#-16
beq .Lno_data_neon
@@ -612,7 +565,7 @@ poly1305_blocks_neon:
cmp r2,#64
bhs .Lenter_neon
tst ip,ip @ is_base2_26?
- beq .Lpoly1305_blocks
+ beq .Lpoly1305_blocks_arm
.Lenter_neon:
stmdb sp!,{r4-r7}
@@ -622,7 +575,7 @@ poly1305_blocks_neon:
bne .Lbase2_26_neon
stmdb sp!,{r1-r3,lr}
- bl poly1305_init_neon
+ bl .Lpoly1305_init_neon
ldr r4,[r0,#0] @ load hash value base 2^32
ldr r5,[r0,#4]
@@ -686,12 +639,12 @@ poly1305_blocks_neon:
sub r2,r2,#16
add r4,r1,#32
-# ifdef __ARMEB__
+#ifdef __ARMEB__
vrev32.8 q10,q10
vrev32.8 q13,q13
vrev32.8 q11,q11
vrev32.8 q12,q12
-# endif
+#endif
vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26
vshl.u32 d26,d26,#18
@@ -735,12 +688,12 @@ poly1305_blocks_neon:
addhi r7,r0,#(48+1*9*4)
addhi r6,r0,#(48+3*9*4)
-# ifdef __ARMEB__
+#ifdef __ARMEB__
vrev32.8 q10,q10
vrev32.8 q13,q13
vrev32.8 q11,q11
vrev32.8 q12,q12
-# endif
+#endif
vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
vshl.u32 q13,q13,#18
@@ -866,12 +819,12 @@ poly1305_blocks_neon:
vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1]
add r1,r1,#64
-# ifdef __ARMEB__
+#ifdef __ARMEB__
vrev32.8 q10,q10
vrev32.8 q11,q11
vrev32.8 q12,q12
vrev32.8 q13,q13
-# endif
+#endif
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ lazy reduction interleaved with base 2^32 -> base 2^26 of
@@ -1086,11 +1039,10 @@ poly1305_blocks_neon:
ldmia sp!,{r4-r7}
.Lno_data_neon:
bx lr @ bx lr
-.size poly1305_blocks_neon,.-poly1305_blocks_neon
+ENDPROC(poly1305_blocks_neon)
-.type poly1305_emit_neon,%function
.align 5
-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
ldr ip,[r0,#36] @ is_base2_26
stmdb sp!,{r4-r11}
@@ -1144,12 +1096,12 @@ poly1305_emit_neon:
adcs r5,r5,r10
adc r6,r6,r11
-# ifdef __ARMEB__
+#ifdef __ARMEB__
rev r3,r3
rev r4,r4
rev r5,r5
rev r6,r6
-# endif
+#endif
str r3,[r1,#0] @ store the result
str r4,[r1,#4]
str r5,[r1,#8]
@@ -1157,16 +1109,9 @@ poly1305_emit_neon:
ldmia sp!,{r4-r11}
bx lr @ bx lr
-.size poly1305_emit_neon,.-poly1305_emit_neon
+ENDPROC(poly1305_emit_neon)
.align 5
.Lzeros:
.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
-.LOPENSSL_armcap:
-.word OPENSSL_armcap_P-.Lpoly1305_init
-#endif
-.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by <appro@openssl.org>"
-.align 2
-#if __ARM_MAX_ARCH__>=7
-.comm OPENSSL_armcap_P,4,4
#endif
diff --git a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S b/lib/zinc/poly1305/poly1305-arm64.S
similarity index 90%
rename from lib/zinc/poly1305/poly1305-arm64-cryptogams.S
rename to lib/zinc/poly1305/poly1305-arm64.S
index 0ecb50a83ec0..84a654479cac 100644
--- a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S
+++ b/lib/zinc/poly1305/poly1305-arm64.S
@@ -1,21 +1,16 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
*/
-#include "arm_arch.h"
-
+#include <linux/linkage.h>
.text
-// forward "declarations" are required for Apple
-
-.globl poly1305_blocks
-.globl poly1305_emit
-
-.globl poly1305_init
-.type poly1305_init,%function
.align 5
-poly1305_init:
+ENTRY(poly1305_init_arm)
cmp x1,xzr
stp xzr,xzr,[x0] // zero hash value
stp xzr,xzr,[x0,#16] // [along with is_base2_26]
@@ -23,17 +18,9 @@ poly1305_init:
csel x0,xzr,x0,eq
b.eq .Lno_key
-#ifdef __ILP32__
- ldrsw x11,.LOPENSSL_armcap_P
-#else
- ldr x11,.LOPENSSL_armcap_P
-#endif
- adr x10,.LOPENSSL_armcap_P
-
ldp x7,x8,[x1] // load key
mov x9,#0xfffffffc0fffffff
movk x9,#0x0fff,lsl#48
- ldr w17,[x10,x11]
#ifdef __ARMEB__
rev x7,x7 // flip bytes
rev x8,x8
@@ -43,30 +30,12 @@ poly1305_init:
and x8,x8,x9 // &=0ffffffc0ffffffc
stp x7,x8,[x0,#32] // save key value
- tst w17,#ARMV7_NEON
-
- adr x12,poly1305_blocks
- adr x7,poly1305_blocks_neon
- adr x13,poly1305_emit
- adr x8,poly1305_emit_neon
-
- csel x12,x12,x7,eq
- csel x13,x13,x8,eq
-
-#ifdef __ILP32__
- stp w12,w13,[x2]
-#else
- stp x12,x13,[x2]
-#endif
-
- mov x0,#1
.Lno_key:
ret
-.size poly1305_init,.-poly1305_init
+ENDPROC(poly1305_init_arm)
-.type poly1305_blocks,%function
.align 5
-poly1305_blocks:
+ENTRY(poly1305_blocks_arm)
ands x2,x2,#-16
b.eq .Lno_data
@@ -126,11 +95,10 @@ poly1305_blocks:
.Lno_data:
ret
-.size poly1305_blocks,.-poly1305_blocks
+ENDPROC(poly1305_blocks_arm)
-.type poly1305_emit,%function
.align 5
-poly1305_emit:
+ENTRY(poly1305_emit_arm)
ldp x4,x5,[x0] // load hash base 2^64
ldr x6,[x0,#16]
ldp x10,x11,[x2] // load nonce
@@ -157,10 +125,10 @@ poly1305_emit:
stp x4,x5,[x1] // write result
ret
-.size poly1305_emit,.-poly1305_emit
-.type poly1305_mult,%function
+ENDPROC(poly1305_emit_arm)
+
.align 5
-poly1305_mult:
+__poly1305_mult:
mul x12,x4,x7 // h0*r0
umulh x13,x4,x7
@@ -193,11 +161,8 @@ poly1305_mult:
adc x6,x6,xzr
ret
-.size poly1305_mult,.-poly1305_mult
-.type poly1305_splat,%function
-.align 5
-poly1305_splat:
+__poly1305_splat:
and x12,x4,#0x03ffffff // base 2^64 -> base 2^26
ubfx x13,x4,#26,#26
extr x14,x5,x4,#52
@@ -220,15 +185,14 @@ poly1305_splat:
str w15,[x0,#16*8] // s4
ret
-.size poly1305_splat,.-poly1305_splat
-.type poly1305_blocks_neon,%function
+#ifdef CONFIG_KERNEL_MODE_NEON
.align 5
-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
ldr x17,[x0,#24]
cmp x2,#128
b.hs .Lblocks_neon
- cbz x17,poly1305_blocks
+ cbz x17,poly1305_blocks_arm
.Lblocks_neon:
stp x29,x30,[sp,#-80]!
@@ -276,7 +240,7 @@ poly1305_blocks_neon:
adcs x5,x5,x13
adc x6,x6,x3
- bl poly1305_mult
+ bl __poly1305_mult
ldr x30,[sp,#8]
cbz x3,.Lstore_base2_64_neon
@@ -322,7 +286,7 @@ poly1305_blocks_neon:
adcs x5,x5,x13
adc x6,x6,x3
- bl poly1305_mult
+ bl __poly1305_mult
.Linit_neon:
and x10,x4,#0x03ffffff // base 2^64 -> base 2^26
@@ -349,19 +313,19 @@ poly1305_blocks_neon:
mov x5,x8
mov x6,xzr
add x0,x0,#48+12
- bl poly1305_splat
+ bl __poly1305_splat
- bl poly1305_mult // r^2
+ bl __poly1305_mult // r^2
sub x0,x0,#4
- bl poly1305_splat
+ bl __poly1305_splat
- bl poly1305_mult // r^3
+ bl __poly1305_mult // r^3
sub x0,x0,#4
- bl poly1305_splat
+ bl __poly1305_splat
- bl poly1305_mult // r^4
+ bl __poly1305_mult // r^4
sub x0,x0,#4
- bl poly1305_splat
+ bl __poly1305_splat
ldr x30,[sp,#8]
add x16,x1,#32
@@ -801,13 +765,12 @@ poly1305_blocks_neon:
.Lno_data_neon:
ldr x29,[sp],#80
ret
-.size poly1305_blocks_neon,.-poly1305_blocks_neon
+ENDPROC(poly1305_blocks_neon)
-.type poly1305_emit_neon,%function
.align 5
-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
ldr x17,[x0,#24]
- cbz x17,poly1305_emit
+ cbz x17,poly1305_emit_arm
ldp w10,w11,[x0] // load hash value base 2^26
ldp w12,w13,[x0,#8]
@@ -853,17 +816,9 @@ poly1305_emit_neon:
stp x4,x5,[x1] // write result
ret
-.size poly1305_emit_neon,.-poly1305_emit_neon
+ENDPROC(poly1305_emit_neon)
.align 5
.Lzeros:
.long 0,0,0,0,0,0,0,0
-.LOPENSSL_armcap_P:
-#ifdef __ILP32__
-.long OPENSSL_armcap_P-.
-#else
-.quad OPENSSL_armcap_P-.
#endif
-.byte 80,111,108,121,49,51,48,53,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
-.align 2
-.align 2
diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c
index 2ae1b3cb66cd..647aa3354d38 100644
--- a/lib/zinc/poly1305/poly1305.c
+++ b/lib/zinc/poly1305/poly1305.c
@@ -17,6 +17,8 @@
#if defined(CONFIG_ZINC_ARCH_X86_64)
#include "poly1305-x86_64-glue.h"
+#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64)
+#include "poly1305-arm-glue.h"
#else
static inline bool poly1305_init_arch(void *ctx,
const u8 key[POLY1305_KEY_SIZE])
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 13/23] zinc: Poly1305 MIPS32r2 and MIPS64 implementations
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
` (11 preceding siblings ...)
2018-09-25 14:56 ` Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 14/23] zinc: ChaCha20Poly1305 construction and selftest Jason A. Donenfeld
` (10 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, René van Dorst, Samuel Neves,
Andy Lutomirski, Jean-Philippe Aumasson, Andy Polyakov,
Ralf Baechle, Paul Burton, James Hogan, linux-mips
This MIPS32r2 implementation comes from René van Dorst and me and
results in a nice speedup on the usual OpenWRT targets. The MIPS64
implementation comes from Andy Polyakov results in a nice speedup on
commodity Octeon hardware, and has been modified slightly from the
original:
- The function names have been renamed to fit kernel conventions.
- A comment has been added.
No changes have been made to the actual instructions.
While MIPS64 is CRYPTOGAMS code, the originating code for this happens to
be the same as OpenSSL's commit 947716c1872d210828122212d076d503ae68b928
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Co-developed-by: René van Dorst <opensource@vdorst.com>
Based-on-code-from: Andy Polyakov <appro@openssl.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
---
lib/zinc/Makefile | 3 +
lib/zinc/poly1305/poly1305-mips-glue.h | 35 +++
lib/zinc/poly1305/poly1305-mips.S | 407 +++++++++++++++++++++++++
lib/zinc/poly1305/poly1305-mips64.S | 355 +++++++++++++++++++++
lib/zinc/poly1305/poly1305.c | 2 +
5 files changed, 802 insertions(+)
create mode 100644 lib/zinc/poly1305/poly1305-mips-glue.h
create mode 100644 lib/zinc/poly1305/poly1305-mips.S
create mode 100644 lib/zinc/poly1305/poly1305-mips64.S
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index c09fd3de60f9..5c4b1d51cb03 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -14,4 +14,7 @@ zinc_poly1305-y := poly1305/poly1305.o
zinc_poly1305-$(CONFIG_ZINC_ARCH_X86_64) += poly1305/poly1305-x86_64.o
zinc_poly1305-$(CONFIG_ZINC_ARCH_ARM) += poly1305/poly1305-arm.o
zinc_poly1305-$(CONFIG_ZINC_ARCH_ARM64) += poly1305/poly1305-arm64.o
+zinc_poly1305-$(CONFIG_ZINC_ARCH_MIPS) += poly1305/poly1305-mips.o
+AFLAGS_poly1305-mips.o += -O2 # This is required to fill the branch delay slots
+zinc_poly1305-$(CONFIG_ZINC_ARCH_MIPS64) += poly1305/poly1305-mips64.o
obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o
diff --git a/lib/zinc/poly1305/poly1305-mips-glue.h b/lib/zinc/poly1305/poly1305-mips-glue.h
new file mode 100644
index 000000000000..eb38cd5a75ae
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-mips-glue.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+asmlinkage void poly1305_init_mips(void *ctx, const u8 key[16]);
+asmlinkage void poly1305_blocks_mips(void *ctx, const u8 *inp, const size_t len,
+ const u32 padbit);
+asmlinkage void poly1305_emit_mips(void *ctx, u8 mac[16], const u32 nonce[4]);
+static void __init poly1305_fpu_init(void)
+{
+}
+
+static inline bool poly1305_init_arch(void *ctx,
+ const u8 key[POLY1305_KEY_SIZE])
+{
+ poly1305_init_mips(ctx, key);
+ return true;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+ const size_t len, const u32 padbit,
+ simd_context_t *simd_context)
+{
+ poly1305_blocks_mips(ctx, inp, len, padbit);
+ return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+ const u32 nonce[4],
+ simd_context_t *simd_context)
+{
+ poly1305_emit_mips(ctx, mac, nonce);
+ return true;
+}
diff --git a/lib/zinc/poly1305/poly1305-mips.S b/lib/zinc/poly1305/poly1305-mips.S
new file mode 100644
index 000000000000..4d695eef1091
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-mips.S
@@ -0,0 +1,407 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2016-2018 René van Dorst <opensource@vdorst.com> All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define MSB 0
+#define LSB 3
+#else
+#define MSB 3
+#define LSB 0
+#endif
+
+#define POLY1305_BLOCK_SIZE 16
+.text
+#define H0 $t0
+#define H1 $t1
+#define H2 $t2
+#define H3 $t3
+#define H4 $t4
+
+#define R0 $t5
+#define R1 $t6
+#define R2 $t7
+#define R3 $t8
+
+#define O0 $s0
+#define O1 $s4
+#define O2 $v1
+#define O3 $t9
+#define O4 $s5
+
+#define S1 $s1
+#define S2 $s2
+#define S3 $s3
+
+#define SC $at
+#define CA $v0
+
+/* Input arguments */
+#define poly $a0
+#define src $a1
+#define srclen $a2
+#define hibit $a3
+
+/* Location in the opaque buffer
+ * R[0..3], CA, H[0..4]
+ */
+#define PTR_POLY1305_R(n) ( 0 + (n*4)) ## ($a0)
+#define PTR_POLY1305_CA (16 ) ## ($a0)
+#define PTR_POLY1305_H(n) (20 + (n*4)) ## ($a0)
+
+#define POLY1305_BLOCK_SIZE 16
+#define POLY1305_STACK_SIZE 32
+
+.set noat
+.align 4
+.globl poly1305_blocks_mips
+.ent poly1305_blocks_mips
+poly1305_blocks_mips:
+ .frame $sp, POLY1305_STACK_SIZE, $ra
+ /* srclen &= 0xFFFFFFF0 */
+ ins srclen, $zero, 0, 4
+
+ addiu $sp, -(POLY1305_STACK_SIZE)
+
+ /* check srclen >= 16 bytes */
+ beqz srclen, .Lpoly1305_blocks_mips_end
+
+ /* Calculate last round based on src address pointer.
+ * last round src ptr (srclen) = src + (srclen & 0xFFFFFFF0)
+ */
+ addu srclen, src
+
+ lw R0, PTR_POLY1305_R(0)
+ lw R1, PTR_POLY1305_R(1)
+ lw R2, PTR_POLY1305_R(2)
+ lw R3, PTR_POLY1305_R(3)
+
+ /* store the used save registers. */
+ sw $s0, 0($sp)
+ sw $s1, 4($sp)
+ sw $s2, 8($sp)
+ sw $s3, 12($sp)
+ sw $s4, 16($sp)
+ sw $s5, 20($sp)
+
+ /* load Hx and Carry */
+ lw CA, PTR_POLY1305_CA
+ lw H0, PTR_POLY1305_H(0)
+ lw H1, PTR_POLY1305_H(1)
+ lw H2, PTR_POLY1305_H(2)
+ lw H3, PTR_POLY1305_H(3)
+ lw H4, PTR_POLY1305_H(4)
+
+ /* Sx = Rx + (Rx >> 2) */
+ srl S1, R1, 2
+ srl S2, R2, 2
+ srl S3, R3, 2
+ addu S1, R1
+ addu S2, R2
+ addu S3, R3
+
+ addiu SC, $zero, 1
+
+.Lpoly1305_loop:
+ lwl O0, 0+MSB(src)
+ lwl O1, 4+MSB(src)
+ lwl O2, 8+MSB(src)
+ lwl O3,12+MSB(src)
+ lwr O0, 0+LSB(src)
+ lwr O1, 4+LSB(src)
+ lwr O2, 8+LSB(src)
+ lwr O3,12+LSB(src)
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ wsbh O0
+ wsbh O1
+ wsbh O2
+ wsbh O3
+ rotr O0, 16
+ rotr O1, 16
+ rotr O2, 16
+ rotr O3, 16
+#endif
+
+ /* h0 = (u32)(d0 = (u64)h0 + inp[0] + c 'Carry_previous cycle'); */
+ addu H0, CA
+ sltu CA, H0, CA
+ addu O0, H0
+ sltu H0, O0, H0
+ addu CA, H0
+
+ /* h1 = (u32)(d1 = (u64)h1 + (d0 >> 32) + inp[4]); */
+ addu H1, CA
+ sltu CA, H1, CA
+ addu O1, H1
+ sltu H1, O1, H1
+ addu CA, H1
+
+ /* h2 = (u32)(d2 = (u64)h2 + (d1 >> 32) + inp[8]); */
+ addu H2, CA
+ sltu CA, H2, CA
+ addu O2, H2
+ sltu H2, O2, H2
+ addu CA, H2
+
+ /* h3 = (u32)(d3 = (u64)h3 + (d2 >> 32) + inp[12]); */
+ addu H3, CA
+ sltu CA, H3, CA
+ addu O3, H3
+ sltu H3, O3, H3
+ addu CA, H3
+
+ /* h4 += (u32)(d3 >> 32) + padbit; */
+ addu H4, hibit
+ addu O4, H4, CA
+
+ /* D0 */
+ multu O0, R0
+ maddu O1, S3
+ maddu O2, S2
+ maddu O3, S1
+ mfhi CA
+ mflo H0
+
+ /* D1 */
+ multu O0, R1
+ maddu O1, R0
+ maddu O2, S3
+ maddu O3, S2
+ maddu O4, S1
+ maddu CA, SC
+ mfhi CA
+ mflo H1
+
+ /* D2 */
+ multu O0, R2
+ maddu O1, R1
+ maddu O2, R0
+ maddu O3, S3
+ maddu O4, S2
+ maddu CA, SC
+ mfhi CA
+ mflo H2
+
+ /* D4 */
+ mul H4, O4, R0
+
+ /* D3 */
+ multu O0, R3
+ maddu O1, R2
+ maddu O2, R1
+ maddu O3, R0
+ maddu O4, S3
+ maddu CA, SC
+ mfhi CA
+ mflo H3
+
+ addiu src, POLY1305_BLOCK_SIZE
+
+ /* h4 += (u32)(d3 >> 32); */
+ addu O4, H4, CA
+ /* h4 &= 3 */
+ andi H4, O4, 3
+ /* c = (h4 >> 2) + (h4 & ~3U); */
+ srl CA, O4, 2
+ ins O4, $zero, 0, 2
+
+ addu CA, O4
+
+ /* able to do a 16 byte block. */
+ bne src, srclen, .Lpoly1305_loop
+
+ /* restore the used save registers. */
+ lw $s0, 0($sp)
+ lw $s1, 4($sp)
+ lw $s2, 8($sp)
+ lw $s3, 12($sp)
+ lw $s4, 16($sp)
+ lw $s5, 20($sp)
+
+ /* store Hx and Carry */
+ sw CA, PTR_POLY1305_CA
+ sw H0, PTR_POLY1305_H(0)
+ sw H1, PTR_POLY1305_H(1)
+ sw H2, PTR_POLY1305_H(2)
+ sw H3, PTR_POLY1305_H(3)
+ sw H4, PTR_POLY1305_H(4)
+
+.Lpoly1305_blocks_mips_end:
+ addiu $sp, POLY1305_STACK_SIZE
+
+ /* Jump Back */
+ jr $ra
+.end poly1305_blocks_mips
+.set at
+
+/* Input arguments CTX=$a0, MAC=$a1, NONCE=$a2 */
+#define MAC $a1
+#define NONCE $a2
+
+#define G0 $t5
+#define G1 $t6
+#define G2 $t7
+#define G3 $t8
+#define G4 $t9
+
+.set noat
+.align 4
+.globl poly1305_emit_mips
+.ent poly1305_emit_mips
+poly1305_emit_mips:
+ /* load Hx and Carry */
+ lw CA, PTR_POLY1305_CA
+ lw H0, PTR_POLY1305_H(0)
+ lw H1, PTR_POLY1305_H(1)
+ lw H2, PTR_POLY1305_H(2)
+ lw H3, PTR_POLY1305_H(3)
+ lw H4, PTR_POLY1305_H(4)
+
+ /* Add left over carry */
+ addu H0, CA
+ sltu CA, H0, CA
+ addu H1, CA
+ sltu CA, H1, CA
+ addu H2, CA
+ sltu CA, H2, CA
+ addu H3, CA
+ sltu CA, H3, CA
+ addu H4, CA
+
+ /* compare to modulus by computing h + -p */
+ addiu G0, H0, 5
+ sltu CA, G0, H0
+ addu G1, H1, CA
+ sltu CA, G1, H1
+ addu G2, H2, CA
+ sltu CA, G2, H2
+ addu G3, H3, CA
+ sltu CA, G3, H3
+ addu G4, H4, CA
+
+ srl SC, G4, 2
+
+ /* if there was carry into 131st bit, h3:h0 = g3:g0 */
+ movn H0, G0, SC
+ movn H1, G1, SC
+ movn H2, G2, SC
+ movn H3, G3, SC
+
+ lwl G0, 0+MSB(NONCE)
+ lwl G1, 4+MSB(NONCE)
+ lwl G2, 8+MSB(NONCE)
+ lwl G3,12+MSB(NONCE)
+ lwr G0, 0+LSB(NONCE)
+ lwr G1, 4+LSB(NONCE)
+ lwr G2, 8+LSB(NONCE)
+ lwr G3,12+LSB(NONCE)
+
+ /* mac = (h + nonce) % (2^128) */
+ addu H0, G0
+ sltu CA, H0, G0
+
+ /* H1 */
+ addu H1, CA
+ sltu CA, H1, CA
+ addu H1, G1
+ sltu G1, H1, G1
+ addu CA, G1
+
+ /* H2 */
+ addu H2, CA
+ sltu CA, H2, CA
+ addu H2, G2
+ sltu G2, H2, G2
+ addu CA, G2
+
+ /* H3 */
+ addu H3, CA
+ addu H3, G3
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ wsbh H0
+ wsbh H1
+ wsbh H2
+ wsbh H3
+ rotr H0, 16
+ rotr H1, 16
+ rotr H2, 16
+ rotr H3, 16
+#endif
+
+ /* store MAC */
+ swl H0, 0+MSB(MAC)
+ swl H1, 4+MSB(MAC)
+ swl H2, 8+MSB(MAC)
+ swl H3,12+MSB(MAC)
+ swr H0, 0+LSB(MAC)
+ swr H1, 4+LSB(MAC)
+ swr H2, 8+LSB(MAC)
+ swr H3,12+LSB(MAC)
+
+ jr $ra
+.end poly1305_emit_mips
+
+#define PR0 $t0
+#define PR1 $t1
+#define PR2 $t2
+#define PR3 $t3
+#define PT0 $t4
+
+/* Input arguments CTX=$a0, KEY=$a1 */
+
+.align 4
+.globl poly1305_init_mips
+.ent poly1305_init_mips
+poly1305_init_mips:
+ lwl PR0, 0+MSB($a1)
+ lwl PR1, 4+MSB($a1)
+ lwl PR2, 8+MSB($a1)
+ lwl PR3,12+MSB($a1)
+ lwr PR0, 0+LSB($a1)
+ lwr PR1, 4+LSB($a1)
+ lwr PR2, 8+LSB($a1)
+ lwr PR3,12+LSB($a1)
+
+ /* store Hx and Carry */
+ sw $zero, PTR_POLY1305_CA
+ sw $zero, PTR_POLY1305_H(0)
+ sw $zero, PTR_POLY1305_H(1)
+ sw $zero, PTR_POLY1305_H(2)
+ sw $zero, PTR_POLY1305_H(3)
+ sw $zero, PTR_POLY1305_H(4)
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ wsbh PR0
+ wsbh PR1
+ wsbh PR2
+ wsbh PR3
+ rotr PR0, 16
+ rotr PR1, 16
+ rotr PR2, 16
+ rotr PR3, 16
+#endif
+
+ lui PT0, 0x0FFF
+ ori PT0, 0xFFFC
+
+ /* AND 0x0fffffff; */
+ ext PR0, PR0, 0, (32-4)
+
+ /* AND 0x0ffffffc; */
+ and PR1, PT0
+ and PR2, PT0
+ and PR3, PT0
+
+ /* store Rx */
+ sw PR0, PTR_POLY1305_R(0)
+ sw PR1, PTR_POLY1305_R(1)
+ sw PR2, PTR_POLY1305_R(2)
+ sw PR3, PTR_POLY1305_R(3)
+
+ /* Jump Back */
+ jr $ra
+.end poly1305_init_mips
diff --git a/lib/zinc/poly1305/poly1305-mips64.S b/lib/zinc/poly1305/poly1305-mips64.S
new file mode 100644
index 000000000000..d95d83f75b99
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-mips64.S
@@ -0,0 +1,355 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#ifdef __MIPSEB__
+#define MSB 0
+#define LSB 7
+#else
+#define MSB 7
+#define LSB 0
+#endif
+
+#if defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+#define dmultu(rs,rt)
+#define mflo(rd,rs,rt) dmulu rd,rs,rt
+#define mfhi(rd,rs,rt) dmuhu rd,rs,rt
+#else
+#define dmultu(rs,rt) dmultu rs,rt
+#define multu(rs,rt) multu rs,rt
+#define mflo(rd,rs,rt) mflo rd
+#define mfhi(rd,rs,rt) mfhi rd
+#endif
+
+.text
+.set noat
+.set noreorder
+
+/* While most of the assembly in the kernel prefers ENTRY() and ENDPROC(),
+ * there is no existing MIPS assembly that uses it, and MIPS assembler seems
+ * to like its own .ent/.end notation, which the MIPS include files don't
+ * provide in a MIPS-specific ENTRY/ENDPROC definition. So, we skip these
+ * for now, until somebody complains. */
+
+.align 5
+.globl poly1305_init_mips
+.ent poly1305_init_mips
+poly1305_init_mips:
+ .frame $29,0,$31
+ .set reorder
+
+ sd $0,0($4)
+ sd $0,8($4)
+ sd $0,16($4)
+
+ beqz $5,.Lno_key
+
+#if defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+ ld $8,0($5)
+ ld $9,8($5)
+#else
+ ldl $8,0+MSB($5)
+ ldl $9,8+MSB($5)
+ ldr $8,0+LSB($5)
+ ldr $9,8+LSB($5)
+#endif
+#ifdef __MIPSEB__
+#if defined(CONFIG_CPU_MIPS64_R2) || defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+ dsbh $8,$8 # byte swap
+ dsbh $9,$9
+ dshd $8,$8
+ dshd $9,$9
+#else
+ ori $10,$0,0xFF
+ dsll $1,$10,32
+ or $10,$1 # 0x000000FF000000FF
+
+ and $11,$8,$10 # byte swap
+ and $2,$9,$10
+ dsrl $1,$8,24
+ dsrl $24,$9,24
+ dsll $11,24
+ dsll $2,24
+ and $1,$10
+ and $24,$10
+ dsll $10,8 # 0x0000FF000000FF00
+ or $11,$1
+ or $2,$24
+ and $1,$8,$10
+ and $24,$9,$10
+ dsrl $8,8
+ dsrl $9,8
+ dsll $1,8
+ dsll $24,8
+ and $8,$10
+ and $9,$10
+ or $11,$1
+ or $2,$24
+ or $8,$11
+ or $9,$2
+ dsrl $11,$8,32
+ dsrl $2,$9,32
+ dsll $8,32
+ dsll $9,32
+ or $8,$11
+ or $9,$2
+#endif
+#endif
+ li $10,1
+ dsll $10,32
+ daddiu $10,-63
+ dsll $10,28
+ daddiu $10,-1 # 0ffffffc0fffffff
+
+ and $8,$10
+ daddiu $10,-3 # 0ffffffc0ffffffc
+ and $9,$10
+
+ sd $8,24($4)
+ dsrl $10,$9,2
+ sd $9,32($4)
+ daddu $10,$9 # s1 = r1 + (r1 >> 2)
+ sd $10,40($4)
+
+.Lno_key:
+ li $2,0 # return 0
+ jr $31
+.end poly1305_init_mips
+
+.align 5
+.globl poly1305_blocks_mips
+.ent poly1305_blocks_mips
+poly1305_blocks_mips:
+ .set noreorder
+ dsrl $6,4 # number of complete blocks
+ bnez $6,poly1305_blocks_internal
+ nop
+ jr $31
+ nop
+.end poly1305_blocks_mips
+
+.align 5
+.ent poly1305_blocks_internal
+poly1305_blocks_internal:
+ .frame $29,6*8,$31
+ .mask 0x00030000,-8
+ .set noreorder
+ dsubu $29,6*8
+ sd $17,40($29)
+ sd $16,32($29)
+ .set reorder
+
+ ld $12,0($4) # load hash value
+ ld $13,8($4)
+ ld $14,16($4)
+
+ ld $15,24($4) # load key
+ ld $16,32($4)
+ ld $17,40($4)
+
+.Loop:
+#if defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+ ld $8,0($5) # load input
+ ld $9,8($5)
+#else
+ ldl $8,0+MSB($5) # load input
+ ldl $9,8+MSB($5)
+ ldr $8,0+LSB($5)
+ ldr $9,8+LSB($5)
+#endif
+ daddiu $6,-1
+ daddiu $5,16
+#ifdef __MIPSEB__
+#if defined(CONFIG_CPU_MIPS64_R2) || defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+ dsbh $8,$8 # byte swap
+ dsbh $9,$9
+ dshd $8,$8
+ dshd $9,$9
+#else
+ ori $10,$0,0xFF
+ dsll $1,$10,32
+ or $10,$1 # 0x000000FF000000FF
+
+ and $11,$8,$10 # byte swap
+ and $2,$9,$10
+ dsrl $1,$8,24
+ dsrl $24,$9,24
+ dsll $11,24
+ dsll $2,24
+ and $1,$10
+ and $24,$10
+ dsll $10,8 # 0x0000FF000000FF00
+ or $11,$1
+ or $2,$24
+ and $1,$8,$10
+ and $24,$9,$10
+ dsrl $8,8
+ dsrl $9,8
+ dsll $1,8
+ dsll $24,8
+ and $8,$10
+ and $9,$10
+ or $11,$1
+ or $2,$24
+ or $8,$11
+ or $9,$2
+ dsrl $11,$8,32
+ dsrl $2,$9,32
+ dsll $8,32
+ dsll $9,32
+ or $8,$11
+ or $9,$2
+#endif
+#endif
+ daddu $12,$8 # accumulate input
+ daddu $13,$9
+ sltu $10,$12,$8
+ sltu $11,$13,$9
+ daddu $13,$10
+
+ dmultu ($15,$12) # h0*r0
+ daddu $14,$7
+ sltu $10,$13,$10
+ mflo ($8,$15,$12)
+ mfhi ($9,$15,$12)
+
+ dmultu ($17,$13) # h1*5*r1
+ daddu $10,$11
+ daddu $14,$10
+ mflo ($10,$17,$13)
+ mfhi ($11,$17,$13)
+
+ dmultu ($16,$12) # h0*r1
+ daddu $8,$10
+ daddu $9,$11
+ mflo ($1,$16,$12)
+ mfhi ($25,$16,$12)
+ sltu $10,$8,$10
+ daddu $9,$10
+
+ dmultu ($15,$13) # h1*r0
+ daddu $9,$1
+ sltu $1,$9,$1
+ mflo ($10,$15,$13)
+ mfhi ($11,$15,$13)
+ daddu $25,$1
+
+ dmultu ($17,$14) # h2*5*r1
+ daddu $9,$10
+ daddu $25,$11
+ mflo ($1,$17,$14)
+
+ dmultu ($15,$14) # h2*r0
+ sltu $10,$9,$10
+ daddu $25,$10
+ mflo ($2,$15,$14)
+
+ daddu $9,$1
+ daddu $25,$2
+ sltu $1,$9,$1
+ daddu $25,$1
+
+ li $10,-4 # final reduction
+ and $10,$25
+ dsrl $11,$25,2
+ andi $14,$25,3
+ daddu $10,$11
+ daddu $12,$8,$10
+ sltu $10,$12,$10
+ daddu $13,$9,$10
+ sltu $10,$13,$10
+ daddu $14,$14,$10
+
+ bnez $6,.Loop
+
+ sd $12,0($4) # store hash value
+ sd $13,8($4)
+ sd $14,16($4)
+
+ .set noreorder
+ ld $17,40($29) # epilogue
+ ld $16,32($29)
+ jr $31
+ daddu $29,6*8
+.end poly1305_blocks_internal
+
+.align 5
+.globl poly1305_emit_mips
+.ent poly1305_emit_mips
+poly1305_emit_mips:
+ .frame $29,0,$31
+ .set reorder
+
+ ld $10,0($4)
+ ld $11,8($4)
+ ld $1,16($4)
+
+ daddiu $8,$10,5 # compare to modulus
+ sltiu $2,$8,5
+ daddu $9,$11,$2
+ sltu $2,$9,$2
+ daddu $1,$1,$2
+
+ dsrl $1,2 # see if it carried/borrowed
+ dsubu $1,$0,$1
+ nor $2,$0,$1
+
+ and $8,$1
+ and $10,$2
+ and $9,$1
+ and $11,$2
+ or $8,$10
+ or $9,$11
+
+ lwu $10,0($6) # load nonce
+ lwu $11,4($6)
+ lwu $1,8($6)
+ lwu $2,12($6)
+ dsll $11,32
+ dsll $2,32
+ or $10,$11
+ or $1,$2
+
+ daddu $8,$10 # accumulate nonce
+ daddu $9,$1
+ sltu $10,$8,$10
+ daddu $9,$10
+
+ dsrl $10,$8,8 # write mac value
+ dsrl $11,$8,16
+ dsrl $1,$8,24
+ sb $8,0($5)
+ dsrl $2,$8,32
+ sb $10,1($5)
+ dsrl $10,$8,40
+ sb $11,2($5)
+ dsrl $11,$8,48
+ sb $1,3($5)
+ dsrl $1,$8,56
+ sb $2,4($5)
+ dsrl $2,$9,8
+ sb $10,5($5)
+ dsrl $10,$9,16
+ sb $11,6($5)
+ dsrl $11,$9,24
+ sb $1,7($5)
+
+ sb $9,8($5)
+ dsrl $1,$9,32
+ sb $2,9($5)
+ dsrl $2,$9,40
+ sb $10,10($5)
+ dsrl $10,$9,48
+ sb $11,11($5)
+ dsrl $11,$9,56
+ sb $1,12($5)
+ sb $2,13($5)
+ sb $10,14($5)
+ sb $11,15($5)
+
+ jr $31
+.end poly1305_emit_mips
diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c
index 647aa3354d38..361647054aff 100644
--- a/lib/zinc/poly1305/poly1305.c
+++ b/lib/zinc/poly1305/poly1305.c
@@ -19,6 +19,8 @@
#include "poly1305-x86_64-glue.h"
#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64)
#include "poly1305-arm-glue.h"
+#elif defined(CONFIG_ZINC_ARCH_MIPS) || defined(CONFIG_ZINC_ARCH_MIPS64)
+#include "poly1305-mips-glue.h"
#else
static inline bool poly1305_init_arch(void *ctx,
const u8 key[POLY1305_KEY_SIZE])
--
2.19.0
^ permalink raw reply related [flat|nested] 213+ messages in thread
* [PATCH net-next v6 14/23] zinc: ChaCha20Poly1305 construction and selftest
2018-09-25 14:55 [PATCH net-next v6 00/23] WireGuard: Secure Network Tunnel Jason A. Donenfeld
` (12 preceding siblings ...)
2018-09-25 14:56 ` [PATCH net-next v6 13/23] zinc: Poly1305 MIPS32r2 and MIPS64 implementations Jason A. Donenfeld
@ 2018-09-25 14:56 ` Jason A. Donenfeld
2018-09-25 14:56 ` [PATCH net-next v6 15/23] zinc: BLAKE2s generic C implementation " Jason A. Donenfeld
` (9 subsequent siblings)
23 siblings, 0 replies; 213+ messages in thread
From: Jason A. Donenfeld @ 2018-09-25 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh
Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
Jean-Philippe Aumasson
This is an implementation of the ChaCha20Poly1305 AEAD, with an easy API
for encrypting either contiguous buffers or scatter gather lists (such
as those created from skb_to_sgvec).
Information: https://tools.ietf.org/html/rfc8439
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
include/zinc/chacha20poly1305.h | 50 +
lib/zinc/Kconfig | 6 +
lib/zinc/Makefile | 3 +
lib/zinc/chacha20poly1305.c | 361 +
lib/zinc/selftest/chacha20poly1305.h | 9038 ++++++++++++++++++++++++++
5 files changed, 9458 insertions(+)
create mode 100644 include/zinc/chacha20poly1305.h
create mode 100644 lib/zinc/chacha20poly1305.c
create mode 100644 lib/zinc/selftest/chacha20poly1305.h
diff --git a/include/zinc/chacha20poly1305.h b/include/zinc/chacha20poly1305.h
new file mode 100644
index 000000000000..d2753c5bada5
--- /dev/null
+++ b/include/zinc/chacha20poly1305.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_CHACHA20POLY1305_H
+#define _ZINC_CHACHA20POLY1305_H
+
+#include <linux/simd.h>
+#include <linux/types.h>
+
+struct scatterlist;
+
+enum chacha20poly1305_lengths {
+ XCHACHA20POLY1305_NONCE_SIZE = 24,
+ CHACHA20POLY1305_KEY_SIZE = 32,
+ CHACHA20POLY1305_AUTHTAG_SIZE = 16
+};
+
+void chacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE]);
+
+bool __must_check chacha20poly1305_encrypt_sg(
+ struct scatterlist *dst, struct scatterlist *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len, const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE], simd_context_t *simd_context);
+
+bool __must_check
+chacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len, const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE]);
+
+bool __must_check chacha20poly1305_decrypt_sg(
+ struct scatterlist *dst, struct scatterlist *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len, const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE], simd_context_t *simd_context);
+
+void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u8 nonce[XCHACHA20POLY1305_NONCE_SIZE],
+ const u8 key[CHACHA20POLY1305_KEY_SIZE]);
+
+bool __must_check xchacha20poly1305_decrypt(
+ u8 *dst, const u8 *src, const size_t src_len, const u8 *ad,
+ const size_t ad_len, const u8 nonce[XCHACHA20POLY1305_NONCE_SIZE],
+ const u8 key[CHACHA20POLY1305_KEY_SIZE]);
+
+#endif /* _ZINC_CHACHA20POLY1305_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index f9ef8f7e3c25..dec4633cdfb4 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -5,6 +5,12 @@ config ZINC_CHACHA20
config ZINC_POLY1305
tristate
+config ZINC_CHACHA20POLY1305
+ tristate
+ select ZINC_CHACHA20
+ select ZINC_POLY1305
+ select CRYPTO_BLKCIPHER
+
config ZINC_DEBUG
bool "Zinc cryptography library debugging and self-tests"
help
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 5c4b1d51cb03..c31186b491e8 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -18,3 +18,6 @@ zinc_poly1305-$(CONFIG_ZINC_ARCH_MIPS) += poly1305/poly1305-mips.o
AFLAGS_poly1305-mips.o += -O2 # This is required to fill the branch delay slots
zinc_poly1305-$(CONFIG_ZINC_ARCH_MIPS64) += poly1305/poly1305-mips64.o
obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o
+
+zinc_chacha20poly1305-y := chacha20poly1305.o
+obj-$(CONFIG_ZINC_CHACHA20POLY1305) += zinc_chacha20poly1305.o
diff --git a/lib/zinc/chacha20poly1305.c b/lib/zinc/chacha20poly1305.c
new file mode 100644
index 000000000000..f91fbed7b9aa
--- /dev/null
+++ b/lib/zinc/chacha20poly1305.c
@@ -0,0 +1,361 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is an implementation of the ChaCha20Poly1305 AEAD construction.
+ *
+ * Information: https://tools.ietf.org/html/rfc8439
+ */
+
+#include <zinc/chacha20poly1305.h>
+#include <zinc/chacha20.h>
+#include <zinc/poly1305.h>
+#include <asm/unaligned.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <crypto/scatterwalk.h>
+
+static const u8 pad0[16] = { 0 };
+
+static struct crypto_alg chacha20_alg = {
+ .cra_blocksize = 1,
+ .cra_alignmask = sizeof(u32) - 1
+};
+static struct crypto_blkcipher chacha20_cipher = {
+ .base = {
+ .__crt_alg = &chacha20_alg
+ }
+};
+static struct blkcipher_desc chacha20_desc = {
+ .tfm = &chacha20_cipher
+};
+
+static inline void
+__chacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len, const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+ struct poly1305_ctx poly1305_state;
+ struct chacha20_ctx chacha20_state;
+ union {
+ u8 block0[POLY1305_KEY_SIZE];
+ __le64 lens[2];
+ } b = { { 0 } };
+
+ chacha20_init(&chacha20_state, key, nonce);
+ chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+ simd_context);
+ poly1305_init(&poly1305_state, b.block0);
+
+ poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+ poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+ simd_context);
+
+ chacha20(&chacha20_state, dst, src, src_len, simd_context);
+
+ poly1305_update(&poly1305_state, dst, src_len, simd_context);
+ poly1305_update(&poly1305_state, pad0, (0x10 - src_len) & 0xf,
+ simd_context);
+
+ b.lens[0] = cpu_to_le64(ad_len);
+ b.lens[1] = cpu_to_le64(src_len);
+ poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+ simd_context);
+
+ poly1305_final(&poly1305_state, dst + src_len, simd_context);
+
+ memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+ memzero_explicit(&b, sizeof(b));
+}
+
+void chacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE])
+{
+ simd_context_t simd_context;
+
+ simd_get(&simd_context);
+ __chacha20poly1305_encrypt(dst, src, src_len, ad, ad_len, nonce, key,
+ &simd_context);
+ simd_put(&simd_context);
+}
+EXPORT_SYMBOL(chacha20poly1305_encrypt);
+
+bool chacha20poly1305_encrypt_sg(struct scatterlist *dst,
+ struct scatterlist *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+ struct poly1305_ctx poly1305_state;
+ struct chacha20_ctx chacha20_state;
+ int ret = 0;
+ struct blkcipher_walk walk;
+ union {
+ u8 block0[POLY1305_KEY_SIZE];
+ u8 mac[POLY1305_MAC_SIZE];
+ __le64 lens[2];
+ } b = { { 0 } };
+
+ chacha20_init(&chacha20_state, key, nonce);
+ chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+ simd_context);
+ poly1305_init(&poly1305_state, b.block0);
+
+ poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+ poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+ simd_context);
+
+ if (likely(src_len)) {
+ blkcipher_walk_init(&walk, dst, src, src_len);
+ ret = blkcipher_walk_virt_block(&chacha20_desc, &walk,
+ CHACHA20_BLOCK_SIZE);
+ while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+ size_t chunk_len =
+ rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE);
+
+ chacha20(&chacha20_state, walk.dst.virt.addr,
+ walk.src.virt.addr, chunk_len, simd_context);
+ poly1305_update(&poly1305_state, walk.dst.virt.addr,
+ chunk_len, simd_context);
+ simd_relax(simd_context);
+ ret = blkcipher_walk_done(&chacha20_desc, &walk,
+ walk.nbytes % CHACHA20_BLOCK_SIZE);
+ }
+ if (walk.nbytes) {
+ chacha20(&chacha20_state, walk.dst.virt.addr,
+ walk.src.virt.addr, walk.nbytes, simd_context);
+ poly1305_update(&poly1305_state, walk.dst.virt.addr,
+ walk.nbytes, simd_context);
+ ret = blkcipher_walk_done(&chacha20_desc, &walk, 0);
+ }
+ }
+ if (unlikely(ret))
+ goto err;
+
+ poly1305_update(&poly1305_state, pad0, (0x10 - src_len) & 0xf,
+ simd_context);
+
+ b.lens[0] = cpu_to_le64(ad_len);
+ b.lens[1] = cpu_to_le64(src_len);
+ poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+ simd_context);
+
+ poly1305_final(&poly1305_state, b.mac, simd_context);
+ scatterwalk_map_and_copy(b.mac, dst, src_len, sizeof(b.mac), 1);
+err:
+ memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+ memzero_explicit(&b, sizeof(b));
+ return !ret;
+}
+EXPORT_SYMBOL(chacha20poly1305_encrypt_sg);
+
+static inline bool
+__chacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len, const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+ struct poly1305_ctx poly1305_state;
+ struct chacha20_ctx chacha20_state;
+ int ret;
+ size_t dst_len;
+ union {
+ u8 block0[POLY1305_KEY_SIZE];
+ u8 mac[POLY1305_MAC_SIZE];
+ __le64 lens[2];
+ } b = { { 0 } };
+
+ if (unlikely(src_len < POLY1305_MAC_SIZE))
+ return false;
+
+ chacha20_init(&chacha20_state, key, nonce);
+ chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+ simd_context);
+ poly1305_init(&poly1305_state, b.block0);
+
+ poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+ poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+ simd_context);
+
+ dst_len = src_len - POLY1305_MAC_SIZE;
+ poly1305_update(&poly1305_state, src, dst_len, simd_context);
+ poly1305_update(&poly1305_state, pad0, (0x10 - dst_len) & 0xf,
+ simd_context);
+
+ b.lens[0] = cpu_to_le64(ad_len);
+ b.lens[1] = cpu_to_le64(dst_len);
+ poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+ simd_context);
+
+ poly1305_final(&poly1305_state, b.mac, simd_context);
+
+ ret = crypto_memneq(b.mac, src + dst_len, POLY1305_MAC_SIZE);
+ if (likely(!ret))
+ chacha20(&chacha20_state, dst, src, dst_len, simd_context);
+
+ memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+ memzero_explicit(&b, sizeof(b));
+
+ return !ret;
+}
+
+bool chacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE])
+{
+ simd_context_t simd_context, ret;
+
+ simd_get(&simd_context);
+ ret = __chacha20poly1305_decrypt(dst, src, src_len, ad, ad_len, nonce,
+ key, &simd_context);
+ simd_put(&simd_context);
+ return ret;
+}
+EXPORT_SYMBOL(chacha20poly1305_decrypt);
+
+bool chacha20poly1305_decrypt_sg(struct scatterlist *dst,
+ struct scatterlist *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u64 nonce,
+ const u8 key[CHACHA20POLY1305_KEY_SIZE],
+ simd_context_t *simd_context)
+{
+ struct poly1305_ctx poly1305_state;
+ struct chacha20_ctx chacha20_state;
+ struct blkcipher_walk walk;
+ int ret = 0;
+ size_t dst_len;
+ union {
+ u8 block0[POLY1305_KEY_SIZE];
+ struct {
+ u8 read_mac[POLY1305_MAC_SIZE];
+ u8 computed_mac[POLY1305_MAC_SIZE];
+ };
+ __le64 lens[2];
+ } b = { { 0 } };
+
+ if (unlikely(src_len < POLY1305_MAC_SIZE))
+ return false;
+
+ chacha20_init(&chacha20_state, key, nonce);
+ chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+ simd_context);
+ poly1305_init(&poly1305_state, b.block0);
+
+ poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+ poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+ simd_context);
+
+ dst_len = src_len - POLY1305_MAC_SIZE;
+ if (likely(dst_len)) {
+ blkcipher_walk_init(&walk, dst, src, dst_len);
+ ret = blkcipher_walk_virt_block(&chacha20_desc, &walk,
+ CHACHA20_BLOCK_SIZE);
+ while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+ size_t chunk_len =
+ rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE);
+
+ poly1305_update(&poly1305_state, walk.src.virt.addr,
+ chunk_len, simd_context);
+ chacha20(&chacha20_state, walk.dst.virt.addr,
+ walk.src.virt.addr, chunk_len, simd_context);
+ simd_relax(simd_context);
+ ret = blkcipher_walk_done(&chacha20_desc, &walk,
+ walk.nbytes % CHACHA20_BLOCK_SIZE);
+ }
+ if (walk.nbytes) {
+ poly1305_update(&poly1305_state, walk.src.virt.addr,
+ walk.nbytes, simd_context);
+ chacha20(&chacha20_state, walk.dst.virt.addr,
+ walk.src.virt.addr, walk.nbytes, simd_context);
+ ret = blkcipher_walk_done(&chacha20_desc, &walk, 0);
+ }
+ }
+ if (unlikely(ret))
+ goto err;
+
+ poly1305_update(&poly1305_state, pad0, (0x10 - dst_len) & 0xf,
+ simd_context);
+
+ b.lens[0] = cpu_to_le64(ad_len);
+ b.lens[1] = cpu_to_le64(dst_len);
+ poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+ simd_context);
+
+ poly1305_final(&poly1305_state, b.computed_mac, simd_context);
+
+ scatterwalk_map_and_copy(b.read_mac, src, dst_len, POLY1305_MAC_SIZE, 0);
+ ret = crypto_memneq(b.read_mac, b.computed_mac, POLY1305_MAC_SIZE);
+err:
+ memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+ memzero_explicit(&b, sizeof(b));
+ return !ret;
+}
+EXPORT_SYMBOL(chacha20poly1305_decrypt_sg);
+
+void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u8 nonce[XCHACHA20POLY1305_NONCE_SIZE],
+ const u8 key[CHACHA20POLY1305_KEY_SIZE])
+{
+ simd_context_t simd_context;
+ u32 derived_key[CHACHA20_KEY_WORDS] __aligned(16);
+
+ simd_get(&simd_context);
+ hchacha20(derived_key, nonce, key, &simd_context);
+ cpu_to_le32_array(derived_key, ARRAY_SIZE(derived_key));
+ __chacha20poly1305_encrypt(dst, src, src_len, ad, ad_len,
+ get_unaligned_le64(nonce + 16),
+ (u8 *)derived_key, &simd_context);
+ memzero_explicit(derived_key, CHACHA20POLY1305_KEY_SIZE);
+ simd_put(&simd_context);
+}
+EXPORT_SYMBOL(xchacha20poly1305_encrypt);
+
+bool xchacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+ const u8 *ad, const size_t ad_len,
+ const u8 nonce[XCHACHA20POLY1305_NONCE_SIZE],
+ const u8 key[CHACHA20POLY1305_KEY_SIZE])
+{
+ bool ret;
+ simd_context_t simd_context;
+ u32 derived_key[CHACHA20_KEY_WORDS] __aligned(16);
+
+ simd_get(&simd_context);
+ hchacha20(derived_key, nonce, key, &simd_context);
+ cpu_to_le32_array(derived_key, ARRAY_SIZE(derived_key));
+ ret = __chacha20poly1305_decrypt(dst, src, src_len, ad, ad_len,
+ get_unaligned_le64(nonce + 16),
+ (u8 *)derived_key, &simd_context);
+ memzero_explicit(derived_key, CHACHA20POLY1305_KEY_SIZE);
+ simd_put(&simd_context);
+ return ret;
+}
+EXPORT_SYMBOL(xchacha20poly1305_decrypt);
+
+#include "selftest/chacha20poly1305.h"
+
+static int __init mod_init(void)
+{
+#ifdef DEBUG
+ if (!chacha20poly1305_selftest())
+ return -ENOTRECOVERABLE;
+#endif
+ return 0;
+}
+
+static void __exit mod_exit(void)
+{
+}
+
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("ChaCha20Poly1305 AEAD construction");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
diff --git a/lib/zinc/selftest/chacha20poly1305.h b/lib/zinc/selftest/chacha20poly1305.h
new file mode 100644
index 000000000000..0510a8e966eb
--- /dev/null
+++ b/lib/zinc/selftest/chacha20poly1305.h
@@ -0,0 +1,9038 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+struct chacha20poly1305_testvec {
+ const u8 *input, *output, *assoc, *nonce, *key;
+ size_t ilen, alen, nlen;
+ bool failure;
+};
+
+/* The first of these are the ChaCha20-Poly1305 AEAD test vectors from RFC7539
+ * 2.8.2. After they are generated by reference implementations. And the final
+ * marked ones are taken from wycheproof, but we only do these for the encrypt
+ * side, because mostly we're stressing the primitives rather than the actual
+ * chapoly construction. This also requires adding a 96-bit nonce construction,
+ * just for the purpose of the tests.
+ */
+
+static const u8 enc_input001[] __initconst = {
+ 0x49, 0x6e, 0x74, 0x65, 0x72, 0x6e, 0x65, 0x74,
+ 0x2d, 0x44, 0x72, 0x61, 0x66, 0x74, 0x73, 0x20,
+ 0x61, 0x72, 0x65, 0x20, 0x64, 0x72, 0x61, 0x66,
+ 0x74, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+ 0x6e, 0x74, 0x73, 0x20, 0x76, 0x61, 0x6c, 0x69,
+ 0x64, 0x20, 0x66, 0x6f, 0x72, 0x20, 0x61, 0x20,
+ 0x6d, 0x61, 0x78, 0x69, 0x6d, 0x75, 0x6d, 0x20,
+ 0x6f, 0x66, 0x20, 0x73, 0x69, 0x78, 0x20, 0x6d,
+ 0x6f, 0x6e, 0x74, 0x68, 0x73, 0x20, 0x61, 0x6e,
+ 0x64, 0x20, 0x6d, 0x61, 0x79, 0x20, 0x62, 0x65,
+ 0x20, 0x75, 0x70, 0x64, 0x61, 0x74, 0x65, 0x64,
+ 0x2c, 0x20, 0x72, 0x65, 0x70, 0x6c, 0x61, 0x63,
+ 0x65, 0x64, 0x2c, 0x20, 0x6f, 0x72, 0x20, 0x6f,
+ 0x62, 0x73, 0x6f, 0x6c, 0x65, 0x74, 0x65, 0x64,
+ 0x20, 0x62, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65,
+ 0x72, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+ 0x6e, 0x74, 0x73, 0x20, 0x61, 0x74, 0x20, 0x61,
+ 0x6e, 0x79, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x2e,
+ 0x20, 0x49, 0x74, 0x20, 0x69, 0x73, 0x20, 0x69,
+ 0x6e, 0x61, 0x70, 0x70, 0x72, 0x6f, 0x70, 0x72,
+ 0x69, 0x61, 0x74, 0x65, 0x20, 0x74, 0x6f, 0x20,
+ 0x75, 0x73, 0x65, 0x20, 0x49, 0x6e, 0x74, 0x65,
+ 0x72, 0x6e, 0x65, 0x74, 0x2d, 0x44, 0x72, 0x61,
+ 0x66, 0x74, 0x73, 0x20, 0x61, 0x73, 0x20, 0x72,
+ 0x65, 0x66, 0x65, 0x72, 0x65, 0x6e, 0x63, 0x65,
+ 0x20, 0x6d, 0x61, 0x74, 0x65, 0x72, 0x69, 0x61,
+ 0x6c, 0x20, 0x6f, 0x72, 0x20, 0x74, 0x6f, 0x20,
+ 0x63, 0x69, 0x74, 0x65, 0x20, 0x74, 0x68, 0x65,
+ 0x6d, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20,
+ 0x74, 0x68, 0x61, 0x6e, 0x20, 0x61, 0x73, 0x20,
+ 0x2f, 0xe2, 0x80, 0x9c, 0x77, 0x6f, 0x72, 0x6b,
+ 0x20, 0x69, 0x6e, 0x20, 0x70, 0x72, 0x6f, 0x67,
+ 0x72, 0x65, 0x73, 0x73, 0x2e, 0x2f, 0xe2, 0x80,
+ 0x9d
+};
+static const u8 enc_output001[] __initconst = {
+ 0x64, 0xa0, 0x86, 0x15, 0x75, 0x86, 0x1a, 0xf4,
+ 0x60, 0xf0, 0x62, 0xc7, 0x9b, 0xe6, 0x43, 0xbd,
+ 0x5e, 0x80, 0x5c, 0xfd, 0x34, 0x5c, 0xf3, 0x89,
+ 0xf1, 0x08, 0x67, 0x0a, 0xc7, 0x6c, 0x8c, 0xb2,
+ 0x4c, 0x6c, 0xfc, 0x18, 0x75, 0x5d, 0x43, 0xee,
+ 0xa0, 0x9e, 0xe9, 0x4e, 0x38, 0x2d, 0x26, 0xb0,
+ 0xbd, 0xb7, 0xb7, 0x3c, 0x32, 0x1b, 0x01, 0x00,
+ 0xd4, 0xf0, 0x3b, 0x7f, 0x35, 0x58, 0x94, 0xcf,
+ 0x33, 0x2f, 0x83, 0x0e, 0x71, 0x0b, 0x97, 0xce,
+ 0x98, 0xc8, 0xa8, 0x4a, 0xbd, 0x0b, 0x94, 0x81,
+ 0x14, 0xad, 0x17, 0x6e, 0x00, 0x8d, 0x33, 0xbd,
+ 0x60, 0xf9, 0x82, 0xb1, 0xff, 0x37, 0xc8, 0x55,
+ 0x97, 0x97, 0xa0, 0x6e, 0xf4, 0xf0, 0xef, 0x61,
+ 0xc1, 0x86, 0x32, 0x4e, 0x2b, 0x35, 0x06, 0x38,
+ 0x36, 0x06, 0x90, 0x7b, 0x6a, 0x7c, 0x02, 0xb0,
+ 0xf9, 0xf6, 0x15, 0x7b, 0x53, 0xc8, 0x67, 0xe4,
+ 0xb9, 0x16, 0x6c, 0x76, 0x7b, 0x80, 0x4d, 0x46,
+ 0xa5, 0x9b, 0x52, 0x16, 0xcd, 0xe7, 0xa4, 0xe9,
+ 0x90, 0x40, 0xc5, 0xa4, 0x04, 0x33, 0x22, 0x5e,
+ 0xe2, 0x82, 0xa1, 0xb0, 0xa0, 0x6c, 0x52, 0x3e,
+ 0xaf, 0x45, 0x34, 0xd7, 0xf8, 0x3f, 0xa1, 0x15,
+ 0x5b, 0x00, 0x47, 0x71, 0x8c, 0xbc, 0x54, 0x6a,
+ 0x0d, 0x07, 0x2b, 0x04, 0xb3, 0x56, 0x4e, 0xea,
+ 0x1b, 0x42, 0x22, 0x73, 0xf5, 0x48, 0x27, 0x1a,
+ 0x0b, 0xb2, 0x31, 0x60, 0x53, 0xfa, 0x76, 0x99,
+ 0x19, 0x55, 0xeb, 0xd6, 0x31, 0x59, 0x43, 0x4e,
+ 0xce, 0xbb, 0x4e, 0x46, 0x6d, 0xae, 0x5a, 0x10,
+ 0x73, 0xa6, 0x72, 0x76, 0x27, 0x09, 0x7a, 0x10,
+ 0x49, 0xe6, 0x17, 0xd9, 0x1d, 0x36, 0x10, 0x94,
+ 0xfa, 0x68, 0xf0, 0xff, 0x77, 0x98, 0x71, 0x30,
+ 0x30, 0x5b, 0xea, 0xba, 0x2e, 0xda, 0x04, 0xdf,
+ 0x99, 0x7b, 0x71, 0x4d, 0x6c, 0x6f, 0x2c, 0x29,
+ 0xa6, 0xad, 0x5c, 0xb4, 0x02, 0x2b, 0x02, 0x70,
+ 0x9b, 0xee, 0xad, 0x9d, 0x67, 0x89, 0x0c, 0xbb,
+ 0x22, 0x39, 0x23, 0x36, 0xfe, 0xa1, 0x85, 0x1f,
+ 0x38
+};
+static const u8 enc_assoc001[] __initconst = {
+ 0xf3, 0x33, 0x88, 0x86, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x4e, 0x91
+};
+static const u8 enc_nonce001[] __initconst = {
+ 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08
+};
+static const u8 enc_key001[] __initconst = {
+ 0x1c, 0x92, 0x40, 0xa5, 0xeb, 0x55, 0xd3, 0x8a,
+ 0xf3, 0x33, 0x88, 0x86, 0x04, 0xf6, 0xb5, 0xf0,
+ 0x47, 0x39, 0x17, 0xc1, 0x40, 0x2b, 0x80, 0x09,
+ 0x9d, 0xca, 0x5c, 0xbc, 0x20, 0x70, 0x75, 0xc0
+};
+
+static const u8 enc_input002[] __initconst = { };
+static const u8 enc_output002[] __initconst = {
+ 0xea, 0xe0, 0x1e, 0x9e, 0x2c, 0x91, 0xaa, 0xe1,
+ 0xdb, 0x5d, 0x99, 0x3f, 0x8a, 0xf7, 0x69, 0x92
+};
+static const u8 enc_assoc002[] __initconst = { };
+static const u8 enc_nonce002[] __initconst = {
+ 0xca, 0xbf, 0x33, 0x71, 0x32, 0x45, 0x77, 0x8e
+};
+static const u8 enc_key002[] __initconst = {
+ 0x4c, 0xf5, 0x96, 0x83, 0x38, 0xe6, 0xae, 0x7f,
+ 0x2d, 0x29, 0x25, 0x76, 0xd5, 0x75, 0x27, 0x86,
+ 0x91, 0x9a, 0x27, 0x7a, 0xfb, 0x46, 0xc5, 0xef,
+ 0x94, 0x81, 0x79, 0x57, 0x14, 0x59, 0x40, 0x68
+};
+
+static const u8 enc_input003[] __initconst = { };
+static const u8 enc_output003[] __initconst = {
+ 0xdd, 0x6b, 0x3b, 0x82, 0xce, 0x5a, 0xbd, 0xd6,
+ 0xa9, 0x35, 0x83, 0xd8, 0x8c, 0x3d, 0x85, 0x77
+};
+static const u8 enc_assoc003[] __initconst = {
+ 0x33, 0x10, 0x41, 0x12, 0x1f, 0xf3, 0xd2, 0x6b
+};
+static const u8 enc_nonce003[] __initconst = {
+ 0x3d, 0x86, 0xb5, 0x6b, 0xc8, 0xa3, 0x1f, 0x1d
+};
+static const u8 enc_key003[] __initconst = {
+ 0x2d, 0xb0, 0x5d, 0x40, 0xc8, 0xed, 0x44, 0x88,
+ 0x34, 0xd1, 0x13, 0xaf, 0x57, 0xa1, 0xeb, 0x3a,
+ 0x2a, 0x80, 0x51, 0x36, 0xec, 0x5b, 0xbc, 0x08,
+ 0x93, 0x84, 0x21, 0xb5, 0x13, 0x88, 0x3c, 0x0d
+};
+
+static const u8 enc_input004[] __initconst = {
+ 0xa4
+};
+static const u8 enc_output004[] __initconst = {
+ 0xb7, 0x1b, 0xb0, 0x73, 0x59, 0xb0, 0x84, 0xb2,
+ 0x6d, 0x8e, 0xab, 0x94, 0x31, 0xa1, 0xae, 0xac,
+ 0x89
+};
+static const u8 enc_assoc004[] __initconst = {
+ 0x6a, 0xe2, 0xad, 0x3f, 0x88, 0x39, 0x5a, 0x40
+};
+static const u8 enc_nonce004[] __initconst = {
+ 0xd2, 0x32, 0x1f, 0x29, 0x28, 0xc6, 0xc4, 0xc4
+};
+static const u8 enc_key004[] __initconst = {
+ 0x4b, 0x28, 0x4b, 0xa3, 0x7b, 0xbe, 0xe9, 0xf8,
+ 0x31, 0x80, 0x82, 0xd7, 0xd8, 0xe8, 0xb5, 0xa1,
+ 0xe2, 0x18, 0x18, 0x8a, 0x9c, 0xfa, 0xa3, 0x3d,
+ 0x25, 0x71, 0x3e, 0x40, 0xbc, 0x54, 0x7a, 0x3e
+};
+
+static const u8 enc_input005[] __initconst = {
+ 0x2d
+};
+static const u8 enc_output005[] __initconst = {
+ 0xbf, 0xe1, 0x5b, 0x0b, 0xdb, 0x6b, 0xf5, 0x5e,
+ 0x6c, 0x5d, 0x84, 0x44, 0x39, 0x81, 0xc1, 0x9c,
+ 0xac
+};
+static const u8 enc_assoc005[] __initconst = { };
+static const u8 enc_nonce005[] __initconst = {
+ 0x20, 0x1c, 0xaa, 0x5f, 0x9c, 0xbf, 0x92, 0x30
+};
+static const u8 enc_key005[] __initconst = {
+ 0x66, 0xca, 0x9c, 0x23, 0x2a, 0x4b, 0x4b, 0x31,
+ 0x0e, 0x92, 0x89, 0x8b, 0xf4, 0x93, 0xc7, 0x87,
+ 0x98, 0xa3, 0xd8, 0x39, 0xf8, 0xf4, 0xa7, 0x01,
+ 0xc0, 0x2e, 0x0a, 0xa6, 0x7e, 0x5a, 0x78, 0x87
+};
+
+static const u8 enc_input006[] __initconst = {
+ 0x33, 0x2f, 0x94, 0xc1, 0xa4, 0xef, 0xcc, 0x2a,
+ 0x5b, 0xa6, 0xe5, 0x8f, 0x1d, 0x40, 0xf0, 0x92,
+ 0x3c, 0xd9, 0x24, 0x11, 0xa9, 0x71, 0xf9, 0x37,
+ 0x14, 0x99, 0xfa, 0xbe, 0xe6, 0x80, 0xde, 0x50,
+ 0xc9, 0x96, 0xd4, 0xb0, 0xec, 0x9e, 0x17, 0xec,
+ 0xd2, 0x5e, 0x72, 0x99, 0xfc, 0x0a, 0xe1, 0xcb,
+ 0x48, 0xd2, 0x85, 0xdd, 0x2f, 0x90, 0xe0, 0x66,
+ 0x3b, 0xe6, 0x20, 0x74, 0xbe, 0x23, 0x8f, 0xcb,
+ 0xb4, 0xe4, 0xda, 0x48, 0x40, 0xa6, 0xd1, 0x1b,
+ 0xc7, 0x42, 0xce, 0x2f, 0x0c, 0xa6, 0x85, 0x6e,
+ 0x87, 0x37, 0x03, 0xb1, 0x7c, 0x25, 0x96, 0xa3,
+ 0x05, 0xd8, 0xb0, 0xf4, 0xed, 0xea, 0xc2, 0xf0,
+ 0x31, 0x98, 0x6c, 0xd1, 0x14, 0x25, 0xc0, 0xcb,
+ 0x01, 0x74, 0xd0, 0x82, 0xf4, 0x36, 0xf5, 0x41,
+ 0xd5, 0xdc, 0xca, 0xc5, 0xbb, 0x98, 0xfe, 0xfc,
+ 0x69, 0x21, 0x70, 0xd8, 0xa4, 0x4b, 0xc8, 0xde,
+ 0x8f
+};
+static const u8 enc_output006[] __initconst = {
+ 0x8b, 0x06, 0xd3, 0x31, 0xb0, 0x93, 0x45, 0xb1,
+ 0x75, 0x6e, 0x26, 0xf9, 0x67, 0xbc, 0x90, 0x15,
+ 0x81, 0x2c, 0xb5, 0xf0, 0xc6, 0x2b, 0xc7, 0x8c,
+ 0x56, 0xd1, 0xbf, 0x69, 0x6c, 0x07, 0xa0, 0xda,
+ 0x65, 0x27, 0xc9, 0x90, 0x3d, 0xef, 0x4b, 0x11,
+ 0x0f, 0x19, 0x07, 0xfd, 0x29, 0x92, 0xd9, 0xc8,
+ 0xf7, 0x99, 0x2e, 0x4a, 0xd0, 0xb8, 0x2c, 0xdc,
+ 0x93, 0xf5, 0x9e, 0x33, 0x78, 0xd1, 0x37, 0xc3,
+ 0x66, 0xd7, 0x5e, 0xbc, 0x44, 0xbf, 0x53, 0xa5,
+ 0xbc, 0xc4, 0xcb, 0x7b, 0x3a, 0x8e, 0x7f, 0x02,
+ 0xbd, 0xbb, 0xe7, 0xca, 0xa6, 0x6c, 0x6b, 0x93,
+ 0x21, 0x93, 0x10, 0x61, 0xe7, 0x69, 0xd0, 0x78,
+ 0xf3, 0x07, 0x5a, 0x1a, 0x8f, 0x73, 0xaa, 0xb1,
+ 0x4e, 0xd3, 0xda, 0x4f, 0xf3, 0x32, 0xe1, 0x66,
+ 0x3e, 0x6c, 0xc6, 0x13, 0xba, 0x06, 0x5b, 0xfc,
+ 0x6a, 0xe5, 0x6f, 0x60, 0xfb, 0x07, 0x40, 0xb0,
+ 0x8c, 0x9d, 0x84, 0x43, 0x6b, 0xc1, 0xf7, 0x8d,
+ 0x8d, 0x31, 0xf7, 0x7a, 0x39, 0x4d, 0x8f, 0x9a,
+ 0xeb
+};
+static const u8 enc_assoc006[] __initconst = {
+ 0x70, 0xd3, 0x33, 0xf3, 0x8b, 0x18, 0x0b
+};
+static const u8 enc_nonce006[] __initconst = {
+ 0xdf, 0x51, 0x84, 0x82, 0x42, 0x0c, 0x75, 0x9c
+};
+static const u8 enc_key006[] __initconst = {
+ 0x68, 0x7b, 0x8d, 0x8e, 0xe3, 0xc4, 0xdd, 0xae,
+ 0xdf, 0x72, 0x7f, 0x53, 0x72, 0x25, 0x1e, 0x78,
+ 0x91, 0xcb, 0x69, 0x76, 0x1f, 0x49, 0x93, 0xf9,
+ 0x6f, 0x21, 0xcc, 0x39, 0x9c, 0xad, 0xb1, 0x01
+};
+
+static const u8 enc_input007[] __initconst = {
+ 0x9b, 0x18, 0xdb, 0xdd, 0x9a, 0x0f, 0x3e, 0xa5,
+ 0x15, 0x17, 0xde, 0xdf, 0x08, 0x9d, 0x65, 0x0a,
+ 0x67, 0x30, 0x12, 0xe2, 0x34, 0x77, 0x4b, 0xc1,
+ 0xd9, 0xc6, 0x1f, 0xab, 0xc6, 0x18, 0x50, 0x17,
+ 0xa7, 0x9d, 0x3c, 0xa6, 0xc5, 0x35, 0x8c, 0x1c,
+ 0xc0, 0xa1, 0x7c, 0x9f, 0x03, 0x89, 0xca, 0xe1,
+ 0xe6, 0xe9, 0xd4, 0xd3, 0x88, 0xdb, 0xb4, 0x51,
+ 0x9d, 0xec, 0xb4, 0xfc, 0x52, 0xee, 0x6d, 0xf1,
+ 0x75, 0x42, 0xc6, 0xfd, 0xbd, 0x7a, 0x8e, 0x86,
+ 0xfc, 0x44, 0xb3, 0x4f, 0xf3, 0xea, 0x67, 0x5a,
+ 0x41, 0x13, 0xba, 0xb0, 0xdc, 0xe1, 0xd3, 0x2a,
+ 0x7c, 0x22, 0xb3, 0xca, 0xac, 0x6a, 0x37, 0x98,
+ 0x3e, 0x1d, 0x40, 0x97, 0xf7, 0x9b, 0x1d, 0x36,
+ 0x6b, 0xb3, 0x28, 0xbd, 0x60, 0x82, 0x47, 0x34,
+ 0xaa, 0x2f, 0x7d, 0xe9, 0xa8, 0x70, 0x81, 0x57,
+ 0xd4, 0xb9, 0x77, 0x0a, 0x9d, 0x29, 0xa7, 0x84,
+ 0x52, 0x4f, 0xc2, 0x4a, 0x40, 0x3b, 0x3c, 0xd4,
+ 0xc9, 0x2a, 0xdb, 0x4a, 0x53, 0xc4, 0xbe, 0x80,
+ 0xe9, 0x51, 0x7f, 0x8f, 0xc7, 0xa2, 0xce, 0x82,
+ 0x5c, 0x91, 0x1e, 0x74, 0xd9, 0xd0, 0xbd, 0xd5,
+ 0xf3, 0xfd, 0xda, 0x4d, 0x25, 0xb4, 0xbb, 0x2d,
+ 0xac, 0x2f, 0x3d, 0x71, 0x85, 0x7b, 0xcf, 0x3c,
+ 0x7b, 0x3e, 0x0e, 0x22, 0x78, 0x0c, 0x29, 0xbf,
+ 0xe4, 0xf4, 0x57, 0xb3, 0xcb, 0x49, 0xa0, 0xfc,
+ 0x1e, 0x05, 0x4e, 0x16, 0xbc, 0xd5, 0xa8, 0xa3,
+ 0xee, 0x05, 0x35, 0xc6, 0x7c, 0xab, 0x60, 0x14,
+ 0x55, 0x1a, 0x8e, 0xc5, 0x88, 0x5d, 0xd5, 0x81,
+ 0xc2, 0x81, 0xa5, 0xc4, 0x60, 0xdb, 0xaf, 0x77,
+ 0x91, 0xe1, 0xce, 0xa2, 0x7e, 0x7f, 0x42, 0xe3,
+ 0xb0, 0x13, 0x1c, 0x1f, 0x25, 0x60, 0x21, 0xe2,
+ 0x40, 0x5f, 0x99, 0xb7, 0x73, 0xec, 0x9b, 0x2b,
+ 0xf0, 0x65, 0x11, 0xc8, 0xd0, 0x0a, 0x9f, 0xd3
+};
+static const u8 enc_output007[] __initconst = {
+ 0x85, 0x04, 0xc2, 0xed, 0x8d, 0xfd, 0x97, 0x5c,
+ 0xd2, 0xb7, 0xe2, 0xc1, 0x6b, 0xa3, 0xba, 0xf8,
+ 0xc9, 0x50, 0xc3, 0xc6, 0xa5, 0xe3, 0xa4, 0x7c,
+ 0xc3, 0x23, 0x49, 0x5e, 0xa9, 0xb9, 0x32, 0xeb,
+ 0x8a, 0x7c, 0xca, 0xe5, 0xec, 0xfb, 0x7c, 0xc0,
+ 0xcb, 0x7d, 0xdc, 0x2c, 0x9d, 0x92, 0x55, 0x21,
+ 0x0a, 0xc8, 0x43, 0x63, 0x59, 0x0a, 0x31, 0x70,
+ 0x82, 0x67, 0x41, 0x03, 0xf8, 0xdf, 0xf2, 0xac,
+ 0xa7, 0x02, 0xd4, 0xd5, 0x8a, 0x2d, 0xc8, 0x99,
+ 0x19, 0x66, 0xd0, 0xf6, 0x88, 0x2c, 0x77, 0xd9,
+ 0xd4, 0x0d, 0x6c, 0xbd, 0x98, 0xde, 0xe7, 0x7f,
+ 0xad, 0x7e, 0x8a, 0xfb, 0xe9, 0x4b, 0xe5, 0xf7,
+ 0xe5, 0x50, 0xa0, 0x90, 0x3f, 0xd6, 0x22, 0x53,
+ 0xe3, 0xfe, 0x1b, 0xcc, 0x79, 0x3b, 0xec, 0x12,
+ 0x47, 0x52, 0xa7, 0xd6, 0x04, 0xe3, 0x52, 0xe6,
+ 0x93, 0x90, 0x91, 0x32, 0x73, 0x79, 0xb8, 0xd0,
+ 0x31, 0xde, 0x1f, 0x9f, 0x2f, 0x05, 0x38, 0x54,
+ 0x2f, 0x35, 0x04, 0x39, 0xe0, 0xa7, 0xba, 0xc6,
+ 0x52, 0xf6, 0x37, 0x65, 0x4c, 0x07, 0xa9, 0x7e,
+ 0xb3, 0x21, 0x6f, 0x74, 0x8c, 0xc9, 0xde, 0xdb,
+ 0x65, 0x1b, 0x9b, 0xaa, 0x60, 0xb1, 0x03, 0x30,
+ 0x6b, 0xb2, 0x03, 0xc4, 0x1c, 0x04, 0xf8, 0x0f,
+ 0x64, 0xaf, 0x46, 0xe4, 0x65, 0x99, 0x49, 0xe2,
+ 0xea, 0xce, 0x78, 0x00, 0xd8, 0x8b, 0xd5, 0x2e,
+ 0xcf, 0xfc, 0x40, 0x49, 0xe8, 0x58, 0xdc, 0x34,
+ 0x9c, 0x8c, 0x61, 0xbf, 0x0a, 0x8e, 0xec, 0x39,
+ 0xa9, 0x30, 0x05, 0x5a, 0xd2, 0x56, 0x01, 0xc7,
+ 0xda, 0x8f, 0x4e, 0xbb, 0x43, 0xa3, 0x3a, 0xf9,
+ 0x15, 0x2a, 0xd0, 0xa0, 0x7a, 0x87, 0x34, 0x82,
+ 0xfe, 0x8a, 0xd1, 0x2d, 0x5e, 0xc7, 0xbf, 0x04,
+ 0x53, 0x5f, 0x3b, 0x36, 0xd4, 0x25, 0x5c, 0x34,
+ 0x7a, 0x8d, 0xd5, 0x05, 0xce, 0x72, 0xca, 0xef,
+ 0x7a, 0x4b, 0xbc, 0xb0, 0x10, 0x5c, 0x96, 0x42,
+ 0x3a, 0x00, 0x98, 0xcd, 0x15, 0xe8, 0xb7, 0x53
+};
+static const u8 enc_assoc007[] __initconst = { };
+static const u8 enc_nonce007[] __initconst = {
+ 0xde, 0x7b, 0xef, 0xc3, 0x65, 0x1b, 0x68, 0xb0
+};
+static const u8 enc_key007[] __initconst = {
+ 0x8d, 0xb8, 0x91, 0x48, 0xf0, 0xe7, 0x0a, 0xbd,
+ 0xf9, 0x3f, 0xcd, 0xd9, 0xa0, 0x1e, 0x42, 0x4c,
+ 0xe7, 0xde, 0x25, 0x3d, 0xa3, 0xd7, 0x05, 0x80,
+ 0x8d, 0xf2, 0x82, 0xac, 0x44, 0x16, 0x51, 0x01
+};
+
+static const u8 enc_input008[] __initconst = {
+ 0xc3, 0x09, 0x94, 0x62, 0xe6, 0x46, 0x2e, 0x10,
+ 0xbe, 0x00, 0xe4, 0xfc, 0xf3, 0x40, 0xa3, 0xe2,
+ 0x0f, 0xc2, 0x8b, 0x28, 0xdc, 0xba, 0xb4, 0x3c,
+ 0xe4, 0x21, 0x58, 0x61, 0xcd, 0x8b, 0xcd, 0xfb,
+ 0xac, 0x94, 0xa1, 0x45, 0xf5, 0x1c, 0xe1, 0x12,
+ 0xe0, 0x3b, 0x67, 0x21, 0x54, 0x5e, 0x8c, 0xaa,
+ 0xcf, 0xdb, 0xb4, 0x51, 0xd4, 0x13, 0xda, 0xe6,
+ 0x83, 0x89, 0xb6, 0x92, 0xe9, 0x21, 0x76, 0xa4,
+ 0x93, 0x7d, 0x0e, 0xfd, 0x96, 0x36, 0x03, 0x91,
+ 0x43, 0x5c, 0x92, 0x49, 0x62, 0x61, 0x7b, 0xeb,
+ 0x43, 0x89, 0xb8, 0x12, 0x20, 0x43, 0xd4, 0x47,
+ 0x06, 0x84, 0xee, 0x47, 0xe9, 0x8a, 0x73, 0x15,
+ 0x0f, 0x72, 0xcf, 0xed, 0xce, 0x96, 0xb2, 0x7f,
+ 0x21, 0x45, 0x76, 0xeb, 0x26, 0x28, 0x83, 0x6a,
+ 0xad, 0xaa, 0xa6, 0x81, 0xd8, 0x55, 0xb1, 0xa3,
+ 0x85, 0xb3, 0x0c, 0xdf, 0xf1, 0x69, 0x2d, 0x97,
+ 0x05, 0x2a, 0xbc, 0x7c, 0x7b, 0x25, 0xf8, 0x80,
+ 0x9d, 0x39, 0x25, 0xf3, 0x62, 0xf0, 0x66, 0x5e,
+ 0xf4, 0xa0, 0xcf, 0xd8, 0xfd, 0x4f, 0xb1, 0x1f,
+ 0x60, 0x3a, 0x08, 0x47, 0xaf, 0xe1, 0xf6, 0x10,
+ 0x77, 0x09, 0xa7, 0x27, 0x8f, 0x9a, 0x97, 0x5a,
+ 0x26, 0xfa, 0xfe, 0x41, 0x32, 0x83, 0x10, 0xe0,
+ 0x1d, 0xbf, 0x64, 0x0d, 0xf4, 0x1c, 0x32, 0x35,
+ 0xe5, 0x1b, 0x36, 0xef, 0xd4, 0x4a, 0x93, 0x4d,
+ 0x00, 0x7c, 0xec, 0x02, 0x07, 0x8b, 0x5d, 0x7d,
+ 0x1b, 0x0e, 0xd1, 0xa6, 0xa5, 0x5d, 0x7d, 0x57,
+ 0x88, 0xa8, 0xcc, 0x81, 0xb4, 0x86, 0x4e, 0xb4,
+ 0x40, 0xe9, 0x1d, 0xc3, 0xb1, 0x24, 0x3e, 0x7f,
+ 0xcc, 0x8a, 0x24, 0x9b, 0xdf, 0x6d, 0xf0, 0x39,
+ 0x69, 0x3e, 0x4c, 0xc0, 0x96, 0xe4, 0x13, 0xda,
+ 0x90, 0xda, 0xf4, 0x95, 0x66, 0x8b, 0x17, 0x17,
+ 0xfe, 0x39, 0x43, 0x25, 0xaa, 0xda, 0xa0, 0x43,
+ 0x3c, 0xb1, 0x41, 0x02, 0xa3, 0xf0, 0xa7, 0x19,
+ 0x59, 0xbc, 0x1d, 0x7d, 0x6c, 0x6d, 0x91, 0x09,
+ 0x5c, 0xb7, 0x5b, 0x01, 0xd1, 0x6f, 0x17, 0x21,
+ 0x97, 0xbf, 0x89, 0x71, 0xa5, 0xb0, 0x6e, 0x07,
+ 0x45, 0xfd, 0x9d, 0xea, 0x07, 0xf6, 0x7a, 0x9f,
+ 0x10, 0x18, 0x22, 0x30, 0x73, 0xac, 0xd4, 0x6b,
+ 0x72, 0x44, 0xed, 0xd9, 0x19, 0x9b, 0x2d, 0x4a,
+ 0x41, 0xdd, 0xd1, 0x85, 0x5e, 0x37, 0x19, 0xed,
+ 0xd2, 0x15, 0x8f, 0x5e, 0x91, 0xdb, 0x33, 0xf2,
+ 0xe4, 0xdb, 0xff, 0x98, 0xfb, 0xa3, 0xb5, 0xca,
+ 0x21, 0x69, 0x08, 0xe7, 0x8a, 0xdf, 0x90, 0xff,
+ 0x3e, 0xe9, 0x20, 0x86, 0x3c, 0xe9, 0xfc, 0x0b,
+ 0xfe, 0x5c, 0x61, 0xaa, 0x13, 0x92, 0x7f, 0x7b,
+ 0xec, 0xe0, 0x6d, 0xa8, 0x23, 0x22, 0xf6, 0x6b,
+ 0x77, 0xc4, 0xfe, 0x40, 0x07, 0x3b, 0xb6, 0xf6,
+ 0x8e, 0x5f, 0xd4, 0xb9, 0xb7, 0x0f, 0x21, 0x04,
+ 0xef, 0x83, 0x63, 0x91, 0x69, 0x40, 0xa3, 0x48,
+ 0x5c, 0xd2, 0x60, 0xf9, 0x4f, 0x6c, 0x47, 0x8b,
+ 0x3b, 0xb1, 0x9f, 0x8e, 0xee, 0x16, 0x8a, 0x13,
+ 0xfc, 0x46, 0x17, 0xc3, 0xc3, 0x32, 0x56, 0xf8,
+ 0x3c, 0x85, 0x3a, 0xb6, 0x3e, 0xaa, 0x89, 0x4f,
+ 0xb3, 0xdf, 0x38, 0xfd, 0xf1, 0xe4, 0x3a, 0xc0,
+ 0xe6, 0x58, 0xb5, 0x8f, 0xc5, 0x29, 0xa2, 0x92,
+ 0x4a, 0xb6, 0xa0, 0x34, 0x7f, 0xab, 0xb5, 0x8a,
+ 0x90, 0xa1, 0xdb, 0x4d, 0xca, 0xb6, 0x2c, 0x41,
+ 0x3c, 0xf7, 0x2b, 0x21, 0xc3, 0xfd, 0xf4, 0x17,
+ 0x5c, 0xb5, 0x33, 0x17, 0x68, 0x2b, 0x08, 0x30,
+ 0xf3, 0xf7, 0x30, 0x3c, 0x96, 0xe6, 0x6a, 0x20,
+ 0x97, 0xe7, 0x4d, 0x10, 0x5f, 0x47, 0x5f, 0x49,
+ 0x96, 0x09, 0xf0, 0x27, 0x91, 0xc8, 0xf8, 0x5a,
+ 0x2e, 0x79, 0xb5, 0xe2, 0xb8, 0xe8, 0xb9, 0x7b,
+ 0xd5, 0x10, 0xcb, 0xff, 0x5d, 0x14, 0x73, 0xf3
+};
+static const u8 enc_output008[] __initconst = {
+ 0x14, 0xf6, 0x41, 0x37, 0xa6, 0xd4, 0x27, 0xcd,
+ 0xdb, 0x06, 0x3e, 0x9a, 0x4e, 0xab, 0xd5, 0xb1,
+ 0x1e, 0x6b, 0xd2, 0xbc, 0x11, 0xf4, 0x28, 0x93,
+ 0x63, 0x54, 0xef, 0xbb, 0x5e, 0x1d, 0x3a, 0x1d,
+ 0x37, 0x3c, 0x0a, 0x6c, 0x1e, 0xc2, 0xd1, 0x2c,
+ 0xb5, 0xa3, 0xb5, 0x7b, 0xb8, 0x8f, 0x25, 0xa6,
+ 0x1b, 0x61, 0x1c, 0xec, 0x28, 0x58, 0x26, 0xa4,
+ 0xa8, 0x33, 0x28, 0x25, 0x5c, 0x45, 0x05, 0xe5,
+ 0x6c, 0x99, 0xe5, 0x45, 0xc4, 0xa2, 0x03, 0x84,
+ 0x03, 0x73, 0x1e, 0x8c, 0x49, 0xac, 0x20, 0xdd,
+ 0x8d, 0xb3, 0xc4, 0xf5, 0xe7, 0x4f, 0xf1, 0xed,
+ 0xa1, 0x98, 0xde, 0xa4, 0x96, 0xdd, 0x2f, 0xab,
+ 0xab, 0x97, 0xcf, 0x3e, 0xd2, 0x9e, 0xb8, 0x13,
+ 0x07, 0x28, 0x29, 0x19, 0xaf, 0xfd, 0xf2, 0x49,
+ 0x43, 0xea, 0x49, 0x26, 0x91, 0xc1, 0x07, 0xd6,
+ 0xbb, 0x81, 0x75, 0x35, 0x0d, 0x24, 0x7f, 0xc8,
+ 0xda, 0xd4, 0xb7, 0xeb, 0xe8, 0x5c, 0x09, 0xa2,
+ 0x2f, 0xdc, 0x28, 0x7d, 0x3a, 0x03, 0xfa, 0x94,
+ 0xb5, 0x1d, 0x17, 0x99, 0x36, 0xc3, 0x1c, 0x18,
+ 0x34, 0xe3, 0x9f, 0xf5, 0x55, 0x7c, 0xb0, 0x60,
+ 0x9d, 0xff, 0xac, 0xd4, 0x61, 0xf2, 0xad, 0xf8,
+ 0xce, 0xc7, 0xbe, 0x5c, 0xd2, 0x95, 0xa8, 0x4b,
+ 0x77, 0x13, 0x19, 0x59, 0x26, 0xc9, 0xb7, 0x8f,
+ 0x6a, 0xcb, 0x2d, 0x37, 0x91, 0xea, 0x92, 0x9c,
+ 0x94, 0x5b, 0xda, 0x0b, 0xce, 0xfe, 0x30, 0x20,
+ 0xf8, 0x51, 0xad, 0xf2, 0xbe, 0xe7, 0xc7, 0xff,
+ 0xb3, 0x33, 0x91, 0x6a, 0xc9, 0x1a, 0x41, 0xc9,
+ 0x0f, 0xf3, 0x10, 0x0e, 0xfd, 0x53, 0xff, 0x6c,
+ 0x16, 0x52, 0xd9, 0xf3, 0xf7, 0x98, 0x2e, 0xc9,
+ 0x07, 0x31, 0x2c, 0x0c, 0x72, 0xd7, 0xc5, 0xc6,
+ 0x08, 0x2a, 0x7b, 0xda, 0xbd, 0x7e, 0x02, 0xea,
+ 0x1a, 0xbb, 0xf2, 0x04, 0x27, 0x61, 0x28, 0x8e,
+ 0xf5, 0x04, 0x03, 0x1f, 0x4c, 0x07, 0x55, 0x82,
+ 0xec, 0x1e, 0xd7, 0x8b, 0x2f, 0x65, 0x56, 0xd1,
+ 0xd9, 0x1e, 0x3c, 0xe9, 0x1f, 0x5e, 0x98, 0x70,
+ 0x38, 0x4a, 0x8c, 0x49, 0xc5, 0x43, 0xa0, 0xa1,
+ 0x8b, 0x74, 0x9d, 0x4c, 0x62, 0x0d, 0x10, 0x0c,
+ 0xf4, 0x6c, 0x8f, 0xe0, 0xaa, 0x9a, 0x8d, 0xb7,
+ 0xe0, 0xbe, 0x4c, 0x87, 0xf1, 0x98, 0x2f, 0xcc,
+ 0xed, 0xc0, 0x52, 0x29, 0xdc, 0x83, 0xf8, 0xfc,
+ 0x2c, 0x0e, 0xa8, 0x51, 0x4d, 0x80, 0x0d, 0xa3,
+ 0xfe, 0xd8, 0x37, 0xe7, 0x41, 0x24, 0xfc, 0xfb,
+ 0x75, 0xe3, 0x71, 0x7b, 0x57, 0x45, 0xf5, 0x97,
+ 0x73, 0x65, 0x63, 0x14, 0x74, 0xb8, 0x82, 0x9f,
+ 0xf8, 0x60, 0x2f, 0x8a, 0xf2, 0x4e, 0xf1, 0x39,
+ 0xda, 0x33, 0x91, 0xf8, 0x36, 0xe0, 0x8d, 0x3f,
+ 0x1f, 0x3b, 0x56, 0xdc, 0xa0, 0x8f, 0x3c, 0x9d,
+ 0x71, 0x52, 0xa7, 0xb8, 0xc0, 0xa5, 0xc6, 0xa2,
+ 0x73, 0xda, 0xf4, 0x4b, 0x74, 0x5b, 0x00, 0x3d,
+ 0x99, 0xd7, 0x96, 0xba, 0xe6, 0xe1, 0xa6, 0x96,
+ 0x38, 0xad, 0xb3, 0xc0, 0xd2, 0xba, 0x91, 0x6b,
+ 0xf9, 0x19, 0xdd, 0x3b, 0xbe, 0xbe, 0x9c, 0x20,
+ 0x50, 0xba, 0xa1, 0xd0, 0xce, 0x11, 0xbd, 0x95,
+ 0xd8, 0xd1, 0xdd, 0x33, 0x85, 0x74, 0xdc, 0xdb,
+ 0x66, 0x76, 0x44, 0xdc, 0x03, 0x74, 0x48, 0x35,
+ 0x98, 0xb1, 0x18, 0x47, 0x94, 0x7d, 0xff, 0x62,
+ 0xe4, 0x58, 0x78, 0xab, 0xed, 0x95, 0x36, 0xd9,
+ 0x84, 0x91, 0x82, 0x64, 0x41, 0xbb, 0x58, 0xe6,
+ 0x1c, 0x20, 0x6d, 0x15, 0x6b, 0x13, 0x96, 0xe8,
+ 0x35, 0x7f, 0xdc, 0x40, 0x2c, 0xe9, 0xbc, 0x8a,
+ 0x4f, 0x92, 0xec, 0x06, 0x2d, 0x50, 0xdf, 0x93,
+ 0x5d, 0x65, 0x5a, 0xa8, 0xfc, 0x20, 0x50, 0x14,
+ 0xa9, 0x8a, 0x7e, 0x1d, 0x08, 0x1f, 0xe2, 0x99,
+ 0xd0, 0xbe, 0xfb, 0x3a, 0x21, 0x9d, 0xad, 0x86,
+ 0x54, 0xfd, 0x0d, 0x98, 0x1c, 0x5a, 0x6f, 0x1f,
+ 0x9a, 0x40, 0xcd, 0xa2, 0xff, 0x6a, 0xf1, 0x54
+};
+static const u8 enc_assoc008[] __initconst = { };
+static const u8 enc_nonce008[] __initconst = {
+ 0x0e, 0x0d, 0x57, 0xbb, 0x7b, 0x40, 0x54, 0x02
+};
+static const u8 enc_key008[] __initconst = {
+ 0xf2, 0xaa, 0x4f, 0x99, 0xfd, 0x3e, 0xa8, 0x53,
+ 0xc1, 0x44, 0xe9, 0x81, 0x18, 0xdc, 0xf5, 0xf0,
+ 0x3e, 0x44, 0x15, 0x59, 0xe0, 0xc5, 0x44, 0x86,
+ 0xc3, 0x91, 0xa8, 0x75, 0xc0, 0x12, 0x46, 0xba
+};
+
+static const u8 enc_input009[] __initconst = {
+ 0xe6, 0xc3, 0xdb, 0x63, 0x55, 0x15, 0xe3, 0x5b,
+ 0xb7, 0x4b, 0x27, 0x8b, 0x5a, 0xdd, 0xc2, 0xe8,
+ 0x3a, 0x6b, 0xd7, 0x81, 0x96, 0x35, 0x97, 0xca,
+ 0xd7, 0x68, 0xe8, 0xef, 0xce, 0xab, 0xda, 0x09,
+ 0x6e, 0xd6, 0x8e, 0xcb, 0x55, 0xb5, 0xe1, 0xe5,
+ 0x57, 0xfd, 0xc4, 0xe3, 0xe0, 0x18, 0x4f, 0x85,
+ 0xf5, 0x3f, 0x7e, 0x4b, 0x88, 0xc9, 0x52, 0x44,
+ 0x0f, 0xea, 0xaf, 0x1f, 0x71, 0x48, 0x9f, 0x97,
+ 0x6d, 0xb9, 0x6f, 0x00, 0xa6, 0xde, 0x2b, 0x77,
+ 0x8b, 0x15, 0xad, 0x10, 0xa0, 0x2b, 0x7b, 0x41,
+ 0x90, 0x03, 0x2d, 0x69, 0xae, 0xcc, 0x77, 0x7c,
+ 0xa5, 0x9d, 0x29, 0x22, 0xc2, 0xea, 0xb4, 0x00,
+ 0x1a, 0xd2, 0x7a, 0x98, 0x8a, 0xf9, 0xf7, 0x82,
+ 0xb0, 0xab, 0xd8, 0xa6, 0x94, 0x8d, 0x58, 0x2f,
+ 0x01, 0x9e, 0x00, 0x20, 0xfc, 0x49, 0xdc, 0x0e,
+ 0x03, 0xe8, 0x45, 0x10, 0xd6, 0xa8, 0xda, 0x55,
+ 0x10, 0x9a, 0xdf, 0x67, 0x22, 0x8b, 0x43, 0xab,
+ 0x00, 0xbb, 0x02, 0xc8, 0xdd, 0x7b, 0x97, 0x17,
+ 0xd7, 0x1d, 0x9e, 0x02, 0x5e, 0x48, 0xde, 0x8e,
+ 0xcf, 0x99, 0x07, 0x95, 0x92, 0x3c, 0x5f, 0x9f,
+ 0xc5, 0x8a, 0xc0, 0x23, 0xaa, 0xd5, 0x8c, 0x82,
+ 0x6e, 0x16, 0x92, 0xb1, 0x12, 0x17, 0x07, 0xc3,
+ 0xfb, 0x36, 0xf5, 0x6c, 0x35, 0xd6, 0x06, 0x1f,
+ 0x9f, 0xa7, 0x94, 0xa2, 0x38, 0x63, 0x9c, 0xb0,
+ 0x71, 0xb3, 0xa5, 0xd2, 0xd8, 0xba, 0x9f, 0x08,
+ 0x01, 0xb3, 0xff, 0x04, 0x97, 0x73, 0x45, 0x1b,
+ 0xd5, 0xa9, 0x9c, 0x80, 0xaf, 0x04, 0x9a, 0x85,
+ 0xdb, 0x32, 0x5b, 0x5d, 0x1a, 0xc1, 0x36, 0x28,
+ 0x10, 0x79, 0xf1, 0x3c, 0xbf, 0x1a, 0x41, 0x5c,
+ 0x4e, 0xdf, 0xb2, 0x7c, 0x79, 0x3b, 0x7a, 0x62,
+ 0x3d, 0x4b, 0xc9, 0x9b, 0x2a, 0x2e, 0x7c, 0xa2,
+ 0xb1, 0x11, 0x98, 0xa7, 0x34, 0x1a, 0x00, 0xf3,
+ 0xd1, 0xbc, 0x18, 0x22, 0xba, 0x02, 0x56, 0x62,
+ 0x31, 0x10, 0x11, 0x6d, 0xe0, 0x54, 0x9d, 0x40,
+ 0x1f, 0x26, 0x80, 0x41, 0xca, 0x3f, 0x68, 0x0f,
+ 0x32, 0x1d, 0x0a, 0x8e, 0x79, 0xd8, 0xa4, 0x1b,
+ 0x29, 0x1c, 0x90, 0x8e, 0xc5, 0xe3, 0xb4, 0x91,
+ 0x37, 0x9a, 0x97, 0x86, 0x99, 0xd5, 0x09, 0xc5,
+ 0xbb, 0xa3, 0x3f, 0x21, 0x29, 0x82, 0x14, 0x5c,
+ 0xab, 0x25, 0xfb, 0xf2, 0x4f, 0x58, 0x26, 0xd4,
+ 0x83, 0xaa, 0x66, 0x89, 0x67, 0x7e, 0xc0, 0x49,
+ 0xe1, 0x11, 0x10, 0x7f, 0x7a, 0xda, 0x29, 0x04,
+ 0xff, 0xf0, 0xcb, 0x09, 0x7c, 0x9d, 0xfa, 0x03,
+ 0x6f, 0x81, 0x09, 0x31, 0x60, 0xfb, 0x08, 0xfa,
+ 0x74, 0xd3, 0x64, 0x44, 0x7c, 0x55, 0x85, 0xec,
+ 0x9c, 0x6e, 0x25, 0xb7, 0x6c, 0xc5, 0x37, 0xb6,
+ 0x83, 0x87, 0x72, 0x95, 0x8b, 0x9d, 0xe1, 0x69,
+ 0x5c, 0x31, 0x95, 0x42, 0xa6, 0x2c, 0xd1, 0x36,
+ 0x47, 0x1f, 0xec, 0x54, 0xab, 0xa2, 0x1c, 0xd8,
+ 0x00, 0xcc, 0xbc, 0x0d, 0x65, 0xe2, 0x67, 0xbf,
+ 0xbc, 0xea, 0xee, 0x9e, 0xe4, 0x36, 0x95, 0xbe,
+ 0x73, 0xd9, 0xa6, 0xd9, 0x0f, 0xa0, 0xcc, 0x82,
+ 0x76, 0x26, 0xad, 0x5b, 0x58, 0x6c, 0x4e, 0xab,
+ 0x29, 0x64, 0xd3, 0xd9, 0xa9, 0x08, 0x8c, 0x1d,
+ 0xa1, 0x4f, 0x80, 0xd8, 0x3f, 0x94, 0xfb, 0xd3,
+ 0x7b, 0xfc, 0xd1, 0x2b, 0xc3, 0x21, 0xeb, 0xe5,
+ 0x1c, 0x84, 0x23, 0x7f, 0x4b, 0xfa, 0xdb, 0x34,
+ 0x18, 0xa2, 0xc2, 0xe5, 0x13, 0xfe, 0x6c, 0x49,
+ 0x81, 0xd2, 0x73, 0xe7, 0xe2, 0xd7, 0xe4, 0x4f,
+ 0x4b, 0x08, 0x6e, 0xb1, 0x12, 0x22, 0x10, 0x9d,
+ 0xac, 0x51, 0x1e, 0x17, 0xd9, 0x8a, 0x0b, 0x42,
+ 0x88, 0x16, 0x81, 0x37, 0x7c, 0x6a, 0xf7, 0xef,
+ 0x2d, 0xe3, 0xd9, 0xf8, 0x5f, 0xe0, 0x53, 0x27,
+ 0x74, 0xb9, 0xe2, 0xd6, 0x1c, 0x80, 0x2c, 0x52,
+ 0x65
+};
+static const u8 enc_output009[] __initconst = {
+ 0xfd, 0x81, 0x8d, 0xd0, 0x3d, 0xb4, 0xd5, 0xdf,
+ 0xd3, 0x42, 0x47, 0x5a, 0x6d, 0x19, 0x27, 0x66,
+ 0x4b, 0x2e, 0x0c, 0x27, 0x9c, 0x96, 0x4c, 0x72,
+ 0x02, 0xa3, 0x65, 0xc3, 0xb3, 0x6f, 0x2e, 0xbd,
+ 0x63, 0x8a, 0x4a, 0x5d, 0x29, 0xa2, 0xd0, 0x28,
+ 0x48, 0xc5, 0x3d, 0x98, 0xa3, 0xbc, 0xe0, 0xbe,
+ 0x3b, 0x3f, 0xe6, 0x8a, 0xa4, 0x7f, 0x53, 0x06,
+ 0xfa, 0x7f, 0x27, 0x76, 0x72, 0x31, 0xa1, 0xf5,
+ 0xd6, 0x0c, 0x52, 0x47, 0xba, 0xcd, 0x4f, 0xd7,
+ 0xeb, 0x05, 0x48, 0x0d, 0x7c, 0x35, 0x4a, 0x09,
+ 0xc9, 0x76, 0x71, 0x02, 0xa3, 0xfb, 0xb7, 0x1a,
+ 0x65, 0xb7, 0xed, 0x98, 0xc6, 0x30, 0x8a, 0x00,
+ 0xae, 0xa1, 0x31, 0xe5, 0xb5, 0x9e, 0x6d, 0x62,
+ 0xda, 0xda, 0x07, 0x0f, 0x38, 0x38, 0xd3, 0xcb,
+ 0xc1, 0xb0, 0xad, 0xec, 0x72, 0xec, 0xb1, 0xa2,
+ 0x7b, 0x59, 0xf3, 0x3d, 0x2b, 0xef, 0xcd, 0x28,
+ 0x5b, 0x83, 0xcc, 0x18, 0x91, 0x88, 0xb0, 0x2e,
+ 0xf9, 0x29, 0x31, 0x18, 0xf9, 0x4e, 0xe9, 0x0a,
+ 0x91, 0x92, 0x9f, 0xae, 0x2d, 0xad, 0xf4, 0xe6,
+ 0x1a, 0xe2, 0xa4, 0xee, 0x47, 0x15, 0xbf, 0x83,
+ 0x6e, 0xd7, 0x72, 0x12, 0x3b, 0x2d, 0x24, 0xe9,
+ 0xb2, 0x55, 0xcb, 0x3c, 0x10, 0xf0, 0x24, 0x8a,
+ 0x4a, 0x02, 0xea, 0x90, 0x25, 0xf0, 0xb4, 0x79,
+ 0x3a, 0xef, 0x6e, 0xf5, 0x52, 0xdf, 0xb0, 0x0a,
+ 0xcd, 0x24, 0x1c, 0xd3, 0x2e, 0x22, 0x74, 0xea,
+ 0x21, 0x6f, 0xe9, 0xbd, 0xc8, 0x3e, 0x36, 0x5b,
+ 0x19, 0xf1, 0xca, 0x99, 0x0a, 0xb4, 0xa7, 0x52,
+ 0x1a, 0x4e, 0xf2, 0xad, 0x8d, 0x56, 0x85, 0xbb,
+ 0x64, 0x89, 0xba, 0x26, 0xf9, 0xc7, 0xe1, 0x89,
+ 0x19, 0x22, 0x77, 0xc3, 0xa8, 0xfc, 0xff, 0xad,
+ 0xfe, 0xb9, 0x48, 0xae, 0x12, 0x30, 0x9f, 0x19,
+ 0xfb, 0x1b, 0xef, 0x14, 0x87, 0x8a, 0x78, 0x71,
+ 0xf3, 0xf4, 0xb7, 0x00, 0x9c, 0x1d, 0xb5, 0x3d,
+ 0x49, 0x00, 0x0c, 0x06, 0xd4, 0x50, 0xf9, 0x54,
+ 0x45, 0xb2, 0x5b, 0x43, 0xdb, 0x6d, 0xcf, 0x1a,
+ 0xe9, 0x7a, 0x7a, 0xcf, 0xfc, 0x8a, 0x4e, 0x4d,
+ 0x0b, 0x07, 0x63, 0x28, 0xd8, 0xe7, 0x08, 0x95,
+ 0xdf, 0xa6, 0x72, 0x93, 0x2e, 0xbb, 0xa0, 0x42,
+ 0x89, 0x16, 0xf1, 0xd9, 0x0c, 0xf9, 0xa1, 0x16,
+ 0xfd, 0xd9, 0x03, 0xb4, 0x3b, 0x8a, 0xf5, 0xf6,
+ 0xe7, 0x6b, 0x2e, 0x8e, 0x4c, 0x3d, 0xe2, 0xaf,
+ 0x08, 0x45, 0x03, 0xff, 0x09, 0xb6, 0xeb, 0x2d,
+ 0xc6, 0x1b, 0x88, 0x94, 0xac, 0x3e, 0xf1, 0x9f,
+ 0x0e, 0x0e, 0x2b, 0xd5, 0x00, 0x4d, 0x3f, 0x3b,
+ 0x53, 0xae, 0xaf, 0x1c, 0x33, 0x5f, 0x55, 0x6e,
+ 0x8d, 0xaf, 0x05, 0x7a, 0x10, 0x34, 0xc9, 0xf4,
+ 0x66, 0xcb, 0x62, 0x12, 0xa6, 0xee, 0xe8, 0x1c,
+ 0x5d, 0x12, 0x86, 0xdb, 0x6f, 0x1c, 0x33, 0xc4,
+ 0x1c, 0xda, 0x82, 0x2d, 0x3b, 0x59, 0xfe, 0xb1,
+ 0xa4, 0x59, 0x41, 0x86, 0xd0, 0xef, 0xae, 0xfb,
+ 0xda, 0x6d, 0x11, 0xb8, 0xca, 0xe9, 0x6e, 0xff,
+ 0xf7, 0xa9, 0xd9, 0x70, 0x30, 0xfc, 0x53, 0xe2,
+ 0xd7, 0xa2, 0x4e, 0xc7, 0x91, 0xd9, 0x07, 0x06,
+ 0xaa, 0xdd, 0xb0, 0x59, 0x28, 0x1d, 0x00, 0x66,
+ 0xc5, 0x54, 0xc2, 0xfc, 0x06, 0xda, 0x05, 0x90,
+ 0x52, 0x1d, 0x37, 0x66, 0xee, 0xf0, 0xb2, 0x55,
+ 0x8a, 0x5d, 0xd2, 0x38, 0x86, 0x94, 0x9b, 0xfc,
+ 0x10, 0x4c, 0xa1, 0xb9, 0x64, 0x3e, 0x44, 0xb8,
+ 0x5f, 0xb0, 0x0c, 0xec, 0xe0, 0xc9, 0xe5, 0x62,
+ 0x75, 0x3f, 0x09, 0xd5, 0xf5, 0xd9, 0x26, 0xba,
+ 0x9e, 0xd2, 0xf4, 0xb9, 0x48, 0x0a, 0xbc, 0xa2,
+ 0xd6, 0x7c, 0x36, 0x11, 0x7d, 0x26, 0x81, 0x89,
+ 0xcf, 0xa4, 0xad, 0x73, 0x0e, 0xee, 0xcc, 0x06,
+ 0xa9, 0xdb, 0xb1, 0xfd, 0xfb, 0x09, 0x7f, 0x90,
+ 0x42, 0x37, 0x2f, 0xe1, 0x9c, 0x0f, 0x6f, 0xcf,
+ 0x43, 0xb5, 0xd9, 0x90, 0xe1, 0x85, 0xf5, 0xa8,
+ 0xae
+};
+static const u8 enc_assoc009[] __initconst = {
+ 0x5a, 0x27, 0xff, 0xeb, 0xdf, 0x84, 0xb2, 0x9e,
+ 0xef
+};
+static const u8 enc_nonce009[] __initconst = {
+ 0xef, 0x2d, 0x63, 0xee, 0x6b, 0x80, 0x8b, 0x78
+};
+static const u8 enc_key009[] __initconst = {
+ 0xea, 0xbc, 0x56, 0x99, 0xe3, 0x50, 0xff, 0xc5,
+ 0xcc, 0x1a, 0xd7, 0xc1, 0x57, 0x72, 0xea, 0x86,
+ 0x5b, 0x89, 0x88, 0x61, 0x3d, 0x2f, 0x9b, 0xb2,
+ 0xe7, 0x9c, 0xec, 0x74, 0x6e, 0x3e, 0xf4, 0x3b
+};
+
+static const u8 enc_input010[] __initconst = {
+ 0x42, 0x93, 0xe4, 0xeb, 0x97, 0xb0, 0x57, 0xbf,
+ 0x1a, 0x8b, 0x1f, 0xe4, 0x5f, 0x36, 0x20, 0x3c,
+ 0xef, 0x0a, 0xa9, 0x48, 0x5f, 0x5f, 0x37, 0x22,
+ 0x3a, 0xde, 0xe3, 0xae, 0xbe, 0xad, 0x07, 0xcc,
+ 0xb1, 0xf6, 0xf5, 0xf9, 0x56, 0xdd, 0xe7, 0x16,
+ 0x1e, 0x7f, 0xdf, 0x7a, 0x9e, 0x75, 0xb7, 0xc7,
+ 0xbe, 0xbe, 0x8a, 0x36, 0x04, 0xc0, 0x10, 0xf4,
+ 0x95, 0x20, 0x03, 0xec, 0xdc, 0x05, 0xa1, 0x7d,
+ 0xc4, 0xa9, 0x2c, 0x82, 0xd0, 0xbc, 0x8b, 0xc5,
+ 0xc7, 0x45, 0x50, 0xf6, 0xa2, 0x1a, 0xb5, 0x46,
+ 0x3b, 0x73, 0x02, 0xa6, 0x83, 0x4b, 0x73, 0x82,
+ 0x58, 0x5e, 0x3b, 0x65, 0x2f, 0x0e, 0xfd, 0x2b,
+ 0x59, 0x16, 0xce, 0xa1, 0x60, 0x9c, 0xe8, 0x3a,
+ 0x99, 0xed, 0x8d, 0x5a, 0xcf, 0xf6, 0x83, 0xaf,
+ 0xba, 0xd7, 0x73, 0x73, 0x40, 0x97, 0x3d, 0xca,
+ 0xef, 0x07, 0x57, 0xe6, 0xd9, 0x70, 0x0e, 0x95,
+ 0xae, 0xa6, 0x8d, 0x04, 0xcc, 0xee, 0xf7, 0x09,
+ 0x31, 0x77, 0x12, 0xa3, 0x23, 0x97, 0x62, 0xb3,
+ 0x7b, 0x32, 0xfb, 0x80, 0x14, 0x48, 0x81, 0xc3,
+ 0xe5, 0xea, 0x91, 0x39, 0x52, 0x81, 0xa2, 0x4f,
+ 0xe4, 0xb3, 0x09, 0xff, 0xde, 0x5e, 0xe9, 0x58,
+ 0x84, 0x6e, 0xf9, 0x3d, 0xdf, 0x25, 0xea, 0xad,
+ 0xae, 0xe6, 0x9a, 0xd1, 0x89, 0x55, 0xd3, 0xde,
+ 0x6c, 0x52, 0xdb, 0x70, 0xfe, 0x37, 0xce, 0x44,
+ 0x0a, 0xa8, 0x25, 0x5f, 0x92, 0xc1, 0x33, 0x4a,
+ 0x4f, 0x9b, 0x62, 0x35, 0xff, 0xce, 0xc0, 0xa9,
+ 0x60, 0xce, 0x52, 0x00, 0x97, 0x51, 0x35, 0x26,
+ 0x2e, 0xb9, 0x36, 0xa9, 0x87, 0x6e, 0x1e, 0xcc,
+ 0x91, 0x78, 0x53, 0x98, 0x86, 0x5b, 0x9c, 0x74,
+ 0x7d, 0x88, 0x33, 0xe1, 0xdf, 0x37, 0x69, 0x2b,
+ 0xbb, 0xf1, 0x4d, 0xf4, 0xd1, 0xf1, 0x39, 0x93,
+ 0x17, 0x51, 0x19, 0xe3, 0x19, 0x1e, 0x76, 0x37,
+ 0x25, 0xfb, 0x09, 0x27, 0x6a, 0xab, 0x67, 0x6f,
+ 0x14, 0x12, 0x64, 0xe7, 0xc4, 0x07, 0xdf, 0x4d,
+ 0x17, 0xbb, 0x6d, 0xe0, 0xe9, 0xb9, 0xab, 0xca,
+ 0x10, 0x68, 0xaf, 0x7e, 0xb7, 0x33, 0x54, 0x73,
+ 0x07, 0x6e, 0xf7, 0x81, 0x97, 0x9c, 0x05, 0x6f,
+ 0x84, 0x5f, 0xd2, 0x42, 0xfb, 0x38, 0xcf, 0xd1,
+ 0x2f, 0x14, 0x30, 0x88, 0x98, 0x4d, 0x5a, 0xa9,
+ 0x76, 0xd5, 0x4f, 0x3e, 0x70, 0x6c, 0x85, 0x76,
+ 0xd7, 0x01, 0xa0, 0x1a, 0xc8, 0x4e, 0xaa, 0xac,
+ 0x78, 0xfe, 0x46, 0xde, 0x6a, 0x05, 0x46, 0xa7,
+ 0x43, 0x0c, 0xb9, 0xde, 0xb9, 0x68, 0xfb, 0xce,
+ 0x42, 0x99, 0x07, 0x4d, 0x0b, 0x3b, 0x5a, 0x30,
+ 0x35, 0xa8, 0xf9, 0x3a, 0x73, 0xef, 0x0f, 0xdb,
+ 0x1e, 0x16, 0x42, 0xc4, 0xba, 0xae, 0x58, 0xaa,
+ 0xf8, 0xe5, 0x75, 0x2f, 0x1b, 0x15, 0x5c, 0xfd,
+ 0x0a, 0x97, 0xd0, 0xe4, 0x37, 0x83, 0x61, 0x5f,
+ 0x43, 0xa6, 0xc7, 0x3f, 0x38, 0x59, 0xe6, 0xeb,
+ 0xa3, 0x90, 0xc3, 0xaa, 0xaa, 0x5a, 0xd3, 0x34,
+ 0xd4, 0x17, 0xc8, 0x65, 0x3e, 0x57, 0xbc, 0x5e,
+ 0xdd, 0x9e, 0xb7, 0xf0, 0x2e, 0x5b, 0xb2, 0x1f,
+ 0x8a, 0x08, 0x0d, 0x45, 0x91, 0x0b, 0x29, 0x53,
+ 0x4f, 0x4c, 0x5a, 0x73, 0x56, 0xfe, 0xaf, 0x41,
+ 0x01, 0x39, 0x0a, 0x24, 0x3c, 0x7e, 0xbe, 0x4e,
+ 0x53, 0xf3, 0xeb, 0x06, 0x66, 0x51, 0x28, 0x1d,
+ 0xbd, 0x41, 0x0a, 0x01, 0xab, 0x16, 0x47, 0x27,
+ 0x47, 0x47, 0xf7, 0xcb, 0x46, 0x0a, 0x70, 0x9e,
+ 0x01, 0x9c, 0x09, 0xe1, 0x2a, 0x00, 0x1a, 0xd8,
+ 0xd4, 0x79, 0x9d, 0x80, 0x15, 0x8e, 0x53, 0x2a,
+ 0x65, 0x83, 0x78, 0x3e, 0x03, 0x00, 0x07, 0x12,
+ 0x1f, 0x33, 0x3e, 0x7b, 0x13, 0x37, 0xf1, 0xc3,
+ 0xef, 0xb7, 0xc1, 0x20, 0x3c, 0x3e, 0x67, 0x66,
+ 0x5d, 0x88, 0xa7, 0x7d, 0x33, 0x50, 0x77, 0xb0,
+ 0x28, 0x8e, 0xe7, 0x2c, 0x2e, 0x7a, 0xf4, 0x3c,
+ 0x8d, 0x74, 0x83, 0xaf, 0x8e, 0x87, 0x0f, 0xe4,
+ 0x50, 0xff, 0x84, 0x5c, 0x47, 0x0c, 0x6a, 0x49,
+ 0xbf, 0x42, 0x86, 0x77, 0x15, 0x48, 0xa5, 0x90,
+ 0x5d, 0x93, 0xd6, 0x2a, 0x11, 0xd5, 0xd5, 0x11,
+ 0xaa, 0xce, 0xe7, 0x6f, 0xa5, 0xb0, 0x09, 0x2c,
+ 0x8d, 0xd3, 0x92, 0xf0, 0x5a, 0x2a, 0xda, 0x5b,
+ 0x1e, 0xd5, 0x9a, 0xc4, 0xc4, 0xf3, 0x49, 0x74,
+ 0x41, 0xca, 0xe8, 0xc1, 0xf8, 0x44, 0xd6, 0x3c,
+ 0xae, 0x6c, 0x1d, 0x9a, 0x30, 0x04, 0x4d, 0x27,
+ 0x0e, 0xb1, 0x5f, 0x59, 0xa2, 0x24, 0xe8, 0xe1,
+ 0x98, 0xc5, 0x6a, 0x4c, 0xfe, 0x41, 0xd2, 0x27,
+ 0x42, 0x52, 0xe1, 0xe9, 0x7d, 0x62, 0xe4, 0x88,
+ 0x0f, 0xad, 0xb2, 0x70, 0xcb, 0x9d, 0x4c, 0x27,
+ 0x2e, 0x76, 0x1e, 0x1a, 0x63, 0x65, 0xf5, 0x3b,
+ 0xf8, 0x57, 0x69, 0xeb, 0x5b, 0x38, 0x26, 0x39,
+ 0x33, 0x25, 0x45, 0x3e, 0x91, 0xb8, 0xd8, 0xc7,
+ 0xd5, 0x42, 0xc0, 0x22, 0x31, 0x74, 0xf4, 0xbc,
+ 0x0c, 0x23, 0xf1, 0xca, 0xc1, 0x8d, 0xd7, 0xbe,
+ 0xc9, 0x62, 0xe4, 0x08, 0x1a, 0xcf, 0x36, 0xd5,
+ 0xfe, 0x55, 0x21, 0x59, 0x91, 0x87, 0x87, 0xdf,
+ 0x06, 0xdb, 0xdf, 0x96, 0x45, 0x58, 0xda, 0x05,
+ 0xcd, 0x50, 0x4d, 0xd2, 0x7d, 0x05, 0x18, 0x73,
+ 0x6a, 0x8d, 0x11, 0x85, 0xa6, 0x88, 0xe8, 0xda,
+ 0xe6, 0x30, 0x33, 0xa4, 0x89, 0x31, 0x75, 0xbe,
+ 0x69, 0x43, 0x84, 0x43, 0x50, 0x87, 0xdd, 0x71,
+ 0x36, 0x83, 0xc3, 0x78, 0x74, 0x24, 0x0a, 0xed,
+ 0x7b, 0xdb, 0xa4, 0x24, 0x0b, 0xb9, 0x7e, 0x5d,
+ 0xff, 0xde, 0xb1, 0xef, 0x61, 0x5a, 0x45, 0x33,
+ 0xf6, 0x17, 0x07, 0x08, 0x98, 0x83, 0x92, 0x0f,
+ 0x23, 0x6d, 0xe6, 0xaa, 0x17, 0x54, 0xad, 0x6a,
+ 0xc8, 0xdb, 0x26, 0xbe, 0xb8, 0xb6, 0x08, 0xfa,
+ 0x68, 0xf1, 0xd7, 0x79, 0x6f, 0x18, 0xb4, 0x9e,
+ 0x2d, 0x3f, 0x1b, 0x64, 0xaf, 0x8d, 0x06, 0x0e,
+ 0x49, 0x28, 0xe0, 0x5d, 0x45, 0x68, 0x13, 0x87,
+ 0xfa, 0xde, 0x40, 0x7b, 0xd2, 0xc3, 0x94, 0xd5,
+ 0xe1, 0xd9, 0xc2, 0xaf, 0x55, 0x89, 0xeb, 0xb4,
+ 0x12, 0x59, 0xa8, 0xd4, 0xc5, 0x29, 0x66, 0x38,
+ 0xe6, 0xac, 0x22, 0x22, 0xd9, 0x64, 0x9b, 0x34,
+ 0x0a, 0x32, 0x9f, 0xc2, 0xbf, 0x17, 0x6c, 0x3f,
+ 0x71, 0x7a, 0x38, 0x6b, 0x98, 0xfb, 0x49, 0x36,
+ 0x89, 0xc9, 0xe2, 0xd6, 0xc7, 0x5d, 0xd0, 0x69,
+ 0x5f, 0x23, 0x35, 0xc9, 0x30, 0xe2, 0xfd, 0x44,
+ 0x58, 0x39, 0xd7, 0x97, 0xfb, 0x5c, 0x00, 0xd5,
+ 0x4f, 0x7a, 0x1a, 0x95, 0x8b, 0x62, 0x4b, 0xce,
+ 0xe5, 0x91, 0x21, 0x7b, 0x30, 0x00, 0xd6, 0xdd,
+ 0x6d, 0x02, 0x86, 0x49, 0x0f, 0x3c, 0x1a, 0x27,
+ 0x3c, 0xd3, 0x0e, 0x71, 0xf2, 0xff, 0xf5, 0x2f,
+ 0x87, 0xac, 0x67, 0x59, 0x81, 0xa3, 0xf7, 0xf8,
+ 0xd6, 0x11, 0x0c, 0x84, 0xa9, 0x03, 0xee, 0x2a,
+ 0xc4, 0xf3, 0x22, 0xab, 0x7c, 0xe2, 0x25, 0xf5,
+ 0x67, 0xa3, 0xe4, 0x11, 0xe0, 0x59, 0xb3, 0xca,
+ 0x87, 0xa0, 0xae, 0xc9, 0xa6, 0x62, 0x1b, 0x6e,
+ 0x4d, 0x02, 0x6b, 0x07, 0x9d, 0xfd, 0xd0, 0x92,
+ 0x06, 0xe1, 0xb2, 0x9a, 0x4a, 0x1f, 0x1f, 0x13,
+ 0x49, 0x99, 0x97, 0x08, 0xde, 0x7f, 0x98, 0xaf,
+ 0x51, 0x98, 0xee, 0x2c, 0xcb, 0xf0, 0x0b, 0xc6,
+ 0xb6, 0xb7, 0x2d, 0x9a, 0xb1, 0xac, 0xa6, 0xe3,
+ 0x15, 0x77, 0x9d, 0x6b, 0x1a, 0xe4, 0xfc, 0x8b,
+ 0xf2, 0x17, 0x59, 0x08, 0x04, 0x58, 0x81, 0x9d,
+ 0x1b, 0x1b, 0x69, 0x55, 0xc2, 0xb4, 0x3c, 0x1f,
+ 0x50, 0xf1, 0x7f, 0x77, 0x90, 0x4c, 0x66, 0x40,
+ 0x5a, 0xc0, 0x33, 0x1f, 0xcb, 0x05, 0x6d, 0x5c,
+ 0x06, 0x87, 0x52, 0xa2, 0x8f, 0x26, 0xd5, 0x4f
+};
+static const u8 enc_output010[] __initconst = {
+ 0xe5, 0x26, 0xa4, 0x3d, 0xbd, 0x33, 0xd0, 0x4b,
+ 0x6f, 0x05, 0xa7, 0x6e, 0x12, 0x7a, 0xd2, 0x74,
+ 0xa6, 0xdd, 0xbd, 0x95, 0xeb, 0xf9, 0xa4, 0xf1,
+ 0x59, 0x93, 0x91, 0x70, 0xd9, 0xfe, 0x9a, 0xcd,
+ 0x53, 0x1f, 0x3a, 0xab, 0xa6, 0x7c, 0x9f, 0xa6,
+ 0x9e, 0xbd, 0x99, 0xd9, 0xb5, 0x97, 0x44, 0xd5,
+ 0x14, 0x48, 0x4d, 0x9d, 0xc0, 0xd0, 0x05, 0x96,
+ 0xeb, 0x4c, 0x78, 0x55, 0x09, 0x08, 0x01, 0x02,
+ 0x30, 0x90, 0x7b, 0x96, 0x7a, 0x7b, 0x5f, 0x30,
+ 0x41, 0x24, 0xce, 0x68, 0x61, 0x49, 0x86, 0x57,
+ 0x82, 0xdd, 0x53, 0x1c, 0x51, 0x28, 0x2b, 0x53,
+ 0x6e, 0x2d, 0xc2, 0x20, 0x4c, 0xdd, 0x8f, 0x65,
+ 0x10, 0x20, 0x50, 0xdd, 0x9d, 0x50, 0xe5, 0x71,
+ 0x40, 0x53, 0x69, 0xfc, 0x77, 0x48, 0x11, 0xb9,
+ 0xde, 0xa4, 0x8d, 0x58, 0xe4, 0xa6, 0x1a, 0x18,
+ 0x47, 0x81, 0x7e, 0xfc, 0xdd, 0xf6, 0xef, 0xce,
+ 0x2f, 0x43, 0x68, 0xd6, 0x06, 0xe2, 0x74, 0x6a,
+ 0xad, 0x90, 0xf5, 0x37, 0xf3, 0x3d, 0x82, 0x69,
+ 0x40, 0xe9, 0x6b, 0xa7, 0x3d, 0xa8, 0x1e, 0xd2,
+ 0x02, 0x7c, 0xb7, 0x9b, 0xe4, 0xda, 0x8f, 0x95,
+ 0x06, 0xc5, 0xdf, 0x73, 0xa3, 0x20, 0x9a, 0x49,
+ 0xde, 0x9c, 0xbc, 0xee, 0x14, 0x3f, 0x81, 0x5e,
+ 0xf8, 0x3b, 0x59, 0x3c, 0xe1, 0x68, 0x12, 0x5a,
+ 0x3a, 0x76, 0x3a, 0x3f, 0xf7, 0x87, 0x33, 0x0a,
+ 0x01, 0xb8, 0xd4, 0xed, 0xb6, 0xbe, 0x94, 0x5e,
+ 0x70, 0x40, 0x56, 0x67, 0x1f, 0x50, 0x44, 0x19,
+ 0xce, 0x82, 0x70, 0x10, 0x87, 0x13, 0x20, 0x0b,
+ 0x4c, 0x5a, 0xb6, 0xf6, 0xa7, 0xae, 0x81, 0x75,
+ 0x01, 0x81, 0xe6, 0x4b, 0x57, 0x7c, 0xdd, 0x6d,
+ 0xf8, 0x1c, 0x29, 0x32, 0xf7, 0xda, 0x3c, 0x2d,
+ 0xf8, 0x9b, 0x25, 0x6e, 0x00, 0xb4, 0xf7, 0x2f,
+ 0xf7, 0x04, 0xf7, 0xa1, 0x56, 0xac, 0x4f, 0x1a,
+ 0x64, 0xb8, 0x47, 0x55, 0x18, 0x7b, 0x07, 0x4d,
+ 0xbd, 0x47, 0x24, 0x80, 0x5d, 0xa2, 0x70, 0xc5,
+ 0xdd, 0x8e, 0x82, 0xd4, 0xeb, 0xec, 0xb2, 0x0c,
+ 0x39, 0xd2, 0x97, 0xc1, 0xcb, 0xeb, 0xf4, 0x77,
+ 0x59, 0xb4, 0x87, 0xef, 0xcb, 0x43, 0x2d, 0x46,
+ 0x54, 0xd1, 0xa7, 0xd7, 0x15, 0x99, 0x0a, 0x43,
+ 0xa1, 0xe0, 0x99, 0x33, 0x71, 0xc1, 0xed, 0xfe,
+ 0x72, 0x46, 0x33, 0x8e, 0x91, 0x08, 0x9f, 0xc8,
+ 0x2e, 0xca, 0xfa, 0xdc, 0x59, 0xd5, 0xc3, 0x76,
+ 0x84, 0x9f, 0xa3, 0x37, 0x68, 0xc3, 0xf0, 0x47,
+ 0x2c, 0x68, 0xdb, 0x5e, 0xc3, 0x49, 0x4c, 0xe8,
+ 0x92, 0x85, 0xe2, 0x23, 0xd3, 0x3f, 0xad, 0x32,
+ 0xe5, 0x2b, 0x82, 0xd7, 0x8f, 0x99, 0x0a, 0x59,
+ 0x5c, 0x45, 0xd9, 0xb4, 0x51, 0x52, 0xc2, 0xae,
+ 0xbf, 0x80, 0xcf, 0xc9, 0xc9, 0x51, 0x24, 0x2a,
+ 0x3b, 0x3a, 0x4d, 0xae, 0xeb, 0xbd, 0x22, 0xc3,
+ 0x0e, 0x0f, 0x59, 0x25, 0x92, 0x17, 0xe9, 0x74,
+ 0xc7, 0x8b, 0x70, 0x70, 0x36, 0x55, 0x95, 0x75,
+ 0x4b, 0xad, 0x61, 0x2b, 0x09, 0xbc, 0x82, 0xf2,
+ 0x6e, 0x94, 0x43, 0xae, 0xc3, 0xd5, 0xcd, 0x8e,
+ 0xfe, 0x5b, 0x9a, 0x88, 0x43, 0x01, 0x75, 0xb2,
+ 0x23, 0x09, 0xf7, 0x89, 0x83, 0xe7, 0xfa, 0xf9,
+ 0xb4, 0x9b, 0xf8, 0xef, 0xbd, 0x1c, 0x92, 0xc1,
+ 0xda, 0x7e, 0xfe, 0x05, 0xba, 0x5a, 0xcd, 0x07,
+ 0x6a, 0x78, 0x9e, 0x5d, 0xfb, 0x11, 0x2f, 0x79,
+ 0x38, 0xb6, 0xc2, 0x5b, 0x6b, 0x51, 0xb4, 0x71,
+ 0xdd, 0xf7, 0x2a, 0xe4, 0xf4, 0x72, 0x76, 0xad,
+ 0xc2, 0xdd, 0x64, 0x5d, 0x79, 0xb6, 0xf5, 0x7a,
+ 0x77, 0x20, 0x05, 0x3d, 0x30, 0x06, 0xd4, 0x4c,
+ 0x0a, 0x2c, 0x98, 0x5a, 0xb9, 0xd4, 0x98, 0xa9,
+ 0x3f, 0xc6, 0x12, 0xea, 0x3b, 0x4b, 0xc5, 0x79,
+ 0x64, 0x63, 0x6b, 0x09, 0x54, 0x3b, 0x14, 0x27,
+ 0xba, 0x99, 0x80, 0xc8, 0x72, 0xa8, 0x12, 0x90,
+ 0x29, 0xba, 0x40, 0x54, 0x97, 0x2b, 0x7b, 0xfe,
+ 0xeb, 0xcd, 0x01, 0x05, 0x44, 0x72, 0xdb, 0x99,
+ 0xe4, 0x61, 0xc9, 0x69, 0xd6, 0xb9, 0x28, 0xd1,
+ 0x05, 0x3e, 0xf9, 0x0b, 0x49, 0x0a, 0x49, 0xe9,
+ 0x8d, 0x0e, 0xa7, 0x4a, 0x0f, 0xaf, 0x32, 0xd0,
+ 0xe0, 0xb2, 0x3a, 0x55, 0x58, 0xfe, 0x5c, 0x28,
+ 0x70, 0x51, 0x23, 0xb0, 0x7b, 0x6a, 0x5f, 0x1e,
+ 0xb8, 0x17, 0xd7, 0x94, 0x15, 0x8f, 0xee, 0x20,
+ 0xc7, 0x42, 0x25, 0x3e, 0x9a, 0x14, 0xd7, 0x60,
+ 0x72, 0x39, 0x47, 0x48, 0xa9, 0xfe, 0xdd, 0x47,
+ 0x0a, 0xb1, 0xe6, 0x60, 0x28, 0x8c, 0x11, 0x68,
+ 0xe1, 0xff, 0xd7, 0xce, 0xc8, 0xbe, 0xb3, 0xfe,
+ 0x27, 0x30, 0x09, 0x70, 0xd7, 0xfa, 0x02, 0x33,
+ 0x3a, 0x61, 0x2e, 0xc7, 0xff, 0xa4, 0x2a, 0xa8,
+ 0x6e, 0xb4, 0x79, 0x35, 0x6d, 0x4c, 0x1e, 0x38,
+ 0xf8, 0xee, 0xd4, 0x84, 0x4e, 0x6e, 0x28, 0xa7,
+ 0xce, 0xc8, 0xc1, 0xcf, 0x80, 0x05, 0xf3, 0x04,
+ 0xef, 0xc8, 0x18, 0x28, 0x2e, 0x8d, 0x5e, 0x0c,
+ 0xdf, 0xb8, 0x5f, 0x96, 0xe8, 0xc6, 0x9c, 0x2f,
+ 0xe5, 0xa6, 0x44, 0xd7, 0xe7, 0x99, 0x44, 0x0c,
+ 0xec, 0xd7, 0x05, 0x60, 0x97, 0xbb, 0x74, 0x77,
+ 0x58, 0xd5, 0xbb, 0x48, 0xde, 0x5a, 0xb2, 0x54,
+ 0x7f, 0x0e, 0x46, 0x70, 0x6a, 0x6f, 0x78, 0xa5,
+ 0x08, 0x89, 0x05, 0x4e, 0x7e, 0xa0, 0x69, 0xb4,
+ 0x40, 0x60, 0x55, 0x77, 0x75, 0x9b, 0x19, 0xf2,
+ 0xd5, 0x13, 0x80, 0x77, 0xf9, 0x4b, 0x3f, 0x1e,
+ 0xee, 0xe6, 0x76, 0x84, 0x7b, 0x8c, 0xe5, 0x27,
+ 0xa8, 0x0a, 0x91, 0x01, 0x68, 0x71, 0x8a, 0x3f,
+ 0x06, 0xab, 0xf6, 0xa9, 0xa5, 0xe6, 0x72, 0x92,
+ 0xe4, 0x67, 0xe2, 0xa2, 0x46, 0x35, 0x84, 0x55,
+ 0x7d, 0xca, 0xa8, 0x85, 0xd0, 0xf1, 0x3f, 0xbe,
+ 0xd7, 0x34, 0x64, 0xfc, 0xae, 0xe3, 0xe4, 0x04,
+ 0x9f, 0x66, 0x02, 0xb9, 0x88, 0x10, 0xd9, 0xc4,
+ 0x4c, 0x31, 0x43, 0x7a, 0x93, 0xe2, 0x9b, 0x56,
+ 0x43, 0x84, 0xdc, 0xdc, 0xde, 0x1d, 0xa4, 0x02,
+ 0x0e, 0xc2, 0xef, 0xc3, 0xf8, 0x78, 0xd1, 0xb2,
+ 0x6b, 0x63, 0x18, 0xc9, 0xa9, 0xe5, 0x72, 0xd8,
+ 0xf3, 0xb9, 0xd1, 0x8a, 0xc7, 0x1a, 0x02, 0x27,
+ 0x20, 0x77, 0x10, 0xe5, 0xc8, 0xd4, 0x4a, 0x47,
+ 0xe5, 0xdf, 0x5f, 0x01, 0xaa, 0xb0, 0xd4, 0x10,
+ 0xbb, 0x69, 0xe3, 0x36, 0xc8, 0xe1, 0x3d, 0x43,
+ 0xfb, 0x86, 0xcd, 0xcc, 0xbf, 0xf4, 0x88, 0xe0,
+ 0x20, 0xca, 0xb7, 0x1b, 0xf1, 0x2f, 0x5c, 0xee,
+ 0xd4, 0xd3, 0xa3, 0xcc, 0xa4, 0x1e, 0x1c, 0x47,
+ 0xfb, 0xbf, 0xfc, 0xa2, 0x41, 0x55, 0x9d, 0xf6,
+ 0x5a, 0x5e, 0x65, 0x32, 0x34, 0x7b, 0x52, 0x8d,
+ 0xd5, 0xd0, 0x20, 0x60, 0x03, 0xab, 0x3f, 0x8c,
+ 0xd4, 0x21, 0xea, 0x2a, 0xd9, 0xc4, 0xd0, 0xd3,
+ 0x65, 0xd8, 0x7a, 0x13, 0x28, 0x62, 0x32, 0x4b,
+ 0x2c, 0x87, 0x93, 0xa8, 0xb4, 0x52, 0x45, 0x09,
+ 0x44, 0xec, 0xec, 0xc3, 0x17, 0xdb, 0x9a, 0x4d,
+ 0x5c, 0xa9, 0x11, 0xd4, 0x7d, 0xaf, 0x9e, 0xf1,
+ 0x2d, 0xb2, 0x66, 0xc5, 0x1d, 0xed, 0xb7, 0xcd,
+ 0x0b, 0x25, 0x5e, 0x30, 0x47, 0x3f, 0x40, 0xf4,
+ 0xa1, 0xa0, 0x00, 0x94, 0x10, 0xc5, 0x6a, 0x63,
+ 0x1a, 0xd5, 0x88, 0x92, 0x8e, 0x82, 0x39, 0x87,
+ 0x3c, 0x78, 0x65, 0x58, 0x42, 0x75, 0x5b, 0xdd,
+ 0x77, 0x3e, 0x09, 0x4e, 0x76, 0x5b, 0xe6, 0x0e,
+ 0x4d, 0x38, 0xb2, 0xc0, 0xb8, 0x95, 0x01, 0x7a,
+ 0x10, 0xe0, 0xfb, 0x07, 0xf2, 0xab, 0x2d, 0x8c,
+ 0x32, 0xed, 0x2b, 0xc0, 0x46, 0xc2, 0xf5, 0x38,
+ 0x83, 0xf0, 0x17, 0xec, 0xc1, 0x20, 0x6a, 0x9a,
+ 0x0b, 0x00, 0xa0, 0x98, 0x22, 0x50, 0x23, 0xd5,
+ 0x80, 0x6b, 0xf6, 0x1f, 0xc3, 0xcc, 0x97, 0xc9,
+ 0x24, 0x9f, 0xf3, 0xaf, 0x43, 0x14, 0xd5, 0xa0
+};
+static const u8 enc_assoc010[] __initconst = {
+ 0xd2, 0xa1, 0x70, 0xdb, 0x7a, 0xf8, 0xfa, 0x27,
+ 0xba, 0x73, 0x0f, 0xbf, 0x3d, 0x1e, 0x82, 0xb2
+};
+static const u8 enc_nonce010[] __initconst = {
+ 0xdb, 0x92, 0x0f, 0x7f, 0x17, 0x54, 0x0c, 0x30
+};
+static const u8 enc_key010[] __initconst = {
+ 0x47, 0x11, 0xeb, 0x86, 0x2b, 0x2c, 0xab, 0x44,
+ 0x34, 0xda, 0x7f, 0x57, 0x03, 0x39, 0x0c, 0xaf,
+ 0x2c, 0x14, 0xfd, 0x65, 0x23, 0xe9, 0x8e, 0x74,
+ 0xd5, 0x08, 0x68, 0x08, 0xe7, 0xb4, 0x72, 0xd7
+};
+
+static const u8 enc_input011[] __initconst = {
+ 0x7a, 0x57, 0xf2, 0xc7, 0x06, 0x3f, 0x50, 0x7b,
+ 0x36, 0x1a, 0x66, 0x5c, 0xb9, 0x0e, 0x5e, 0x3b,
+ 0x45, 0x60, 0xbe, 0x9a, 0x31, 0x9f, 0xff, 0x5d,
+ 0x66, 0x34, 0xb4, 0xdc, 0xfb, 0x9d, 0x8e, 0xee,
+ 0x6a, 0x33, 0xa4, 0x07, 0x3c, 0xf9, 0x4c, 0x30,
+ 0xa1, 0x24, 0x52, 0xf9, 0x50, 0x46, 0x88, 0x20,
+ 0x02, 0x32, 0x3a, 0x0e, 0x99, 0x63, 0xaf, 0x1f,
+ 0x15, 0x28, 0x2a, 0x05, 0xff, 0x57, 0x59, 0x5e,
+ 0x18, 0xa1, 0x1f, 0xd0, 0x92, 0x5c, 0x88, 0x66,
+ 0x1b, 0x00, 0x64, 0xa5, 0x93, 0x8d, 0x06, 0x46,
+ 0xb0, 0x64, 0x8b, 0x8b, 0xef, 0x99, 0x05, 0x35,
+ 0x85, 0xb3, 0xf3, 0x33, 0xbb, 0xec, 0x66, 0xb6,
+ 0x3d, 0x57, 0x42, 0xe3, 0xb4, 0xc6, 0xaa, 0xb0,
+ 0x41, 0x2a, 0xb9, 0x59, 0xa9, 0xf6, 0x3e, 0x15,
+ 0x26, 0x12, 0x03, 0x21, 0x4c, 0x74, 0x43, 0x13,
+ 0x2a, 0x03, 0x27, 0x09, 0xb4, 0xfb, 0xe7, 0xb7,
+ 0x40, 0xff, 0x5e, 0xce, 0x48, 0x9a, 0x60, 0xe3,
+ 0x8b, 0x80, 0x8c, 0x38, 0x2d, 0xcb, 0x93, 0x37,
+ 0x74, 0x05, 0x52, 0x6f, 0x73, 0x3e, 0xc3, 0xbc,
+ 0xca, 0x72, 0x0a, 0xeb, 0xf1, 0x3b, 0xa0, 0x95,
+ 0xdc, 0x8a, 0xc4, 0xa9, 0xdc, 0xca, 0x44, 0xd8,
+ 0x08, 0x63, 0x6a, 0x36, 0xd3, 0x3c, 0xb8, 0xac,
+ 0x46, 0x7d, 0xfd, 0xaa, 0xeb, 0x3e, 0x0f, 0x45,
+ 0x8f, 0x49, 0xda, 0x2b, 0xf2, 0x12, 0xbd, 0xaf,
+ 0x67, 0x8a, 0x63, 0x48, 0x4b, 0x55, 0x5f, 0x6d,
+ 0x8c, 0xb9, 0x76, 0x34, 0x84, 0xae, 0xc2, 0xfc,
+ 0x52, 0x64, 0x82, 0xf7, 0xb0, 0x06, 0xf0, 0x45,
+ 0x73, 0x12, 0x50, 0x30, 0x72, 0xea, 0x78, 0x9a,
+ 0xa8, 0xaf, 0xb5, 0xe3, 0xbb, 0x77, 0x52, 0xec,
+ 0x59, 0x84, 0xbf, 0x6b, 0x8f, 0xce, 0x86, 0x5e,
+ 0x1f, 0x23, 0xe9, 0xfb, 0x08, 0x86, 0xf7, 0x10,
+ 0xb9, 0xf2, 0x44, 0x96, 0x44, 0x63, 0xa9, 0xa8,
+ 0x78, 0x00, 0x23, 0xd6, 0xc7, 0xe7, 0x6e, 0x66,
+ 0x4f, 0xcc, 0xee, 0x15, 0xb3, 0xbd, 0x1d, 0xa0,
+ 0xe5, 0x9c, 0x1b, 0x24, 0x2c, 0x4d, 0x3c, 0x62,
+ 0x35, 0x9c, 0x88, 0x59, 0x09, 0xdd, 0x82, 0x1b,
+ 0xcf, 0x0a, 0x83, 0x6b, 0x3f, 0xae, 0x03, 0xc4,
+ 0xb4, 0xdd, 0x7e, 0x5b, 0x28, 0x76, 0x25, 0x96,
+ 0xd9, 0xc9, 0x9d, 0x5f, 0x86, 0xfa, 0xf6, 0xd7,
+ 0xd2, 0xe6, 0x76, 0x1d, 0x0f, 0xa1, 0xdc, 0x74,
+ 0x05, 0x1b, 0x1d, 0xe0, 0xcd, 0x16, 0xb0, 0xa8,
+ 0x8a, 0x34, 0x7b, 0x15, 0x11, 0x77, 0xe5, 0x7b,
+ 0x7e, 0x20, 0xf7, 0xda, 0x38, 0xda, 0xce, 0x70,
+ 0xe9, 0xf5, 0x6c, 0xd9, 0xbe, 0x0c, 0x4c, 0x95,
+ 0x4c, 0xc2, 0x9b, 0x34, 0x55, 0x55, 0xe1, 0xf3,
+ 0x46, 0x8e, 0x48, 0x74, 0x14, 0x4f, 0x9d, 0xc9,
+ 0xf5, 0xe8, 0x1a, 0xf0, 0x11, 0x4a, 0xc1, 0x8d,
+ 0xe0, 0x93, 0xa0, 0xbe, 0x09, 0x1c, 0x2b, 0x4e,
+ 0x0f, 0xb2, 0x87, 0x8b, 0x84, 0xfe, 0x92, 0x32,
+ 0x14, 0xd7, 0x93, 0xdf, 0xe7, 0x44, 0xbc, 0xc5,
+ 0xae, 0x53, 0x69, 0xd8, 0xb3, 0x79, 0x37, 0x80,
+ 0xe3, 0x17, 0x5c, 0xec, 0x53, 0x00, 0x9a, 0xe3,
+ 0x8e, 0xdc, 0x38, 0xb8, 0x66, 0xf0, 0xd3, 0xad,
+ 0x1d, 0x02, 0x96, 0x86, 0x3e, 0x9d, 0x3b, 0x5d,
+ 0xa5, 0x7f, 0x21, 0x10, 0xf1, 0x1f, 0x13, 0x20,
+ 0xf9, 0x57, 0x87, 0x20, 0xf5, 0x5f, 0xf1, 0x17,
+ 0x48, 0x0a, 0x51, 0x5a, 0xcd, 0x19, 0x03, 0xa6,
+ 0x5a, 0xd1, 0x12, 0x97, 0xe9, 0x48, 0xe2, 0x1d,
+ 0x83, 0x75, 0x50, 0xd9, 0x75, 0x7d, 0x6a, 0x82,
+ 0xa1, 0xf9, 0x4e, 0x54, 0x87, 0x89, 0xc9, 0x0c,
+ 0xb7, 0x5b, 0x6a, 0x91, 0xc1, 0x9c, 0xb2, 0xa9,
+ 0xdc, 0x9a, 0xa4, 0x49, 0x0a, 0x6d, 0x0d, 0xbb,
+ 0xde, 0x86, 0x44, 0xdd, 0x5d, 0x89, 0x2b, 0x96,
+ 0x0f, 0x23, 0x95, 0xad, 0xcc, 0xa2, 0xb3, 0xb9,
+ 0x7e, 0x74, 0x38, 0xba, 0x9f, 0x73, 0xae, 0x5f,
+ 0xf8, 0x68, 0xa2, 0xe0, 0xa9, 0xce, 0xbd, 0x40,
+ 0xd4, 0x4c, 0x6b, 0xd2, 0x56, 0x62, 0xb0, 0xcc,
+ 0x63, 0x7e, 0x5b, 0xd3, 0xae, 0xd1, 0x75, 0xce,
+ 0xbb, 0xb4, 0x5b, 0xa8, 0xf8, 0xb4, 0xac, 0x71,
+ 0x75, 0xaa, 0xc9, 0x9f, 0xbb, 0x6c, 0xad, 0x0f,
+ 0x55, 0x5d, 0xe8, 0x85, 0x7d, 0xf9, 0x21, 0x35,
+ 0xea, 0x92, 0x85, 0x2b, 0x00, 0xec, 0x84, 0x90,
+ 0x0a, 0x63, 0x96, 0xe4, 0x6b, 0xa9, 0x77, 0xb8,
+ 0x91, 0xf8, 0x46, 0x15, 0x72, 0x63, 0x70, 0x01,
+ 0x40, 0xa3, 0xa5, 0x76, 0x62, 0x2b, 0xbf, 0xf1,
+ 0xe5, 0x8d, 0x9f, 0xa3, 0xfa, 0x9b, 0x03, 0xbe,
+ 0xfe, 0x65, 0x6f, 0xa2, 0x29, 0x0d, 0x54, 0xb4,
+ 0x71, 0xce, 0xa9, 0xd6, 0x3d, 0x88, 0xf9, 0xaf,
+ 0x6b, 0xa8, 0x9e, 0xf4, 0x16, 0x96, 0x36, 0xb9,
+ 0x00, 0xdc, 0x10, 0xab, 0xb5, 0x08, 0x31, 0x1f,
+ 0x00, 0xb1, 0x3c, 0xd9, 0x38, 0x3e, 0xc6, 0x04,
+ 0xa7, 0x4e, 0xe8, 0xae, 0xed, 0x98, 0xc2, 0xf7,
+ 0xb9, 0x00, 0x5f, 0x8c, 0x60, 0xd1, 0xe5, 0x15,
+ 0xf7, 0xae, 0x1e, 0x84, 0x88, 0xd1, 0xf6, 0xbc,
+ 0x3a, 0x89, 0x35, 0x22, 0x83, 0x7c, 0xca, 0xf0,
+ 0x33, 0x82, 0x4c, 0x79, 0x3c, 0xfd, 0xb1, 0xae,
+ 0x52, 0x62, 0x55, 0xd2, 0x41, 0x60, 0xc6, 0xbb,
+ 0xfa, 0x0e, 0x59, 0xd6, 0xa8, 0xfe, 0x5d, 0xed,
+ 0x47, 0x3d, 0xe0, 0xea, 0x1f, 0x6e, 0x43, 0x51,
+ 0xec, 0x10, 0x52, 0x56, 0x77, 0x42, 0x6b, 0x52,
+ 0x87, 0xd8, 0xec, 0xe0, 0xaa, 0x76, 0xa5, 0x84,
+ 0x2a, 0x22, 0x24, 0xfd, 0x92, 0x40, 0x88, 0xd5,
+ 0x85, 0x1c, 0x1f, 0x6b, 0x47, 0xa0, 0xc4, 0xe4,
+ 0xef, 0xf4, 0xea, 0xd7, 0x59, 0xac, 0x2a, 0x9e,
+ 0x8c, 0xfa, 0x1f, 0x42, 0x08, 0xfe, 0x4f, 0x74,
+ 0xa0, 0x26, 0xf5, 0xb3, 0x84, 0xf6, 0x58, 0x5f,
+ 0x26, 0x66, 0x3e, 0xd7, 0xe4, 0x22, 0x91, 0x13,
+ 0xc8, 0xac, 0x25, 0x96, 0x23, 0xd8, 0x09, 0xea,
+ 0x45, 0x75, 0x23, 0xb8, 0x5f, 0xc2, 0x90, 0x8b,
+ 0x09, 0xc4, 0xfc, 0x47, 0x6c, 0x6d, 0x0a, 0xef,
+ 0x69, 0xa4, 0x38, 0x19, 0xcf, 0x7d, 0xf9, 0x09,
+ 0x73, 0x9b, 0x60, 0x5a, 0xf7, 0x37, 0xb5, 0xfe,
+ 0x9f, 0xe3, 0x2b, 0x4c, 0x0d, 0x6e, 0x19, 0xf1,
+ 0xd6, 0xc0, 0x70, 0xf3, 0x9d, 0x22, 0x3c, 0xf9,
+ 0x49, 0xce, 0x30, 0x8e, 0x44, 0xb5, 0x76, 0x15,
+ 0x8f, 0x52, 0xfd, 0xa5, 0x04, 0xb8, 0x55, 0x6a,
+ 0x36, 0x59, 0x7c, 0xc4, 0x48, 0xb8, 0xd7, 0xab,
+ 0x05, 0x66, 0xe9, 0x5e, 0x21, 0x6f, 0x6b, 0x36,
+ 0x29, 0xbb, 0xe9, 0xe3, 0xa2, 0x9a, 0xa8, 0xcd,
+ 0x55, 0x25, 0x11, 0xba, 0x5a, 0x58, 0xa0, 0xde,
+ 0xae, 0x19, 0x2a, 0x48, 0x5a, 0xff, 0x36, 0xcd,
+ 0x6d, 0x16, 0x7a, 0x73, 0x38, 0x46, 0xe5, 0x47,
+ 0x59, 0xc8, 0xa2, 0xf6, 0xe2, 0x6c, 0x83, 0xc5,
+ 0x36, 0x2c, 0x83, 0x7d, 0xb4, 0x01, 0x05, 0x69,
+ 0xe7, 0xaf, 0x5c, 0xc4, 0x64, 0x82, 0x12, 0x21,
+ 0xef, 0xf7, 0xd1, 0x7d, 0xb8, 0x8d, 0x8c, 0x98,
+ 0x7c, 0x5f, 0x7d, 0x92, 0x88, 0xb9, 0x94, 0x07,
+ 0x9c, 0xd8, 0xe9, 0x9c, 0x17, 0x38, 0xe3, 0x57,
+ 0x6c, 0xe0, 0xdc, 0xa5, 0x92, 0x42, 0xb3, 0xbd,
+ 0x50, 0xa2, 0x7e, 0xb5, 0xb1, 0x52, 0x72, 0x03,
+ 0x97, 0xd8, 0xaa, 0x9a, 0x1e, 0x75, 0x41, 0x11,
+ 0xa3, 0x4f, 0xcc, 0xd4, 0xe3, 0x73, 0xad, 0x96,
+ 0xdc, 0x47, 0x41, 0x9f, 0xb0, 0xbe, 0x79, 0x91,
+ 0xf5, 0xb6, 0x18, 0xfe, 0xc2, 0x83, 0x18, 0x7d,
+ 0x73, 0xd9, 0x4f, 0x83, 0x84, 0x03, 0xb3, 0xf0,
+ 0x77, 0x66, 0x3d, 0x83, 0x63, 0x2e, 0x2c, 0xf9,
+ 0xdd, 0xa6, 0x1f, 0x89, 0x82, 0xb8, 0x23, 0x42,
+ 0xeb, 0xe2, 0xca, 0x70, 0x82, 0x61, 0x41, 0x0a,
+ 0x6d, 0x5f, 0x75, 0xc5, 0xe2, 0xc4, 0x91, 0x18,
+ 0x44, 0x22, 0xfa, 0x34, 0x10, 0xf5, 0x20, 0xdc,
+ 0xb7, 0xdd, 0x2a, 0x20, 0x77, 0xf5, 0xf9, 0xce,
+ 0xdb, 0xa0, 0x0a, 0x52, 0x2a, 0x4e, 0xdd, 0xcc,
+ 0x97, 0xdf, 0x05, 0xe4, 0x5e, 0xb7, 0xaa, 0xf0,
+ 0xe2, 0x80, 0xff, 0xba, 0x1a, 0x0f, 0xac, 0xdf,
+ 0x02, 0x32, 0xe6, 0xf7, 0xc7, 0x17, 0x13, 0xb7,
+ 0xfc, 0x98, 0x48, 0x8c, 0x0d, 0x82, 0xc9, 0x80,
+ 0x7a, 0xe2, 0x0a, 0xc5, 0xb4, 0xde, 0x7c, 0x3c,
+ 0x79, 0x81, 0x0e, 0x28, 0x65, 0x79, 0x67, 0x82,
+ 0x69, 0x44, 0x66, 0x09, 0xf7, 0x16, 0x1a, 0xf9,
+ 0x7d, 0x80, 0xa1, 0x79, 0x14, 0xa9, 0xc8, 0x20,
+ 0xfb, 0xa2, 0x46, 0xbe, 0x08, 0x35, 0x17, 0x58,
+ 0xc1, 0x1a, 0xda, 0x2a, 0x6b, 0x2e, 0x1e, 0xe6,
+ 0x27, 0x55, 0x7b, 0x19, 0xe2, 0xfb, 0x64, 0xfc,
+ 0x5e, 0x15, 0x54, 0x3c, 0xe7, 0xc2, 0x11, 0x50,
+ 0x30, 0xb8, 0x72, 0x03, 0x0b, 0x1a, 0x9f, 0x86,
+ 0x27, 0x11, 0x5c, 0x06, 0x2b, 0xbd, 0x75, 0x1a,
+ 0x0a, 0xda, 0x01, 0xfa, 0x5c, 0x4a, 0xc1, 0x80,
+ 0x3a, 0x6e, 0x30, 0xc8, 0x2c, 0xeb, 0x56, 0xec,
+ 0x89, 0xfa, 0x35, 0x7b, 0xb2, 0xf0, 0x97, 0x08,
+ 0x86, 0x53, 0xbe, 0xbd, 0x40, 0x41, 0x38, 0x1c,
+ 0xb4, 0x8b, 0x79, 0x2e, 0x18, 0x96, 0x94, 0xde,
+ 0xe8, 0xca, 0xe5, 0x9f, 0x92, 0x9f, 0x15, 0x5d,
+ 0x56, 0x60, 0x5c, 0x09, 0xf9, 0x16, 0xf4, 0x17,
+ 0x0f, 0xf6, 0x4c, 0xda, 0xe6, 0x67, 0x89, 0x9f,
+ 0xca, 0x6c, 0xe7, 0x9b, 0x04, 0x62, 0x0e, 0x26,
+ 0xa6, 0x52, 0xbd, 0x29, 0xff, 0xc7, 0xa4, 0x96,
+ 0xe6, 0x6a, 0x02, 0xa5, 0x2e, 0x7b, 0xfe, 0x97,
+ 0x68, 0x3e, 0x2e, 0x5f, 0x3b, 0x0f, 0x36, 0xd6,
+ 0x98, 0x19, 0x59, 0x48, 0xd2, 0xc6, 0xe1, 0x55,
+ 0x1a, 0x6e, 0xd6, 0xed, 0x2c, 0xba, 0xc3, 0x9e,
+ 0x64, 0xc9, 0x95, 0x86, 0x35, 0x5e, 0x3e, 0x88,
+ 0x69, 0x99, 0x4b, 0xee, 0xbe, 0x9a, 0x99, 0xb5,
+ 0x6e, 0x58, 0xae, 0xdd, 0x22, 0xdb, 0xdd, 0x6b,
+ 0xfc, 0xaf, 0x90, 0xa3, 0x3d, 0xa4, 0xc1, 0x15,
+ 0x92, 0x18, 0x8d, 0xd2, 0x4b, 0x7b, 0x06, 0xd1,
+ 0x37, 0xb5, 0xe2, 0x7c, 0x2c, 0xf0, 0x25, 0xe4,
+ 0x94, 0x2a, 0xbd, 0xe3, 0x82, 0x70, 0x78, 0xa3,
+ 0x82, 0x10, 0x5a, 0x90, 0xd7, 0xa4, 0xfa, 0xaf,
+ 0x1a, 0x88, 0x59, 0xdc, 0x74, 0x12, 0xb4, 0x8e,
+ 0xd7, 0x19, 0x46, 0xf4, 0x84, 0x69, 0x9f, 0xbb,
+ 0x70, 0xa8, 0x4c, 0x52, 0x81, 0xa9, 0xff, 0x76,
+ 0x1c, 0xae, 0xd8, 0x11, 0x3d, 0x7f, 0x7d, 0xc5,
+ 0x12, 0x59, 0x28, 0x18, 0xc2, 0xa2, 0xb7, 0x1c,
+ 0x88, 0xf8, 0xd6, 0x1b, 0xa6, 0x7d, 0x9e, 0xde,
+ 0x29, 0xf8, 0xed, 0xff, 0xeb, 0x92, 0x24, 0x4f,
+ 0x05, 0xaa, 0xd9, 0x49, 0xba, 0x87, 0x59, 0x51,
+ 0xc9, 0x20, 0x5c, 0x9b, 0x74, 0xcf, 0x03, 0xd9,
+ 0x2d, 0x34, 0xc7, 0x5b, 0xa5, 0x40, 0xb2, 0x99,
+ 0xf5, 0xcb, 0xb4, 0xf6, 0xb7, 0x72, 0x4a, 0xd6,
+ 0xbd, 0xb0, 0xf3, 0x93, 0xe0, 0x1b, 0xa8, 0x04,
+ 0x1e, 0x35, 0xd4, 0x80, 0x20, 0xf4, 0x9c, 0x31,
+ 0x6b, 0x45, 0xb9, 0x15, 0xb0, 0x5e, 0xdd, 0x0a,
+ 0x33, 0x9c, 0x83, 0xcd, 0x58, 0x89, 0x50, 0x56,
+ 0xbb, 0x81, 0x00, 0x91, 0x32, 0xf3, 0x1b, 0x3e,
+ 0xcf, 0x45, 0xe1, 0xf9, 0xe1, 0x2c, 0x26, 0x78,
+ 0x93, 0x9a, 0x60, 0x46, 0xc9, 0xb5, 0x5e, 0x6a,
+ 0x28, 0x92, 0x87, 0x3f, 0x63, 0x7b, 0xdb, 0xf7,
+ 0xd0, 0x13, 0x9d, 0x32, 0x40, 0x5e, 0xcf, 0xfb,
+ 0x79, 0x68, 0x47, 0x4c, 0xfd, 0x01, 0x17, 0xe6,
+ 0x97, 0x93, 0x78, 0xbb, 0xa6, 0x27, 0xa3, 0xe8,
+ 0x1a, 0xe8, 0x94, 0x55, 0x7d, 0x08, 0xe5, 0xdc,
+ 0x66, 0xa3, 0x69, 0xc8, 0xca, 0xc5, 0xa1, 0x84,
+ 0x55, 0xde, 0x08, 0x91, 0x16, 0x3a, 0x0c, 0x86,
+ 0xab, 0x27, 0x2b, 0x64, 0x34, 0x02, 0x6c, 0x76,
+ 0x8b, 0xc6, 0xaf, 0xcc, 0xe1, 0xd6, 0x8c, 0x2a,
+ 0x18, 0x3d, 0xa6, 0x1b, 0x37, 0x75, 0x45, 0x73,
+ 0xc2, 0x75, 0xd7, 0x53, 0x78, 0x3a, 0xd6, 0xe8,
+ 0x29, 0xd2, 0x4a, 0xa8, 0x1e, 0x82, 0xf6, 0xb6,
+ 0x81, 0xde, 0x21, 0xed, 0x2b, 0x56, 0xbb, 0xf2,
+ 0xd0, 0x57, 0xc1, 0x7c, 0xd2, 0x6a, 0xd2, 0x56,
+ 0xf5, 0x13, 0x5f, 0x1c, 0x6a, 0x0b, 0x74, 0xfb,
+ 0xe9, 0xfe, 0x9e, 0xea, 0x95, 0xb2, 0x46, 0xab,
+ 0x0a, 0xfc, 0xfd, 0xf3, 0xbb, 0x04, 0x2b, 0x76,
+ 0x1b, 0xa4, 0x74, 0xb0, 0xc1, 0x78, 0xc3, 0x69,
+ 0xe2, 0xb0, 0x01, 0xe1, 0xde, 0x32, 0x4c, 0x8d,
+ 0x1a, 0xb3, 0x38, 0x08, 0xd5, 0xfc, 0x1f, 0xdc,
+ 0x0e, 0x2c, 0x9c, 0xb1, 0xa1, 0x63, 0x17, 0x22,
+ 0xf5, 0x6c, 0x93, 0x70, 0x74, 0x00, 0xf8, 0x39,
+ 0x01, 0x94, 0xd1, 0x32, 0x23, 0x56, 0x5d, 0xa6,
+ 0x02, 0x76, 0x76, 0x93, 0xce, 0x2f, 0x19, 0xe9,
+ 0x17, 0x52, 0xae, 0x6e, 0x2c, 0x6d, 0x61, 0x7f,
+ 0x3b, 0xaa, 0xe0, 0x52, 0x85, 0xc5, 0x65, 0xc1,
+ 0xbb, 0x8e, 0x5b, 0x21, 0xd5, 0xc9, 0x78, 0x83,
+ 0x07, 0x97, 0x4c, 0x62, 0x61, 0x41, 0xd4, 0xfc,
+ 0xc9, 0x39, 0xe3, 0x9b, 0xd0, 0xcc, 0x75, 0xc4,
+ 0x97, 0xe6, 0xdd, 0x2a, 0x5f, 0xa6, 0xe8, 0x59,
+ 0x6c, 0x98, 0xb9, 0x02, 0xe2, 0xa2, 0xd6, 0x68,
+ 0xee, 0x3b, 0x1d, 0xe3, 0x4d, 0x5b, 0x30, 0xef,
+ 0x03, 0xf2, 0xeb, 0x18, 0x57, 0x36, 0xe8, 0xa1,
+ 0xf4, 0x47, 0xfb, 0xcb, 0x8f, 0xcb, 0xc8, 0xf3,
+ 0x4f, 0x74, 0x9d, 0x9d, 0xb1, 0x8d, 0x14, 0x44,
+ 0xd9, 0x19, 0xb4, 0x54, 0x4f, 0x75, 0x19, 0x09,
+ 0xa0, 0x75, 0xbc, 0x3b, 0x82, 0xc6, 0x3f, 0xb8,
+ 0x83, 0x19, 0x6e, 0xd6, 0x37, 0xfe, 0x6e, 0x8a,
+ 0x4e, 0xe0, 0x4a, 0xab, 0x7b, 0xc8, 0xb4, 0x1d,
+ 0xf4, 0xed, 0x27, 0x03, 0x65, 0xa2, 0xa1, 0xae,
+ 0x11, 0xe7, 0x98, 0x78, 0x48, 0x91, 0xd2, 0xd2,
+ 0xd4, 0x23, 0x78, 0x50, 0xb1, 0x5b, 0x85, 0x10,
+ 0x8d, 0xca, 0x5f, 0x0f, 0x71, 0xae, 0x72, 0x9a,
+ 0xf6, 0x25, 0x19, 0x60, 0x06, 0xf7, 0x10, 0x34,
+ 0x18, 0x0d, 0xc9, 0x9f, 0x7b, 0x0c, 0x9b, 0x8f,
+ 0x91, 0x1b, 0x9f, 0xcd, 0x10, 0xee, 0x75, 0xf9,
+ 0x97, 0x66, 0xfc, 0x4d, 0x33, 0x6e, 0x28, 0x2b,
+ 0x92, 0x85, 0x4f, 0xab, 0x43, 0x8d, 0x8f, 0x7d,
+ 0x86, 0xa7, 0xc7, 0xd8, 0xd3, 0x0b, 0x8b, 0x57,
+ 0xb6, 0x1d, 0x95, 0x0d, 0xe9, 0xbc, 0xd9, 0x03,
+ 0xd9, 0x10, 0x19, 0xc3, 0x46, 0x63, 0x55, 0x87,
+ 0x61, 0x79, 0x6c, 0x95, 0x0e, 0x9c, 0xdd, 0xca,
+ 0xc3, 0xf3, 0x64, 0xf0, 0x7d, 0x76, 0xb7, 0x53,
+ 0x67, 0x2b, 0x1e, 0x44, 0x56, 0x81, 0xea, 0x8f,
+ 0x5c, 0x42, 0x16, 0xb8, 0x28, 0xeb, 0x1b, 0x61,
+ 0x10, 0x1e, 0xbf, 0xec, 0xa8
+};
+static const u8 enc_output011[] __initconst = {
+ 0x6a, 0xfc, 0x4b, 0x25, 0xdf, 0xc0, 0xe4, 0xe8,
+ 0x17, 0x4d, 0x4c, 0xc9, 0x7e, 0xde, 0x3a, 0xcc,
+ 0x3c, 0xba, 0x6a, 0x77, 0x47, 0xdb, 0xe3, 0x74,
+ 0x7a, 0x4d, 0x5f, 0x8d, 0x37, 0x55, 0x80, 0x73,
+ 0x90, 0x66, 0x5d, 0x3a, 0x7d, 0x5d, 0x86, 0x5e,
+ 0x8d, 0xfd, 0x83, 0xff, 0x4e, 0x74, 0x6f, 0xf9,
+ 0xe6, 0x70, 0x17, 0x70, 0x3e, 0x96, 0xa7, 0x7e,
+ 0xcb, 0xab, 0x8f, 0x58, 0x24, 0x9b, 0x01, 0xfd,
+ 0xcb, 0xe6, 0x4d, 0x9b, 0xf0, 0x88, 0x94, 0x57,
+ 0x66, 0xef, 0x72, 0x4c, 0x42, 0x6e, 0x16, 0x19,
+ 0x15, 0xea, 0x70, 0x5b, 0xac, 0x13, 0xdb, 0x9f,
+ 0x18, 0xe2, 0x3c, 0x26, 0x97, 0xbc, 0xdc, 0x45,
+ 0x8c, 0x6c, 0x24, 0x69, 0x9c, 0xf7, 0x65, 0x1e,
+ 0x18, 0x59, 0x31, 0x7c, 0xe4, 0x73, 0xbc, 0x39,
+ 0x62, 0xc6, 0x5c, 0x9f, 0xbf, 0xfa, 0x90, 0x03,
+ 0xc9, 0x72, 0x26, 0xb6, 0x1b, 0xc2, 0xb7, 0x3f,
+ 0xf2, 0x13, 0x77, 0xf2, 0x8d, 0xb9, 0x47, 0xd0,
+ 0x53, 0xdd, 0xc8, 0x91, 0x83, 0x8b, 0xb1, 0xce,
+ 0xa3, 0xfe, 0xcd, 0xd9, 0xdd, 0x92, 0x7b, 0xdb,
+ 0xb8, 0xfb, 0xc9, 0x2d, 0x01, 0x59, 0x39, 0x52,
+ 0xad, 0x1b, 0xec, 0xcf, 0xd7, 0x70, 0x13, 0x21,
+ 0xf5, 0x47, 0xaa, 0x18, 0x21, 0x5c, 0xc9, 0x9a,
+ 0xd2, 0x6b, 0x05, 0x9c, 0x01, 0xa1, 0xda, 0x35,
+ 0x5d, 0xb3, 0x70, 0xe6, 0xa9, 0x80, 0x8b, 0x91,
+ 0xb7, 0xb3, 0x5f, 0x24, 0x9a, 0xb7, 0xd1, 0x6b,
+ 0xa1, 0x1c, 0x50, 0xba, 0x49, 0xe0, 0xee, 0x2e,
+ 0x75, 0xac, 0x69, 0xc0, 0xeb, 0x03, 0xdd, 0x19,
+ 0xe5, 0xf6, 0x06, 0xdd, 0xc3, 0xd7, 0x2b, 0x07,
+ 0x07, 0x30, 0xa7, 0x19, 0x0c, 0xbf, 0xe6, 0x18,
+ 0xcc, 0xb1, 0x01, 0x11, 0x85, 0x77, 0x1d, 0x96,
+ 0xa7, 0xa3, 0x00, 0x84, 0x02, 0xa2, 0x83, 0x68,
+ 0xda, 0x17, 0x27, 0xc8, 0x7f, 0x23, 0xb7, 0xf4,
+ 0x13, 0x85, 0xcf, 0xdd, 0x7a, 0x7d, 0x24, 0x57,
+ 0xfe, 0x05, 0x93, 0xf5, 0x74, 0xce, 0xed, 0x0c,
+ 0x20, 0x98, 0x8d, 0x92, 0x30, 0xa1, 0x29, 0x23,
+ 0x1a, 0xa0, 0x4f, 0x69, 0x56, 0x4c, 0xe1, 0xc8,
+ 0xce, 0xf6, 0x9a, 0x0c, 0xa4, 0xfa, 0x04, 0xf6,
+ 0x62, 0x95, 0xf2, 0xfa, 0xc7, 0x40, 0x68, 0x40,
+ 0x8f, 0x41, 0xda, 0xb4, 0x26, 0x6f, 0x70, 0xab,
+ 0x40, 0x61, 0xa4, 0x0e, 0x75, 0xfb, 0x86, 0xeb,
+ 0x9d, 0x9a, 0x1f, 0xec, 0x76, 0x99, 0xe7, 0xea,
+ 0xaa, 0x1e, 0x2d, 0xb5, 0xd4, 0xa6, 0x1a, 0xb8,
+ 0x61, 0x0a, 0x1d, 0x16, 0x5b, 0x98, 0xc2, 0x31,
+ 0x40, 0xe7, 0x23, 0x1d, 0x66, 0x99, 0xc8, 0xc0,
+ 0xd7, 0xce, 0xf3, 0x57, 0x40, 0x04, 0x3f, 0xfc,
+ 0xea, 0xb3, 0xfc, 0xd2, 0xd3, 0x99, 0xa4, 0x94,
+ 0x69, 0xa0, 0xef, 0xd1, 0x85, 0xb3, 0xa6, 0xb1,
+ 0x28, 0xbf, 0x94, 0x67, 0x22, 0xc3, 0x36, 0x46,
+ 0xf8, 0xd2, 0x0f, 0x5f, 0xf4, 0x59, 0x80, 0xe6,
+ 0x2d, 0x43, 0x08, 0x7d, 0x19, 0x09, 0x97, 0xa7,
+ 0x4c, 0x3d, 0x8d, 0xba, 0x65, 0x62, 0xa3, 0x71,
+ 0x33, 0x29, 0x62, 0xdb, 0xc1, 0x33, 0x34, 0x1a,
+ 0x63, 0x33, 0x16, 0xb6, 0x64, 0x7e, 0xab, 0x33,
+ 0xf0, 0xe6, 0x26, 0x68, 0xba, 0x1d, 0x2e, 0x38,
+ 0x08, 0xe6, 0x02, 0xd3, 0x25, 0x2c, 0x47, 0x23,
+ 0x58, 0x34, 0x0f, 0x9d, 0x63, 0x4f, 0x63, 0xbb,
+ 0x7f, 0x3b, 0x34, 0x38, 0xa7, 0xb5, 0x8d, 0x65,
+ 0xd9, 0x9f, 0x79, 0x55, 0x3e, 0x4d, 0xe7, 0x73,
+ 0xd8, 0xf6, 0x98, 0x97, 0x84, 0x60, 0x9c, 0xc8,
+ 0xa9, 0x3c, 0xf6, 0xdc, 0x12, 0x5c, 0xe1, 0xbb,
+ 0x0b, 0x8b, 0x98, 0x9c, 0x9d, 0x26, 0x7c, 0x4a,
+ 0xe6, 0x46, 0x36, 0x58, 0x21, 0x4a, 0xee, 0xca,
+ 0xd7, 0x3b, 0xc2, 0x6c, 0x49, 0x2f, 0xe5, 0xd5,
+ 0x03, 0x59, 0x84, 0x53, 0xcb, 0xfe, 0x92, 0x71,
+ 0x2e, 0x7c, 0x21, 0xcc, 0x99, 0x85, 0x7f, 0xb8,
+ 0x74, 0x90, 0x13, 0x42, 0x3f, 0xe0, 0x6b, 0x1d,
+ 0xf2, 0x4d, 0x54, 0xd4, 0xfc, 0x3a, 0x05, 0xe6,
+ 0x74, 0xaf, 0xa6, 0xa0, 0x2a, 0x20, 0x23, 0x5d,
+ 0x34, 0x5c, 0xd9, 0x3e, 0x4e, 0xfa, 0x93, 0xe7,
+ 0xaa, 0xe9, 0x6f, 0x08, 0x43, 0x67, 0x41, 0xc5,
+ 0xad, 0xfb, 0x31, 0x95, 0x82, 0x73, 0x32, 0xd8,
+ 0xa6, 0xa3, 0xed, 0x0e, 0x2d, 0xf6, 0x5f, 0xfd,
+ 0x80, 0xa6, 0x7a, 0xe0, 0xdf, 0x78, 0x15, 0x29,
+ 0x74, 0x33, 0xd0, 0x9e, 0x83, 0x86, 0x72, 0x22,
+ 0x57, 0x29, 0xb9, 0x9e, 0x5d, 0xd3, 0x1a, 0xb5,
+ 0x96, 0x72, 0x41, 0x3d, 0xf1, 0x64, 0x43, 0x67,
+ 0xee, 0xaa, 0x5c, 0xd3, 0x9a, 0x96, 0x13, 0x11,
+ 0x5d, 0xf3, 0x0c, 0x87, 0x82, 0x1e, 0x41, 0x9e,
+ 0xd0, 0x27, 0xd7, 0x54, 0x3b, 0x67, 0x73, 0x09,
+ 0x91, 0xe9, 0xd5, 0x36, 0xa7, 0xb5, 0x55, 0xe4,
+ 0xf3, 0x21, 0x51, 0x49, 0x22, 0x07, 0x55, 0x4f,
+ 0x44, 0x4b, 0xd2, 0x15, 0x93, 0x17, 0x2a, 0xfa,
+ 0x4d, 0x4a, 0x57, 0xdb, 0x4c, 0xa6, 0xeb, 0xec,
+ 0x53, 0x25, 0x6c, 0x21, 0xed, 0x00, 0x4c, 0x3b,
+ 0xca, 0x14, 0x57, 0xa9, 0xd6, 0x6a, 0xcd, 0x8d,
+ 0x5e, 0x74, 0xac, 0x72, 0xc1, 0x97, 0xe5, 0x1b,
+ 0x45, 0x4e, 0xda, 0xfc, 0xcc, 0x40, 0xe8, 0x48,
+ 0x88, 0x0b, 0xa3, 0xe3, 0x8d, 0x83, 0x42, 0xc3,
+ 0x23, 0xfd, 0x68, 0xb5, 0x8e, 0xf1, 0x9d, 0x63,
+ 0x77, 0xe9, 0xa3, 0x8e, 0x8c, 0x26, 0x6b, 0xbd,
+ 0x72, 0x73, 0x35, 0x0c, 0x03, 0xf8, 0x43, 0x78,
+ 0x52, 0x71, 0x15, 0x1f, 0x71, 0x5d, 0x6e, 0xed,
+ 0xb9, 0xcc, 0x86, 0x30, 0xdb, 0x2b, 0xd3, 0x82,
+ 0x88, 0x23, 0x71, 0x90, 0x53, 0x5c, 0xa9, 0x2f,
+ 0x76, 0x01, 0xb7, 0x9a, 0xfe, 0x43, 0x55, 0xa3,
+ 0x04, 0x9b, 0x0e, 0xe4, 0x59, 0xdf, 0xc9, 0xe9,
+ 0xb1, 0xea, 0x29, 0x28, 0x3c, 0x5c, 0xae, 0x72,
+ 0x84, 0xb6, 0xc6, 0xeb, 0x0c, 0x27, 0x07, 0x74,
+ 0x90, 0x0d, 0x31, 0xb0, 0x00, 0x77, 0xe9, 0x40,
+ 0x70, 0x6f, 0x68, 0xa7, 0xfd, 0x06, 0xec, 0x4b,
+ 0xc0, 0xb7, 0xac, 0xbc, 0x33, 0xb7, 0x6d, 0x0a,
+ 0xbd, 0x12, 0x1b, 0x59, 0xcb, 0xdd, 0x32, 0xf5,
+ 0x1d, 0x94, 0x57, 0x76, 0x9e, 0x0c, 0x18, 0x98,
+ 0x71, 0xd7, 0x2a, 0xdb, 0x0b, 0x7b, 0xa7, 0x71,
+ 0xb7, 0x67, 0x81, 0x23, 0x96, 0xae, 0xb9, 0x7e,
+ 0x32, 0x43, 0x92, 0x8a, 0x19, 0xa0, 0xc4, 0xd4,
+ 0x3b, 0x57, 0xf9, 0x4a, 0x2c, 0xfb, 0x51, 0x46,
+ 0xbb, 0xcb, 0x5d, 0xb3, 0xef, 0x13, 0x93, 0x6e,
+ 0x68, 0x42, 0x54, 0x57, 0xd3, 0x6a, 0x3a, 0x8f,
+ 0x9d, 0x66, 0xbf, 0xbd, 0x36, 0x23, 0xf5, 0x93,
+ 0x83, 0x7b, 0x9c, 0xc0, 0xdd, 0xc5, 0x49, 0xc0,
+ 0x64, 0xed, 0x07, 0x12, 0xb3, 0xe6, 0xe4, 0xe5,
+ 0x38, 0x95, 0x23, 0xb1, 0xa0, 0x3b, 0x1a, 0x61,
+ 0xda, 0x17, 0xac, 0xc3, 0x58, 0xdd, 0x74, 0x64,
+ 0x22, 0x11, 0xe8, 0x32, 0x1d, 0x16, 0x93, 0x85,
+ 0x99, 0xa5, 0x9c, 0x34, 0x55, 0xb1, 0xe9, 0x20,
+ 0x72, 0xc9, 0x28, 0x7b, 0x79, 0x00, 0xa1, 0xa6,
+ 0xa3, 0x27, 0x40, 0x18, 0x8a, 0x54, 0xe0, 0xcc,
+ 0xe8, 0x4e, 0x8e, 0x43, 0x96, 0xe7, 0x3f, 0xc8,
+ 0xe9, 0xb2, 0xf9, 0xc9, 0xda, 0x04, 0x71, 0x50,
+ 0x47, 0xe4, 0xaa, 0xce, 0xa2, 0x30, 0xc8, 0xe4,
+ 0xac, 0xc7, 0x0d, 0x06, 0x2e, 0xe6, 0xe8, 0x80,
+ 0x36, 0x29, 0x9e, 0x01, 0xb8, 0xc3, 0xf0, 0xa0,
+ 0x5d, 0x7a, 0xca, 0x4d, 0xa0, 0x57, 0xbd, 0x2a,
+ 0x45, 0xa7, 0x7f, 0x9c, 0x93, 0x07, 0x8f, 0x35,
+ 0x67, 0x92, 0xe3, 0xe9, 0x7f, 0xa8, 0x61, 0x43,
+ 0x9e, 0x25, 0x4f, 0x33, 0x76, 0x13, 0x6e, 0x12,
+ 0xb9, 0xdd, 0xa4, 0x7c, 0x08, 0x9f, 0x7c, 0xe7,
+ 0x0a, 0x8d, 0x84, 0x06, 0xa4, 0x33, 0x17, 0x34,
+ 0x5e, 0x10, 0x7c, 0xc0, 0xa8, 0x3d, 0x1f, 0x42,
+ 0x20, 0x51, 0x65, 0x5d, 0x09, 0xc3, 0xaa, 0xc0,
+ 0xc8, 0x0d, 0xf0, 0x79, 0xbc, 0x20, 0x1b, 0x95,
+ 0xe7, 0x06, 0x7d, 0x47, 0x20, 0x03, 0x1a, 0x74,
+ 0xdd, 0xe2, 0xd4, 0xae, 0x38, 0x71, 0x9b, 0xf5,
+ 0x80, 0xec, 0x08, 0x4e, 0x56, 0xba, 0x76, 0x12,
+ 0x1a, 0xdf, 0x48, 0xf3, 0xae, 0xb3, 0xe6, 0xe6,
+ 0xbe, 0xc0, 0x91, 0x2e, 0x01, 0xb3, 0x01, 0x86,
+ 0xa2, 0xb9, 0x52, 0xd1, 0x21, 0xae, 0xd4, 0x97,
+ 0x1d, 0xef, 0x41, 0x12, 0x95, 0x3d, 0x48, 0x45,
+ 0x1c, 0x56, 0x32, 0x8f, 0xb8, 0x43, 0xbb, 0x19,
+ 0xf3, 0xca, 0xe9, 0xeb, 0x6d, 0x84, 0xbe, 0x86,
+ 0x06, 0xe2, 0x36, 0xb2, 0x62, 0x9d, 0xd3, 0x4c,
+ 0x48, 0x18, 0x54, 0x13, 0x4e, 0xcf, 0xfd, 0xba,
+ 0x84, 0xb9, 0x30, 0x53, 0xcf, 0xfb, 0xb9, 0x29,
+ 0x8f, 0xdc, 0x9f, 0xef, 0x60, 0x0b, 0x64, 0xf6,
+ 0x8b, 0xee, 0xa6, 0x91, 0xc2, 0x41, 0x6c, 0xf6,
+ 0xfa, 0x79, 0x67, 0x4b, 0xc1, 0x3f, 0xaf, 0x09,
+ 0x81, 0xd4, 0x5d, 0xcb, 0x09, 0xdf, 0x36, 0x31,
+ 0xc0, 0x14, 0x3c, 0x7c, 0x0e, 0x65, 0x95, 0x99,
+ 0x6d, 0xa3, 0xf4, 0xd7, 0x38, 0xee, 0x1a, 0x2b,
+ 0x37, 0xe2, 0xa4, 0x3b, 0x4b, 0xd0, 0x65, 0xca,
+ 0xf8, 0xc3, 0xe8, 0x15, 0x20, 0xef, 0xf2, 0x00,
+ 0xfd, 0x01, 0x09, 0xc5, 0xc8, 0x17, 0x04, 0x93,
+ 0xd0, 0x93, 0x03, 0x55, 0xc5, 0xfe, 0x32, 0xa3,
+ 0x3e, 0x28, 0x2d, 0x3b, 0x93, 0x8a, 0xcc, 0x07,
+ 0x72, 0x80, 0x8b, 0x74, 0x16, 0x24, 0xbb, 0xda,
+ 0x94, 0x39, 0x30, 0x8f, 0xb1, 0xcd, 0x4a, 0x90,
+ 0x92, 0x7c, 0x14, 0x8f, 0x95, 0x4e, 0xac, 0x9b,
+ 0xd8, 0x8f, 0x1a, 0x87, 0xa4, 0x32, 0x27, 0x8a,
+ 0xba, 0xf7, 0x41, 0xcf, 0x84, 0x37, 0x19, 0xe6,
+ 0x06, 0xf5, 0x0e, 0xcf, 0x36, 0xf5, 0x9e, 0x6c,
+ 0xde, 0xbc, 0xff, 0x64, 0x7e, 0x4e, 0x59, 0x57,
+ 0x48, 0xfe, 0x14, 0xf7, 0x9c, 0x93, 0x5d, 0x15,
+ 0xad, 0xcc, 0x11, 0xb1, 0x17, 0x18, 0xb2, 0x7e,
+ 0xcc, 0xab, 0xe9, 0xce, 0x7d, 0x77, 0x5b, 0x51,
+ 0x1b, 0x1e, 0x20, 0xa8, 0x32, 0x06, 0x0e, 0x75,
+ 0x93, 0xac, 0xdb, 0x35, 0x37, 0x1f, 0xe9, 0x19,
+ 0x1d, 0xb4, 0x71, 0x97, 0xd6, 0x4e, 0x2c, 0x08,
+ 0xa5, 0x13, 0xf9, 0x0e, 0x7e, 0x78, 0x6e, 0x14,
+ 0xe0, 0xa9, 0xb9, 0x96, 0x4c, 0x80, 0x82, 0xba,
+ 0x17, 0xb3, 0x9d, 0x69, 0xb0, 0x84, 0x46, 0xff,
+ 0xf9, 0x52, 0x79, 0x94, 0x58, 0x3a, 0x62, 0x90,
+ 0x15, 0x35, 0x71, 0x10, 0x37, 0xed, 0xa1, 0x8e,
+ 0x53, 0x6e, 0xf4, 0x26, 0x57, 0x93, 0x15, 0x93,
+ 0xf6, 0x81, 0x2c, 0x5a, 0x10, 0xda, 0x92, 0xad,
+ 0x2f, 0xdb, 0x28, 0x31, 0x2d, 0x55, 0x04, 0xd2,
+ 0x06, 0x28, 0x8c, 0x1e, 0xdc, 0xea, 0x54, 0xac,
+ 0xff, 0xb7, 0x6c, 0x30, 0x15, 0xd4, 0xb4, 0x0d,
+ 0x00, 0x93, 0x57, 0xdd, 0xd2, 0x07, 0x07, 0x06,
+ 0xd9, 0x43, 0x9b, 0xcd, 0x3a, 0xf4, 0x7d, 0x4c,
+ 0x36, 0x5d, 0x23, 0xa2, 0xcc, 0x57, 0x40, 0x91,
+ 0xe9, 0x2c, 0x2f, 0x2c, 0xd5, 0x30, 0x9b, 0x17,
+ 0xb0, 0xc9, 0xf7, 0xa7, 0x2f, 0xd1, 0x93, 0x20,
+ 0x6b, 0xc6, 0xc1, 0xe4, 0x6f, 0xcb, 0xd1, 0xe7,
+ 0x09, 0x0f, 0x9e, 0xdc, 0xaa, 0x9f, 0x2f, 0xdf,
+ 0x56, 0x9f, 0xd4, 0x33, 0x04, 0xaf, 0xd3, 0x6c,
+ 0x58, 0x61, 0xf0, 0x30, 0xec, 0xf2, 0x7f, 0xf2,
+ 0x9c, 0xdf, 0x39, 0xbb, 0x6f, 0xa2, 0x8c, 0x7e,
+ 0xc4, 0x22, 0x51, 0x71, 0xc0, 0x4d, 0x14, 0x1a,
+ 0xc4, 0xcd, 0x04, 0xd9, 0x87, 0x08, 0x50, 0x05,
+ 0xcc, 0xaf, 0xf6, 0xf0, 0x8f, 0x92, 0x54, 0x58,
+ 0xc2, 0xc7, 0x09, 0x7a, 0x59, 0x02, 0x05, 0xe8,
+ 0xb0, 0x86, 0xd9, 0xbf, 0x7b, 0x35, 0x51, 0x4d,
+ 0xaf, 0x08, 0x97, 0x2c, 0x65, 0xda, 0x2a, 0x71,
+ 0x3a, 0xa8, 0x51, 0xcc, 0xf2, 0x73, 0x27, 0xc3,
+ 0xfd, 0x62, 0xcf, 0xe3, 0xb2, 0xca, 0xcb, 0xbe,
+ 0x1a, 0x0a, 0xa1, 0x34, 0x7b, 0x77, 0xc4, 0x62,
+ 0x68, 0x78, 0x5f, 0x94, 0x07, 0x04, 0x65, 0x16,
+ 0x4b, 0x61, 0xcb, 0xff, 0x75, 0x26, 0x50, 0x66,
+ 0x1f, 0x6e, 0x93, 0xf8, 0xc5, 0x51, 0xeb, 0xa4,
+ 0x4a, 0x48, 0x68, 0x6b, 0xe2, 0x5e, 0x44, 0xb2,
+ 0x50, 0x2c, 0x6c, 0xae, 0x79, 0x4e, 0x66, 0x35,
+ 0x81, 0x50, 0xac, 0xbc, 0x3f, 0xb1, 0x0c, 0xf3,
+ 0x05, 0x3c, 0x4a, 0xa3, 0x6c, 0x2a, 0x79, 0xb4,
+ 0xb7, 0xab, 0xca, 0xc7, 0x9b, 0x8e, 0xcd, 0x5f,
+ 0x11, 0x03, 0xcb, 0x30, 0xa3, 0xab, 0xda, 0xfe,
+ 0x64, 0xb9, 0xbb, 0xd8, 0x5e, 0x3a, 0x1a, 0x56,
+ 0xe5, 0x05, 0x48, 0x90, 0x1e, 0x61, 0x69, 0x1b,
+ 0x22, 0xe6, 0x1a, 0x3c, 0x75, 0xad, 0x1f, 0x37,
+ 0x28, 0xdc, 0xe4, 0x6d, 0xbd, 0x42, 0xdc, 0xd3,
+ 0xc8, 0xb6, 0x1c, 0x48, 0xfe, 0x94, 0x77, 0x7f,
+ 0xbd, 0x62, 0xac, 0xa3, 0x47, 0x27, 0xcf, 0x5f,
+ 0xd9, 0xdb, 0xaf, 0xec, 0xf7, 0x5e, 0xc1, 0xb0,
+ 0x9d, 0x01, 0x26, 0x99, 0x7e, 0x8f, 0x03, 0x70,
+ 0xb5, 0x42, 0xbe, 0x67, 0x28, 0x1b, 0x7c, 0xbd,
+ 0x61, 0x21, 0x97, 0xcc, 0x5c, 0xe1, 0x97, 0x8f,
+ 0x8d, 0xde, 0x2b, 0xaa, 0xa7, 0x71, 0x1d, 0x1e,
+ 0x02, 0x73, 0x70, 0x58, 0x32, 0x5b, 0x1d, 0x67,
+ 0x3d, 0xe0, 0x74, 0x4f, 0x03, 0xf2, 0x70, 0x51,
+ 0x79, 0xf1, 0x61, 0x70, 0x15, 0x74, 0x9d, 0x23,
+ 0x89, 0xde, 0xac, 0xfd, 0xde, 0xd0, 0x1f, 0xc3,
+ 0x87, 0x44, 0x35, 0x4b, 0xe5, 0xb0, 0x60, 0xc5,
+ 0x22, 0xe4, 0x9e, 0xca, 0xeb, 0xd5, 0x3a, 0x09,
+ 0x45, 0xa4, 0xdb, 0xfa, 0x3f, 0xeb, 0x1b, 0xc7,
+ 0xc8, 0x14, 0x99, 0x51, 0x92, 0x10, 0xed, 0xed,
+ 0x28, 0xe0, 0xa1, 0xf8, 0x26, 0xcf, 0xcd, 0xcb,
+ 0x63, 0xa1, 0x3b, 0xe3, 0xdf, 0x7e, 0xfe, 0xa6,
+ 0xf0, 0x81, 0x9a, 0xbf, 0x55, 0xde, 0x54, 0xd5,
+ 0x56, 0x60, 0x98, 0x10, 0x68, 0xf4, 0x38, 0x96,
+ 0x8e, 0x6f, 0x1d, 0x44, 0x7f, 0xd6, 0x2f, 0xfe,
+ 0x55, 0xfb, 0x0c, 0x7e, 0x67, 0xe2, 0x61, 0x44,
+ 0xed, 0xf2, 0x35, 0x30, 0x5d, 0xe9, 0xc7, 0xd6,
+ 0x6d, 0xe0, 0xa0, 0xed, 0xf3, 0xfc, 0xd8, 0x3e,
+ 0x0a, 0x7b, 0xcd, 0xaf, 0x65, 0x68, 0x18, 0xc0,
+ 0xec, 0x04, 0x1c, 0x74, 0x6d, 0xe2, 0x6e, 0x79,
+ 0xd4, 0x11, 0x2b, 0x62, 0xd5, 0x27, 0xad, 0x4f,
+ 0x01, 0x59, 0x73, 0xcc, 0x6a, 0x53, 0xfb, 0x2d,
+ 0xd5, 0x4e, 0x99, 0x21, 0x65, 0x4d, 0xf5, 0x82,
+ 0xf7, 0xd8, 0x42, 0xce, 0x6f, 0x3d, 0x36, 0x47,
+ 0xf1, 0x05, 0x16, 0xe8, 0x1b, 0x6a, 0x8f, 0x93,
+ 0xf2, 0x8f, 0x37, 0x40, 0x12, 0x28, 0xa3, 0xe6,
+ 0xb9, 0x17, 0x4a, 0x1f, 0xb1, 0xd1, 0x66, 0x69,
+ 0x86, 0xc4, 0xfc, 0x97, 0xae, 0x3f, 0x8f, 0x1e,
+ 0x2b, 0xdf, 0xcd, 0xf9, 0x3c
+};
+static const u8 enc_assoc011[] __initconst = {
+ 0xd6, 0x31, 0xda, 0x5d, 0x42, 0x5e, 0xd7
+};
+static const u8 enc_nonce011[] __initconst = {
+ 0xfd, 0x87, 0xd4, 0xd8, 0x62, 0xfd, 0xec, 0xaa
+};
+static const u8 enc_key011[] __initconst = {
+ 0x35, 0x4e, 0xb5, 0x70, 0x50, 0x42, 0x8a, 0x85,
+ 0xf2, 0xfb, 0xed, 0x7b, 0xd0, 0x9e, 0x97, 0xca,
+ 0xfa, 0x98, 0x66, 0x63, 0xee, 0x37, 0xcc, 0x52,
+ 0xfe, 0xd1, 0xdf, 0x95, 0x15, 0x34, 0x29, 0x38
+};
+
+static const u8 enc_input012[] __initconst = {
+ 0x74, 0xa6, 0x3e, 0xe4, 0xb1, 0xcb, 0xaf, 0xb0,
+ 0x40, 0xe5, 0x0f, 0x9e, 0xf1, 0xf2, 0x89, 0xb5,
+ 0x42, 0x34, 0x8a, 0xa1, 0x03, 0xb7, 0xe9, 0x57,
+ 0x46, 0xbe, 0x20, 0xe4, 0x6e, 0xb0, 0xeb, 0xff,
+ 0xea, 0x07, 0x7e, 0xef, 0xe2, 0x55, 0x9f, 0xe5,
+ 0x78, 0x3a, 0xb7, 0x83, 0xc2, 0x18, 0x40, 0x7b,
+ 0xeb, 0xcd, 0x81, 0xfb, 0x90, 0x12, 0x9e, 0x46,
+ 0xa9, 0xd6, 0x4a, 0xba, 0xb0, 0x62, 0xdb, 0x6b,
+ 0x99, 0xc4, 0xdb, 0x54, 0x4b, 0xb8, 0xa5, 0x71,
+ 0xcb, 0xcd, 0x63, 0x32, 0x55, 0xfb, 0x31, 0xf0,
+ 0x38, 0xf5, 0xbe, 0x78, 0xe4, 0x45, 0xce, 0x1b,
+ 0x6a, 0x5b, 0x0e, 0xf4, 0x16, 0xe4, 0xb1, 0x3d,
+ 0xf6, 0x63, 0x7b, 0xa7, 0x0c, 0xde, 0x6f, 0x8f,
+ 0x74, 0xdf, 0xe0, 0x1e, 0x9d, 0xce, 0x8f, 0x24,
+ 0xef, 0x23, 0x35, 0x33, 0x7b, 0x83, 0x34, 0x23,
+ 0x58, 0x74, 0x14, 0x77, 0x1f, 0xc2, 0x4f, 0x4e,
+ 0xc6, 0x89, 0xf9, 0x52, 0x09, 0x37, 0x64, 0x14,
+ 0xc4, 0x01, 0x6b, 0x9d, 0x77, 0xe8, 0x90, 0x5d,
+ 0xa8, 0x4a, 0x2a, 0xef, 0x5c, 0x7f, 0xeb, 0xbb,
+ 0xb2, 0xc6, 0x93, 0x99, 0x66, 0xdc, 0x7f, 0xd4,
+ 0x9e, 0x2a, 0xca, 0x8d, 0xdb, 0xe7, 0x20, 0xcf,
+ 0xe4, 0x73, 0xae, 0x49, 0x7d, 0x64, 0x0f, 0x0e,
+ 0x28, 0x46, 0xa9, 0xa8, 0x32, 0xe4, 0x0e, 0xf6,
+ 0x51, 0x53, 0xb8, 0x3c, 0xb1, 0xff, 0xa3, 0x33,
+ 0x41, 0x75, 0xff, 0xf1, 0x6f, 0xf1, 0xfb, 0xbb,
+ 0x83, 0x7f, 0x06, 0x9b, 0xe7, 0x1b, 0x0a, 0xe0,
+ 0x5c, 0x33, 0x60, 0x5b, 0xdb, 0x5b, 0xed, 0xfe,
+ 0xa5, 0x16, 0x19, 0x72, 0xa3, 0x64, 0x23, 0x00,
+ 0x02, 0xc7, 0xf3, 0x6a, 0x81, 0x3e, 0x44, 0x1d,
+ 0x79, 0x15, 0x5f, 0x9a, 0xde, 0xe2, 0xfd, 0x1b,
+ 0x73, 0xc1, 0xbc, 0x23, 0xba, 0x31, 0xd2, 0x50,
+ 0xd5, 0xad, 0x7f, 0x74, 0xa7, 0xc9, 0xf8, 0x3e,
+ 0x2b, 0x26, 0x10, 0xf6, 0x03, 0x36, 0x74, 0xe4,
+ 0x0e, 0x6a, 0x72, 0xb7, 0x73, 0x0a, 0x42, 0x28,
+ 0xc2, 0xad, 0x5e, 0x03, 0xbe, 0xb8, 0x0b, 0xa8,
+ 0x5b, 0xd4, 0xb8, 0xba, 0x52, 0x89, 0xb1, 0x9b,
+ 0xc1, 0xc3, 0x65, 0x87, 0xed, 0xa5, 0xf4, 0x86,
+ 0xfd, 0x41, 0x80, 0x91, 0x27, 0x59, 0x53, 0x67,
+ 0x15, 0x78, 0x54, 0x8b, 0x2d, 0x3d, 0xc7, 0xff,
+ 0x02, 0x92, 0x07, 0x5f, 0x7a, 0x4b, 0x60, 0x59,
+ 0x3c, 0x6f, 0x5c, 0xd8, 0xec, 0x95, 0xd2, 0xfe,
+ 0xa0, 0x3b, 0xd8, 0x3f, 0xd1, 0x69, 0xa6, 0xd6,
+ 0x41, 0xb2, 0xf4, 0x4d, 0x12, 0xf4, 0x58, 0x3e,
+ 0x66, 0x64, 0x80, 0x31, 0x9b, 0xa8, 0x4c, 0x8b,
+ 0x07, 0xb2, 0xec, 0x66, 0x94, 0x66, 0x47, 0x50,
+ 0x50, 0x5f, 0x18, 0x0b, 0x0e, 0xd6, 0xc0, 0x39,
+ 0x21, 0x13, 0x9e, 0x33, 0xbc, 0x79, 0x36, 0x02,
+ 0x96, 0x70, 0xf0, 0x48, 0x67, 0x2f, 0x26, 0xe9,
+ 0x6d, 0x10, 0xbb, 0xd6, 0x3f, 0xd1, 0x64, 0x7a,
+ 0x2e, 0xbe, 0x0c, 0x61, 0xf0, 0x75, 0x42, 0x38,
+ 0x23, 0xb1, 0x9e, 0x9f, 0x7c, 0x67, 0x66, 0xd9,
+ 0x58, 0x9a, 0xf1, 0xbb, 0x41, 0x2a, 0x8d, 0x65,
+ 0x84, 0x94, 0xfc, 0xdc, 0x6a, 0x50, 0x64, 0xdb,
+ 0x56, 0x33, 0x76, 0x00, 0x10, 0xed, 0xbe, 0xd2,
+ 0x12, 0xf6, 0xf6, 0x1b, 0xa2, 0x16, 0xde, 0xae,
+ 0x31, 0x95, 0xdd, 0xb1, 0x08, 0x7e, 0x4e, 0xee,
+ 0xe7, 0xf9, 0xa5, 0xfb, 0x5b, 0x61, 0x43, 0x00,
+ 0x40, 0xf6, 0x7e, 0x02, 0x04, 0x32, 0x4e, 0x0c,
+ 0xe2, 0x66, 0x0d, 0xd7, 0x07, 0x98, 0x0e, 0xf8,
+ 0x72, 0x34, 0x6d, 0x95, 0x86, 0xd7, 0xcb, 0x31,
+ 0x54, 0x47, 0xd0, 0x38, 0x29, 0x9c, 0x5a, 0x68,
+ 0xd4, 0x87, 0x76, 0xc9, 0xe7, 0x7e, 0xe3, 0xf4,
+ 0x81, 0x6d, 0x18, 0xcb, 0xc9, 0x05, 0xaf, 0xa0,
+ 0xfb, 0x66, 0xf7, 0xf1, 0x1c, 0xc6, 0x14, 0x11,
+ 0x4f, 0x2b, 0x79, 0x42, 0x8b, 0xbc, 0xac, 0xe7,
+ 0x6c, 0xfe, 0x0f, 0x58, 0xe7, 0x7c, 0x78, 0x39,
+ 0x30, 0xb0, 0x66, 0x2c, 0x9b, 0x6d, 0x3a, 0xe1,
+ 0xcf, 0xc9, 0xa4, 0x0e, 0x6d, 0x6d, 0x8a, 0xa1,
+ 0x3a, 0xe7, 0x28, 0xd4, 0x78, 0x4c, 0xa6, 0xa2,
+ 0x2a, 0xa6, 0x03, 0x30, 0xd7, 0xa8, 0x25, 0x66,
+ 0x87, 0x2f, 0x69, 0x5c, 0x4e, 0xdd, 0xa5, 0x49,
+ 0x5d, 0x37, 0x4a, 0x59, 0xc4, 0xaf, 0x1f, 0xa2,
+ 0xe4, 0xf8, 0xa6, 0x12, 0x97, 0xd5, 0x79, 0xf5,
+ 0xe2, 0x4a, 0x2b, 0x5f, 0x61, 0xe4, 0x9e, 0xe3,
+ 0xee, 0xb8, 0xa7, 0x5b, 0x2f, 0xf4, 0x9e, 0x6c,
+ 0xfb, 0xd1, 0xc6, 0x56, 0x77, 0xba, 0x75, 0xaa,
+ 0x3d, 0x1a, 0xa8, 0x0b, 0xb3, 0x68, 0x24, 0x00,
+ 0x10, 0x7f, 0xfd, 0xd7, 0xa1, 0x8d, 0x83, 0x54,
+ 0x4f, 0x1f, 0xd8, 0x2a, 0xbe, 0x8a, 0x0c, 0x87,
+ 0xab, 0xa2, 0xde, 0xc3, 0x39, 0xbf, 0x09, 0x03,
+ 0xa5, 0xf3, 0x05, 0x28, 0xe1, 0xe1, 0xee, 0x39,
+ 0x70, 0x9c, 0xd8, 0x81, 0x12, 0x1e, 0x02, 0x40,
+ 0xd2, 0x6e, 0xf0, 0xeb, 0x1b, 0x3d, 0x22, 0xc6,
+ 0xe5, 0xe3, 0xb4, 0x5a, 0x98, 0xbb, 0xf0, 0x22,
+ 0x28, 0x8d, 0xe5, 0xd3, 0x16, 0x48, 0x24, 0xa5,
+ 0xe6, 0x66, 0x0c, 0xf9, 0x08, 0xf9, 0x7e, 0x1e,
+ 0xe1, 0x28, 0x26, 0x22, 0xc7, 0xc7, 0x0a, 0x32,
+ 0x47, 0xfa, 0xa3, 0xbe, 0x3c, 0xc4, 0xc5, 0x53,
+ 0x0a, 0xd5, 0x94, 0x4a, 0xd7, 0x93, 0xd8, 0x42,
+ 0x99, 0xb9, 0x0a, 0xdb, 0x56, 0xf7, 0xb9, 0x1c,
+ 0x53, 0x4f, 0xfa, 0xd3, 0x74, 0xad, 0xd9, 0x68,
+ 0xf1, 0x1b, 0xdf, 0x61, 0xc6, 0x5e, 0xa8, 0x48,
+ 0xfc, 0xd4, 0x4a, 0x4c, 0x3c, 0x32, 0xf7, 0x1c,
+ 0x96, 0x21, 0x9b, 0xf9, 0xa3, 0xcc, 0x5a, 0xce,
+ 0xd5, 0xd7, 0x08, 0x24, 0xf6, 0x1c, 0xfd, 0xdd,
+ 0x38, 0xc2, 0x32, 0xe9, 0xb8, 0xe7, 0xb6, 0xfa,
+ 0x9d, 0x45, 0x13, 0x2c, 0x83, 0xfd, 0x4a, 0x69,
+ 0x82, 0xcd, 0xdc, 0xb3, 0x76, 0x0c, 0x9e, 0xd8,
+ 0xf4, 0x1b, 0x45, 0x15, 0xb4, 0x97, 0xe7, 0x58,
+ 0x34, 0xe2, 0x03, 0x29, 0x5a, 0xbf, 0xb6, 0xe0,
+ 0x5d, 0x13, 0xd9, 0x2b, 0xb4, 0x80, 0xb2, 0x45,
+ 0x81, 0x6a, 0x2e, 0x6c, 0x89, 0x7d, 0xee, 0xbb,
+ 0x52, 0xdd, 0x1f, 0x18, 0xe7, 0x13, 0x6b, 0x33,
+ 0x0e, 0xea, 0x36, 0x92, 0x77, 0x7b, 0x6d, 0x9c,
+ 0x5a, 0x5f, 0x45, 0x7b, 0x7b, 0x35, 0x62, 0x23,
+ 0xd1, 0xbf, 0x0f, 0xd0, 0x08, 0x1b, 0x2b, 0x80,
+ 0x6b, 0x7e, 0xf1, 0x21, 0x47, 0xb0, 0x57, 0xd1,
+ 0x98, 0x72, 0x90, 0x34, 0x1c, 0x20, 0x04, 0xff,
+ 0x3d, 0x5c, 0xee, 0x0e, 0x57, 0x5f, 0x6f, 0x24,
+ 0x4e, 0x3c, 0xea, 0xfc, 0xa5, 0xa9, 0x83, 0xc9,
+ 0x61, 0xb4, 0x51, 0x24, 0xf8, 0x27, 0x5e, 0x46,
+ 0x8c, 0xb1, 0x53, 0x02, 0x96, 0x35, 0xba, 0xb8,
+ 0x4c, 0x71, 0xd3, 0x15, 0x59, 0x35, 0x22, 0x20,
+ 0xad, 0x03, 0x9f, 0x66, 0x44, 0x3b, 0x9c, 0x35,
+ 0x37, 0x1f, 0x9b, 0xbb, 0xf3, 0xdb, 0x35, 0x63,
+ 0x30, 0x64, 0xaa, 0xa2, 0x06, 0xa8, 0x5d, 0xbb,
+ 0xe1, 0x9f, 0x70, 0xec, 0x82, 0x11, 0x06, 0x36,
+ 0xec, 0x8b, 0x69, 0x66, 0x24, 0x44, 0xc9, 0x4a,
+ 0x57, 0xbb, 0x9b, 0x78, 0x13, 0xce, 0x9c, 0x0c,
+ 0xba, 0x92, 0x93, 0x63, 0xb8, 0xe2, 0x95, 0x0f,
+ 0x0f, 0x16, 0x39, 0x52, 0xfd, 0x3a, 0x6d, 0x02,
+ 0x4b, 0xdf, 0x13, 0xd3, 0x2a, 0x22, 0xb4, 0x03,
+ 0x7c, 0x54, 0x49, 0x96, 0x68, 0x54, 0x10, 0xfa,
+ 0xef, 0xaa, 0x6c, 0xe8, 0x22, 0xdc, 0x71, 0x16,
+ 0x13, 0x1a, 0xf6, 0x28, 0xe5, 0x6d, 0x77, 0x3d,
+ 0xcd, 0x30, 0x63, 0xb1, 0x70, 0x52, 0xa1, 0xc5,
+ 0x94, 0x5f, 0xcf, 0xe8, 0xb8, 0x26, 0x98, 0xf7,
+ 0x06, 0xa0, 0x0a, 0x70, 0xfa, 0x03, 0x80, 0xac,
+ 0xc1, 0xec, 0xd6, 0x4c, 0x54, 0xd7, 0xfe, 0x47,
+ 0xb6, 0x88, 0x4a, 0xf7, 0x71, 0x24, 0xee, 0xf3,
+ 0xd2, 0xc2, 0x4a, 0x7f, 0xfe, 0x61, 0xc7, 0x35,
+ 0xc9, 0x37, 0x67, 0xcb, 0x24, 0x35, 0xda, 0x7e,
+ 0xca, 0x5f, 0xf3, 0x8d, 0xd4, 0x13, 0x8e, 0xd6,
+ 0xcb, 0x4d, 0x53, 0x8f, 0x53, 0x1f, 0xc0, 0x74,
+ 0xf7, 0x53, 0xb9, 0x5e, 0x23, 0x37, 0xba, 0x6e,
+ 0xe3, 0x9d, 0x07, 0x55, 0x25, 0x7b, 0xe6, 0x2a,
+ 0x64, 0xd1, 0x32, 0xdd, 0x54, 0x1b, 0x4b, 0xc0,
+ 0xe1, 0xd7, 0x69, 0x58, 0xf8, 0x93, 0x29, 0xc4,
+ 0xdd, 0x23, 0x2f, 0xa5, 0xfc, 0x9d, 0x7e, 0xf8,
+ 0xd4, 0x90, 0xcd, 0x82, 0x55, 0xdc, 0x16, 0x16,
+ 0x9f, 0x07, 0x52, 0x9b, 0x9d, 0x25, 0xed, 0x32,
+ 0xc5, 0x7b, 0xdf, 0xf6, 0x83, 0x46, 0x3d, 0x65,
+ 0xb7, 0xef, 0x87, 0x7a, 0x12, 0x69, 0x8f, 0x06,
+ 0x7c, 0x51, 0x15, 0x4a, 0x08, 0xe8, 0xac, 0x9a,
+ 0x0c, 0x24, 0xa7, 0x27, 0xd8, 0x46, 0x2f, 0xe7,
+ 0x01, 0x0e, 0x1c, 0xc6, 0x91, 0xb0, 0x6e, 0x85,
+ 0x65, 0xf0, 0x29, 0x0d, 0x2e, 0x6b, 0x3b, 0xfb,
+ 0x4b, 0xdf, 0xe4, 0x80, 0x93, 0x03, 0x66, 0x46,
+ 0x3e, 0x8a, 0x6e, 0xf3, 0x5e, 0x4d, 0x62, 0x0e,
+ 0x49, 0x05, 0xaf, 0xd4, 0xf8, 0x21, 0x20, 0x61,
+ 0x1d, 0x39, 0x17, 0xf4, 0x61, 0x47, 0x95, 0xfb,
+ 0x15, 0x2e, 0xb3, 0x4f, 0xd0, 0x5d, 0xf5, 0x7d,
+ 0x40, 0xda, 0x90, 0x3c, 0x6b, 0xcb, 0x17, 0x00,
+ 0x13, 0x3b, 0x64, 0x34, 0x1b, 0xf0, 0xf2, 0xe5,
+ 0x3b, 0xb2, 0xc7, 0xd3, 0x5f, 0x3a, 0x44, 0xa6,
+ 0x9b, 0xb7, 0x78, 0x0e, 0x42, 0x5d, 0x4c, 0xc1,
+ 0xe9, 0xd2, 0xcb, 0xb7, 0x78, 0xd1, 0xfe, 0x9a,
+ 0xb5, 0x07, 0xe9, 0xe0, 0xbe, 0xe2, 0x8a, 0xa7,
+ 0x01, 0x83, 0x00, 0x8c, 0x5c, 0x08, 0xe6, 0x63,
+ 0x12, 0x92, 0xb7, 0xb7, 0xa6, 0x19, 0x7d, 0x38,
+ 0x13, 0x38, 0x92, 0x87, 0x24, 0xf9, 0x48, 0xb3,
+ 0x5e, 0x87, 0x6a, 0x40, 0x39, 0x5c, 0x3f, 0xed,
+ 0x8f, 0xee, 0xdb, 0x15, 0x82, 0x06, 0xda, 0x49,
+ 0x21, 0x2b, 0xb5, 0xbf, 0x32, 0x7c, 0x9f, 0x42,
+ 0x28, 0x63, 0xcf, 0xaf, 0x1e, 0xf8, 0xc6, 0xa0,
+ 0xd1, 0x02, 0x43, 0x57, 0x62, 0xec, 0x9b, 0x0f,
+ 0x01, 0x9e, 0x71, 0xd8, 0x87, 0x9d, 0x01, 0xc1,
+ 0x58, 0x77, 0xd9, 0xaf, 0xb1, 0x10, 0x7e, 0xdd,
+ 0xa6, 0x50, 0x96, 0xe5, 0xf0, 0x72, 0x00, 0x6d,
+ 0x4b, 0xf8, 0x2a, 0x8f, 0x19, 0xf3, 0x22, 0x88,
+ 0x11, 0x4a, 0x8b, 0x7c, 0xfd, 0xb7, 0xed, 0xe1,
+ 0xf6, 0x40, 0x39, 0xe0, 0xe9, 0xf6, 0x3d, 0x25,
+ 0xe6, 0x74, 0x3c, 0x58, 0x57, 0x7f, 0xe1, 0x22,
+ 0x96, 0x47, 0x31, 0x91, 0xba, 0x70, 0x85, 0x28,
+ 0x6b, 0x9f, 0x6e, 0x25, 0xac, 0x23, 0x66, 0x2f,
+ 0x29, 0x88, 0x28, 0xce, 0x8c, 0x5c, 0x88, 0x53,
+ 0xd1, 0x3b, 0xcc, 0x6a, 0x51, 0xb2, 0xe1, 0x28,
+ 0x3f, 0x91, 0xb4, 0x0d, 0x00, 0x3a, 0xe3, 0xf8,
+ 0xc3, 0x8f, 0xd7, 0x96, 0x62, 0x0e, 0x2e, 0xfc,
+ 0xc8, 0x6c, 0x77, 0xa6, 0x1d, 0x22, 0xc1, 0xb8,
+ 0xe6, 0x61, 0xd7, 0x67, 0x36, 0x13, 0x7b, 0xbb,
+ 0x9b, 0x59, 0x09, 0xa6, 0xdf, 0xf7, 0x6b, 0xa3,
+ 0x40, 0x1a, 0xf5, 0x4f, 0xb4, 0xda, 0xd3, 0xf3,
+ 0x81, 0x93, 0xc6, 0x18, 0xd9, 0x26, 0xee, 0xac,
+ 0xf0, 0xaa, 0xdf, 0xc5, 0x9c, 0xca, 0xc2, 0xa2,
+ 0xcc, 0x7b, 0x5c, 0x24, 0xb0, 0xbc, 0xd0, 0x6a,
+ 0x4d, 0x89, 0x09, 0xb8, 0x07, 0xfe, 0x87, 0xad,
+ 0x0a, 0xea, 0xb8, 0x42, 0xf9, 0x5e, 0xb3, 0x3e,
+ 0x36, 0x4c, 0xaf, 0x75, 0x9e, 0x1c, 0xeb, 0xbd,
+ 0xbc, 0xbb, 0x80, 0x40, 0xa7, 0x3a, 0x30, 0xbf,
+ 0xa8, 0x44, 0xf4, 0xeb, 0x38, 0xad, 0x29, 0xba,
+ 0x23, 0xed, 0x41, 0x0c, 0xea, 0xd2, 0xbb, 0x41,
+ 0x18, 0xd6, 0xb9, 0xba, 0x65, 0x2b, 0xa3, 0x91,
+ 0x6d, 0x1f, 0xa9, 0xf4, 0xd1, 0x25, 0x8d, 0x4d,
+ 0x38, 0xff, 0x64, 0xa0, 0xec, 0xde, 0xa6, 0xb6,
+ 0x79, 0xab, 0x8e, 0x33, 0x6c, 0x47, 0xde, 0xaf,
+ 0x94, 0xa4, 0xa5, 0x86, 0x77, 0x55, 0x09, 0x92,
+ 0x81, 0x31, 0x76, 0xc7, 0x34, 0x22, 0x89, 0x8e,
+ 0x3d, 0x26, 0x26, 0xd7, 0xfc, 0x1e, 0x16, 0x72,
+ 0x13, 0x33, 0x63, 0xd5, 0x22, 0xbe, 0xb8, 0x04,
+ 0x34, 0x84, 0x41, 0xbb, 0x80, 0xd0, 0x9f, 0x46,
+ 0x48, 0x07, 0xa7, 0xfc, 0x2b, 0x3a, 0x75, 0x55,
+ 0x8c, 0xc7, 0x6a, 0xbd, 0x7e, 0x46, 0x08, 0x84,
+ 0x0f, 0xd5, 0x74, 0xc0, 0x82, 0x8e, 0xaa, 0x61,
+ 0x05, 0x01, 0xb2, 0x47, 0x6e, 0x20, 0x6a, 0x2d,
+ 0x58, 0x70, 0x48, 0x32, 0xa7, 0x37, 0xd2, 0xb8,
+ 0x82, 0x1a, 0x51, 0xb9, 0x61, 0xdd, 0xfd, 0x9d,
+ 0x6b, 0x0e, 0x18, 0x97, 0xf8, 0x45, 0x5f, 0x87,
+ 0x10, 0xcf, 0x34, 0x72, 0x45, 0x26, 0x49, 0x70,
+ 0xe7, 0xa3, 0x78, 0xe0, 0x52, 0x89, 0x84, 0x94,
+ 0x83, 0x82, 0xc2, 0x69, 0x8f, 0xe3, 0xe1, 0x3f,
+ 0x60, 0x74, 0x88, 0xc4, 0xf7, 0x75, 0x2c, 0xfb,
+ 0xbd, 0xb6, 0xc4, 0x7e, 0x10, 0x0a, 0x6c, 0x90,
+ 0x04, 0x9e, 0xc3, 0x3f, 0x59, 0x7c, 0xce, 0x31,
+ 0x18, 0x60, 0x57, 0x73, 0x46, 0x94, 0x7d, 0x06,
+ 0xa0, 0x6d, 0x44, 0xec, 0xa2, 0x0a, 0x9e, 0x05,
+ 0x15, 0xef, 0xca, 0x5c, 0xbf, 0x00, 0xeb, 0xf7,
+ 0x3d, 0x32, 0xd4, 0xa5, 0xef, 0x49, 0x89, 0x5e,
+ 0x46, 0xb0, 0xa6, 0x63, 0x5b, 0x8a, 0x73, 0xae,
+ 0x6f, 0xd5, 0x9d, 0xf8, 0x4f, 0x40, 0xb5, 0xb2,
+ 0x6e, 0xd3, 0xb6, 0x01, 0xa9, 0x26, 0xa2, 0x21,
+ 0xcf, 0x33, 0x7a, 0x3a, 0xa4, 0x23, 0x13, 0xb0,
+ 0x69, 0x6a, 0xee, 0xce, 0xd8, 0x9d, 0x01, 0x1d,
+ 0x50, 0xc1, 0x30, 0x6c, 0xb1, 0xcd, 0xa0, 0xf0,
+ 0xf0, 0xa2, 0x64, 0x6f, 0xbb, 0xbf, 0x5e, 0xe6,
+ 0xab, 0x87, 0xb4, 0x0f, 0x4f, 0x15, 0xaf, 0xb5,
+ 0x25, 0xa1, 0xb2, 0xd0, 0x80, 0x2c, 0xfb, 0xf9,
+ 0xfe, 0xd2, 0x33, 0xbb, 0x76, 0xfe, 0x7c, 0xa8,
+ 0x66, 0xf7, 0xe7, 0x85, 0x9f, 0x1f, 0x85, 0x57,
+ 0x88, 0xe1, 0xe9, 0x63, 0xe4, 0xd8, 0x1c, 0xa1,
+ 0xfb, 0xda, 0x44, 0x05, 0x2e, 0x1d, 0x3a, 0x1c,
+ 0xff, 0xc8, 0x3b, 0xc0, 0xfe, 0xda, 0x22, 0x0b,
+ 0x43, 0xd6, 0x88, 0x39, 0x4c, 0x4a, 0xa6, 0x69,
+ 0x18, 0x93, 0x42, 0x4e, 0xb5, 0xcc, 0x66, 0x0d,
+ 0x09, 0xf8, 0x1e, 0x7c, 0xd3, 0x3c, 0x99, 0x0d,
+ 0x50, 0x1d, 0x62, 0xe9, 0x57, 0x06, 0xbf, 0x19,
+ 0x88, 0xdd, 0xad, 0x7b, 0x4f, 0xf9, 0xc7, 0x82,
+ 0x6d, 0x8d, 0xc8, 0xc4, 0xc5, 0x78, 0x17, 0x20,
+ 0x15, 0xc5, 0x52, 0x41, 0xcf, 0x5b, 0xd6, 0x7f,
+ 0x94, 0x02, 0x41, 0xe0, 0x40, 0x22, 0x03, 0x5e,
+ 0xd1, 0x53, 0xd4, 0x86, 0xd3, 0x2c, 0x9f, 0x0f,
+ 0x96, 0xe3, 0x6b, 0x9a, 0x76, 0x32, 0x06, 0x47,
+ 0x4b, 0x11, 0xb3, 0xdd, 0x03, 0x65, 0xbd, 0x9b,
+ 0x01, 0xda, 0x9c, 0xb9, 0x7e, 0x3f, 0x6a, 0xc4,
+ 0x7b, 0xea, 0xd4, 0x3c, 0xb9, 0xfb, 0x5c, 0x6b,
+ 0x64, 0x33, 0x52, 0xba, 0x64, 0x78, 0x8f, 0xa4,
+ 0xaf, 0x7a, 0x61, 0x8d, 0xbc, 0xc5, 0x73, 0xe9,
+ 0x6b, 0x58, 0x97, 0x4b, 0xbf, 0x63, 0x22, 0xd3,
+ 0x37, 0x02, 0x54, 0xc5, 0xb9, 0x16, 0x4a, 0xf0,
+ 0x19, 0xd8, 0x94, 0x57, 0xb8, 0x8a, 0xb3, 0x16,
+ 0x3b, 0xd0, 0x84, 0x8e, 0x67, 0xa6, 0xa3, 0x7d,
+ 0x78, 0xec, 0x00
+};
+static const u8 enc_output012[] __initconst = {
+ 0x52, 0x34, 0xb3, 0x65, 0x3b, 0xb7, 0xe5, 0xd3,
+ 0xab, 0x49, 0x17, 0x60, 0xd2, 0x52, 0x56, 0xdf,
+ 0xdf, 0x34, 0x56, 0x82, 0xe2, 0xbe, 0xe5, 0xe1,
+ 0x28, 0xd1, 0x4e, 0x5f, 0x4f, 0x01, 0x7d, 0x3f,
+ 0x99, 0x6b, 0x30, 0x6e, 0x1a, 0x7c, 0x4c, 0x8e,
+ 0x62, 0x81, 0xae, 0x86, 0x3f, 0x6b, 0xd0, 0xb5,
+ 0xa9, 0xcf, 0x50, 0xf1, 0x02, 0x12, 0xa0, 0x0b,
+ 0x24, 0xe9, 0xe6, 0x72, 0x89, 0x2c, 0x52, 0x1b,
+ 0x34, 0x38, 0xf8, 0x75, 0x5f, 0xa0, 0x74, 0xe2,
+ 0x99, 0xdd, 0xa6, 0x4b, 0x14, 0x50, 0x4e, 0xf1,
+ 0xbe, 0xd6, 0x9e, 0xdb, 0xb2, 0x24, 0x27, 0x74,
+ 0x12, 0x4a, 0x78, 0x78, 0x17, 0xa5, 0x58, 0x8e,
+ 0x2f, 0xf9, 0xf4, 0x8d, 0xee, 0x03, 0x88, 0xae,
+ 0xb8, 0x29, 0xa1, 0x2f, 0x4b, 0xee, 0x92, 0xbd,
+ 0x87, 0xb3, 0xce, 0x34, 0x21, 0x57, 0x46, 0x04,
+ 0x49, 0x0c, 0x80, 0xf2, 0x01, 0x13, 0xa1, 0x55,
+ 0xb3, 0xff, 0x44, 0x30, 0x3c, 0x1c, 0xd0, 0xef,
+ 0xbc, 0x18, 0x74, 0x26, 0xad, 0x41, 0x5b, 0x5b,
+ 0x3e, 0x9a, 0x7a, 0x46, 0x4f, 0x16, 0xd6, 0x74,
+ 0x5a, 0xb7, 0x3a, 0x28, 0x31, 0xd8, 0xae, 0x26,
+ 0xac, 0x50, 0x53, 0x86, 0xf2, 0x56, 0xd7, 0x3f,
+ 0x29, 0xbc, 0x45, 0x68, 0x8e, 0xcb, 0x98, 0x64,
+ 0xdd, 0xc9, 0xba, 0xb8, 0x4b, 0x7b, 0x82, 0xdd,
+ 0x14, 0xa7, 0xcb, 0x71, 0x72, 0x00, 0x5c, 0xad,
+ 0x7b, 0x6a, 0x89, 0xa4, 0x3d, 0xbf, 0xb5, 0x4b,
+ 0x3e, 0x7c, 0x5a, 0xcf, 0xb8, 0xa1, 0xc5, 0x6e,
+ 0xc8, 0xb6, 0x31, 0x57, 0x7b, 0xdf, 0xa5, 0x7e,
+ 0xb1, 0xd6, 0x42, 0x2a, 0x31, 0x36, 0xd1, 0xd0,
+ 0x3f, 0x7a, 0xe5, 0x94, 0xd6, 0x36, 0xa0, 0x6f,
+ 0xb7, 0x40, 0x7d, 0x37, 0xc6, 0x55, 0x7c, 0x50,
+ 0x40, 0x6d, 0x29, 0x89, 0xe3, 0x5a, 0xae, 0x97,
+ 0xe7, 0x44, 0x49, 0x6e, 0xbd, 0x81, 0x3d, 0x03,
+ 0x93, 0x06, 0x12, 0x06, 0xe2, 0x41, 0x12, 0x4a,
+ 0xf1, 0x6a, 0xa4, 0x58, 0xa2, 0xfb, 0xd2, 0x15,
+ 0xba, 0xc9, 0x79, 0xc9, 0xce, 0x5e, 0x13, 0xbb,
+ 0xf1, 0x09, 0x04, 0xcc, 0xfd, 0xe8, 0x51, 0x34,
+ 0x6a, 0xe8, 0x61, 0x88, 0xda, 0xed, 0x01, 0x47,
+ 0x84, 0xf5, 0x73, 0x25, 0xf9, 0x1c, 0x42, 0x86,
+ 0x07, 0xf3, 0x5b, 0x1a, 0x01, 0xb3, 0xeb, 0x24,
+ 0x32, 0x8d, 0xf6, 0xed, 0x7c, 0x4b, 0xeb, 0x3c,
+ 0x36, 0x42, 0x28, 0xdf, 0xdf, 0xb6, 0xbe, 0xd9,
+ 0x8c, 0x52, 0xd3, 0x2b, 0x08, 0x90, 0x8c, 0xe7,
+ 0x98, 0x31, 0xe2, 0x32, 0x8e, 0xfc, 0x11, 0x48,
+ 0x00, 0xa8, 0x6a, 0x42, 0x4a, 0x02, 0xc6, 0x4b,
+ 0x09, 0xf1, 0xe3, 0x49, 0xf3, 0x45, 0x1f, 0x0e,
+ 0xbc, 0x56, 0xe2, 0xe4, 0xdf, 0xfb, 0xeb, 0x61,
+ 0xfa, 0x24, 0xc1, 0x63, 0x75, 0xbb, 0x47, 0x75,
+ 0xaf, 0xe1, 0x53, 0x16, 0x96, 0x21, 0x85, 0x26,
+ 0x11, 0xb3, 0x76, 0xe3, 0x23, 0xa1, 0x6b, 0x74,
+ 0x37, 0xd0, 0xde, 0x06, 0x90, 0x71, 0x5d, 0x43,
+ 0x88, 0x9b, 0x00, 0x54, 0xa6, 0x75, 0x2f, 0xa1,
+ 0xc2, 0x0b, 0x73, 0x20, 0x1d, 0xb6, 0x21, 0x79,
+ 0x57, 0x3f, 0xfa, 0x09, 0xbe, 0x8a, 0x33, 0xc3,
+ 0x52, 0xf0, 0x1d, 0x82, 0x31, 0xd1, 0x55, 0xb5,
+ 0x6c, 0x99, 0x25, 0xcf, 0x5c, 0x32, 0xce, 0xe9,
+ 0x0d, 0xfa, 0x69, 0x2c, 0xd5, 0x0d, 0xc5, 0x6d,
+ 0x86, 0xd0, 0x0c, 0x3b, 0x06, 0x50, 0x79, 0xe8,
+ 0xc3, 0xae, 0x04, 0xe6, 0xcd, 0x51, 0xe4, 0x26,
+ 0x9b, 0x4f, 0x7e, 0xa6, 0x0f, 0xab, 0xd8, 0xe5,
+ 0xde, 0xa9, 0x00, 0x95, 0xbe, 0xa3, 0x9d, 0x5d,
+ 0xb2, 0x09, 0x70, 0x18, 0x1c, 0xf0, 0xac, 0x29,
+ 0x23, 0x02, 0x29, 0x28, 0xd2, 0x74, 0x35, 0x57,
+ 0x62, 0x0f, 0x24, 0xea, 0x5e, 0x33, 0xc2, 0x92,
+ 0xf3, 0x78, 0x4d, 0x30, 0x1e, 0xa1, 0x99, 0xa9,
+ 0x82, 0xb0, 0x42, 0x31, 0x8d, 0xad, 0x8a, 0xbc,
+ 0xfc, 0xd4, 0x57, 0x47, 0x3e, 0xb4, 0x50, 0xdd,
+ 0x6e, 0x2c, 0x80, 0x4d, 0x22, 0xf1, 0xfb, 0x57,
+ 0xc4, 0xdd, 0x17, 0xe1, 0x8a, 0x36, 0x4a, 0xb3,
+ 0x37, 0xca, 0xc9, 0x4e, 0xab, 0xd5, 0x69, 0xc4,
+ 0xf4, 0xbc, 0x0b, 0x3b, 0x44, 0x4b, 0x29, 0x9c,
+ 0xee, 0xd4, 0x35, 0x22, 0x21, 0xb0, 0x1f, 0x27,
+ 0x64, 0xa8, 0x51, 0x1b, 0xf0, 0x9f, 0x19, 0x5c,
+ 0xfb, 0x5a, 0x64, 0x74, 0x70, 0x45, 0x09, 0xf5,
+ 0x64, 0xfe, 0x1a, 0x2d, 0xc9, 0x14, 0x04, 0x14,
+ 0xcf, 0xd5, 0x7d, 0x60, 0xaf, 0x94, 0x39, 0x94,
+ 0xe2, 0x7d, 0x79, 0x82, 0xd0, 0x65, 0x3b, 0x6b,
+ 0x9c, 0x19, 0x84, 0xb4, 0x6d, 0xb3, 0x0c, 0x99,
+ 0xc0, 0x56, 0xa8, 0xbd, 0x73, 0xce, 0x05, 0x84,
+ 0x3e, 0x30, 0xaa, 0xc4, 0x9b, 0x1b, 0x04, 0x2a,
+ 0x9f, 0xd7, 0x43, 0x2b, 0x23, 0xdf, 0xbf, 0xaa,
+ 0xd5, 0xc2, 0x43, 0x2d, 0x70, 0xab, 0xdc, 0x75,
+ 0xad, 0xac, 0xf7, 0xc0, 0xbe, 0x67, 0xb2, 0x74,
+ 0xed, 0x67, 0x10, 0x4a, 0x92, 0x60, 0xc1, 0x40,
+ 0x50, 0x19, 0x8a, 0x8a, 0x8c, 0x09, 0x0e, 0x72,
+ 0xe1, 0x73, 0x5e, 0xe8, 0x41, 0x85, 0x63, 0x9f,
+ 0x3f, 0xd7, 0x7d, 0xc4, 0xfb, 0x22, 0x5d, 0x92,
+ 0x6c, 0xb3, 0x1e, 0xe2, 0x50, 0x2f, 0x82, 0xa8,
+ 0x28, 0xc0, 0xb5, 0xd7, 0x5f, 0x68, 0x0d, 0x2c,
+ 0x2d, 0xaf, 0x7e, 0xfa, 0x2e, 0x08, 0x0f, 0x1f,
+ 0x70, 0x9f, 0xe9, 0x19, 0x72, 0x55, 0xf8, 0xfb,
+ 0x51, 0xd2, 0x33, 0x5d, 0xa0, 0xd3, 0x2b, 0x0a,
+ 0x6c, 0xbc, 0x4e, 0xcf, 0x36, 0x4d, 0xdc, 0x3b,
+ 0xe9, 0x3e, 0x81, 0x7c, 0x61, 0xdb, 0x20, 0x2d,
+ 0x3a, 0xc3, 0xb3, 0x0c, 0x1e, 0x00, 0xb9, 0x7c,
+ 0xf5, 0xca, 0x10, 0x5f, 0x3a, 0x71, 0xb3, 0xe4,
+ 0x20, 0xdb, 0x0c, 0x2a, 0x98, 0x63, 0x45, 0x00,
+ 0x58, 0xf6, 0x68, 0xe4, 0x0b, 0xda, 0x13, 0x3b,
+ 0x60, 0x5c, 0x76, 0xdb, 0xb9, 0x97, 0x71, 0xe4,
+ 0xd9, 0xb7, 0xdb, 0xbd, 0x68, 0xc7, 0x84, 0x84,
+ 0xaa, 0x7c, 0x68, 0x62, 0x5e, 0x16, 0xfc, 0xba,
+ 0x72, 0xaa, 0x9a, 0xa9, 0xeb, 0x7c, 0x75, 0x47,
+ 0x97, 0x7e, 0xad, 0xe2, 0xd9, 0x91, 0xe8, 0xe4,
+ 0xa5, 0x31, 0xd7, 0x01, 0x8e, 0xa2, 0x11, 0x88,
+ 0x95, 0xb9, 0xf2, 0x9b, 0xd3, 0x7f, 0x1b, 0x81,
+ 0x22, 0xf7, 0x98, 0x60, 0x0a, 0x64, 0xa6, 0xc1,
+ 0xf6, 0x49, 0xc7, 0xe3, 0x07, 0x4d, 0x94, 0x7a,
+ 0xcf, 0x6e, 0x68, 0x0c, 0x1b, 0x3f, 0x6e, 0x2e,
+ 0xee, 0x92, 0xfa, 0x52, 0xb3, 0x59, 0xf8, 0xf1,
+ 0x8f, 0x6a, 0x66, 0xa3, 0x82, 0x76, 0x4a, 0x07,
+ 0x1a, 0xc7, 0xdd, 0xf5, 0xda, 0x9c, 0x3c, 0x24,
+ 0xbf, 0xfd, 0x42, 0xa1, 0x10, 0x64, 0x6a, 0x0f,
+ 0x89, 0xee, 0x36, 0xa5, 0xce, 0x99, 0x48, 0x6a,
+ 0xf0, 0x9f, 0x9e, 0x69, 0xa4, 0x40, 0x20, 0xe9,
+ 0x16, 0x15, 0xf7, 0xdb, 0x75, 0x02, 0xcb, 0xe9,
+ 0x73, 0x8b, 0x3b, 0x49, 0x2f, 0xf0, 0xaf, 0x51,
+ 0x06, 0x5c, 0xdf, 0x27, 0x27, 0x49, 0x6a, 0xd1,
+ 0xcc, 0xc7, 0xb5, 0x63, 0xb5, 0xfc, 0xb8, 0x5c,
+ 0x87, 0x7f, 0x84, 0xb4, 0xcc, 0x14, 0xa9, 0x53,
+ 0xda, 0xa4, 0x56, 0xf8, 0xb6, 0x1b, 0xcc, 0x40,
+ 0x27, 0x52, 0x06, 0x5a, 0x13, 0x81, 0xd7, 0x3a,
+ 0xd4, 0x3b, 0xfb, 0x49, 0x65, 0x31, 0x33, 0xb2,
+ 0xfa, 0xcd, 0xad, 0x58, 0x4e, 0x2b, 0xae, 0xd2,
+ 0x20, 0xfb, 0x1a, 0x48, 0xb4, 0x3f, 0x9a, 0xd8,
+ 0x7a, 0x35, 0x4a, 0xc8, 0xee, 0x88, 0x5e, 0x07,
+ 0x66, 0x54, 0xb9, 0xec, 0x9f, 0xa3, 0xe3, 0xb9,
+ 0x37, 0xaa, 0x49, 0x76, 0x31, 0xda, 0x74, 0x2d,
+ 0x3c, 0xa4, 0x65, 0x10, 0x32, 0x38, 0xf0, 0xde,
+ 0xd3, 0x99, 0x17, 0xaa, 0x71, 0xaa, 0x8f, 0x0f,
+ 0x8c, 0xaf, 0xa2, 0xf8, 0x5d, 0x64, 0xba, 0x1d,
+ 0xa3, 0xef, 0x96, 0x73, 0xe8, 0xa1, 0x02, 0x8d,
+ 0x0c, 0x6d, 0xb8, 0x06, 0x90, 0xb8, 0x08, 0x56,
+ 0x2c, 0xa7, 0x06, 0xc9, 0xc2, 0x38, 0xdb, 0x7c,
+ 0x63, 0xb1, 0x57, 0x8e, 0xea, 0x7c, 0x79, 0xf3,
+ 0x49, 0x1d, 0xfe, 0x9f, 0xf3, 0x6e, 0xb1, 0x1d,
+ 0xba, 0x19, 0x80, 0x1a, 0x0a, 0xd3, 0xb0, 0x26,
+ 0x21, 0x40, 0xb1, 0x7c, 0xf9, 0x4d, 0x8d, 0x10,
+ 0xc1, 0x7e, 0xf4, 0xf6, 0x3c, 0xa8, 0xfd, 0x7c,
+ 0xa3, 0x92, 0xb2, 0x0f, 0xaa, 0xcc, 0xa6, 0x11,
+ 0xfe, 0x04, 0xe3, 0xd1, 0x7a, 0x32, 0x89, 0xdf,
+ 0x0d, 0xc4, 0x8f, 0x79, 0x6b, 0xca, 0x16, 0x7c,
+ 0x6e, 0xf9, 0xad, 0x0f, 0xf6, 0xfe, 0x27, 0xdb,
+ 0xc4, 0x13, 0x70, 0xf1, 0x62, 0x1a, 0x4f, 0x79,
+ 0x40, 0xc9, 0x9b, 0x8b, 0x21, 0xea, 0x84, 0xfa,
+ 0xf5, 0xf1, 0x89, 0xce, 0xb7, 0x55, 0x0a, 0x80,
+ 0x39, 0x2f, 0x55, 0x36, 0x16, 0x9c, 0x7b, 0x08,
+ 0xbd, 0x87, 0x0d, 0xa5, 0x32, 0xf1, 0x52, 0x7c,
+ 0xe8, 0x55, 0x60, 0x5b, 0xd7, 0x69, 0xe4, 0xfc,
+ 0xfa, 0x12, 0x85, 0x96, 0xea, 0x50, 0x28, 0xab,
+ 0x8a, 0xf7, 0xbb, 0x0e, 0x53, 0x74, 0xca, 0xa6,
+ 0x27, 0x09, 0xc2, 0xb5, 0xde, 0x18, 0x14, 0xd9,
+ 0xea, 0xe5, 0x29, 0x1c, 0x40, 0x56, 0xcf, 0xd7,
+ 0xae, 0x05, 0x3f, 0x65, 0xaf, 0x05, 0x73, 0xe2,
+ 0x35, 0x96, 0x27, 0x07, 0x14, 0xc0, 0xad, 0x33,
+ 0xf1, 0xdc, 0x44, 0x7a, 0x89, 0x17, 0x77, 0xd2,
+ 0x9c, 0x58, 0x60, 0xf0, 0x3f, 0x7b, 0x2d, 0x2e,
+ 0x57, 0x95, 0x54, 0x87, 0xed, 0xf2, 0xc7, 0x4c,
+ 0xf0, 0xae, 0x56, 0x29, 0x19, 0x7d, 0x66, 0x4b,
+ 0x9b, 0x83, 0x84, 0x42, 0x3b, 0x01, 0x25, 0x66,
+ 0x8e, 0x02, 0xde, 0xb9, 0x83, 0x54, 0x19, 0xf6,
+ 0x9f, 0x79, 0x0d, 0x67, 0xc5, 0x1d, 0x7a, 0x44,
+ 0x02, 0x98, 0xa7, 0x16, 0x1c, 0x29, 0x0d, 0x74,
+ 0xff, 0x85, 0x40, 0x06, 0xef, 0x2c, 0xa9, 0xc6,
+ 0xf5, 0x53, 0x07, 0x06, 0xae, 0xe4, 0xfa, 0x5f,
+ 0xd8, 0x39, 0x4d, 0xf1, 0x9b, 0x6b, 0xd9, 0x24,
+ 0x84, 0xfe, 0x03, 0x4c, 0xb2, 0x3f, 0xdf, 0xa1,
+ 0x05, 0x9e, 0x50, 0x14, 0x5a, 0xd9, 0x1a, 0xa2,
+ 0xa7, 0xfa, 0xfa, 0x17, 0xf7, 0x78, 0xd6, 0xb5,
+ 0x92, 0x61, 0x91, 0xac, 0x36, 0xfa, 0x56, 0x0d,
+ 0x38, 0x32, 0x18, 0x85, 0x08, 0x58, 0x37, 0xf0,
+ 0x4b, 0xdb, 0x59, 0xe7, 0xa4, 0x34, 0xc0, 0x1b,
+ 0x01, 0xaf, 0x2d, 0xde, 0xa1, 0xaa, 0x5d, 0xd3,
+ 0xec, 0xe1, 0xd4, 0xf7, 0xe6, 0x54, 0x68, 0xf0,
+ 0x51, 0x97, 0xa7, 0x89, 0xea, 0x24, 0xad, 0xd3,
+ 0x6e, 0x47, 0x93, 0x8b, 0x4b, 0xb4, 0xf7, 0x1c,
+ 0x42, 0x06, 0x67, 0xe8, 0x99, 0xf6, 0xf5, 0x7b,
+ 0x85, 0xb5, 0x65, 0xb5, 0xb5, 0xd2, 0x37, 0xf5,
+ 0xf3, 0x02, 0xa6, 0x4d, 0x11, 0xa7, 0xdc, 0x51,
+ 0x09, 0x7f, 0xa0, 0xd8, 0x88, 0x1c, 0x13, 0x71,
+ 0xae, 0x9c, 0xb7, 0x7b, 0x34, 0xd6, 0x4e, 0x68,
+ 0x26, 0x83, 0x51, 0xaf, 0x1d, 0xee, 0x8b, 0xbb,
+ 0x69, 0x43, 0x2b, 0x9e, 0x8a, 0xbc, 0x02, 0x0e,
+ 0xa0, 0x1b, 0xe0, 0xa8, 0x5f, 0x6f, 0xaf, 0x1b,
+ 0x8f, 0xe7, 0x64, 0x71, 0x74, 0x11, 0x7e, 0xa8,
+ 0xd8, 0xf9, 0x97, 0x06, 0xc3, 0xb6, 0xfb, 0xfb,
+ 0xb7, 0x3d, 0x35, 0x9d, 0x3b, 0x52, 0xed, 0x54,
+ 0xca, 0xf4, 0x81, 0x01, 0x2d, 0x1b, 0xc3, 0xa7,
+ 0x00, 0x3d, 0x1a, 0x39, 0x54, 0xe1, 0xf6, 0xff,
+ 0xed, 0x6f, 0x0b, 0x5a, 0x68, 0xda, 0x58, 0xdd,
+ 0xa9, 0xcf, 0x5c, 0x4a, 0xe5, 0x09, 0x4e, 0xde,
+ 0x9d, 0xbc, 0x3e, 0xee, 0x5a, 0x00, 0x3b, 0x2c,
+ 0x87, 0x10, 0x65, 0x60, 0xdd, 0xd7, 0x56, 0xd1,
+ 0x4c, 0x64, 0x45, 0xe4, 0x21, 0xec, 0x78, 0xf8,
+ 0x25, 0x7a, 0x3e, 0x16, 0x5d, 0x09, 0x53, 0x14,
+ 0xbe, 0x4f, 0xae, 0x87, 0xd8, 0xd1, 0xaa, 0x3c,
+ 0xf6, 0x3e, 0xa4, 0x70, 0x8c, 0x5e, 0x70, 0xa4,
+ 0xb3, 0x6b, 0x66, 0x73, 0xd3, 0xbf, 0x31, 0x06,
+ 0x19, 0x62, 0x93, 0x15, 0xf2, 0x86, 0xe4, 0x52,
+ 0x7e, 0x53, 0x4c, 0x12, 0x38, 0xcc, 0x34, 0x7d,
+ 0x57, 0xf6, 0x42, 0x93, 0x8a, 0xc4, 0xee, 0x5c,
+ 0x8a, 0xe1, 0x52, 0x8f, 0x56, 0x64, 0xf6, 0xa6,
+ 0xd1, 0x91, 0x57, 0x70, 0xcd, 0x11, 0x76, 0xf5,
+ 0x59, 0x60, 0x60, 0x3c, 0xc1, 0xc3, 0x0b, 0x7f,
+ 0x58, 0x1a, 0x50, 0x91, 0xf1, 0x68, 0x8f, 0x6e,
+ 0x74, 0x74, 0xa8, 0x51, 0x0b, 0xf7, 0x7a, 0x98,
+ 0x37, 0xf2, 0x0a, 0x0e, 0xa4, 0x97, 0x04, 0xb8,
+ 0x9b, 0xfd, 0xa0, 0xea, 0xf7, 0x0d, 0xe1, 0xdb,
+ 0x03, 0xf0, 0x31, 0x29, 0xf8, 0xdd, 0x6b, 0x8b,
+ 0x5d, 0xd8, 0x59, 0xa9, 0x29, 0xcf, 0x9a, 0x79,
+ 0x89, 0x19, 0x63, 0x46, 0x09, 0x79, 0x6a, 0x11,
+ 0xda, 0x63, 0x68, 0x48, 0x77, 0x23, 0xfb, 0x7d,
+ 0x3a, 0x43, 0xcb, 0x02, 0x3b, 0x7a, 0x6d, 0x10,
+ 0x2a, 0x9e, 0xac, 0xf1, 0xd4, 0x19, 0xf8, 0x23,
+ 0x64, 0x1d, 0x2c, 0x5f, 0xf2, 0xb0, 0x5c, 0x23,
+ 0x27, 0xf7, 0x27, 0x30, 0x16, 0x37, 0xb1, 0x90,
+ 0xab, 0x38, 0xfb, 0x55, 0xcd, 0x78, 0x58, 0xd4,
+ 0x7d, 0x43, 0xf6, 0x45, 0x5e, 0x55, 0x8d, 0xb1,
+ 0x02, 0x65, 0x58, 0xb4, 0x13, 0x4b, 0x36, 0xf7,
+ 0xcc, 0xfe, 0x3d, 0x0b, 0x82, 0xe2, 0x12, 0x11,
+ 0xbb, 0xe6, 0xb8, 0x3a, 0x48, 0x71, 0xc7, 0x50,
+ 0x06, 0x16, 0x3a, 0xe6, 0x7c, 0x05, 0xc7, 0xc8,
+ 0x4d, 0x2f, 0x08, 0x6a, 0x17, 0x9a, 0x95, 0x97,
+ 0x50, 0x68, 0xdc, 0x28, 0x18, 0xc4, 0x61, 0x38,
+ 0xb9, 0xe0, 0x3e, 0x78, 0xdb, 0x29, 0xe0, 0x9f,
+ 0x52, 0xdd, 0xf8, 0x4f, 0x91, 0xc1, 0xd0, 0x33,
+ 0xa1, 0x7a, 0x8e, 0x30, 0x13, 0x82, 0x07, 0x9f,
+ 0xd3, 0x31, 0x0f, 0x23, 0xbe, 0x32, 0x5a, 0x75,
+ 0xcf, 0x96, 0xb2, 0xec, 0xb5, 0x32, 0xac, 0x21,
+ 0xd1, 0x82, 0x33, 0xd3, 0x15, 0x74, 0xbd, 0x90,
+ 0xf1, 0x2c, 0xe6, 0x5f, 0x8d, 0xe3, 0x02, 0xe8,
+ 0xe9, 0xc4, 0xca, 0x96, 0xeb, 0x0e, 0xbc, 0x91,
+ 0xf4, 0xb9, 0xea, 0xd9, 0x1b, 0x75, 0xbd, 0xe1,
+ 0xac, 0x2a, 0x05, 0x37, 0x52, 0x9b, 0x1b, 0x3f,
+ 0x5a, 0xdc, 0x21, 0xc3, 0x98, 0xbb, 0xaf, 0xa3,
+ 0xf2, 0x00, 0xbf, 0x0d, 0x30, 0x89, 0x05, 0xcc,
+ 0xa5, 0x76, 0xf5, 0x06, 0xf0, 0xc6, 0x54, 0x8a,
+ 0x5d, 0xd4, 0x1e, 0xc1, 0xf2, 0xce, 0xb0, 0x62,
+ 0xc8, 0xfc, 0x59, 0x42, 0x9a, 0x90, 0x60, 0x55,
+ 0xfe, 0x88, 0xa5, 0x8b, 0xb8, 0x33, 0x0c, 0x23,
+ 0x24, 0x0d, 0x15, 0x70, 0x37, 0x1e, 0x3d, 0xf6,
+ 0xd2, 0xea, 0x92, 0x10, 0xb2, 0xc4, 0x51, 0xac,
+ 0xf2, 0xac, 0xf3, 0x6b, 0x6c, 0xaa, 0xcf, 0x12,
+ 0xc5, 0x6c, 0x90, 0x50, 0xb5, 0x0c, 0xfc, 0x1a,
+ 0x15, 0x52, 0xe9, 0x26, 0xc6, 0x52, 0xa4, 0xe7,
+ 0x81, 0x69, 0xe1, 0xe7, 0x9e, 0x30, 0x01, 0xec,
+ 0x84, 0x89, 0xb2, 0x0d, 0x66, 0xdd, 0xce, 0x28,
+ 0x5c, 0xec, 0x98, 0x46, 0x68, 0x21, 0x9f, 0x88,
+ 0x3f, 0x1f, 0x42, 0x77, 0xce, 0xd0, 0x61, 0xd4,
+ 0x20, 0xa7, 0xff, 0x53, 0xad, 0x37, 0xd0, 0x17,
+ 0x35, 0xc9, 0xfc, 0xba, 0x0a, 0x78, 0x3f, 0xf2,
+ 0xcc, 0x86, 0x89, 0xe8, 0x4b, 0x3c, 0x48, 0x33,
+ 0x09, 0x7f, 0xc6, 0xc0, 0xdd, 0xb8, 0xfd, 0x7a,
+ 0x66, 0x66, 0x65, 0xeb, 0x47, 0xa7, 0x04, 0x28,
+ 0xa3, 0x19, 0x8e, 0xa9, 0xb1, 0x13, 0x67, 0x62,
+ 0x70, 0xcf, 0xd6
+};
+static const u8 enc_assoc012[] __initconst = {
+ 0xb1, 0x69, 0x83, 0x87, 0x30, 0xaa, 0x5d, 0xb8,
+ 0x77, 0xe8, 0x21, 0xff, 0x06, 0x59, 0x35, 0xce,
+ 0x75, 0xfe, 0x38, 0xef, 0xb8, 0x91, 0x43, 0x8c,
+ 0xcf, 0x70, 0xdd, 0x0a, 0x68, 0xbf, 0xd4, 0xbc,
+ 0x16, 0x76, 0x99, 0x36, 0x1e, 0x58, 0x79, 0x5e,
+ 0xd4, 0x29, 0xf7, 0x33, 0x93, 0x48, 0xdb, 0x5f,
+ 0x01, 0xae, 0x9c, 0xb6, 0xe4, 0x88, 0x6d, 0x2b,
+ 0x76, 0x75, 0xe0, 0xf3, 0x74, 0xe2, 0xc9
+};
+static const u8 enc_nonce012[] __initconst = {
+ 0x05, 0xa3, 0x93, 0xed, 0x30, 0xc5, 0xa2, 0x06
+};
+static const u8 enc_key012[] __initconst = {
+ 0xb3, 0x35, 0x50, 0x03, 0x54, 0x2e, 0x40, 0x5e,
+ 0x8f, 0x59, 0x8e, 0xc5, 0x90, 0xd5, 0x27, 0x2d,
+ 0xba, 0x29, 0x2e, 0xcb, 0x1b, 0x70, 0x44, 0x1e,
+ 0x65, 0x91, 0x6e, 0x2a, 0x79, 0x22, 0xda, 0x64
+};
+
+/* wycheproof - rfc7539 */
+static const u8 enc_input013[] __initconst = {
+ 0x4c, 0x61, 0x64, 0x69, 0x65, 0x73, 0x20, 0x61,
+ 0x6e, 0x64, 0x20, 0x47, 0x65, 0x6e, 0x74, 0x6c,
+ 0x65, 0x6d, 0x65, 0x6e, 0x20, 0x6f, 0x66, 0x20,
+ 0x74, 0x68, 0x65, 0x20, 0x63, 0x6c, 0x61, 0x73,
+ 0x73, 0x20, 0x6f, 0x66, 0x20, 0x27, 0x39, 0x39,
+ 0x3a, 0x20, 0x49, 0x66, 0x20, 0x49, 0x20, 0x63,
+ 0x6f, 0x75, 0x6c, 0x64, 0x20, 0x6f, 0x66, 0x66,
+ 0x65, 0x72, 0x20, 0x79, 0x6f, 0x75, 0x20, 0x6f,
+ 0x6e, 0x6c, 0x79, 0x20, 0x6f, 0x6e, 0x65, 0x20,
+ 0x74, 0x69, 0x70, 0x20, 0x66, 0x6f, 0x72, 0x20,
+ 0x74, 0x68, 0x65, 0x20, 0x66, 0x75, 0x74, 0x75,
+ 0x72, 0x65, 0x2c, 0x20, 0x73, 0x75, 0x6e, 0x73,
+ 0x63, 0x72, 0x65, 0x65, 0x6e, 0x20, 0x77, 0x6f,
+ 0x75, 0x6c, 0x64, 0x20, 0x62, 0x65, 0x20, 0x69,
+ 0x74, 0x2e
+};
+static const u8 enc_output013[] __initconst = {
+ 0xd3, 0x1a, 0x8d, 0x34, 0x64, 0x8e, 0x60, 0xdb,
+ 0x7b, 0x86, 0xaf, 0xbc, 0x53, 0xef, 0x7e, 0xc2,
+ 0xa4, 0xad, 0xed, 0x51, 0x29, 0x6e, 0x08, 0xfe,
+ 0xa9, 0xe2, 0xb5, 0xa7, 0x36, 0xee, 0x62, 0xd6,
+ 0x3d, 0xbe, 0xa4, 0x5e, 0x8c, 0xa9, 0x67, 0x12,
+ 0x82, 0xfa, 0xfb, 0x69, 0xda, 0x92, 0x72, 0x8b,
+ 0x1a, 0x71, 0xde, 0x0a, 0x9e, 0x06, 0x0b, 0x29,
+ 0x05, 0xd6, 0xa5, 0xb6, 0x7e, 0xcd, 0x3b, 0x36,
+ 0x92, 0xdd, 0xbd, 0x7f, 0x2d, 0x77, 0x8b, 0x8c,
+ 0x98, 0x03, 0xae, 0xe3, 0x28, 0x09, 0x1b, 0x58,
+ 0xfa, 0xb3, 0x24, 0xe4, 0xfa, 0xd6, 0x75, 0x94,
+ 0x55, 0x85, 0x80, 0x8b, 0x48, 0x31, 0xd7, 0xbc,
+ 0x3f, 0xf4, 0xde, 0xf0, 0x8e, 0x4b, 0x7a, 0x9d,
+ 0xe5, 0x76, 0xd2, 0x65, 0x86, 0xce, 0xc6, 0x4b,
+ 0x61, 0x16, 0x1a, 0xe1, 0x0b, 0x59, 0x4f, 0x09,
+ 0xe2, 0x6a, 0x7e, 0x90, 0x2e, 0xcb, 0xd0, 0x60,
+ 0x06, 0x91
+};
+static const u8 enc_assoc013[] __initconst = {
+ 0x50, 0x51, 0x52, 0x53, 0xc0, 0xc1, 0xc2, 0xc3,
+ 0xc4, 0xc5, 0xc6, 0xc7
+};
+static const u8 enc_nonce013[] __initconst = {
+ 0x07, 0x00, 0x00, 0x00, 0x40, 0x41, 0x42, 0x43,
+ 0x44, 0x45, 0x46, 0x47
+};
+static const u8 enc_key013[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input014[] __initconst = { };
+static const u8 enc_output014[] __initconst = {
+ 0x76, 0xac, 0xb3, 0x42, 0xcf, 0x31, 0x66, 0xa5,
+ 0xb6, 0x3c, 0x0c, 0x0e, 0xa1, 0x38, 0x3c, 0x8d
+};
+static const u8 enc_assoc014[] __initconst = { };
+static const u8 enc_nonce014[] __initconst = {
+ 0x4d, 0xa5, 0xbf, 0x8d, 0xfd, 0x58, 0x52, 0xc1,
+ 0xea, 0x12, 0x37, 0x9d
+};
+static const u8 enc_key014[] __initconst = {
+ 0x80, 0xba, 0x31, 0x92, 0xc8, 0x03, 0xce, 0x96,
+ 0x5e, 0xa3, 0x71, 0xd5, 0xff, 0x07, 0x3c, 0xf0,
+ 0xf4, 0x3b, 0x6a, 0x2a, 0xb5, 0x76, 0xb2, 0x08,
+ 0x42, 0x6e, 0x11, 0x40, 0x9c, 0x09, 0xb9, 0xb0
+};
+
+/* wycheproof - misc */
+static const u8 enc_input015[] __initconst = { };
+static const u8 enc_output015[] __initconst = {
+ 0x90, 0x6f, 0xa6, 0x28, 0x4b, 0x52, 0xf8, 0x7b,
+ 0x73, 0x59, 0xcb, 0xaa, 0x75, 0x63, 0xc7, 0x09
+};
+static const u8 enc_assoc015[] __initconst = {
+ 0xbd, 0x50, 0x67, 0x64, 0xf2, 0xd2, 0xc4, 0x10
+};
+static const u8 enc_nonce015[] __initconst = {
+ 0xa9, 0x2e, 0xf0, 0xac, 0x99, 0x1d, 0xd5, 0x16,
+ 0xa3, 0xc6, 0xf6, 0x89
+};
+static const u8 enc_key015[] __initconst = {
+ 0x7a, 0x4c, 0xd7, 0x59, 0x17, 0x2e, 0x02, 0xeb,
+ 0x20, 0x4d, 0xb2, 0xc3, 0xf5, 0xc7, 0x46, 0x22,
+ 0x7d, 0xf5, 0x84, 0xfc, 0x13, 0x45, 0x19, 0x63,
+ 0x91, 0xdb, 0xb9, 0x57, 0x7a, 0x25, 0x07, 0x42
+};
+
+/* wycheproof - misc */
+static const u8 enc_input016[] __initconst = {
+ 0x2a
+};
+static const u8 enc_output016[] __initconst = {
+ 0x3a, 0xca, 0xc2, 0x7d, 0xec, 0x09, 0x68, 0x80,
+ 0x1e, 0x9f, 0x6e, 0xde, 0xd6, 0x9d, 0x80, 0x75,
+ 0x22
+};
+static const u8 enc_assoc016[] __initconst = { };
+static const u8 enc_nonce016[] __initconst = {
+ 0x99, 0xe2, 0x3e, 0xc4, 0x89, 0x85, 0xbc, 0xcd,
+ 0xee, 0xab, 0x60, 0xf1
+};
+static const u8 enc_key016[] __initconst = {
+ 0xcc, 0x56, 0xb6, 0x80, 0x55, 0x2e, 0xb7, 0x50,
+ 0x08, 0xf5, 0x48, 0x4b, 0x4c, 0xb8, 0x03, 0xfa,
+ 0x50, 0x63, 0xeb, 0xd6, 0xea, 0xb9, 0x1f, 0x6a,
+ 0xb6, 0xae, 0xf4, 0x91, 0x6a, 0x76, 0x62, 0x73
+};
+
+/* wycheproof - misc */
+static const u8 enc_input017[] __initconst = {
+ 0x51
+};
+static const u8 enc_output017[] __initconst = {
+ 0xc4, 0x16, 0x83, 0x10, 0xca, 0x45, 0xb1, 0xf7,
+ 0xc6, 0x6c, 0xad, 0x4e, 0x99, 0xe4, 0x3f, 0x72,
+ 0xb9
+};
+static const u8 enc_assoc017[] __initconst = {
+ 0x91, 0xca, 0x6c, 0x59, 0x2c, 0xbc, 0xca, 0x53
+};
+static const u8 enc_nonce017[] __initconst = {
+ 0xab, 0x0d, 0xca, 0x71, 0x6e, 0xe0, 0x51, 0xd2,
+ 0x78, 0x2f, 0x44, 0x03
+};
+static const u8 enc_key017[] __initconst = {
+ 0x46, 0xf0, 0x25, 0x49, 0x65, 0xf7, 0x69, 0xd5,
+ 0x2b, 0xdb, 0x4a, 0x70, 0xb4, 0x43, 0x19, 0x9f,
+ 0x8e, 0xf2, 0x07, 0x52, 0x0d, 0x12, 0x20, 0xc5,
+ 0x5e, 0x4b, 0x70, 0xf0, 0xfd, 0xa6, 0x20, 0xee
+};
+
+/* wycheproof - misc */
+static const u8 enc_input018[] __initconst = {
+ 0x5c, 0x60
+};
+static const u8 enc_output018[] __initconst = {
+ 0x4d, 0x13, 0x91, 0xe8, 0xb6, 0x1e, 0xfb, 0x39,
+ 0xc1, 0x22, 0x19, 0x54, 0x53, 0x07, 0x7b, 0x22,
+ 0xe5, 0xe2
+};
+static const u8 enc_assoc018[] __initconst = { };
+static const u8 enc_nonce018[] __initconst = {
+ 0x46, 0x1a, 0xf1, 0x22, 0xe9, 0xf2, 0xe0, 0x34,
+ 0x7e, 0x03, 0xf2, 0xdb
+};
+static const u8 enc_key018[] __initconst = {
+ 0x2f, 0x7f, 0x7e, 0x4f, 0x59, 0x2b, 0xb3, 0x89,
+ 0x19, 0x49, 0x89, 0x74, 0x35, 0x07, 0xbf, 0x3e,
+ 0xe9, 0xcb, 0xde, 0x17, 0x86, 0xb6, 0x69, 0x5f,
+ 0xe6, 0xc0, 0x25, 0xfd, 0x9b, 0xa4, 0xc1, 0x00
+};
+
+/* wycheproof - misc */
+static const u8 enc_input019[] __initconst = {
+ 0xdd, 0xf2
+};
+static const u8 enc_output019[] __initconst = {
+ 0xb6, 0x0d, 0xea, 0xd0, 0xfd, 0x46, 0x97, 0xec,
+ 0x2e, 0x55, 0x58, 0x23, 0x77, 0x19, 0xd0, 0x24,
+ 0x37, 0xa2
+};
+static const u8 enc_assoc019[] __initconst = {
+ 0x88, 0x36, 0x4f, 0xc8, 0x06, 0x05, 0x18, 0xbf
+};
+static const u8 enc_nonce019[] __initconst = {
+ 0x61, 0x54, 0x6b, 0xa5, 0xf1, 0x72, 0x05, 0x90,
+ 0xb6, 0x04, 0x0a, 0xc6
+};
+static const u8 enc_key019[] __initconst = {
+ 0xc8, 0x83, 0x3d, 0xce, 0x5e, 0xa9, 0xf2, 0x48,
+ 0xaa, 0x20, 0x30, 0xea, 0xcf, 0xe7, 0x2b, 0xff,
+ 0xe6, 0x9a, 0x62, 0x0c, 0xaf, 0x79, 0x33, 0x44,
+ 0xe5, 0x71, 0x8f, 0xe0, 0xd7, 0xab, 0x1a, 0x58
+};
+
+/* wycheproof - misc */
+static const u8 enc_input020[] __initconst = {
+ 0xab, 0x85, 0xe9, 0xc1, 0x57, 0x17, 0x31
+};
+static const u8 enc_output020[] __initconst = {
+ 0x5d, 0xfe, 0x34, 0x40, 0xdb, 0xb3, 0xc3, 0xed,
+ 0x7a, 0x43, 0x4e, 0x26, 0x02, 0xd3, 0x94, 0x28,
+ 0x1e, 0x0a, 0xfa, 0x9f, 0xb7, 0xaa, 0x42
+};
+static const u8 enc_assoc020[] __initconst = { };
+static const u8 enc_nonce020[] __initconst = {
+ 0x3c, 0x4e, 0x65, 0x4d, 0x66, 0x3f, 0xa4, 0x59,
+ 0x6d, 0xc5, 0x5b, 0xb7
+};
+static const u8 enc_key020[] __initconst = {
+ 0x55, 0x56, 0x81, 0x58, 0xd3, 0xa6, 0x48, 0x3f,
+ 0x1f, 0x70, 0x21, 0xea, 0xb6, 0x9b, 0x70, 0x3f,
+ 0x61, 0x42, 0x51, 0xca, 0xdc, 0x1a, 0xf5, 0xd3,
+ 0x4a, 0x37, 0x4f, 0xdb, 0xfc, 0x5a, 0xda, 0xc7
+};
+
+/* wycheproof - misc */
+static const u8 enc_input021[] __initconst = {
+ 0x4e, 0xe5, 0xcd, 0xa2, 0x0d, 0x42, 0x90
+};
+static const u8 enc_output021[] __initconst = {
+ 0x4b, 0xd4, 0x72, 0x12, 0x94, 0x1c, 0xe3, 0x18,
+ 0x5f, 0x14, 0x08, 0xee, 0x7f, 0xbf, 0x18, 0xf5,
+ 0xab, 0xad, 0x6e, 0x22, 0x53, 0xa1, 0xba
+};
+static const u8 enc_assoc021[] __initconst = {
+ 0x84, 0xe4, 0x6b, 0xe8, 0xc0, 0x91, 0x90, 0x53
+};
+static const u8 enc_nonce021[] __initconst = {
+ 0x58, 0x38, 0x93, 0x75, 0xc6, 0x9e, 0xe3, 0x98,
+ 0xde, 0x94, 0x83, 0x96
+};
+static const u8 enc_key021[] __initconst = {
+ 0xe3, 0xc0, 0x9e, 0x7f, 0xab, 0x1a, 0xef, 0xb5,
+ 0x16, 0xda, 0x6a, 0x33, 0x02, 0x2a, 0x1d, 0xd4,
+ 0xeb, 0x27, 0x2c, 0x80, 0xd5, 0x40, 0xc5, 0xda,
+ 0x52, 0xa7, 0x30, 0xf3, 0x4d, 0x84, 0x0d, 0x7f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input022[] __initconst = {
+ 0xbe, 0x33, 0x08, 0xf7, 0x2a, 0x2c, 0x6a, 0xed
+};
+static const u8 enc_output022[] __initconst = {
+ 0x8e, 0x94, 0x39, 0xa5, 0x6e, 0xee, 0xc8, 0x17,
+ 0xfb, 0xe8, 0xa6, 0xed, 0x8f, 0xab, 0xb1, 0x93,
+ 0x75, 0x39, 0xdd, 0x6c, 0x00, 0xe9, 0x00, 0x21
+};
+static const u8 enc_assoc022[] __initconst = { };
+static const u8 enc_nonce022[] __initconst = {
+ 0x4f, 0x07, 0xaf, 0xed, 0xfd, 0xc3, 0xb6, 0xc2,
+ 0x36, 0x18, 0x23, 0xd3
+};
+static const u8 enc_key022[] __initconst = {
+ 0x51, 0xe4, 0xbf, 0x2b, 0xad, 0x92, 0xb7, 0xaf,
+ 0xf1, 0xa4, 0xbc, 0x05, 0x55, 0x0b, 0xa8, 0x1d,
+ 0xf4, 0xb9, 0x6f, 0xab, 0xf4, 0x1c, 0x12, 0xc7,
+ 0xb0, 0x0e, 0x60, 0xe4, 0x8d, 0xb7, 0xe1, 0x52
+};
+
+/* wycheproof - misc */
+static const u8 enc_input023[] __initconst = {
+ 0xa4, 0xc9, 0xc2, 0x80, 0x1b, 0x71, 0xf7, 0xdf
+};
+static const u8 enc_output023[] __initconst = {
+ 0xb9, 0xb9, 0x10, 0x43, 0x3a, 0xf0, 0x52, 0xb0,
+ 0x45, 0x30, 0xf5, 0x1a, 0xee, 0xe0, 0x24, 0xe0,
+ 0xa4, 0x45, 0xa6, 0x32, 0x8f, 0xa6, 0x7a, 0x18
+};
+static const u8 enc_assoc023[] __initconst = {
+ 0x66, 0xc0, 0xae, 0x70, 0x07, 0x6c, 0xb1, 0x4d
+};
+static const u8 enc_nonce023[] __initconst = {
+ 0xb4, 0xea, 0x66, 0x6e, 0xe1, 0x19, 0x56, 0x33,
+ 0x66, 0x48, 0x4a, 0x78
+};
+static const u8 enc_key023[] __initconst = {
+ 0x11, 0x31, 0xc1, 0x41, 0x85, 0x77, 0xa0, 0x54,
+ 0xde, 0x7a, 0x4a, 0xc5, 0x51, 0x95, 0x0f, 0x1a,
+ 0x05, 0x3f, 0x9a, 0xe4, 0x6e, 0x5b, 0x75, 0xfe,
+ 0x4a, 0xbd, 0x56, 0x08, 0xd7, 0xcd, 0xda, 0xdd
+};
+
+/* wycheproof - misc */
+static const u8 enc_input024[] __initconst = {
+ 0x42, 0xba, 0xae, 0x59, 0x78, 0xfe, 0xaf, 0x5c,
+ 0x36, 0x8d, 0x14, 0xe0
+};
+static const u8 enc_output024[] __initconst = {
+ 0xff, 0x7d, 0xc2, 0x03, 0xb2, 0x6c, 0x46, 0x7a,
+ 0x6b, 0x50, 0xdb, 0x33, 0x57, 0x8c, 0x0f, 0x27,
+ 0x58, 0xc2, 0xe1, 0x4e, 0x36, 0xd4, 0xfc, 0x10,
+ 0x6d, 0xcb, 0x29, 0xb4
+};
+static const u8 enc_assoc024[] __initconst = { };
+static const u8 enc_nonce024[] __initconst = {
+ 0x9a, 0x59, 0xfc, 0xe2, 0x6d, 0xf0, 0x00, 0x5e,
+ 0x07, 0x53, 0x86, 0x56
+};
+static const u8 enc_key024[] __initconst = {
+ 0x99, 0xb6, 0x2b, 0xd5, 0xaf, 0xbe, 0x3f, 0xb0,
+ 0x15, 0xbd, 0xe9, 0x3f, 0x0a, 0xbf, 0x48, 0x39,
+ 0x57, 0xa1, 0xc3, 0xeb, 0x3c, 0xa5, 0x9c, 0xb5,
+ 0x0b, 0x39, 0xf7, 0xf8, 0xa9, 0xcc, 0x51, 0xbe
+};
+
+/* wycheproof - misc */
+static const u8 enc_input025[] __initconst = {
+ 0xfd, 0xc8, 0x5b, 0x94, 0xa4, 0xb2, 0xa6, 0xb7,
+ 0x59, 0xb1, 0xa0, 0xda
+};
+static const u8 enc_output025[] __initconst = {
+ 0x9f, 0x88, 0x16, 0xde, 0x09, 0x94, 0xe9, 0x38,
+ 0xd9, 0xe5, 0x3f, 0x95, 0xd0, 0x86, 0xfc, 0x6c,
+ 0x9d, 0x8f, 0xa9, 0x15, 0xfd, 0x84, 0x23, 0xa7,
+ 0xcf, 0x05, 0x07, 0x2f
+};
+static const u8 enc_assoc025[] __initconst = {
+ 0xa5, 0x06, 0xe1, 0xa5, 0xc6, 0x90, 0x93, 0xf9
+};
+static const u8 enc_nonce025[] __initconst = {
+ 0x58, 0xdb, 0xd4, 0xad, 0x2c, 0x4a, 0xd3, 0x5d,
+ 0xd9, 0x06, 0xe9, 0xce
+};
+static const u8 enc_key025[] __initconst = {
+ 0x85, 0xf3, 0x5b, 0x62, 0x82, 0xcf, 0xf4, 0x40,
+ 0xbc, 0x10, 0x20, 0xc8, 0x13, 0x6f, 0xf2, 0x70,
+ 0x31, 0x11, 0x0f, 0xa6, 0x3e, 0xc1, 0x6f, 0x1e,
+ 0x82, 0x51, 0x18, 0xb0, 0x06, 0xb9, 0x12, 0x57
+};
+
+/* wycheproof - misc */
+static const u8 enc_input026[] __initconst = {
+ 0x51, 0xf8, 0xc1, 0xf7, 0x31, 0xea, 0x14, 0xac,
+ 0xdb, 0x21, 0x0a, 0x6d, 0x97, 0x3e, 0x07
+};
+static const u8 enc_output026[] __initconst = {
+ 0x0b, 0x29, 0x63, 0x8e, 0x1f, 0xbd, 0xd6, 0xdf,
+ 0x53, 0x97, 0x0b, 0xe2, 0x21, 0x00, 0x42, 0x2a,
+ 0x91, 0x34, 0x08, 0x7d, 0x67, 0xa4, 0x6e, 0x79,
+ 0x17, 0x8d, 0x0a, 0x93, 0xf5, 0xe1, 0xd2
+};
+static const u8 enc_assoc026[] __initconst = { };
+static const u8 enc_nonce026[] __initconst = {
+ 0x68, 0xab, 0x7f, 0xdb, 0xf6, 0x19, 0x01, 0xda,
+ 0xd4, 0x61, 0xd2, 0x3c
+};
+static const u8 enc_key026[] __initconst = {
+ 0x67, 0x11, 0x96, 0x27, 0xbd, 0x98, 0x8e, 0xda,
+ 0x90, 0x62, 0x19, 0xe0, 0x8c, 0x0d, 0x0d, 0x77,
+ 0x9a, 0x07, 0xd2, 0x08, 0xce, 0x8a, 0x4f, 0xe0,
+ 0x70, 0x9a, 0xf7, 0x55, 0xee, 0xec, 0x6d, 0xcb
+};
+
+/* wycheproof - misc */
+static const u8 enc_input027[] __initconst = {
+ 0x97, 0x46, 0x9d, 0xa6, 0x67, 0xd6, 0x11, 0x0f,
+ 0x9c, 0xbd, 0xa1, 0xd1, 0xa2, 0x06, 0x73
+};
+static const u8 enc_output027[] __initconst = {
+ 0x32, 0xdb, 0x66, 0xc4, 0xa3, 0x81, 0x9d, 0x81,
+ 0x55, 0x74, 0x55, 0xe5, 0x98, 0x0f, 0xed, 0xfe,
+ 0xae, 0x30, 0xde, 0xc9, 0x4e, 0x6a, 0xd3, 0xa9,
+ 0xee, 0xa0, 0x6a, 0x0d, 0x70, 0x39, 0x17
+};
+static const u8 enc_assoc027[] __initconst = {
+ 0x64, 0x53, 0xa5, 0x33, 0x84, 0x63, 0x22, 0x12
+};
+static const u8 enc_nonce027[] __initconst = {
+ 0xd9, 0x5b, 0x32, 0x43, 0xaf, 0xae, 0xf7, 0x14,
+ 0xc5, 0x03, 0x5b, 0x6a
+};
+static const u8 enc_key027[] __initconst = {
+ 0xe6, 0xf1, 0x11, 0x8d, 0x41, 0xe4, 0xb4, 0x3f,
+ 0xb5, 0x82, 0x21, 0xb7, 0xed, 0x79, 0x67, 0x38,
+ 0x34, 0xe0, 0xd8, 0xac, 0x5c, 0x4f, 0xa6, 0x0b,
+ 0xbc, 0x8b, 0xc4, 0x89, 0x3a, 0x58, 0x89, 0x4d
+};
+
+/* wycheproof - misc */
+static const u8 enc_input028[] __initconst = {
+ 0x54, 0x9b, 0x36, 0x5a, 0xf9, 0x13, 0xf3, 0xb0,
+ 0x81, 0x13, 0x1c, 0xcb, 0x6b, 0x82, 0x55, 0x88
+};
+static const u8 enc_output028[] __initconst = {
+ 0xe9, 0x11, 0x0e, 0x9f, 0x56, 0xab, 0x3c, 0xa4,
+ 0x83, 0x50, 0x0c, 0xea, 0xba, 0xb6, 0x7a, 0x13,
+ 0x83, 0x6c, 0xca, 0xbf, 0x15, 0xa6, 0xa2, 0x2a,
+ 0x51, 0xc1, 0x07, 0x1c, 0xfa, 0x68, 0xfa, 0x0c
+};
+static const u8 enc_assoc028[] __initconst = { };
+static const u8 enc_nonce028[] __initconst = {
+ 0x2f, 0xcb, 0x1b, 0x38, 0xa9, 0x9e, 0x71, 0xb8,
+ 0x47, 0x40, 0xad, 0x9b
+};
+static const u8 enc_key028[] __initconst = {
+ 0x59, 0xd4, 0xea, 0xfb, 0x4d, 0xe0, 0xcf, 0xc7,
+ 0xd3, 0xdb, 0x99, 0xa8, 0xf5, 0x4b, 0x15, 0xd7,
+ 0xb3, 0x9f, 0x0a, 0xcc, 0x8d, 0xa6, 0x97, 0x63,
+ 0xb0, 0x19, 0xc1, 0x69, 0x9f, 0x87, 0x67, 0x4a
+};
+
+/* wycheproof - misc */
+static const u8 enc_input029[] __initconst = {
+ 0x55, 0xa4, 0x65, 0x64, 0x4f, 0x5b, 0x65, 0x09,
+ 0x28, 0xcb, 0xee, 0x7c, 0x06, 0x32, 0x14, 0xd6
+};
+static const u8 enc_output029[] __initconst = {
+ 0xe4, 0xb1, 0x13, 0xcb, 0x77, 0x59, 0x45, 0xf3,
+ 0xd3, 0xa8, 0xae, 0x9e, 0xc1, 0x41, 0xc0, 0x0c,
+ 0x7c, 0x43, 0xf1, 0x6c, 0xe0, 0x96, 0xd0, 0xdc,
+ 0x27, 0xc9, 0x58, 0x49, 0xdc, 0x38, 0x3b, 0x7d
+};
+static const u8 enc_assoc029[] __initconst = {
+ 0x03, 0x45, 0x85, 0x62, 0x1a, 0xf8, 0xd7, 0xff
+};
+static const u8 enc_nonce029[] __initconst = {
+ 0x11, 0x8a, 0x69, 0x64, 0xc2, 0xd3, 0xe3, 0x80,
+ 0x07, 0x1f, 0x52, 0x66
+};
+static const u8 enc_key029[] __initconst = {
+ 0xb9, 0x07, 0xa4, 0x50, 0x75, 0x51, 0x3f, 0xe8,
+ 0xa8, 0x01, 0x9e, 0xde, 0xe3, 0xf2, 0x59, 0x14,
+ 0x87, 0xb2, 0xa0, 0x30, 0xb0, 0x3c, 0x6e, 0x1d,
+ 0x77, 0x1c, 0x86, 0x25, 0x71, 0xd2, 0xea, 0x1e
+};
+
+/* wycheproof - misc */
+static const u8 enc_input030[] __initconst = {
+ 0x3f, 0xf1, 0x51, 0x4b, 0x1c, 0x50, 0x39, 0x15,
+ 0x91, 0x8f, 0x0c, 0x0c, 0x31, 0x09, 0x4a, 0x6e,
+ 0x1f
+};
+static const u8 enc_output030[] __initconst = {
+ 0x02, 0xcc, 0x3a, 0xcb, 0x5e, 0xe1, 0xfc, 0xdd,
+ 0x12, 0xa0, 0x3b, 0xb8, 0x57, 0x97, 0x64, 0x74,
+ 0xd3, 0xd8, 0x3b, 0x74, 0x63, 0xa2, 0xc3, 0x80,
+ 0x0f, 0xe9, 0x58, 0xc2, 0x8e, 0xaa, 0x29, 0x08,
+ 0x13
+};
+static const u8 enc_assoc030[] __initconst = { };
+static const u8 enc_nonce030[] __initconst = {
+ 0x45, 0xaa, 0xa3, 0xe5, 0xd1, 0x6d, 0x2d, 0x42,
+ 0xdc, 0x03, 0x44, 0x5d
+};
+static const u8 enc_key030[] __initconst = {
+ 0x3b, 0x24, 0x58, 0xd8, 0x17, 0x6e, 0x16, 0x21,
+ 0xc0, 0xcc, 0x24, 0xc0, 0xc0, 0xe2, 0x4c, 0x1e,
+ 0x80, 0xd7, 0x2f, 0x7e, 0xe9, 0x14, 0x9a, 0x4b,
+ 0x16, 0x61, 0x76, 0x62, 0x96, 0x16, 0xd0, 0x11
+};
+
+/* wycheproof - misc */
+static const u8 enc_input031[] __initconst = {
+ 0x63, 0x85, 0x8c, 0xa3, 0xe2, 0xce, 0x69, 0x88,
+ 0x7b, 0x57, 0x8a, 0x3c, 0x16, 0x7b, 0x42, 0x1c,
+ 0x9c
+};
+static const u8 enc_output031[] __initconst = {
+ 0x35, 0x76, 0x64, 0x88, 0xd2, 0xbc, 0x7c, 0x2b,
+ 0x8d, 0x17, 0xcb, 0xbb, 0x9a, 0xbf, 0xad, 0x9e,
+ 0x6d, 0x1f, 0x39, 0x1e, 0x65, 0x7b, 0x27, 0x38,
+ 0xdd, 0xa0, 0x84, 0x48, 0xcb, 0xa2, 0x81, 0x1c,
+ 0xeb
+};
+static const u8 enc_assoc031[] __initconst = {
+ 0x9a, 0xaf, 0x29, 0x9e, 0xee, 0xa7, 0x8f, 0x79
+};
+static const u8 enc_nonce031[] __initconst = {
+ 0xf0, 0x38, 0x4f, 0xb8, 0x76, 0x12, 0x14, 0x10,
+ 0x63, 0x3d, 0x99, 0x3d
+};
+static const u8 enc_key031[] __initconst = {
+ 0xf6, 0x0c, 0x6a, 0x1b, 0x62, 0x57, 0x25, 0xf7,
+ 0x6c, 0x70, 0x37, 0xb4, 0x8f, 0xe3, 0x57, 0x7f,
+ 0xa7, 0xf7, 0xb8, 0x7b, 0x1b, 0xd5, 0xa9, 0x82,
+ 0x17, 0x6d, 0x18, 0x23, 0x06, 0xff, 0xb8, 0x70
+};
+
+/* wycheproof - misc */
+static const u8 enc_input032[] __initconst = {
+ 0x10, 0xf1, 0xec, 0xf9, 0xc6, 0x05, 0x84, 0x66,
+ 0x5d, 0x9a, 0xe5, 0xef, 0xe2, 0x79, 0xe7, 0xf7,
+ 0x37, 0x7e, 0xea, 0x69, 0x16, 0xd2, 0xb1, 0x11
+};
+static const u8 enc_output032[] __initconst = {
+ 0x42, 0xf2, 0x6c, 0x56, 0xcb, 0x4b, 0xe2, 0x1d,
+ 0x9d, 0x8d, 0x0c, 0x80, 0xfc, 0x99, 0xdd, 0xe0,
+ 0x0d, 0x75, 0xf3, 0x80, 0x74, 0xbf, 0xe7, 0x64,
+ 0x54, 0xaa, 0x7e, 0x13, 0xd4, 0x8f, 0xff, 0x7d,
+ 0x75, 0x57, 0x03, 0x94, 0x57, 0x04, 0x0a, 0x3a
+};
+static const u8 enc_assoc032[] __initconst = { };
+static const u8 enc_nonce032[] __initconst = {
+ 0xe6, 0xb1, 0xad, 0xf2, 0xfd, 0x58, 0xa8, 0x76,
+ 0x2c, 0x65, 0xf3, 0x1b
+};
+static const u8 enc_key032[] __initconst = {
+ 0x02, 0x12, 0xa8, 0xde, 0x50, 0x07, 0xed, 0x87,
+ 0xb3, 0x3f, 0x1a, 0x70, 0x90, 0xb6, 0x11, 0x4f,
+ 0x9e, 0x08, 0xce, 0xfd, 0x96, 0x07, 0xf2, 0xc2,
+ 0x76, 0xbd, 0xcf, 0xdb, 0xc5, 0xce, 0x9c, 0xd7
+};
+
+/* wycheproof - misc */
+static const u8 enc_input033[] __initconst = {
+ 0x92, 0x22, 0xf9, 0x01, 0x8e, 0x54, 0xfd, 0x6d,
+ 0xe1, 0x20, 0x08, 0x06, 0xa9, 0xee, 0x8e, 0x4c,
+ 0xc9, 0x04, 0xd2, 0x9f, 0x25, 0xcb, 0xa1, 0x93
+};
+static const u8 enc_output033[] __initconst = {
+ 0x12, 0x30, 0x32, 0x43, 0x7b, 0x4b, 0xfd, 0x69,
+ 0x20, 0xe8, 0xf7, 0xe7, 0xe0, 0x08, 0x7a, 0xe4,
+ 0x88, 0x9e, 0xbe, 0x7a, 0x0a, 0xd0, 0xe9, 0x00,
+ 0x3c, 0xf6, 0x8f, 0x17, 0x95, 0x50, 0xda, 0x63,
+ 0xd3, 0xb9, 0x6c, 0x2d, 0x55, 0x41, 0x18, 0x65
+};
+static const u8 enc_assoc033[] __initconst = {
+ 0x3e, 0x8b, 0xc5, 0xad, 0xe1, 0x82, 0xff, 0x08
+};
+static const u8 enc_nonce033[] __initconst = {
+ 0x6b, 0x28, 0x2e, 0xbe, 0xcc, 0x54, 0x1b, 0xcd,
+ 0x78, 0x34, 0xed, 0x55
+};
+static const u8 enc_key033[] __initconst = {
+ 0xc5, 0xbc, 0x09, 0x56, 0x56, 0x46, 0xe7, 0xed,
+ 0xda, 0x95, 0x4f, 0x1f, 0x73, 0x92, 0x23, 0xda,
+ 0xda, 0x20, 0xb9, 0x5c, 0x44, 0xab, 0x03, 0x3d,
+ 0x0f, 0xae, 0x4b, 0x02, 0x83, 0xd1, 0x8b, 0xe3
+};
+
+/* wycheproof - misc */
+static const u8 enc_input034[] __initconst = {
+ 0xb0, 0x53, 0x99, 0x92, 0x86, 0xa2, 0x82, 0x4f,
+ 0x42, 0xcc, 0x8c, 0x20, 0x3a, 0xb2, 0x4e, 0x2c,
+ 0x97, 0xa6, 0x85, 0xad, 0xcc, 0x2a, 0xd3, 0x26,
+ 0x62, 0x55, 0x8e, 0x55, 0xa5, 0xc7, 0x29
+};
+static const u8 enc_output034[] __initconst = {
+ 0x45, 0xc7, 0xd6, 0xb5, 0x3a, 0xca, 0xd4, 0xab,
+ 0xb6, 0x88, 0x76, 0xa6, 0xe9, 0x6a, 0x48, 0xfb,
+ 0x59, 0x52, 0x4d, 0x2c, 0x92, 0xc9, 0xd8, 0xa1,
+ 0x89, 0xc9, 0xfd, 0x2d, 0xb9, 0x17, 0x46, 0x56,
+ 0x6d, 0x3c, 0xa1, 0x0e, 0x31, 0x1b, 0x69, 0x5f,
+ 0x3e, 0xae, 0x15, 0x51, 0x65, 0x24, 0x93
+};
+static const u8 enc_assoc034[] __initconst = { };
+static const u8 enc_nonce034[] __initconst = {
+ 0x04, 0xa9, 0xbe, 0x03, 0x50, 0x8a, 0x5f, 0x31,
+ 0x37, 0x1a, 0x6f, 0xd2
+};
+static const u8 enc_key034[] __initconst = {
+ 0x2e, 0xb5, 0x1c, 0x46, 0x9a, 0xa8, 0xeb, 0x9e,
+ 0x6c, 0x54, 0xa8, 0x34, 0x9b, 0xae, 0x50, 0xa2,
+ 0x0f, 0x0e, 0x38, 0x27, 0x11, 0xbb, 0xa1, 0x15,
+ 0x2c, 0x42, 0x4f, 0x03, 0xb6, 0x67, 0x1d, 0x71
+};
+
+/* wycheproof - misc */
+static const u8 enc_input035[] __initconst = {
+ 0xf4, 0x52, 0x06, 0xab, 0xc2, 0x55, 0x52, 0xb2,
+ 0xab, 0xc9, 0xab, 0x7f, 0xa2, 0x43, 0x03, 0x5f,
+ 0xed, 0xaa, 0xdd, 0xc3, 0xb2, 0x29, 0x39, 0x56,
+ 0xf1, 0xea, 0x6e, 0x71, 0x56, 0xe7, 0xeb
+};
+static const u8 enc_output035[] __initconst = {
+ 0x46, 0xa8, 0x0c, 0x41, 0x87, 0x02, 0x47, 0x20,
+ 0x08, 0x46, 0x27, 0x58, 0x00, 0x80, 0xdd, 0xe5,
+ 0xa3, 0xf4, 0xa1, 0x10, 0x93, 0xa7, 0x07, 0x6e,
+ 0xd6, 0xf3, 0xd3, 0x26, 0xbc, 0x7b, 0x70, 0x53,
+ 0x4d, 0x4a, 0xa2, 0x83, 0x5a, 0x52, 0xe7, 0x2d,
+ 0x14, 0xdf, 0x0e, 0x4f, 0x47, 0xf2, 0x5f
+};
+static const u8 enc_assoc035[] __initconst = {
+ 0x37, 0x46, 0x18, 0xa0, 0x6e, 0xa9, 0x8a, 0x48
+};
+static const u8 enc_nonce035[] __initconst = {
+ 0x47, 0x0a, 0x33, 0x9e, 0xcb, 0x32, 0x19, 0xb8,
+ 0xb8, 0x1a, 0x1f, 0x8b
+};
+static const u8 enc_key035[] __initconst = {
+ 0x7f, 0x5b, 0x74, 0xc0, 0x7e, 0xd1, 0xb4, 0x0f,
+ 0xd1, 0x43, 0x58, 0xfe, 0x2f, 0xf2, 0xa7, 0x40,
+ 0xc1, 0x16, 0xc7, 0x70, 0x65, 0x10, 0xe6, 0xa4,
+ 0x37, 0xf1, 0x9e, 0xa4, 0x99, 0x11, 0xce, 0xc4
+};
+
+/* wycheproof - misc */
+static const u8 enc_input036[] __initconst = {
+ 0xb9, 0xc5, 0x54, 0xcb, 0xc3, 0x6a, 0xc1, 0x8a,
+ 0xe8, 0x97, 0xdf, 0x7b, 0xee, 0xca, 0xc1, 0xdb,
+ 0xeb, 0x4e, 0xaf, 0xa1, 0x56, 0xbb, 0x60, 0xce,
+ 0x2e, 0x5d, 0x48, 0xf0, 0x57, 0x15, 0xe6, 0x78
+};
+static const u8 enc_output036[] __initconst = {
+ 0xea, 0x29, 0xaf, 0xa4, 0x9d, 0x36, 0xe8, 0x76,
+ 0x0f, 0x5f, 0xe1, 0x97, 0x23, 0xb9, 0x81, 0x1e,
+ 0xd5, 0xd5, 0x19, 0x93, 0x4a, 0x44, 0x0f, 0x50,
+ 0x81, 0xac, 0x43, 0x0b, 0x95, 0x3b, 0x0e, 0x21,
+ 0x22, 0x25, 0x41, 0xaf, 0x46, 0xb8, 0x65, 0x33,
+ 0xc6, 0xb6, 0x8d, 0x2f, 0xf1, 0x08, 0xa7, 0xea
+};
+static const u8 enc_assoc036[] __initconst = { };
+static const u8 enc_nonce036[] __initconst = {
+ 0x72, 0xcf, 0xd9, 0x0e, 0xf3, 0x02, 0x6c, 0xa2,
+ 0x2b, 0x7e, 0x6e, 0x6a
+};
+static const u8 enc_key036[] __initconst = {
+ 0xe1, 0x73, 0x1d, 0x58, 0x54, 0xe1, 0xb7, 0x0c,
+ 0xb3, 0xff, 0xe8, 0xb7, 0x86, 0xa2, 0xb3, 0xeb,
+ 0xf0, 0x99, 0x43, 0x70, 0x95, 0x47, 0x57, 0xb9,
+ 0xdc, 0x8c, 0x7b, 0xc5, 0x35, 0x46, 0x34, 0xa3
+};
+
+/* wycheproof - misc */
+static const u8 enc_input037[] __initconst = {
+ 0x6b, 0x26, 0x04, 0x99, 0x6c, 0xd3, 0x0c, 0x14,
+ 0xa1, 0x3a, 0x52, 0x57, 0xed, 0x6c, 0xff, 0xd3,
+ 0xbc, 0x5e, 0x29, 0xd6, 0xb9, 0x7e, 0xb1, 0x79,
+ 0x9e, 0xb3, 0x35, 0xe2, 0x81, 0xea, 0x45, 0x1e
+};
+static const u8 enc_output037[] __initconst = {
+ 0x6d, 0xad, 0x63, 0x78, 0x97, 0x54, 0x4d, 0x8b,
+ 0xf6, 0xbe, 0x95, 0x07, 0xed, 0x4d, 0x1b, 0xb2,
+ 0xe9, 0x54, 0xbc, 0x42, 0x7e, 0x5d, 0xe7, 0x29,
+ 0xda, 0xf5, 0x07, 0x62, 0x84, 0x6f, 0xf2, 0xf4,
+ 0x7b, 0x99, 0x7d, 0x93, 0xc9, 0x82, 0x18, 0x9d,
+ 0x70, 0x95, 0xdc, 0x79, 0x4c, 0x74, 0x62, 0x32
+};
+static const u8 enc_assoc037[] __initconst = {
+ 0x23, 0x33, 0xe5, 0xce, 0x0f, 0x93, 0xb0, 0x59
+};
+static const u8 enc_nonce037[] __initconst = {
+ 0x26, 0x28, 0x80, 0xd4, 0x75, 0xf3, 0xda, 0xc5,
+ 0x34, 0x0d, 0xd1, 0xb8
+};
+static const u8 enc_key037[] __initconst = {
+ 0x27, 0xd8, 0x60, 0x63, 0x1b, 0x04, 0x85, 0xa4,
+ 0x10, 0x70, 0x2f, 0xea, 0x61, 0xbc, 0x87, 0x3f,
+ 0x34, 0x42, 0x26, 0x0c, 0xad, 0xed, 0x4a, 0xbd,
+ 0xe2, 0x5b, 0x78, 0x6a, 0x2d, 0x97, 0xf1, 0x45
+};
+
+/* wycheproof - misc */
+static const u8 enc_input038[] __initconst = {
+ 0x97, 0x3d, 0x0c, 0x75, 0x38, 0x26, 0xba, 0xe4,
+ 0x66, 0xcf, 0x9a, 0xbb, 0x34, 0x93, 0x15, 0x2e,
+ 0x9d, 0xe7, 0x81, 0x9e, 0x2b, 0xd0, 0xc7, 0x11,
+ 0x71, 0x34, 0x6b, 0x4d, 0x2c, 0xeb, 0xf8, 0x04,
+ 0x1a, 0xa3, 0xce, 0xdc, 0x0d, 0xfd, 0x7b, 0x46,
+ 0x7e, 0x26, 0x22, 0x8b, 0xc8, 0x6c, 0x9a
+};
+static const u8 enc_output038[] __initconst = {
+ 0xfb, 0xa7, 0x8a, 0xe4, 0xf9, 0xd8, 0x08, 0xa6,
+ 0x2e, 0x3d, 0xa4, 0x0b, 0xe2, 0xcb, 0x77, 0x00,
+ 0xc3, 0x61, 0x3d, 0x9e, 0xb2, 0xc5, 0x29, 0xc6,
+ 0x52, 0xe7, 0x6a, 0x43, 0x2c, 0x65, 0x8d, 0x27,
+ 0x09, 0x5f, 0x0e, 0xb8, 0xf9, 0x40, 0xc3, 0x24,
+ 0x98, 0x1e, 0xa9, 0x35, 0xe5, 0x07, 0xf9, 0x8f,
+ 0x04, 0x69, 0x56, 0xdb, 0x3a, 0x51, 0x29, 0x08,
+ 0xbd, 0x7a, 0xfc, 0x8f, 0x2a, 0xb0, 0xa9
+};
+static const u8 enc_assoc038[] __initconst = { };
+static const u8 enc_nonce038[] __initconst = {
+ 0xe7, 0x4a, 0x51, 0x5e, 0x7e, 0x21, 0x02, 0xb9,
+ 0x0b, 0xef, 0x55, 0xd2
+};
+static const u8 enc_key038[] __initconst = {
+ 0xcf, 0x0d, 0x40, 0xa4, 0x64, 0x4e, 0x5f, 0x51,
+ 0x81, 0x51, 0x65, 0xd5, 0x30, 0x1b, 0x22, 0x63,
+ 0x1f, 0x45, 0x44, 0xc4, 0x9a, 0x18, 0x78, 0xe3,
+ 0xa0, 0xa5, 0xe8, 0xe1, 0xaa, 0xe0, 0xf2, 0x64
+};
+
+/* wycheproof - misc */
+static const u8 enc_input039[] __initconst = {
+ 0xa9, 0x89, 0x95, 0x50, 0x4d, 0xf1, 0x6f, 0x74,
+ 0x8b, 0xfb, 0x77, 0x85, 0xff, 0x91, 0xee, 0xb3,
+ 0xb6, 0x60, 0xea, 0x9e, 0xd3, 0x45, 0x0c, 0x3d,
+ 0x5e, 0x7b, 0x0e, 0x79, 0xef, 0x65, 0x36, 0x59,
+ 0xa9, 0x97, 0x8d, 0x75, 0x54, 0x2e, 0xf9, 0x1c,
+ 0x45, 0x67, 0x62, 0x21, 0x56, 0x40, 0xb9
+};
+static const u8 enc_output039[] __initconst = {
+ 0xa1, 0xff, 0xed, 0x80, 0x76, 0x18, 0x29, 0xec,
+ 0xce, 0x24, 0x2e, 0x0e, 0x88, 0xb1, 0x38, 0x04,
+ 0x90, 0x16, 0xbc, 0xa0, 0x18, 0xda, 0x2b, 0x6e,
+ 0x19, 0x98, 0x6b, 0x3e, 0x31, 0x8c, 0xae, 0x8d,
+ 0x80, 0x61, 0x98, 0xfb, 0x4c, 0x52, 0x7c, 0xc3,
+ 0x93, 0x50, 0xeb, 0xdd, 0xea, 0xc5, 0x73, 0xc4,
+ 0xcb, 0xf0, 0xbe, 0xfd, 0xa0, 0xb7, 0x02, 0x42,
+ 0xc6, 0x40, 0xd7, 0xcd, 0x02, 0xd7, 0xa3
+};
+static const u8 enc_assoc039[] __initconst = {
+ 0xb3, 0xe4, 0x06, 0x46, 0x83, 0xb0, 0x2d, 0x84
+};
+static const u8 enc_nonce039[] __initconst = {
+ 0xd4, 0xd8, 0x07, 0x34, 0x16, 0x83, 0x82, 0x5b,
+ 0x31, 0xcd, 0x4d, 0x95
+};
+static const u8 enc_key039[] __initconst = {
+ 0x6c, 0xbf, 0xd7, 0x1c, 0x64, 0x5d, 0x18, 0x4c,
+ 0xf5, 0xd2, 0x3c, 0x40, 0x2b, 0xdb, 0x0d, 0x25,
+ 0xec, 0x54, 0x89, 0x8c, 0x8a, 0x02, 0x73, 0xd4,
+ 0x2e, 0xb5, 0xbe, 0x10, 0x9f, 0xdc, 0xb2, 0xac
+};
+
+/* wycheproof - misc */
+static const u8 enc_input040[] __initconst = {
+ 0xd0, 0x96, 0x80, 0x31, 0x81, 0xbe, 0xef, 0x9e,
+ 0x00, 0x8f, 0xf8, 0x5d, 0x5d, 0xdc, 0x38, 0xdd,
+ 0xac, 0xf0, 0xf0, 0x9e, 0xe5, 0xf7, 0xe0, 0x7f,
+ 0x1e, 0x40, 0x79, 0xcb, 0x64, 0xd0, 0xdc, 0x8f,
+ 0x5e, 0x67, 0x11, 0xcd, 0x49, 0x21, 0xa7, 0x88,
+ 0x7d, 0xe7, 0x6e, 0x26, 0x78, 0xfd, 0xc6, 0x76,
+ 0x18, 0xf1, 0x18, 0x55, 0x86, 0xbf, 0xea, 0x9d,
+ 0x4c, 0x68, 0x5d, 0x50, 0xe4, 0xbb, 0x9a, 0x82
+};
+static const u8 enc_output040[] __initconst = {
+ 0x9a, 0x4e, 0xf2, 0x2b, 0x18, 0x16, 0x77, 0xb5,
+ 0x75, 0x5c, 0x08, 0xf7, 0x47, 0xc0, 0xf8, 0xd8,
+ 0xe8, 0xd4, 0xc1, 0x8a, 0x9c, 0xc2, 0x40, 0x5c,
+ 0x12, 0xbb, 0x51, 0xbb, 0x18, 0x72, 0xc8, 0xe8,
+ 0xb8, 0x77, 0x67, 0x8b, 0xec, 0x44, 0x2c, 0xfc,
+ 0xbb, 0x0f, 0xf4, 0x64, 0xa6, 0x4b, 0x74, 0x33,
+ 0x2c, 0xf0, 0x72, 0x89, 0x8c, 0x7e, 0x0e, 0xdd,
+ 0xf6, 0x23, 0x2e, 0xa6, 0xe2, 0x7e, 0xfe, 0x50,
+ 0x9f, 0xf3, 0x42, 0x7a, 0x0f, 0x32, 0xfa, 0x56,
+ 0x6d, 0x9c, 0xa0, 0xa7, 0x8a, 0xef, 0xc0, 0x13
+};
+static const u8 enc_assoc040[] __initconst = { };
+static const u8 enc_nonce040[] __initconst = {
+ 0xd6, 0x10, 0x40, 0xa3, 0x13, 0xed, 0x49, 0x28,
+ 0x23, 0xcc, 0x06, 0x5b
+};
+static const u8 enc_key040[] __initconst = {
+ 0x5b, 0x1d, 0x10, 0x35, 0xc0, 0xb1, 0x7e, 0xe0,
+ 0xb0, 0x44, 0x47, 0x67, 0xf8, 0x0a, 0x25, 0xb8,
+ 0xc1, 0xb7, 0x41, 0xf4, 0xb5, 0x0a, 0x4d, 0x30,
+ 0x52, 0x22, 0x6b, 0xaa, 0x1c, 0x6f, 0xb7, 0x01
+};
+
+/* wycheproof - misc */
+static const u8 enc_input041[] __initconst = {
+ 0x94, 0xee, 0x16, 0x6d, 0x6d, 0x6e, 0xcf, 0x88,
+ 0x32, 0x43, 0x71, 0x36, 0xb4, 0xae, 0x80, 0x5d,
+ 0x42, 0x88, 0x64, 0x35, 0x95, 0x86, 0xd9, 0x19,
+ 0x3a, 0x25, 0x01, 0x62, 0x93, 0xed, 0xba, 0x44,
+ 0x3c, 0x58, 0xe0, 0x7e, 0x7b, 0x71, 0x95, 0xec,
+ 0x5b, 0xd8, 0x45, 0x82, 0xa9, 0xd5, 0x6c, 0x8d,
+ 0x4a, 0x10, 0x8c, 0x7d, 0x7c, 0xe3, 0x4e, 0x6c,
+ 0x6f, 0x8e, 0xa1, 0xbe, 0xc0, 0x56, 0x73, 0x17
+};
+static const u8 enc_output041[] __initconst = {
+ 0x5f, 0xbb, 0xde, 0xcc, 0x34, 0xbe, 0x20, 0x16,
+ 0x14, 0xf6, 0x36, 0x03, 0x1e, 0xeb, 0x42, 0xf1,
+ 0xca, 0xce, 0x3c, 0x79, 0xa1, 0x2c, 0xff, 0xd8,
+ 0x71, 0xee, 0x8e, 0x73, 0x82, 0x0c, 0x82, 0x97,
+ 0x49, 0xf1, 0xab, 0xb4, 0x29, 0x43, 0x67, 0x84,
+ 0x9f, 0xb6, 0xc2, 0xaa, 0x56, 0xbd, 0xa8, 0xa3,
+ 0x07, 0x8f, 0x72, 0x3d, 0x7c, 0x1c, 0x85, 0x20,
+ 0x24, 0xb0, 0x17, 0xb5, 0x89, 0x73, 0xfb, 0x1e,
+ 0x09, 0x26, 0x3d, 0xa7, 0xb4, 0xcb, 0x92, 0x14,
+ 0x52, 0xf9, 0x7d, 0xca, 0x40, 0xf5, 0x80, 0xec
+};
+static const u8 enc_assoc041[] __initconst = {
+ 0x71, 0x93, 0xf6, 0x23, 0x66, 0x33, 0x21, 0xa2
+};
+static const u8 enc_nonce041[] __initconst = {
+ 0xd3, 0x1c, 0x21, 0xab, 0xa1, 0x75, 0xb7, 0x0d,
+ 0xe4, 0xeb, 0xb1, 0x9c
+};
+static const u8 enc_key041[] __initconst = {
+ 0x97, 0xd6, 0x35, 0xc4, 0xf4, 0x75, 0x74, 0xd9,
+ 0x99, 0x8a, 0x90, 0x87, 0x5d, 0xa1, 0xd3, 0xa2,
+ 0x84, 0xb7, 0x55, 0xb2, 0xd3, 0x92, 0x97, 0xa5,
+ 0x72, 0x52, 0x35, 0x19, 0x0e, 0x10, 0xa9, 0x7e
+};
+
+/* wycheproof - misc */
+static const u8 enc_input042[] __initconst = {
+ 0xb4, 0x29, 0xeb, 0x80, 0xfb, 0x8f, 0xe8, 0xba,
+ 0xed, 0xa0, 0xc8, 0x5b, 0x9c, 0x33, 0x34, 0x58,
+ 0xe7, 0xc2, 0x99, 0x2e, 0x55, 0x84, 0x75, 0x06,
+ 0x9d, 0x12, 0xd4, 0x5c, 0x22, 0x21, 0x75, 0x64,
+ 0x12, 0x15, 0x88, 0x03, 0x22, 0x97, 0xef, 0xf5,
+ 0x67, 0x83, 0x74, 0x2a, 0x5f, 0xc2, 0x2d, 0x74,
+ 0x10, 0xff, 0xb2, 0x9d, 0x66, 0x09, 0x86, 0x61,
+ 0xd7, 0x6f, 0x12, 0x6c, 0x3c, 0x27, 0x68, 0x9e,
+ 0x43, 0xb3, 0x72, 0x67, 0xca, 0xc5, 0xa3, 0xa6,
+ 0xd3, 0xab, 0x49, 0xe3, 0x91, 0xda, 0x29, 0xcd,
+ 0x30, 0x54, 0xa5, 0x69, 0x2e, 0x28, 0x07, 0xe4,
+ 0xc3, 0xea, 0x46, 0xc8, 0x76, 0x1d, 0x50, 0xf5,
+ 0x92
+};
+static const u8 enc_output042[] __initconst = {
+ 0xd0, 0x10, 0x2f, 0x6c, 0x25, 0x8b, 0xf4, 0x97,
+ 0x42, 0xce, 0xc3, 0x4c, 0xf2, 0xd0, 0xfe, 0xdf,
+ 0x23, 0xd1, 0x05, 0xfb, 0x4c, 0x84, 0xcf, 0x98,
+ 0x51, 0x5e, 0x1b, 0xc9, 0xa6, 0x4f, 0x8a, 0xd5,
+ 0xbe, 0x8f, 0x07, 0x21, 0xbd, 0xe5, 0x06, 0x45,
+ 0xd0, 0x00, 0x83, 0xc3, 0xa2, 0x63, 0xa3, 0x10,
+ 0x53, 0xb7, 0x60, 0x24, 0x5f, 0x52, 0xae, 0x28,
+ 0x66, 0xa5, 0xec, 0x83, 0xb1, 0x9f, 0x61, 0xbe,
+ 0x1d, 0x30, 0xd5, 0xc5, 0xd9, 0xfe, 0xcc, 0x4c,
+ 0xbb, 0xe0, 0x8f, 0xd3, 0x85, 0x81, 0x3a, 0x2a,
+ 0xa3, 0x9a, 0x00, 0xff, 0x9c, 0x10, 0xf7, 0xf2,
+ 0x37, 0x02, 0xad, 0xd1, 0xe4, 0xb2, 0xff, 0xa3,
+ 0x1c, 0x41, 0x86, 0x5f, 0xc7, 0x1d, 0xe1, 0x2b,
+ 0x19, 0x61, 0x21, 0x27, 0xce, 0x49, 0x99, 0x3b,
+ 0xb0
+};
+static const u8 enc_assoc042[] __initconst = { };
+static const u8 enc_nonce042[] __initconst = {
+ 0x17, 0xc8, 0x6a, 0x8a, 0xbb, 0xb7, 0xe0, 0x03,
+ 0xac, 0xde, 0x27, 0x99
+};
+static const u8 enc_key042[] __initconst = {
+ 0xfe, 0x6e, 0x55, 0xbd, 0xae, 0xd1, 0xf7, 0x28,
+ 0x4c, 0xa5, 0xfc, 0x0f, 0x8c, 0x5f, 0x2b, 0x8d,
+ 0xf5, 0x6d, 0xc0, 0xf4, 0x9e, 0x8c, 0xa6, 0x6a,
+ 0x41, 0x99, 0x5e, 0x78, 0x33, 0x51, 0xf9, 0x01
+};
+
+/* wycheproof - misc */
+static const u8 enc_input043[] __initconst = {
+ 0xce, 0xb5, 0x34, 0xce, 0x50, 0xdc, 0x23, 0xff,
+ 0x63, 0x8a, 0xce, 0x3e, 0xf6, 0x3a, 0xb2, 0xcc,
+ 0x29, 0x73, 0xee, 0xad, 0xa8, 0x07, 0x85, 0xfc,
+ 0x16, 0x5d, 0x06, 0xc2, 0xf5, 0x10, 0x0f, 0xf5,
+ 0xe8, 0xab, 0x28, 0x82, 0xc4, 0x75, 0xaf, 0xcd,
+ 0x05, 0xcc, 0xd4, 0x9f, 0x2e, 0x7d, 0x8f, 0x55,
+ 0xef, 0x3a, 0x72, 0xe3, 0xdc, 0x51, 0xd6, 0x85,
+ 0x2b, 0x8e, 0x6b, 0x9e, 0x7a, 0xec, 0xe5, 0x7b,
+ 0xe6, 0x55, 0x6b, 0x0b, 0x6d, 0x94, 0x13, 0xe3,
+ 0x3f, 0xc5, 0xfc, 0x24, 0xa9, 0xa2, 0x05, 0xad,
+ 0x59, 0x57, 0x4b, 0xb3, 0x9d, 0x94, 0x4a, 0x92,
+ 0xdc, 0x47, 0x97, 0x0d, 0x84, 0xa6, 0xad, 0x31,
+ 0x76
+};
+static const u8 enc_output043[] __initconst = {
+ 0x75, 0x45, 0x39, 0x1b, 0x51, 0xde, 0x01, 0xd5,
+ 0xc5, 0x3d, 0xfa, 0xca, 0x77, 0x79, 0x09, 0x06,
+ 0x3e, 0x58, 0xed, 0xee, 0x4b, 0xb1, 0x22, 0x7e,
+ 0x71, 0x10, 0xac, 0x4d, 0x26, 0x20, 0xc2, 0xae,
+ 0xc2, 0xf8, 0x48, 0xf5, 0x6d, 0xee, 0xb0, 0x37,
+ 0xa8, 0xdc, 0xed, 0x75, 0xaf, 0xa8, 0xa6, 0xc8,
+ 0x90, 0xe2, 0xde, 0xe4, 0x2f, 0x95, 0x0b, 0xb3,
+ 0x3d, 0x9e, 0x24, 0x24, 0xd0, 0x8a, 0x50, 0x5d,
+ 0x89, 0x95, 0x63, 0x97, 0x3e, 0xd3, 0x88, 0x70,
+ 0xf3, 0xde, 0x6e, 0xe2, 0xad, 0xc7, 0xfe, 0x07,
+ 0x2c, 0x36, 0x6c, 0x14, 0xe2, 0xcf, 0x7c, 0xa6,
+ 0x2f, 0xb3, 0xd3, 0x6b, 0xee, 0x11, 0x68, 0x54,
+ 0x61, 0xb7, 0x0d, 0x44, 0xef, 0x8c, 0x66, 0xc5,
+ 0xc7, 0xbb, 0xf1, 0x0d, 0xca, 0xdd, 0x7f, 0xac,
+ 0xf6
+};
+static const u8 enc_assoc043[] __initconst = {
+ 0xa1, 0x1c, 0x40, 0xb6, 0x03, 0x76, 0x73, 0x30
+};
+static const u8 enc_nonce043[] __initconst = {
+ 0x46, 0x36, 0x2f, 0x45, 0xd6, 0x37, 0x9e, 0x63,
+ 0xe5, 0x22, 0x94, 0x60
+};
+static const u8 enc_key043[] __initconst = {
+ 0xaa, 0xbc, 0x06, 0x34, 0x74, 0xe6, 0x5c, 0x4c,
+ 0x3e, 0x9b, 0xdc, 0x48, 0x0d, 0xea, 0x97, 0xb4,
+ 0x51, 0x10, 0xc8, 0x61, 0x88, 0x46, 0xff, 0x6b,
+ 0x15, 0xbd, 0xd2, 0xa4, 0xa5, 0x68, 0x2c, 0x4e
+};
+
+/* wycheproof - misc */
+static const u8 enc_input044[] __initconst = {
+ 0xe5, 0xcc, 0xaa, 0x44, 0x1b, 0xc8, 0x14, 0x68,
+ 0x8f, 0x8f, 0x6e, 0x8f, 0x28, 0xb5, 0x00, 0xb2
+};
+static const u8 enc_output044[] __initconst = {
+ 0x7e, 0x72, 0xf5, 0xa1, 0x85, 0xaf, 0x16, 0xa6,
+ 0x11, 0x92, 0x1b, 0x43, 0x8f, 0x74, 0x9f, 0x0b,
+ 0x12, 0x42, 0xc6, 0x70, 0x73, 0x23, 0x34, 0x02,
+ 0x9a, 0xdf, 0xe1, 0xc5, 0x00, 0x16, 0x51, 0xe4
+};
+static const u8 enc_assoc044[] __initconst = {
+ 0x02
+};
+static const u8 enc_nonce044[] __initconst = {
+ 0x87, 0x34, 0x5f, 0x10, 0x55, 0xfd, 0x9e, 0x21,
+ 0x02, 0xd5, 0x06, 0x56
+};
+static const u8 enc_key044[] __initconst = {
+ 0x7d, 0x00, 0xb4, 0x80, 0x95, 0xad, 0xfa, 0x32,
+ 0x72, 0x05, 0x06, 0x07, 0xb2, 0x64, 0x18, 0x50,
+ 0x02, 0xba, 0x99, 0x95, 0x7c, 0x49, 0x8b, 0xe0,
+ 0x22, 0x77, 0x0f, 0x2c, 0xe2, 0xf3, 0x14, 0x3c
+};
+
+/* wycheproof - misc */
+static const u8 enc_input045[] __initconst = {
+ 0x02, 0xcd, 0xe1, 0x68, 0xfb, 0xa3, 0xf5, 0x44,
+ 0xbb, 0xd0, 0x33, 0x2f, 0x7a, 0xde, 0xad, 0xa8
+};
+static const u8 enc_output045[] __initconst = {
+ 0x85, 0xf2, 0x9a, 0x71, 0x95, 0x57, 0xcd, 0xd1,
+ 0x4d, 0x1f, 0x8f, 0xff, 0xab, 0x6d, 0x9e, 0x60,
+ 0x73, 0x2c, 0xa3, 0x2b, 0xec, 0xd5, 0x15, 0xa1,
+ 0xed, 0x35, 0x3f, 0x54, 0x2e, 0x99, 0x98, 0x58
+};
+static const u8 enc_assoc045[] __initconst = {
+ 0xb6, 0x48
+};
+static const u8 enc_nonce045[] __initconst = {
+ 0x87, 0xa3, 0x16, 0x3e, 0xc0, 0x59, 0x8a, 0xd9,
+ 0x5b, 0x3a, 0xa7, 0x13
+};
+static const u8 enc_key045[] __initconst = {
+ 0x64, 0x32, 0x71, 0x7f, 0x1d, 0xb8, 0x5e, 0x41,
+ 0xac, 0x78, 0x36, 0xbc, 0xe2, 0x51, 0x85, 0xa0,
+ 0x80, 0xd5, 0x76, 0x2b, 0x9e, 0x2b, 0x18, 0x44,
+ 0x4b, 0x6e, 0xc7, 0x2c, 0x3b, 0xd8, 0xe4, 0xdc
+};
+
+/* wycheproof - misc */
+static const u8 enc_input046[] __initconst = {
+ 0x16, 0xdd, 0xd2, 0x3f, 0xf5, 0x3f, 0x3d, 0x23,
+ 0xc0, 0x63, 0x34, 0x48, 0x70, 0x40, 0xeb, 0x47
+};
+static const u8 enc_output046[] __initconst = {
+ 0xc1, 0xb2, 0x95, 0x93, 0x6d, 0x56, 0xfa, 0xda,
+ 0xc0, 0x3e, 0x5f, 0x74, 0x2b, 0xff, 0x73, 0xa1,
+ 0x39, 0xc4, 0x57, 0xdb, 0xab, 0x66, 0x38, 0x2b,
+ 0xab, 0xb3, 0xb5, 0x58, 0x00, 0xcd, 0xa5, 0xb8
+};
+static const u8 enc_assoc046[] __initconst = {
+ 0xbd, 0x4c, 0xd0, 0x2f, 0xc7, 0x50, 0x2b, 0xbd,
+ 0xbd, 0xf6, 0xc9, 0xa3, 0xcb, 0xe8, 0xf0
+};
+static const u8 enc_nonce046[] __initconst = {
+ 0x6f, 0x57, 0x3a, 0xa8, 0x6b, 0xaa, 0x49, 0x2b,
+ 0xa4, 0x65, 0x96, 0xdf
+};
+static const u8 enc_key046[] __initconst = {
+ 0x8e, 0x34, 0xcf, 0x73, 0xd2, 0x45, 0xa1, 0x08,
+ 0x2a, 0x92, 0x0b, 0x86, 0x36, 0x4e, 0xb8, 0x96,
+ 0xc4, 0x94, 0x64, 0x67, 0xbc, 0xb3, 0xd5, 0x89,
+ 0x29, 0xfc, 0xb3, 0x66, 0x90, 0xe6, 0x39, 0x4f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input047[] __initconst = {
+ 0x62, 0x3b, 0x78, 0x50, 0xc3, 0x21, 0xe2, 0xcf,
+ 0x0c, 0x6f, 0xbc, 0xc8, 0xdf, 0xd1, 0xaf, 0xf2
+};
+static const u8 enc_output047[] __initconst = {
+ 0xc8, 0x4c, 0x9b, 0xb7, 0xc6, 0x1c, 0x1b, 0xcb,
+ 0x17, 0x77, 0x2a, 0x1c, 0x50, 0x0c, 0x50, 0x95,
+ 0xdb, 0xad, 0xf7, 0xa5, 0x13, 0x8c, 0xa0, 0x34,
+ 0x59, 0xa2, 0xcd, 0x65, 0x83, 0x1e, 0x09, 0x2f
+};
+static const u8 enc_assoc047[] __initconst = {
+ 0x89, 0xcc, 0xe9, 0xfb, 0x47, 0x44, 0x1d, 0x07,
+ 0xe0, 0x24, 0x5a, 0x66, 0xfe, 0x8b, 0x77, 0x8b
+};
+static const u8 enc_nonce047[] __initconst = {
+ 0x1a, 0x65, 0x18, 0xf0, 0x2e, 0xde, 0x1d, 0xa6,
+ 0x80, 0x92, 0x66, 0xd9
+};
+static const u8 enc_key047[] __initconst = {
+ 0xcb, 0x55, 0x75, 0xf5, 0xc7, 0xc4, 0x5c, 0x91,
+ 0xcf, 0x32, 0x0b, 0x13, 0x9f, 0xb5, 0x94, 0x23,
+ 0x75, 0x60, 0xd0, 0xa3, 0xe6, 0xf8, 0x65, 0xa6,
+ 0x7d, 0x4f, 0x63, 0x3f, 0x2c, 0x08, 0xf0, 0x16
+};
+
+/* wycheproof - misc */
+static const u8 enc_input048[] __initconst = {
+ 0x87, 0xb3, 0xa4, 0xd7, 0xb2, 0x6d, 0x8d, 0x32,
+ 0x03, 0xa0, 0xde, 0x1d, 0x64, 0xef, 0x82, 0xe3
+};
+static const u8 enc_output048[] __initconst = {
+ 0x94, 0xbc, 0x80, 0x62, 0x1e, 0xd1, 0xe7, 0x1b,
+ 0x1f, 0xd2, 0xb5, 0xc3, 0xa1, 0x5e, 0x35, 0x68,
+ 0x33, 0x35, 0x11, 0x86, 0x17, 0x96, 0x97, 0x84,
+ 0x01, 0x59, 0x8b, 0x96, 0x37, 0x22, 0xf5, 0xb3
+};
+static const u8 enc_assoc048[] __initconst = {
+ 0xd1, 0x9f, 0x2d, 0x98, 0x90, 0x95, 0xf7, 0xab,
+ 0x03, 0xa5, 0xfd, 0xe8, 0x44, 0x16, 0xe0, 0x0c,
+ 0x0e
+};
+static const u8 enc_nonce048[] __initconst = {
+ 0x56, 0x4d, 0xee, 0x49, 0xab, 0x00, 0xd2, 0x40,
+ 0xfc, 0x10, 0x68, 0xc3
+};
+static const u8 enc_key048[] __initconst = {
+ 0xa5, 0x56, 0x9e, 0x72, 0x9a, 0x69, 0xb2, 0x4b,
+ 0xa6, 0xe0, 0xff, 0x15, 0xc4, 0x62, 0x78, 0x97,
+ 0x43, 0x68, 0x24, 0xc9, 0x41, 0xe9, 0xd0, 0x0b,
+ 0x2e, 0x93, 0xfd, 0xdc, 0x4b, 0xa7, 0x76, 0x57
+};
+
+/* wycheproof - misc */
+static const u8 enc_input049[] __initconst = {
+ 0xe6, 0x01, 0xb3, 0x85, 0x57, 0x79, 0x7d, 0xa2,
+ 0xf8, 0xa4, 0x10, 0x6a, 0x08, 0x9d, 0x1d, 0xa6
+};
+static const u8 enc_output049[] __initconst = {
+ 0x29, 0x9b, 0x5d, 0x3f, 0x3d, 0x03, 0xc0, 0x87,
+ 0x20, 0x9a, 0x16, 0xe2, 0x85, 0x14, 0x31, 0x11,
+ 0x4b, 0x45, 0x4e, 0xd1, 0x98, 0xde, 0x11, 0x7e,
+ 0x83, 0xec, 0x49, 0xfa, 0x8d, 0x85, 0x08, 0xd6
+};
+static const u8 enc_assoc049[] __initconst = {
+ 0x5e, 0x64, 0x70, 0xfa, 0xcd, 0x99, 0xc1, 0xd8,
+ 0x1e, 0x37, 0xcd, 0x44, 0x01, 0x5f, 0xe1, 0x94,
+ 0x80, 0xa2, 0xa4, 0xd3, 0x35, 0x2a, 0x4f, 0xf5,
+ 0x60, 0xc0, 0x64, 0x0f, 0xdb, 0xda
+};
+static const u8 enc_nonce049[] __initconst = {
+ 0xdf, 0x87, 0x13, 0xe8, 0x7e, 0xc3, 0xdb, 0xcf,
+ 0xad, 0x14, 0xd5, 0x3e
+};
+static const u8 enc_key049[] __initconst = {
+ 0x56, 0x20, 0x74, 0x65, 0xb4, 0xe4, 0x8e, 0x6d,
+ 0x04, 0x63, 0x0f, 0x4a, 0x42, 0xf3, 0x5c, 0xfc,
+ 0x16, 0x3a, 0xb2, 0x89, 0xc2, 0x2a, 0x2b, 0x47,
+ 0x84, 0xf6, 0xf9, 0x29, 0x03, 0x30, 0xbe, 0xe0
+};
+
+/* wycheproof - misc */
+static const u8 enc_input050[] __initconst = {
+ 0xdc, 0x9e, 0x9e, 0xaf, 0x11, 0xe3, 0x14, 0x18,
+ 0x2d, 0xf6, 0xa4, 0xeb, 0xa1, 0x7a, 0xec, 0x9c
+};
+static const u8 enc_output050[] __initconst = {
+ 0x60, 0x5b, 0xbf, 0x90, 0xae, 0xb9, 0x74, 0xf6,
+ 0x60, 0x2b, 0xc7, 0x78, 0x05, 0x6f, 0x0d, 0xca,
+ 0x38, 0xea, 0x23, 0xd9, 0x90, 0x54, 0xb4, 0x6b,
+ 0x42, 0xff, 0xe0, 0x04, 0x12, 0x9d, 0x22, 0x04
+};
+static const u8 enc_assoc050[] __initconst = {
+ 0xba, 0x44, 0x6f, 0x6f, 0x9a, 0x0c, 0xed, 0x22,
+ 0x45, 0x0f, 0xeb, 0x10, 0x73, 0x7d, 0x90, 0x07,
+ 0xfd, 0x69, 0xab, 0xc1, 0x9b, 0x1d, 0x4d, 0x90,
+ 0x49, 0xa5, 0x55, 0x1e, 0x86, 0xec, 0x2b, 0x37
+};
+static const u8 enc_nonce050[] __initconst = {
+ 0x8d, 0xf4, 0xb1, 0x5a, 0x88, 0x8c, 0x33, 0x28,
+ 0x6a, 0x7b, 0x76, 0x51
+};
+static const u8 enc_key050[] __initconst = {
+ 0x39, 0x37, 0x98, 0x6a, 0xf8, 0x6d, 0xaf, 0xc1,
+ 0xba, 0x0c, 0x46, 0x72, 0xd8, 0xab, 0xc4, 0x6c,
+ 0x20, 0x70, 0x62, 0x68, 0x2d, 0x9c, 0x26, 0x4a,
+ 0xb0, 0x6d, 0x6c, 0x58, 0x07, 0x20, 0x51, 0x30
+};
+
+/* wycheproof - misc */
+static const u8 enc_input051[] __initconst = {
+ 0x81, 0xce, 0x84, 0xed, 0xe9, 0xb3, 0x58, 0x59,
+ 0xcc, 0x8c, 0x49, 0xa8, 0xf6, 0xbe, 0x7d, 0xc6
+};
+static const u8 enc_output051[] __initconst = {
+ 0x7b, 0x7c, 0xe0, 0xd8, 0x24, 0x80, 0x9a, 0x70,
+ 0xde, 0x32, 0x56, 0x2c, 0xcf, 0x2c, 0x2b, 0xbd,
+ 0x15, 0xd4, 0x4a, 0x00, 0xce, 0x0d, 0x19, 0xb4,
+ 0x23, 0x1f, 0x92, 0x1e, 0x22, 0xbc, 0x0a, 0x43
+};
+static const u8 enc_assoc051[] __initconst = {
+ 0xd4, 0x1a, 0x82, 0x8d, 0x5e, 0x71, 0x82, 0x92,
+ 0x47, 0x02, 0x19, 0x05, 0x40, 0x2e, 0xa2, 0x57,
+ 0xdc, 0xcb, 0xc3, 0xb8, 0x0f, 0xcd, 0x56, 0x75,
+ 0x05, 0x6b, 0x68, 0xbb, 0x59, 0xe6, 0x2e, 0x88,
+ 0x73
+};
+static const u8 enc_nonce051[] __initconst = {
+ 0xbe, 0x40, 0xe5, 0xf1, 0xa1, 0x18, 0x17, 0xa0,
+ 0xa8, 0xfa, 0x89, 0x49
+};
+static const u8 enc_key051[] __initconst = {
+ 0x36, 0x37, 0x2a, 0xbc, 0xdb, 0x78, 0xe0, 0x27,
+ 0x96, 0x46, 0xac, 0x3d, 0x17, 0x6b, 0x96, 0x74,
+ 0xe9, 0x15, 0x4e, 0xec, 0xf0, 0xd5, 0x46, 0x9c,
+ 0x65, 0x1e, 0xc7, 0xe1, 0x6b, 0x4c, 0x11, 0x99
+};
+
+/* wycheproof - misc */
+static const u8 enc_input052[] __initconst = {
+ 0xa6, 0x67, 0x47, 0xc8, 0x9e, 0x85, 0x7a, 0xf3,
+ 0xa1, 0x8e, 0x2c, 0x79, 0x50, 0x00, 0x87, 0xed
+};
+static const u8 enc_output052[] __initconst = {
+ 0xca, 0x82, 0xbf, 0xf3, 0xe2, 0xf3, 0x10, 0xcc,
+ 0xc9, 0x76, 0x67, 0x2c, 0x44, 0x15, 0xe6, 0x9b,
+ 0x57, 0x63, 0x8c, 0x62, 0xa5, 0xd8, 0x5d, 0xed,
+ 0x77, 0x4f, 0x91, 0x3c, 0x81, 0x3e, 0xa0, 0x32
+};
+static const u8 enc_assoc052[] __initconst = {
+ 0x3f, 0x2d, 0xd4, 0x9b, 0xbf, 0x09, 0xd6, 0x9a,
+ 0x78, 0xa3, 0xd8, 0x0e, 0xa2, 0x56, 0x66, 0x14,
+ 0xfc, 0x37, 0x94, 0x74, 0x19, 0x6c, 0x1a, 0xae,
+ 0x84, 0x58, 0x3d, 0xa7, 0x3d, 0x7f, 0xf8, 0x5c,
+ 0x6f, 0x42, 0xca, 0x42, 0x05, 0x6a, 0x97, 0x92,
+ 0xcc, 0x1b, 0x9f, 0xb3, 0xc7, 0xd2, 0x61
+};
+static const u8 enc_nonce052[] __initconst = {
+ 0x84, 0xc8, 0x7d, 0xae, 0x4e, 0xee, 0x27, 0x73,
+ 0x0e, 0xc3, 0x5d, 0x12
+};
+static const u8 enc_key052[] __initconst = {
+ 0x9f, 0x14, 0x79, 0xed, 0x09, 0x7d, 0x7f, 0xe5,
+ 0x29, 0xc1, 0x1f, 0x2f, 0x5a, 0xdd, 0x9a, 0xaf,
+ 0xf4, 0xa1, 0xca, 0x0b, 0x68, 0x99, 0x7a, 0x2c,
+ 0xb7, 0xf7, 0x97, 0x49, 0xbd, 0x90, 0xaa, 0xf4
+};
+
+/* wycheproof - misc */
+static const u8 enc_input053[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x88, 0x80, 0x94, 0x17, 0x83,
+ 0x55, 0xd3, 0x04, 0x84, 0x64, 0x43, 0xfe, 0xe8,
+ 0xdf, 0x99, 0x47, 0x03, 0x03, 0xfb, 0x3b, 0x7b,
+ 0x80, 0xe0, 0x30, 0xbe, 0xeb, 0xd3, 0x29, 0xbe
+};
+static const u8 enc_output053[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0xe6, 0xd3, 0xd7, 0x32, 0x4a, 0x1c, 0xbb, 0xa7,
+ 0x77, 0xbb, 0xb0, 0xec, 0xdd, 0xa3, 0x78, 0x07
+};
+static const u8 enc_assoc053[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 enc_nonce053[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key053[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input054[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x88, 0x80, 0x94, 0x17, 0x83,
+ 0x55, 0xd3, 0x04, 0x84, 0x64, 0x43, 0xfe, 0xe8,
+ 0xdf, 0x99, 0x47, 0x03, 0x03, 0xfb, 0x3b, 0x7b,
+ 0x80, 0xe0, 0x30, 0xbe, 0xeb, 0xd3, 0x29, 0xbe,
+ 0xe3, 0xbc, 0xdb, 0x5b, 0x1e, 0xde, 0xfc, 0xfe,
+ 0x8b, 0xcd, 0xa1, 0xb6, 0xa1, 0x5c, 0x8c, 0x2b,
+ 0x08, 0x69, 0xff, 0xd2, 0xec, 0x5e, 0x26, 0xe5,
+ 0x53, 0xb7, 0xb2, 0x27, 0xfe, 0x87, 0xfd, 0xbd
+};
+static const u8 enc_output054[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x06, 0x2d, 0xe6, 0x79, 0x5f, 0x27, 0x4f, 0xd2,
+ 0xa3, 0x05, 0xd7, 0x69, 0x80, 0xbc, 0x9c, 0xce
+};
+static const u8 enc_assoc054[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 enc_nonce054[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key054[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input055[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x88, 0x80, 0x94, 0x17, 0x83,
+ 0x55, 0xd3, 0x04, 0x84, 0x64, 0x43, 0xfe, 0xe8,
+ 0xdf, 0x99, 0x47, 0x03, 0x03, 0xfb, 0x3b, 0x7b,
+ 0x80, 0xe0, 0x30, 0xbe, 0xeb, 0xd3, 0x29, 0xbe,
+ 0xe3, 0xbc, 0xdb, 0x5b, 0x1e, 0xde, 0xfc, 0xfe,
+ 0x8b, 0xcd, 0xa1, 0xb6, 0xa1, 0x5c, 0x8c, 0x2b,
+ 0x08, 0x69, 0xff, 0xd2, 0xec, 0x5e, 0x26, 0xe5,
+ 0x53, 0xb7, 0xb2, 0x27, 0xfe, 0x87, 0xfd, 0xbd,
+ 0x7a, 0xda, 0x44, 0x42, 0x42, 0x69, 0xbf, 0xfa,
+ 0x55, 0x27, 0xf2, 0x70, 0xac, 0xf6, 0x85, 0x02,
+ 0xb7, 0x4c, 0x5a, 0xe2, 0xe6, 0x0c, 0x05, 0x80,
+ 0x98, 0x1a, 0x49, 0x38, 0x45, 0x93, 0x92, 0xc4,
+ 0x9b, 0xb2, 0xf2, 0x84, 0xb6, 0x46, 0xef, 0xc7,
+ 0xf3, 0xf0, 0xb1, 0x36, 0x1d, 0xc3, 0x48, 0xed,
+ 0x77, 0xd3, 0x0b, 0xc5, 0x76, 0x92, 0xed, 0x38,
+ 0xfb, 0xac, 0x01, 0x88, 0x38, 0x04, 0x88, 0xc7
+};
+static const u8 enc_output055[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0xd8, 0xb4, 0x79, 0x02, 0xba, 0xae, 0xaf, 0xb3,
+ 0x42, 0x03, 0x05, 0x15, 0x29, 0xaf, 0x28, 0x2e
+};
+static const u8 enc_assoc055[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 enc_nonce055[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key055[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input056[] __initconst = {
+ 0xda, 0x92, 0xbf, 0x77, 0x7f, 0x6b, 0xe8, 0x7c,
+ 0xaa, 0x2c, 0xfb, 0x7b, 0x9b, 0xbc, 0x01, 0x17,
+ 0x20, 0x66, 0xb8, 0xfc, 0xfc, 0x04, 0xc4, 0x84,
+ 0x7f, 0x1f, 0xcf, 0x41, 0x14, 0x2c, 0xd6, 0x41
+};
+static const u8 enc_output056[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xb3, 0x89, 0x1c, 0x84, 0x9c, 0xb5, 0x2c, 0x27,
+ 0x74, 0x7e, 0xdf, 0xcf, 0x31, 0x21, 0x3b, 0xb6
+};
+static const u8 enc_assoc056[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce056[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key056[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input057[] __initconst = {
+ 0xda, 0x92, 0xbf, 0x77, 0x7f, 0x6b, 0xe8, 0x7c,
+ 0xaa, 0x2c, 0xfb, 0x7b, 0x9b, 0xbc, 0x01, 0x17,
+ 0x20, 0x66, 0xb8, 0xfc, 0xfc, 0x04, 0xc4, 0x84,
+ 0x7f, 0x1f, 0xcf, 0x41, 0x14, 0x2c, 0xd6, 0x41,
+ 0x1c, 0x43, 0x24, 0xa4, 0xe1, 0x21, 0x03, 0x01,
+ 0x74, 0x32, 0x5e, 0x49, 0x5e, 0xa3, 0x73, 0xd4,
+ 0xf7, 0x96, 0x00, 0x2d, 0x13, 0xa1, 0xd9, 0x1a,
+ 0xac, 0x48, 0x4d, 0xd8, 0x01, 0x78, 0x02, 0x42
+};
+static const u8 enc_output057[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xf0, 0xc1, 0x2d, 0x26, 0xef, 0x03, 0x02, 0x9b,
+ 0x62, 0xc0, 0x08, 0xda, 0x27, 0xc5, 0xdc, 0x68
+};
+static const u8 enc_assoc057[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce057[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key057[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input058[] __initconst = {
+ 0xda, 0x92, 0xbf, 0x77, 0x7f, 0x6b, 0xe8, 0x7c,
+ 0xaa, 0x2c, 0xfb, 0x7b, 0x9b, 0xbc, 0x01, 0x17,
+ 0x20, 0x66, 0xb8, 0xfc, 0xfc, 0x04, 0xc4, 0x84,
+ 0x7f, 0x1f, 0xcf, 0x41, 0x14, 0x2c, 0xd6, 0x41,
+ 0x1c, 0x43, 0x24, 0xa4, 0xe1, 0x21, 0x03, 0x01,
+ 0x74, 0x32, 0x5e, 0x49, 0x5e, 0xa3, 0x73, 0xd4,
+ 0xf7, 0x96, 0x00, 0x2d, 0x13, 0xa1, 0xd9, 0x1a,
+ 0xac, 0x48, 0x4d, 0xd8, 0x01, 0x78, 0x02, 0x42,
+ 0x85, 0x25, 0xbb, 0xbd, 0xbd, 0x96, 0x40, 0x05,
+ 0xaa, 0xd8, 0x0d, 0x8f, 0x53, 0x09, 0x7a, 0xfd,
+ 0x48, 0xb3, 0xa5, 0x1d, 0x19, 0xf3, 0xfa, 0x7f,
+ 0x67, 0xe5, 0xb6, 0xc7, 0xba, 0x6c, 0x6d, 0x3b,
+ 0x64, 0x4d, 0x0d, 0x7b, 0x49, 0xb9, 0x10, 0x38,
+ 0x0c, 0x0f, 0x4e, 0xc9, 0xe2, 0x3c, 0xb7, 0x12,
+ 0x88, 0x2c, 0xf4, 0x3a, 0x89, 0x6d, 0x12, 0xc7,
+ 0x04, 0x53, 0xfe, 0x77, 0xc7, 0xfb, 0x77, 0x38
+};
+static const u8 enc_output058[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xee, 0x65, 0x78, 0x30, 0x01, 0xc2, 0x56, 0x91,
+ 0xfa, 0x28, 0xd0, 0xf5, 0xf1, 0xc1, 0xd7, 0x62
+};
+static const u8 enc_assoc058[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce058[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key058[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input059[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x08, 0x80, 0x94, 0x17, 0x03,
+ 0x55, 0xd3, 0x04, 0x04, 0x64, 0x43, 0xfe, 0x68,
+ 0xdf, 0x99, 0x47, 0x83, 0x03, 0xfb, 0x3b, 0xfb,
+ 0x80, 0xe0, 0x30, 0x3e, 0xeb, 0xd3, 0x29, 0x3e
+};
+static const u8 enc_output059[] __initconst = {
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x79, 0xba, 0x7a, 0x29, 0xf5, 0xa7, 0xbb, 0x75,
+ 0x79, 0x7a, 0xf8, 0x7a, 0x61, 0x01, 0x29, 0xa4
+};
+static const u8 enc_assoc059[] __initconst = {
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80
+};
+static const u8 enc_nonce059[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key059[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input060[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x08, 0x80, 0x94, 0x17, 0x03,
+ 0x55, 0xd3, 0x04, 0x04, 0x64, 0x43, 0xfe, 0x68,
+ 0xdf, 0x99, 0x47, 0x83, 0x03, 0xfb, 0x3b, 0xfb,
+ 0x80, 0xe0, 0x30, 0x3e, 0xeb, 0xd3, 0x29, 0x3e,
+ 0xe3, 0xbc, 0xdb, 0xdb, 0x1e, 0xde, 0xfc, 0x7e,
+ 0x8b, 0xcd, 0xa1, 0x36, 0xa1, 0x5c, 0x8c, 0xab,
+ 0x08, 0x69, 0xff, 0x52, 0xec, 0x5e, 0x26, 0x65,
+ 0x53, 0xb7, 0xb2, 0xa7, 0xfe, 0x87, 0xfd, 0x3d
+};
+static const u8 enc_output060[] __initconst = {
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x36, 0xb1, 0x74, 0x38, 0x19, 0xe1, 0xb9, 0xba,
+ 0x15, 0x51, 0xe8, 0xed, 0x92, 0x2a, 0x95, 0x9a
+};
+static const u8 enc_assoc060[] __initconst = {
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80
+};
+static const u8 enc_nonce060[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key060[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input061[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x08, 0x80, 0x94, 0x17, 0x03,
+ 0x55, 0xd3, 0x04, 0x04, 0x64, 0x43, 0xfe, 0x68,
+ 0xdf, 0x99, 0x47, 0x83, 0x03, 0xfb, 0x3b, 0xfb,
+ 0x80, 0xe0, 0x30, 0x3e, 0xeb, 0xd3, 0x29, 0x3e,
+ 0xe3, 0xbc, 0xdb, 0xdb, 0x1e, 0xde, 0xfc, 0x7e,
+ 0x8b, 0xcd, 0xa1, 0x36, 0xa1, 0x5c, 0x8c, 0xab,
+ 0x08, 0x69, 0xff, 0x52, 0xec, 0x5e, 0x26, 0x65,
+ 0x53, 0xb7, 0xb2, 0xa7, 0xfe, 0x87, 0xfd, 0x3d,
+ 0x7a, 0xda, 0x44, 0xc2, 0x42, 0x69, 0xbf, 0x7a,
+ 0x55, 0x27, 0xf2, 0xf0, 0xac, 0xf6, 0x85, 0x82,
+ 0xb7, 0x4c, 0x5a, 0x62, 0xe6, 0x0c, 0x05, 0x00,
+ 0x98, 0x1a, 0x49, 0xb8, 0x45, 0x93, 0x92, 0x44,
+ 0x9b, 0xb2, 0xf2, 0x04, 0xb6, 0x46, 0xef, 0x47,
+ 0xf3, 0xf0, 0xb1, 0xb6, 0x1d, 0xc3, 0x48, 0x6d,
+ 0x77, 0xd3, 0x0b, 0x45, 0x76, 0x92, 0xed, 0xb8,
+ 0xfb, 0xac, 0x01, 0x08, 0x38, 0x04, 0x88, 0x47
+};
+static const u8 enc_output061[] __initconst = {
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0xfe, 0xac, 0x49, 0x55, 0x55, 0x4e, 0x80, 0x6f,
+ 0x3a, 0x19, 0x02, 0xe2, 0x44, 0x32, 0xc0, 0x8a
+};
+static const u8 enc_assoc061[] __initconst = {
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+ 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80
+};
+static const u8 enc_nonce061[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key061[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input062[] __initconst = {
+ 0xda, 0x92, 0xbf, 0xf7, 0x7f, 0x6b, 0xe8, 0xfc,
+ 0xaa, 0x2c, 0xfb, 0xfb, 0x9b, 0xbc, 0x01, 0x97,
+ 0x20, 0x66, 0xb8, 0x7c, 0xfc, 0x04, 0xc4, 0x04,
+ 0x7f, 0x1f, 0xcf, 0xc1, 0x14, 0x2c, 0xd6, 0xc1
+};
+static const u8 enc_output062[] __initconst = {
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0x20, 0xa3, 0x79, 0x8d, 0xf1, 0x29, 0x2c, 0x59,
+ 0x72, 0xbf, 0x97, 0x41, 0xae, 0xc3, 0x8a, 0x19
+};
+static const u8 enc_assoc062[] __initconst = {
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f
+};
+static const u8 enc_nonce062[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key062[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input063[] __initconst = {
+ 0xda, 0x92, 0xbf, 0xf7, 0x7f, 0x6b, 0xe8, 0xfc,
+ 0xaa, 0x2c, 0xfb, 0xfb, 0x9b, 0xbc, 0x01, 0x97,
+ 0x20, 0x66, 0xb8, 0x7c, 0xfc, 0x04, 0xc4, 0x04,
+ 0x7f, 0x1f, 0xcf, 0xc1, 0x14, 0x2c, 0xd6, 0xc1,
+ 0x1c, 0x43, 0x24, 0x24, 0xe1, 0x21, 0x03, 0x81,
+ 0x74, 0x32, 0x5e, 0xc9, 0x5e, 0xa3, 0x73, 0x54,
+ 0xf7, 0x96, 0x00, 0xad, 0x13, 0xa1, 0xd9, 0x9a,
+ 0xac, 0x48, 0x4d, 0x58, 0x01, 0x78, 0x02, 0xc2
+};
+static const u8 enc_output063[] __initconst = {
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xc0, 0x3d, 0x9f, 0x67, 0x35, 0x4a, 0x97, 0xb2,
+ 0xf0, 0x74, 0xf7, 0x55, 0x15, 0x57, 0xe4, 0x9c
+};
+static const u8 enc_assoc063[] __initconst = {
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f
+};
+static const u8 enc_nonce063[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key063[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input064[] __initconst = {
+ 0xda, 0x92, 0xbf, 0xf7, 0x7f, 0x6b, 0xe8, 0xfc,
+ 0xaa, 0x2c, 0xfb, 0xfb, 0x9b, 0xbc, 0x01, 0x97,
+ 0x20, 0x66, 0xb8, 0x7c, 0xfc, 0x04, 0xc4, 0x04,
+ 0x7f, 0x1f, 0xcf, 0xc1, 0x14, 0x2c, 0xd6, 0xc1,
+ 0x1c, 0x43, 0x24, 0x24, 0xe1, 0x21, 0x03, 0x81,
+ 0x74, 0x32, 0x5e, 0xc9, 0x5e, 0xa3, 0x73, 0x54,
+ 0xf7, 0x96, 0x00, 0xad, 0x13, 0xa1, 0xd9, 0x9a,
+ 0xac, 0x48, 0x4d, 0x58, 0x01, 0x78, 0x02, 0xc2,
+ 0x85, 0x25, 0xbb, 0x3d, 0xbd, 0x96, 0x40, 0x85,
+ 0xaa, 0xd8, 0x0d, 0x0f, 0x53, 0x09, 0x7a, 0x7d,
+ 0x48, 0xb3, 0xa5, 0x9d, 0x19, 0xf3, 0xfa, 0xff,
+ 0x67, 0xe5, 0xb6, 0x47, 0xba, 0x6c, 0x6d, 0xbb,
+ 0x64, 0x4d, 0x0d, 0xfb, 0x49, 0xb9, 0x10, 0xb8,
+ 0x0c, 0x0f, 0x4e, 0x49, 0xe2, 0x3c, 0xb7, 0x92,
+ 0x88, 0x2c, 0xf4, 0xba, 0x89, 0x6d, 0x12, 0x47,
+ 0x04, 0x53, 0xfe, 0xf7, 0xc7, 0xfb, 0x77, 0xb8
+};
+static const u8 enc_output064[] __initconst = {
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xc8, 0x6d, 0xa8, 0xdd, 0x65, 0x22, 0x86, 0xd5,
+ 0x02, 0x13, 0xd3, 0x28, 0xd6, 0x3e, 0x40, 0x06
+};
+static const u8 enc_assoc064[] __initconst = {
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+ 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f
+};
+static const u8 enc_nonce064[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key064[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input065[] __initconst = {
+ 0x5a, 0x92, 0xbf, 0x77, 0xff, 0x6b, 0xe8, 0x7c,
+ 0x2a, 0x2c, 0xfb, 0x7b, 0x1b, 0xbc, 0x01, 0x17,
+ 0xa0, 0x66, 0xb8, 0xfc, 0x7c, 0x04, 0xc4, 0x84,
+ 0xff, 0x1f, 0xcf, 0x41, 0x94, 0x2c, 0xd6, 0x41
+};
+static const u8 enc_output065[] __initconst = {
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0xbe, 0xde, 0x90, 0x83, 0xce, 0xb3, 0x6d, 0xdf,
+ 0xe5, 0xfa, 0x81, 0x1f, 0x95, 0x47, 0x1c, 0x67
+};
+static const u8 enc_assoc065[] __initconst = {
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce065[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key065[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input066[] __initconst = {
+ 0x5a, 0x92, 0xbf, 0x77, 0xff, 0x6b, 0xe8, 0x7c,
+ 0x2a, 0x2c, 0xfb, 0x7b, 0x1b, 0xbc, 0x01, 0x17,
+ 0xa0, 0x66, 0xb8, 0xfc, 0x7c, 0x04, 0xc4, 0x84,
+ 0xff, 0x1f, 0xcf, 0x41, 0x94, 0x2c, 0xd6, 0x41,
+ 0x9c, 0x43, 0x24, 0xa4, 0x61, 0x21, 0x03, 0x01,
+ 0xf4, 0x32, 0x5e, 0x49, 0xde, 0xa3, 0x73, 0xd4,
+ 0x77, 0x96, 0x00, 0x2d, 0x93, 0xa1, 0xd9, 0x1a,
+ 0x2c, 0x48, 0x4d, 0xd8, 0x81, 0x78, 0x02, 0x42
+};
+static const u8 enc_output066[] __initconst = {
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x30, 0x08, 0x74, 0xbb, 0x06, 0x92, 0xb6, 0x89,
+ 0xde, 0xad, 0x9a, 0xe1, 0x5b, 0x06, 0x73, 0x90
+};
+static const u8 enc_assoc066[] __initconst = {
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce066[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key066[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input067[] __initconst = {
+ 0x5a, 0x92, 0xbf, 0x77, 0xff, 0x6b, 0xe8, 0x7c,
+ 0x2a, 0x2c, 0xfb, 0x7b, 0x1b, 0xbc, 0x01, 0x17,
+ 0xa0, 0x66, 0xb8, 0xfc, 0x7c, 0x04, 0xc4, 0x84,
+ 0xff, 0x1f, 0xcf, 0x41, 0x94, 0x2c, 0xd6, 0x41,
+ 0x9c, 0x43, 0x24, 0xa4, 0x61, 0x21, 0x03, 0x01,
+ 0xf4, 0x32, 0x5e, 0x49, 0xde, 0xa3, 0x73, 0xd4,
+ 0x77, 0x96, 0x00, 0x2d, 0x93, 0xa1, 0xd9, 0x1a,
+ 0x2c, 0x48, 0x4d, 0xd8, 0x81, 0x78, 0x02, 0x42,
+ 0x05, 0x25, 0xbb, 0xbd, 0x3d, 0x96, 0x40, 0x05,
+ 0x2a, 0xd8, 0x0d, 0x8f, 0xd3, 0x09, 0x7a, 0xfd,
+ 0xc8, 0xb3, 0xa5, 0x1d, 0x99, 0xf3, 0xfa, 0x7f,
+ 0xe7, 0xe5, 0xb6, 0xc7, 0x3a, 0x6c, 0x6d, 0x3b,
+ 0xe4, 0x4d, 0x0d, 0x7b, 0xc9, 0xb9, 0x10, 0x38,
+ 0x8c, 0x0f, 0x4e, 0xc9, 0x62, 0x3c, 0xb7, 0x12,
+ 0x08, 0x2c, 0xf4, 0x3a, 0x09, 0x6d, 0x12, 0xc7,
+ 0x84, 0x53, 0xfe, 0x77, 0x47, 0xfb, 0x77, 0x38
+};
+static const u8 enc_output067[] __initconst = {
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x99, 0xca, 0xd8, 0x5f, 0x45, 0xca, 0x40, 0x94,
+ 0x2d, 0x0d, 0x4d, 0x5e, 0x95, 0x0a, 0xde, 0x22
+};
+static const u8 enc_assoc067[] __initconst = {
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+ 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce067[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key067[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input068[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x88, 0x7f, 0x6b, 0xe8, 0x7c,
+ 0x55, 0xd3, 0x04, 0x84, 0x9b, 0xbc, 0x01, 0x17,
+ 0xdf, 0x99, 0x47, 0x03, 0xfc, 0x04, 0xc4, 0x84,
+ 0x80, 0xe0, 0x30, 0xbe, 0x14, 0x2c, 0xd6, 0x41
+};
+static const u8 enc_output068[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x8b, 0xbe, 0x14, 0x52, 0x72, 0xe7, 0xc2, 0xd9,
+ 0xa1, 0x89, 0x1a, 0x3a, 0xb0, 0x98, 0x3d, 0x9d
+};
+static const u8 enc_assoc068[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce068[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key068[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input069[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x88, 0x7f, 0x6b, 0xe8, 0x7c,
+ 0x55, 0xd3, 0x04, 0x84, 0x9b, 0xbc, 0x01, 0x17,
+ 0xdf, 0x99, 0x47, 0x03, 0xfc, 0x04, 0xc4, 0x84,
+ 0x80, 0xe0, 0x30, 0xbe, 0x14, 0x2c, 0xd6, 0x41,
+ 0xe3, 0xbc, 0xdb, 0x5b, 0xe1, 0x21, 0x03, 0x01,
+ 0x8b, 0xcd, 0xa1, 0xb6, 0x5e, 0xa3, 0x73, 0xd4,
+ 0x08, 0x69, 0xff, 0xd2, 0x13, 0xa1, 0xd9, 0x1a,
+ 0x53, 0xb7, 0xb2, 0x27, 0x01, 0x78, 0x02, 0x42
+};
+static const u8 enc_output069[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x3b, 0x41, 0x86, 0x19, 0x13, 0xa8, 0xf6, 0xde,
+ 0x7f, 0x61, 0xe2, 0x25, 0x63, 0x1b, 0xc3, 0x82
+};
+static const u8 enc_assoc069[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce069[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key069[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input070[] __initconst = {
+ 0x25, 0x6d, 0x40, 0x88, 0x7f, 0x6b, 0xe8, 0x7c,
+ 0x55, 0xd3, 0x04, 0x84, 0x9b, 0xbc, 0x01, 0x17,
+ 0xdf, 0x99, 0x47, 0x03, 0xfc, 0x04, 0xc4, 0x84,
+ 0x80, 0xe0, 0x30, 0xbe, 0x14, 0x2c, 0xd6, 0x41,
+ 0xe3, 0xbc, 0xdb, 0x5b, 0xe1, 0x21, 0x03, 0x01,
+ 0x8b, 0xcd, 0xa1, 0xb6, 0x5e, 0xa3, 0x73, 0xd4,
+ 0x08, 0x69, 0xff, 0xd2, 0x13, 0xa1, 0xd9, 0x1a,
+ 0x53, 0xb7, 0xb2, 0x27, 0x01, 0x78, 0x02, 0x42,
+ 0x7a, 0xda, 0x44, 0x42, 0xbd, 0x96, 0x40, 0x05,
+ 0x55, 0x27, 0xf2, 0x70, 0x53, 0x09, 0x7a, 0xfd,
+ 0xb7, 0x4c, 0x5a, 0xe2, 0x19, 0xf3, 0xfa, 0x7f,
+ 0x98, 0x1a, 0x49, 0x38, 0xba, 0x6c, 0x6d, 0x3b,
+ 0x9b, 0xb2, 0xf2, 0x84, 0x49, 0xb9, 0x10, 0x38,
+ 0xf3, 0xf0, 0xb1, 0x36, 0xe2, 0x3c, 0xb7, 0x12,
+ 0x77, 0xd3, 0x0b, 0xc5, 0x89, 0x6d, 0x12, 0xc7,
+ 0xfb, 0xac, 0x01, 0x88, 0xc7, 0xfb, 0x77, 0x38
+};
+static const u8 enc_output070[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x84, 0x28, 0xbc, 0xf0, 0x23, 0xec, 0x6b, 0xf3,
+ 0x1f, 0xd9, 0xef, 0xb2, 0x03, 0xff, 0x08, 0x71
+};
+static const u8 enc_assoc070[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+ 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce070[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key070[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input071[] __initconst = {
+ 0xda, 0x92, 0xbf, 0x77, 0x80, 0x94, 0x17, 0x83,
+ 0xaa, 0x2c, 0xfb, 0x7b, 0x64, 0x43, 0xfe, 0xe8,
+ 0x20, 0x66, 0xb8, 0xfc, 0x03, 0xfb, 0x3b, 0x7b,
+ 0x7f, 0x1f, 0xcf, 0x41, 0xeb, 0xd3, 0x29, 0xbe
+};
+static const u8 enc_output071[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0x13, 0x9f, 0xdf, 0x64, 0x74, 0xea, 0x24, 0xf5,
+ 0x49, 0xb0, 0x75, 0x82, 0x5f, 0x2c, 0x76, 0x20
+};
+static const u8 enc_assoc071[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 enc_nonce071[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key071[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input072[] __initconst = {
+ 0xda, 0x92, 0xbf, 0x77, 0x80, 0x94, 0x17, 0x83,
+ 0xaa, 0x2c, 0xfb, 0x7b, 0x64, 0x43, 0xfe, 0xe8,
+ 0x20, 0x66, 0xb8, 0xfc, 0x03, 0xfb, 0x3b, 0x7b,
+ 0x7f, 0x1f, 0xcf, 0x41, 0xeb, 0xd3, 0x29, 0xbe,
+ 0x1c, 0x43, 0x24, 0xa4, 0x1e, 0xde, 0xfc, 0xfe,
+ 0x74, 0x32, 0x5e, 0x49, 0xa1, 0x5c, 0x8c, 0x2b,
+ 0xf7, 0x96, 0x00, 0x2d, 0xec, 0x5e, 0x26, 0xe5,
+ 0xac, 0x48, 0x4d, 0xd8, 0xfe, 0x87, 0xfd, 0xbd
+};
+static const u8 enc_output072[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xbb, 0xad, 0x8d, 0x86, 0x3b, 0x83, 0x5a, 0x8e,
+ 0x86, 0x64, 0xfd, 0x1d, 0x45, 0x66, 0xb6, 0xb4
+};
+static const u8 enc_assoc072[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 enc_nonce072[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key072[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - misc */
+static const u8 enc_input073[] __initconst = {
+ 0xda, 0x92, 0xbf, 0x77, 0x80, 0x94, 0x17, 0x83,
+ 0xaa, 0x2c, 0xfb, 0x7b, 0x64, 0x43, 0xfe, 0xe8,
+ 0x20, 0x66, 0xb8, 0xfc, 0x03, 0xfb, 0x3b, 0x7b,
+ 0x7f, 0x1f, 0xcf, 0x41, 0xeb, 0xd3, 0x29, 0xbe,
+ 0x1c, 0x43, 0x24, 0xa4, 0x1e, 0xde, 0xfc, 0xfe,
+ 0x74, 0x32, 0x5e, 0x49, 0xa1, 0x5c, 0x8c, 0x2b,
+ 0xf7, 0x96, 0x00, 0x2d, 0xec, 0x5e, 0x26, 0xe5,
+ 0xac, 0x48, 0x4d, 0xd8, 0xfe, 0x87, 0xfd, 0xbd,
+ 0x85, 0x25, 0xbb, 0xbd, 0x42, 0x69, 0xbf, 0xfa,
+ 0xaa, 0xd8, 0x0d, 0x8f, 0xac, 0xf6, 0x85, 0x02,
+ 0x48, 0xb3, 0xa5, 0x1d, 0xe6, 0x0c, 0x05, 0x80,
+ 0x67, 0xe5, 0xb6, 0xc7, 0x45, 0x93, 0x92, 0xc4,
+ 0x64, 0x4d, 0x0d, 0x7b, 0xb6, 0x46, 0xef, 0xc7,
+ 0x0c, 0x0f, 0x4e, 0xc9, 0x1d, 0xc3, 0x48, 0xed,
+ 0x88, 0x2c, 0xf4, 0x3a, 0x76, 0x92, 0xed, 0x38,
+ 0x04, 0x53, 0xfe, 0x77, 0x38, 0x04, 0x88, 0xc7
+};
+static const u8 enc_output073[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0x42, 0xf2, 0x35, 0x42, 0x97, 0x84, 0x9a, 0x51,
+ 0x1d, 0x53, 0xe5, 0x57, 0x17, 0x72, 0xf7, 0x1f
+};
+static const u8 enc_assoc073[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00
+};
+static const u8 enc_nonce073[] __initconst = {
+ 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00
+};
+static const u8 enc_key073[] __initconst = {
+ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f
+};
+
+/* wycheproof - checking for int overflows */
+static const u8 enc_input074[] __initconst = {
+ 0xd4, 0x50, 0x0b, 0xf0, 0x09, 0x49, 0x35, 0x51,
+ 0xc3, 0x80, 0xad, 0xf5, 0x2c, 0x57, 0x3a, 0x69,
+ 0xdf, 0x7e, 0x8b, 0x76, 0x24, 0x63, 0x33, 0x0f,
+ 0xac, 0xc1, 0x6a, 0x57, 0x26, 0xbe, 0x71, 0x90,
+ 0xc6, 0x3c, 0x5a, 0x1c, 0x92, 0x65, 0x84, 0xa0,
+ 0x96, 0x75, 0x68, 0x28, 0xdc, 0xdc, 0x64, 0xac,
+ 0xdf, 0x96, 0x3d, 0x93, 0x1b, 0xf1, 0xda, 0xe2,
+ 0x38, 0xf3, 0xf1, 0x57, 0x22, 0x4a, 0xc4, 0xb5,
+ 0x42, 0xd7, 0x85, 0xb0, 0xdd, 0x84, 0xdb, 0x6b,
+ 0xe3, 0xbc, 0x5a, 0x36, 0x63, 0xe8, 0x41, 0x49,
+ 0xff, 0xbe, 0xd0, 0x9e, 0x54, 0xf7, 0x8f, 0x16,
+ 0xa8, 0x22, 0x3b, 0x24, 0xcb, 0x01, 0x9f, 0x58,
+ 0xb2, 0x1b, 0x0e, 0x55, 0x1e, 0x7a, 0xa0, 0x73,
+ 0x27, 0x62, 0x95, 0x51, 0x37, 0x6c, 0xcb, 0xc3,
+ 0x93, 0x76, 0x71, 0xa0, 0x62, 0x9b, 0xd9, 0x5c,
+ 0x99, 0x15, 0xc7, 0x85, 0x55, 0x77, 0x1e, 0x7a
+};
+static const u8 enc_output074[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0x0b, 0x30, 0x0d, 0x8d, 0xa5, 0x6c, 0x21, 0x85,
+ 0x75, 0x52, 0x79, 0x55, 0x3c, 0x4c, 0x82, 0xca
+};
+static const u8 enc_assoc074[] __initconst = {
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+static const u8 enc_nonce074[] __initconst = {
+ 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+ 0x00, 0x02, 0x50, 0x6e
+};
+static const u8 enc_key074[] __initconst = {
+ 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+ 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+ 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+ 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30
+};
+
+/* wycheproof - checking for int overflows */
+static const u8 e